Site Reliability Engineering (SRE) and Operations teams responsible for operating virtual machines (VMs) are always looking for ways to provide a more stable, more scalable environment for their development partners. Part of providing that stable experience is having telemetry data (metrics, logs and traces) from systems and applications so you can monitor and troubleshoot effectively. Many Google Cloud services, including VMs, provide basic system metrics out of the box, without the need to install an agent. However, if you want in-depth metrics about your VMs or application telemetry, installing an agent is necessary.
Agent installation options for Google Cloud VMs
Choosing the right solution for installing agents on your VMs can save you a lot of time and effort. Google Cloud’s operations suite has created options ranging from one VM at a time, all the way to programmatic fleet installations. We know you’re overloaded with tools, so the options we present below leverage both the Google Cloud and third party tools which are likely already in use in your organization today.
Before you begin installing agents, you have to determine which Google Cloud agent fits your needs. The Ops Agent is a single agent for both logs and metrics, targeted toward specialized high throughput logging workloads. Compared with the standard logging-only agent, you can capture more data and avoid OutofMemory errors. As of today, the Ops Agent is in preview, so be sure to confirm which agent will work best for your environment. If the Ops Agent doesn’t meet your needs, you should use the standard Logging and Monitoring agents.
Single VM via the VM Instances dashboard
If you have only a small handful of VMs that need monitoring and logging, and you have determined that the standard Cloud Monitoring and Logging agents are your best options, you can use the VM Instances dashboard in Cloud Monitoring to begin the installation process. This dashboard provides a list of all VMs in your workspace and displays whether or not agents are installed on each VM. If agents are not installed, you can use the ‘Install Agent’ walk through to complete a simple installation flow. If agents are installed but they are out of date, you can click on “Learn more” and follow the linked instructions to upgrade the agent.
Single VM via Google Compute Engine in-context
From the VM Instances page in Compute Engine, you can see important monitoring information about each of your VMs without having to navigate to Cloud Monitoring and you can also install the monitoring agent.
Multi-VM with GCP Tooling (Agent Policies)
If you are responsible for operating a fleet of hundreds or thousands of VMs, walking through a UI-based prompt for each machine does not scale. For those who do not prefer to use a third party configuration management or provisioning tools such as Ansible or Terraform, we provide a built-in option to programmatically manage the installation and management of your agents called Agent Policies, which is currently in preview.
With one command, you can create a policy that governs new and existing VMs to ensure proper installation and optional auto-upgrade of the Ops Agent, the standard Logging agent, or the standard Monitoring agent on VMs that meet your specified criteria.
Multi-VM with Ansible and Terraform
Administrators, SRE and IT managers spend enough time learning new tools. Therefore, if your organization already uses the configuration management/automation capabilities of the open source tool Ansible, we want to make sure you can use it to install agents for Cloud Logging and Cloud Monitoring.
Using the Ansible Role, you can install and configure the agent(s) across your fleet of Linux and Windows VMs. For more information, refer to the Ansible Role for Cloud Ops documentation.
Other popular configuration management tool integrations such as Chef and Puppet are coming in the middle of this year.
If you are already using Terraform, the open-source provisioning management/infrastructure-as-code tool, you can use the Terraform module to install and configure our agents on your VMs. For more information, refer to the Terraform Agent Policy documentation.
Get started today
Whether you are managing a handful of VMs or an entire fleet, ensuring robust observability data is available from systems and applications is key to effective monitoring and troubleshooting. With the VM Instances dashboard in Cloud Monitoring, Agent Policies, or use of open source tooling such as Ansible and Terraform, you have many options to install agents on your Google Cloud VMs. While Google Cloud’s operations suite services like Cloud Logging and Cloud Monitoring have some VM metrics available out of the box, installing the Ops Agent or the Cloud Monitoring and Cloud Logging agents allows you to gather the data that will help you operate your infrastructure and applications at their most optimal levels.
By: Rahul Harpalani (Product Manager) and John Day (Product Marketing Manager)
Source: Google Cloud Blog