Google Cloud - Cloud Compute

Scale in at your own pace with Compute Engine autoscaler controls

In a Compute Engine environment, managed instance groups (MIGs) offer autoscaling that lets you automatically change an instance group’s capacity based on current load, so you can rightsize your environment—and your costs. Autoscaling adds more virtual machines (VMs) to a MIG when there is more load (scaling out), and deletes VMs when the need for VMs is lesser (scaling in).

When load declines, the autoscaler removes all unused capacity. This allows you to save costs but might cause the autoscaler to scale in abruptly. For example, if the load goes down by 50%, the autoscaler removes ~50% of your VMs immediately after a 10-minute stabilization period. Deleting VMs so abruptly might not work well for some workloads. For example, if your VMs take many minutes to initialize you might want to slow down the rate at which you scale in to maintain capacity for imminent load spikes.

Introducing scale-in controls

New scale-in controls in Compute Engine let you limit the VM deletion rate by preventing the autoscaler from reducing a MIG’s size by more VM instances than your workload can tolerate to lose.

When you configure autoscaler scale-in controls, you control the speed at which you scale in. The autoscaler never scales in faster than your configured rate, as shown in this diagram:

Google Cloud - Scale in Controls
  1. When load declines, the autoscaler maintains the size for the group at a level required to serve the peak load observed in the last 10 minutes (the stabilization period). This works the same with and without scale-in controls.
  2. An autoscaler without scale-in controls keeps only enough instances required to handle the recently observed load. After the stabilization period, the autoscaler removes all unneeded instances in one step. A sudden drop in load can lead to a dramatic reduction of instance group size.
  3. An autoscaler with scale-in controls limits how many VM instances can be removed in a given period of time (here 10 VMs in 20 minutes). This slows down the instance reduction rate.
  4. With the load spikes again, the autoscaler adds new instances. However, because VMs take a long time to initialize, the new VMs aren’t ready to serve the load. With scale-in controls, the previous capacity wasn’t deleted yet, allowing existing VMs to serve the spike.

Getting started

You can set up scale-in controls in the Google Cloud Console. Select an autoscaled MIG from the instance groups page and click Edit group. Under the Autoscaling section, set your scale-in controls:

Google Cloud - Console Scale in Controls

You can also configure scale-in controls programmatically. Here’s the same command written using CLI (gcloud):

$ gcloud compute instance-groups managed update-autoscaling autoscaled-group \
–scale-in-control max-scaled-in-replicas=10,time-window=1200

$ gcloud compute instance-groups managed update-autoscaling autoscaled-group \
--scale-in-control max-scaled-in-replicas=10,time-window=1200

For more details including configuration using API refer to the documentation.

Try scale-in controls today

Scale-in controls are generally available across all regions. For more information on how you can control the scale-in rate, check out the docs.

Pawel Wenda
Product Manager


For enquiries, product placements, sponsorships, and collaborations, connect with us at We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

Previous Article
IBM Cloud Logo

Alpitour World Selects IBM Hybrid Cloud and AI Capabilities For a More Personalized Customer Experience

Next Article
Cloud Foundry Logo

Cloud Foundry Foundation Members Double Down on the Cloud Foundry Developer Experience for Kubernetes at Second Virtual Summit

Related Posts