Azure: optional ETag concurrency on VMSS capacity updates by mboersma · Pull Request #9783 · kubernetes/autoscaler

mboersma · 2026-06-09T20:41:30Z

What type of PR is this?

/kind feature
/area provider/azure

What this PR does / why we need it:

Adds optional ETag-based optimistic concurrency to the Azure VMSS capacity update path. When enabled, the cached VMSS ETag is sent as If-Match on capacity PUTs, so a concurrent modification is rejected with HTTP 412 instead of silently overwritten. On 412 the autoscaler refreshes its caches and re-plans on the next loop. On by default (AZURE_ENABLE_VMSS_ETAG / enableVMSSEtag).

Which issue(s) this PR fixes:

Fixes #2581

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Azure: optional ETag optimistic concurrency on VMSS capacity updates

theunrepentantgeek · 2026-06-09T21:55:30Z


+// setScaleSet replaces the cached entry for a single VMSS, e.g. after a fresh GET.
+// It copies the map before mutating it so readers that obtained the map via
+// getScaleSets() are not exposed to a concurrent map write.


What's the ratio of reads/writes?

If # reads is orders of magnitude more common than writes, returning the internal map might be a good performance optimization - but if they're comparable, returning a clone via getScaleSets() might be advisable to prevent concurrency issues.

Reads dominate because every reconcile loop reads the scale sets, while writes happen only on a GET refresh or right after a capacity update. So returning the live map and paying the copy cost on the rare write (copy-on-write in setScaleSet) is probably the cheaper tradeoff.

Makes sense.

theunrepentantgeek · 2026-06-09T21:56:41Z

 	EnableLabelPredictionsOnTemplate bool `json:"enableLabelPredictionsOnTemplate,omitempty" yaml:"enableLabelPredictionsOnTemplate,omitempty"`
+
+	// EnableVMSSEtag sends the cached VMSS ETag as If-Match on capacity-changing
+	// VMSS PUTs so concurrent modifications are rejected with 412 instead of overwritten.


This seems a positive with very few downsides, so I'm curious why the default is off - is this just to ensure people opt-in, or is there more to it? Might this default to on in the future?

I guess I was just playing it safe, since it changes behavior on the capacity-update path. We can make it on by default or just do away with the flag if you think that's preferable.

I would lean towards on by default with a flag that allows it to be disabled if there's a problem, with the flag being removed in a later release.

Done, it's enabled by default now.

theunrepentantgeek · 2026-06-09T22:07:32Z

+	// A successful PUT changes the server-side ETag. Adopt the new one returned by
+	// the operation so a follow-up PUT before the next cache refresh still carries a
+	// valid If-Match rather than overwriting concurrent changes or hitting a 412.
+	if scaleSet.manager.config.EnableVMSSEtag && resp.Etag != nil {


What should happen if EnableVMSSEtag is true but the returned Etag is nil?

I suspect this is in the category of "things that should (almost) never happen", but if it does, we currently retain the current ETag. Is this the right behaviour?

Azure should always return an ETag here, so this is the "shouldn't happen" path, but if it does, keeping the previous ETag seems like the safe choice.

The next PUT then sends a stale If-Match and gets a 412, which we already handle by invalidating and re-planning. The alternative (clearing it) would make the next PUT unconditional and risk silently overwriting a concurrent change, which is what this feature is meant to prevent.

Sounds good.

mboersma · 2026-06-10T17:57:46Z

/label tide/merge-method-squash

mboersma · 2026-06-10T21:08:20Z

/test pull-autoscaling-e2e-gci-gce-ca-test

GCE flake as far as I can tell.

k8s-ci-robot · 2026-06-10T21:21:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mboersma, theunrepentantgeek
Once this PR has been reviewed and has the lgtm label, please assign rakechill for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

cluster-autoscaler/cloudprovider/azure/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mboersma · 2026-06-11T00:34:26Z

/retest

mboersma · 2026-06-11T14:56:13Z

@jackfrancis @tallaxes please take a look when time permits.

jackfrancis · 2026-06-12T21:09:53Z

/triage accepted

jackfrancis · 2026-06-12T23:18:22Z

 		klog.Errorf("virtualMachineScaleSetsClient.BeginCreateOrUpdate for scale set %q failed: %+v", scaleSet.Name, err)
+		if isPreconditionFailedError(err) {
+			klog.V(2).Infof("VMSS %s update rejected by ETag precondition; invalidating cache to re-plan next loop", scaleSet.Name)
+			scaleSet.invalidateInstanceCache()


I think for safety if we are in this ETAG precondition failure condition we also need to invalidate the size "cache" as well.

// lastSizeRefresh is the time curSize was last refreshed from vmss.Sku.Capacity. // Together with sizeRefreshPeriod, it is used to determine if it is time to refresh curSize.

Seems like we want to mark the ScaleSet as such: "as of now it is time to refresh curSize". We can either do something like this literally:

scaleSet.lastSizeRefresh = time.Now().Add(-1 * scaleSet.sizeRefreshPeriod)

Or create a new function to do it (we can't invoke invalidateLastSizeRefreshWithLock because we are in the flow of initCreateOrUpdate which already holds the sizeMutex lock.

Copilot came up w/ this add'l UT that would that an out-of-band VMSS capacity update would slip through (we've invalidated the instance but the VMSS size reflects the pre-invalidated cache result):

func TestScaleSetETagPreconditionFailureInvalidatesTargetSize(t *testing.T) { ctrl := gomock.NewController(t) defer ctrl.Finish() manager := newTestAzureManager(t) manager.config.EnableVMSSEtag = true vmssName := "vmss-etag-stale" orchMode := armcompute.OrchestrationModeUniform manager.azureCache.setScaleSet(vmssName, &armcompute.VirtualMachineScaleSet{ Name: ptr.To(vmssName), SKU: &armcompute.SKU{Capacity: ptr.To[int64](3)}, Properties: &armcompute.VirtualMachineScaleSetProperties{ OrchestrationMode: &orchMode, }, Etag: ptr.To(`W/"old"`), }) mockDeleteClient := NewMockVMSSDeleteClient(ctrl) mockDeleteClient.EXPECT(). BeginCreateOrUpdate(gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any()). Return(nil, &azcore.ResponseError{StatusCode: http.StatusPreconditionFailed}) manager.azClient.vmssClientForDelete = mockDeleteClient scaleSet := newTestScaleSet(manager, vmssName) scaleSet.sizeRefreshPeriod = manager.azureCache.refreshInterval err := scaleSet.IncreaseSize(1) assert.Error(t, err) manager.azureCache.setScaleSet(vmssName, &armcompute.VirtualMachineScaleSet{ Name: ptr.To(vmssName), SKU: &armcompute.SKU{Capacity: ptr.To[int64](5)}, Properties: &armcompute.VirtualMachineScaleSetProperties{ OrchestrationMode: &orchMode, }, Etag: ptr.To(`W/"new"`), }) target, err := scaleSet.TargetSize() assert.NoError(t, err) assert.Equal(t, 5, target) }

jackfrancis · 2026-06-12T23:20:04Z

 		klog.Errorf("virtualMachineScaleSetsClient.BeginCreateOrUpdate for scale set %q failed: %+v", scaleSet.Name, err)
+		if isPreconditionFailedError(err) {
+			klog.V(2).Infof("VMSS %s update rejected by ETag precondition; invalidating cache to re-plan next loop", scaleSet.Name)
+			scaleSet.invalidateInstanceCache()


Copilot came up w/ this add'l UT that would that an out-of-band VMSS capacity update would slip through (we've invalidated the instance but the VMSS size reflects the pre-invalidated cache result):

func TestScaleSetETagPreconditionFailureInvalidatesTargetSize(t *testing.T) { ctrl := gomock.NewController(t) defer ctrl.Finish() manager := newTestAzureManager(t) manager.config.EnableVMSSEtag = true vmssName := "vmss-etag-stale" orchMode := armcompute.OrchestrationModeUniform manager.azureCache.setScaleSet(vmssName, &armcompute.VirtualMachineScaleSet{ Name: ptr.To(vmssName), SKU: &armcompute.SKU{Capacity: ptr.To[int64](3)}, Properties: &armcompute.VirtualMachineScaleSetProperties{ OrchestrationMode: &orchMode, }, Etag: ptr.To(`W/"old"`), }) mockDeleteClient := NewMockVMSSDeleteClient(ctrl) mockDeleteClient.EXPECT(). BeginCreateOrUpdate(gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any()). Return(nil, &azcore.ResponseError{StatusCode: http.StatusPreconditionFailed}) manager.azClient.vmssClientForDelete = mockDeleteClient scaleSet := newTestScaleSet(manager, vmssName) scaleSet.sizeRefreshPeriod = manager.azureCache.refreshInterval err := scaleSet.IncreaseSize(1) assert.Error(t, err) manager.azureCache.setScaleSet(vmssName, &armcompute.VirtualMachineScaleSet{ Name: ptr.To(vmssName), SKU: &armcompute.SKU{Capacity: ptr.To[int64](5)}, Properties: &armcompute.VirtualMachineScaleSetProperties{ OrchestrationMode: &orchMode, }, Etag: ptr.To(`W/"new"`), }) target, err := scaleSet.TargetSize() assert.NoError(t, err) assert.Equal(t, 5, target) }

Azure: optional ETag concurrency on VMSS capacity updates

9c379d6

k8s-ci-robot requested review from jackfrancis and tallaxes June 9, 2026 20:41

k8s-ci-robot added area/cluster-autoscaler Issues or PRs related to the Cluster Autoscaler component cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 9, 2026

theunrepentantgeek approved these changes Jun 9, 2026

View reviewed changes

mboersma added 3 commits June 9, 2026 18:01

Azure: address review feedback on VMSS ETag

ef4fdc4

Azure: extract isPreconditionFailedError helper

d5478a7

Azure: drop year from VMSS ETag test boilerplate

fe6eb5a

k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 10, 2026

theunrepentantgeek approved these changes Jun 10, 2026

View reviewed changes

Azure: enable VMSS ETag concurrency by default

6cdfe20

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 12, 2026

jackfrancis reviewed Jun 12, 2026

View reviewed changes

jackfrancis requested changes Jun 12, 2026

View reviewed changes

Conversation

mboersma commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mboersma commented Jun 10, 2026

Uh oh!

mboersma commented Jun 10, 2026

Uh oh!

k8s-ci-robot commented Jun 10, 2026

Uh oh!

mboersma commented Jun 11, 2026

Uh oh!

mboersma commented Jun 11, 2026

Uh oh!

jackfrancis commented Jun 12, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mboersma commented Jun 9, 2026 •

edited

Loading