Azure: optional ETag concurrency on VMSS capacity updates#9783
Azure: optional ETag concurrency on VMSS capacity updates#9783mboersma wants to merge 5 commits into
Conversation
|
|
||
| // setScaleSet replaces the cached entry for a single VMSS, e.g. after a fresh GET. | ||
| // It copies the map before mutating it so readers that obtained the map via | ||
| // getScaleSets() are not exposed to a concurrent map write. |
There was a problem hiding this comment.
What's the ratio of reads/writes?
If # reads is orders of magnitude more common than writes, returning the internal map might be a good performance optimization - but if they're comparable, returning a clone via getScaleSets() might be advisable to prevent concurrency issues.
There was a problem hiding this comment.
Reads dominate because every reconcile loop reads the scale sets, while writes happen only on a GET refresh or right after a capacity update. So returning the live map and paying the copy cost on the rare write (copy-on-write in setScaleSet) is probably the cheaper tradeoff.
| EnableLabelPredictionsOnTemplate bool `json:"enableLabelPredictionsOnTemplate,omitempty" yaml:"enableLabelPredictionsOnTemplate,omitempty"` | ||
|
|
||
| // EnableVMSSEtag sends the cached VMSS ETag as If-Match on capacity-changing | ||
| // VMSS PUTs so concurrent modifications are rejected with 412 instead of overwritten. |
There was a problem hiding this comment.
This seems a positive with very few downsides, so I'm curious why the default is off - is this just to ensure people opt-in, or is there more to it? Might this default to on in the future?
There was a problem hiding this comment.
I guess I was just playing it safe, since it changes behavior on the capacity-update path. We can make it on by default or just do away with the flag if you think that's preferable.
There was a problem hiding this comment.
I would lean towards on by default with a flag that allows it to be disabled if there's a problem, with the flag being removed in a later release.
There was a problem hiding this comment.
Done, it's enabled by default now.
| // A successful PUT changes the server-side ETag. Adopt the new one returned by | ||
| // the operation so a follow-up PUT before the next cache refresh still carries a | ||
| // valid If-Match rather than overwriting concurrent changes or hitting a 412. | ||
| if scaleSet.manager.config.EnableVMSSEtag && resp.Etag != nil { |
There was a problem hiding this comment.
What should happen if EnableVMSSEtag is true but the returned Etag is nil?
I suspect this is in the category of "things that should (almost) never happen", but if it does, we currently retain the current ETag. Is this the right behaviour?
There was a problem hiding this comment.
Azure should always return an ETag here, so this is the "shouldn't happen" path, but if it does, keeping the previous ETag seems like the safe choice.
The next PUT then sends a stale If-Match and gets a 412, which we already handle by invalidating and re-planning. The alternative (clearing it) would make the next PUT unconditional and risk silently overwriting a concurrent change, which is what this feature is meant to prevent.
|
/label tide/merge-method-squash |
|
/test pull-autoscaling-e2e-gci-gce-ca-test GCE flake as far as I can tell. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mboersma, theunrepentantgeek The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/retest |
|
@jackfrancis @tallaxes please take a look when time permits. |
|
/triage accepted |
| klog.Errorf("virtualMachineScaleSetsClient.BeginCreateOrUpdate for scale set %q failed: %+v", scaleSet.Name, err) | ||
| if isPreconditionFailedError(err) { | ||
| klog.V(2).Infof("VMSS %s update rejected by ETag precondition; invalidating cache to re-plan next loop", scaleSet.Name) | ||
| scaleSet.invalidateInstanceCache() |
There was a problem hiding this comment.
I think for safety if we are in this ETAG precondition failure condition we also need to invalidate the size "cache" as well.
// lastSizeRefresh is the time curSize was last refreshed from vmss.Sku.Capacity.
// Together with sizeRefreshPeriod, it is used to determine if it is time to refresh curSize.
Seems like we want to mark the ScaleSet as such: "as of now it is time to refresh curSize". We can either do something like this literally:
scaleSet.lastSizeRefresh = time.Now().Add(-1 * scaleSet.sizeRefreshPeriod)
Or create a new function to do it (we can't invoke invalidateLastSizeRefreshWithLock because we are in the flow of initCreateOrUpdate which already holds the sizeMutex lock.
There was a problem hiding this comment.
Copilot came up w/ this add'l UT that would that an out-of-band VMSS capacity update would slip through (we've invalidated the instance but the VMSS size reflects the pre-invalidated cache result):
func TestScaleSetETagPreconditionFailureInvalidatesTargetSize(t *testing.T) {
ctrl := gomock.NewController(t)
defer ctrl.Finish()
manager := newTestAzureManager(t)
manager.config.EnableVMSSEtag = true
vmssName := "vmss-etag-stale"
orchMode := armcompute.OrchestrationModeUniform
manager.azureCache.setScaleSet(vmssName, &armcompute.VirtualMachineScaleSet{
Name: ptr.To(vmssName),
SKU: &armcompute.SKU{Capacity: ptr.To[int64](3)},
Properties: &armcompute.VirtualMachineScaleSetProperties{
OrchestrationMode: &orchMode,
},
Etag: ptr.To(`W/"old"`),
})
mockDeleteClient := NewMockVMSSDeleteClient(ctrl)
mockDeleteClient.EXPECT().
BeginCreateOrUpdate(gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any()).
Return(nil, &azcore.ResponseError{StatusCode: http.StatusPreconditionFailed})
manager.azClient.vmssClientForDelete = mockDeleteClient
scaleSet := newTestScaleSet(manager, vmssName)
scaleSet.sizeRefreshPeriod = manager.azureCache.refreshInterval
err := scaleSet.IncreaseSize(1)
assert.Error(t, err)
manager.azureCache.setScaleSet(vmssName, &armcompute.VirtualMachineScaleSet{
Name: ptr.To(vmssName),
SKU: &armcompute.SKU{Capacity: ptr.To[int64](5)},
Properties: &armcompute.VirtualMachineScaleSetProperties{
OrchestrationMode: &orchMode,
},
Etag: ptr.To(`W/"new"`),
})
target, err := scaleSet.TargetSize()
assert.NoError(t, err)
assert.Equal(t, 5, target)
}| klog.Errorf("virtualMachineScaleSetsClient.BeginCreateOrUpdate for scale set %q failed: %+v", scaleSet.Name, err) | ||
| if isPreconditionFailedError(err) { | ||
| klog.V(2).Infof("VMSS %s update rejected by ETag precondition; invalidating cache to re-plan next loop", scaleSet.Name) | ||
| scaleSet.invalidateInstanceCache() |
There was a problem hiding this comment.
Copilot came up w/ this add'l UT that would that an out-of-band VMSS capacity update would slip through (we've invalidated the instance but the VMSS size reflects the pre-invalidated cache result):
func TestScaleSetETagPreconditionFailureInvalidatesTargetSize(t *testing.T) {
ctrl := gomock.NewController(t)
defer ctrl.Finish()
manager := newTestAzureManager(t)
manager.config.EnableVMSSEtag = true
vmssName := "vmss-etag-stale"
orchMode := armcompute.OrchestrationModeUniform
manager.azureCache.setScaleSet(vmssName, &armcompute.VirtualMachineScaleSet{
Name: ptr.To(vmssName),
SKU: &armcompute.SKU{Capacity: ptr.To[int64](3)},
Properties: &armcompute.VirtualMachineScaleSetProperties{
OrchestrationMode: &orchMode,
},
Etag: ptr.To(`W/"old"`),
})
mockDeleteClient := NewMockVMSSDeleteClient(ctrl)
mockDeleteClient.EXPECT().
BeginCreateOrUpdate(gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any()).
Return(nil, &azcore.ResponseError{StatusCode: http.StatusPreconditionFailed})
manager.azClient.vmssClientForDelete = mockDeleteClient
scaleSet := newTestScaleSet(manager, vmssName)
scaleSet.sizeRefreshPeriod = manager.azureCache.refreshInterval
err := scaleSet.IncreaseSize(1)
assert.Error(t, err)
manager.azureCache.setScaleSet(vmssName, &armcompute.VirtualMachineScaleSet{
Name: ptr.To(vmssName),
SKU: &armcompute.SKU{Capacity: ptr.To[int64](5)},
Properties: &armcompute.VirtualMachineScaleSetProperties{
OrchestrationMode: &orchMode,
},
Etag: ptr.To(`W/"new"`),
})
target, err := scaleSet.TargetSize()
assert.NoError(t, err)
assert.Equal(t, 5, target)
}
What type of PR is this?
/kind feature
/area provider/azure
What this PR does / why we need it:
Adds optional ETag-based optimistic concurrency to the Azure VMSS capacity update path. When enabled, the cached VMSS ETag is sent as
If-Matchon capacity PUTs, so a concurrent modification is rejected with HTTP 412 instead of silently overwritten. On 412 the autoscaler refreshes its caches and re-plans on the next loop. On by default (AZURE_ENABLE_VMSS_ETAG/enableVMSSEtag).Which issue(s) this PR fixes:
Fixes #2581
Special notes for your reviewer:
Does this PR introduce a user-facing change?