Move Tables: throttle based on target credentials#1709
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to make move-tables mode throttle against the target cluster by (a) using target credentials when checking control replicas for lag and (b) reading status variables from the appropriate DB connection in move-tables mode.
Changes:
- Add a helper in the throttler to choose control-replica connection credentials based on whether move-tables mode is active.
- Update
Applier.ShowStatusVariableto query status variables from the move-tables target DB connection in move-tables mode. - Add/extend unit tests for the above behavior and clarify CLI help for
--throttle-control-replicasin move-tables mode.
Show a summary per file
| File | Description |
|---|---|
| go/logic/throttler.go | Chooses control-replica credentials from the move-tables target config when in move-tables mode. |
| go/logic/throttler_test.go | Adds a focused test covering the connection-config selection behavior. |
| go/logic/applier.go | Switches status-variable reads to use the move-tables target DB handle when in move-tables mode. |
| go/logic/applier_test.go | Adds tests validating status-variable reads in normal and move-tables modes. |
| go/cmd/gh-ost/main.go | Updates CLI help text for --throttle-control-replicas regarding move-tables behavior. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 3
…g for move-tables mode
| if !mgtr.migrationContext.IsMoveTablesMode() { | ||
| <-mgtr.firstThrottlingCollected // replication lag | ||
| } | ||
| <-mgtr.firstThrottlingCollected // HTTP status | ||
| <-mgtr.firstThrottlingCollected // other, general metrics |
There was a problem hiding this comment.
Can you explain why only this first firstThrottlingCollected is gated behind move tables mode? Can't really tell just looking at this diff
There was a problem hiding this comment.
Oh I see down below we aren't emitting collectReplicationLag when in move tables. What's the reason for that?
There was a problem hiding this comment.
oh okay, reading into the code more is this because collectReplicationLag relies on the heartbeat table in some way? And we don't do the heartbeat in move tables
There was a problem hiding this comment.
Yes, some of the collect...Lag goroutines in throttler.go get that firstThrottlingCollected channel handed over and write to it as soon as they're set up.
As we're skipping the collectReplicationLag goroutine for move-tables, we also must not await the channel, here. Otherwise it would block forever at the initializeThrottler func.
| if !thlr.migrationContext.IsMoveTablesMode() { | ||
| go thlr.collectReplicationLag(firstThrottlingCollected) | ||
| } | ||
| go thlr.collectControlReplicasLag() |
There was a problem hiding this comment.
Do we need to gate this one behind move tables mode? Quick glance at the code of collectControlReplicasLag, it looks like it relies on the changelog table which we also don't have in move-tables mode right?
There was a problem hiding this comment.
yes, the collectReplicationLag is disabled for move-tables mode.
In that function, gh-ost typically collects the replication lag from the replica it is attached to (reading the binlog data).
So for migrations it then knows the "impact" of the copy-writes it did on the corresponding primary - and if these are causing too much stress on the replica it is attached to. It then begins to throttle automatically.
For move-tables we can't use that mechanism, as we're only connected to the target primary and not reading the binlog on the target side.
sorry misread your question. I'll look into that.
A Pull Request should be associated with an Issue.
Related issue (internal): https://github.com/github/database-infrastructure/issues/8212
Related issue (public): #1681
Description
This PR adapts the throttler to use the target credentials in move-tables mode.
Status variables are fetched from the target server and control replicas are expected to be in the target cluster.
script/cibuildreturns with no formatting errors, build errors or unit test errors.