cmd/compile/internal/arm64: fuse adjacent spill/reload STR/LDR into STP/LDP by gaul · Pull Request #79689 · golang/go

gaul · 2026-05-27T06:24:23Z

The SSA pair pass (cmd/compile/internal/ssa/pair.go) runs before regalloc
and only sees source-level loads and stores. Spill/reload code that
regalloc inserts later for OpStoreReg/OpLoadReg becomes individual STR/LDR
instructions that never get a chance to be paired, even when two spills
target adjacent 8-byte stack slots.

Fuse those pairs as the final step of code generation, in the compiler
rather than the assembler. A new ssagen.ArchInfo.SSAGenFinish hook runs
after genssa has emitted all of a function's Progs, resolved its branch
and jump-table targets, and finalized the frame size in defframe (so the
register-argument spills defframe inserts participate too); on arm64 it
walks the Prog list and rewrites strictly-adjacent AMOVD spill pairs that
share a base register and have consecutive 8-byte offsets into a single
ASTP/ALDP. The second Prog is reduced to a 0-byte ANOP rather than
unlinked so that branch targets referencing it remain valid. The pass is
skipped under -N to keep unoptimized builds unoptimized. Doing this in
the compiler keeps the assembler a simple translator.

Fusion is gated on several safety and profitability conditions:

same base register and same Addr.Name (AUTO or PARAM, the only
classes spill slots use), distinct destination registers (LDP with
Rt1 == Rt2 is CONSTRAINED UNPREDICTABLE), and no pre/post-index or
register-offset addressing
the resolved offset must encode in LDP/STP's signed 7-bit scaled
immediate, [-512, 504]: the assembler rewrites an AUTO offset to
off+framesize+8 and a PARAM offset to off+framesize+24 or +32
depending on frame alignment (off+8 in a frameless leaf, which is
not decided until assembly, so a frameless PARAM must fit both).
Checking the resolved value against the final frame size both
admits deep spill slots in large frames, where spills are most
common, and refuses fusions that would need an
assembler-synthesized address (ADD + LDP), which is no smaller
than the original pair and serializes through REGTMP
the first load of a pair must not write the base register: executed
sequentially, the second load computes its address from the
just-loaded value, while LDP computes both addresses from the
original base
the second instruction is not a branch or jump-table target
(otherwise paths that jump directly to it would skip the work the
LDP/STP now does at the first instruction's position) and does not
carry a statement boundary: genssa promotes instructions it reuses
as inline marks to statements, and inline marks must never become
zero-sized, while plain statement boundaries must keep their line
table entries

The prologue's register-argument spills around morestack, which the
assembler inserts during preprocess, are already emitted as STP/LDP
pairs (CL 621556).

TestPairSpills in cmd/compile/internal/arm64 drives pairSpills directly
with hand-constructed Prog chains, asserting the fused operands and
covering each fusion path and each gating condition.
test/codegen/memcombine.go pins down the spill/reload pattern that the
SSA pair pass misses but this pass catches.
test/fixedbugs/spillreload_arm64_pair.go exercises the conditional-call-
with-adjacent-reloads pattern from runtime.schedule that miscompiled
before the branch-target check was added. BenchmarkSpillReloadPair in
cmd/compile/internal/test improves from about 1.92 to 1.78 ns/op on an
Apple M4 Max (~7%).

armlint reports that adjacent STR/LDR pairings drop from 4354 -> 235 on
gofmt (94.6% reduction) and 26022 -> 772 on cmd/go (97.0% reduction).
The text section shrinks by 16720 bytes (1.40%) on gofmt and 101312
bytes (1.52%) on cmd/go.

gopherbot · 2026-05-27T06:42:42Z

This PR (HEAD: 8da7882) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/783660.

Important tips:

Don't comment on this PR. All discussion takes place in Gerrit.
You need a Gmail or other Google account to log in to Gerrit.
To change your code in response to feedback:
- Push a new commit to the branch used by your GitHub PR.
- A new "patch set" will then appear in Gerrit.
- Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
- Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
- Multiple commits in the PR will be squashed by GerritBot.
The title and description of the GitHub PR are used to construct the final commit message.
- Edit these as needed via the GitHub web interface (not via Gerrit or git).
- You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

gopherbot · 2026-05-27T06:57:26Z

Message from Gopher Robot:

Patch Set 1:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/783660.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2026-05-29T17:48:21Z

Message from Keith Randall:

Patch Set 1:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/783660.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2026-06-02T19:07:23Z

Message from Cherry Mui:

Patch Set 1:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/783660.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2026-06-04T16:53:54Z

Message from Andrew Gaul:

Patch Set 1:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/783660.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2026-06-04T16:54:05Z

This PR (HEAD: 8a00d61) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/783660.

Important tips:

Don't comment on this PR. All discussion takes place in Gerrit.
You need a Gmail or other Google account to log in to Gerrit.
To change your code in response to feedback:
- Push a new commit to the branch used by your GitHub PR.
- A new "patch set" will then appear in Gerrit.
- Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
- Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
- Multiple commits in the PR will be squashed by GerritBot.
The title and description of the GitHub PR are used to construct the final commit message.
- Edit these as needed via the GitHub web interface (not via Gerrit or git).
- You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

gopherbot · 2026-06-09T21:37:28Z

Message from Keith Randall:

Patch Set 2:

(16 comments)

Please don’t reply on this GitHub thread. Visit golang.org/cl/783660.
After addressing review feedback, remember to publish your drafts!

…TP/LDP The SSA pair pass (cmd/compile/internal/ssa/pair.go) runs before regalloc and only sees source-level loads and stores. Spill/reload code that regalloc inserts later for OpStoreReg/OpLoadReg becomes individual STR/LDR instructions that never get a chance to be paired, even when two spills target adjacent 8-byte stack slots. Fuse those pairs as the final step of code generation, in the compiler rather than the assembler. A new ssagen.ArchInfo.SSAGenFinish hook runs after genssa has emitted all of a function's Progs, resolved its branch and jump-table targets, and finalized the frame size in defframe (so the register-argument spills defframe inserts participate too); on arm64 it walks the Prog list and rewrites strictly-adjacent AMOVD spill pairs that share a base register and have consecutive 8-byte offsets into a single ASTP/ALDP. The second Prog is reduced to a 0-byte ANOP rather than unlinked so that branch targets referencing it remain valid. The pass is skipped under -N to keep unoptimized builds unoptimized. Doing this in the compiler keeps the assembler a simple translator. Fusion is gated on several safety and profitability conditions: - same base register and same Addr.Name (AUTO or PARAM, the only classes spill slots use), distinct destination registers (LDP with Rt1 == Rt2 is CONSTRAINED UNPREDICTABLE), and no pre/post-index or register-offset addressing - the resolved offset must encode in LDP/STP's signed 7-bit scaled immediate, [-512, 504]: the assembler rewrites an AUTO offset to off+framesize+8 and a PARAM offset to off+framesize+24 or +32 depending on frame alignment (off+8 in a frameless leaf, which is not decided until assembly, so a frameless PARAM must fit both). Checking the resolved value against the final frame size both admits deep spill slots in large frames, where spills are most common, and refuses fusions that would need an assembler-synthesized address (ADD + LDP), which is no smaller than the original pair and serializes through REGTMP - the first load of a pair must not write the base register: executed sequentially, the second load computes its address from the just-loaded value, while LDP computes both addresses from the original base - the second instruction is not a branch or jump-table target (otherwise paths that jump directly to it would skip the work the LDP/STP now does at the first instruction's position) and does not carry a statement boundary: genssa promotes instructions it reuses as inline marks to statements, and inline marks must never become zero-sized, while plain statement boundaries must keep their line table entries The prologue's register-argument spills around morestack, which the assembler inserts during preprocess, are already emitted as STP/LDP pairs (CL 621556). TestPairSpills in cmd/compile/internal/arm64 drives pairSpills directly with hand-constructed Prog chains, asserting the fused operands and covering each fusion path and each gating condition. test/codegen/memcombine.go pins down the spill/reload pattern that the SSA pair pass misses but this pass catches. test/fixedbugs/spillreload_arm64_pair.go exercises the conditional-call- with-adjacent-reloads pattern from runtime.schedule that miscompiled before the branch-target check was added. BenchmarkSpillReloadPair in cmd/compile/internal/test improves from about 1.92 to 1.78 ns/op on an Apple M4 Max (~7%). armlint reports that adjacent STR/LDR pairings drop from 4354 -> 235 on gofmt (94.6% reduction) and 26022 -> 772 on cmd/go (97.0% reduction). The text section shrinks by 16720 bytes (1.40%) on gofmt and 101312 bytes (1.52%) on cmd/go.

gopherbot · 2026-06-11T19:23:50Z

Message from Andrew Gaul:

Patch Set 2:

(17 comments)

Please don’t reply on this GitHub thread. Visit golang.org/cl/783660.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2026-06-11T19:23:58Z

This PR (HEAD: b153dbd) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/783660.

Important tips:

Don't comment on this PR. All discussion takes place in Gerrit.
You need a Gmail or other Google account to log in to Gerrit.
To change your code in response to feedback:
- Push a new commit to the branch used by your GitHub PR.
- A new "patch set" will then appear in Gerrit.
- Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
- Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
- Multiple commits in the PR will be squashed by GerritBot.
The title and description of the GitHub PR are used to construct the final commit message.
- Edit these as needed via the GitHub web interface (not via Gerrit or git).
- You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

hbrooks mentioned this pull request May 28, 2026

cmd/internal/obj/arm64: fuse adjacent spill/reload LDR/STR into LDP/STP ellipsis-dev-test/go#9

Open

gaul force-pushed the arm64/ldp-stp branch 2 times, most recently from bcc3282 to 8a00d61 Compare June 4, 2026 16:43

gaul changed the title ~~cmd/internal/obj/arm64: fuse adjacent spill/reload LDR/STR into LDP/STP~~ cmd/compile/internal/arm64: fuse adjacent spill/reload STR/LDR into STP/LDP Jun 11, 2026

gaul force-pushed the arm64/ldp-stp branch from 8a00d61 to b153dbd Compare June 11, 2026 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile/internal/arm64: fuse adjacent spill/reload STR/LDR into STP/LDP#79689

cmd/compile/internal/arm64: fuse adjacent spill/reload STR/LDR into STP/LDP#79689
gaul wants to merge 1 commit into
golang:masterfrom
gaul:arm64/ldp-stp

gaul commented May 27, 2026 •

edited

Loading

Uh oh!

gopherbot commented May 27, 2026

Uh oh!

gopherbot commented May 27, 2026

Uh oh!

gopherbot commented May 29, 2026

Uh oh!

gopherbot commented Jun 2, 2026

Uh oh!

gopherbot commented Jun 4, 2026

Uh oh!

gopherbot commented Jun 4, 2026

Uh oh!

gopherbot commented Jun 9, 2026

Uh oh!

gopherbot commented Jun 11, 2026

Uh oh!

gopherbot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gaul commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gopherbot commented May 27, 2026

Uh oh!

gopherbot commented May 27, 2026

Uh oh!

gopherbot commented May 29, 2026

Uh oh!

gopherbot commented Jun 2, 2026

Uh oh!

gopherbot commented Jun 4, 2026

Uh oh!

gopherbot commented Jun 4, 2026

Uh oh!

gopherbot commented Jun 9, 2026

Uh oh!

gopherbot commented Jun 11, 2026

Uh oh!

gopherbot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gaul commented May 27, 2026 •

edited

Loading