Skip to content

net/http/internal/http2: enable SetReuseFrames in server and Transport#79634

Open
ChrisLundquist wants to merge 7 commits into
golang:masterfrom
ChrisLundquist:h2-frame-cache
Open

net/http/internal/http2: enable SetReuseFrames in server and Transport#79634
ChrisLundquist wants to merge 7 commits into
golang:masterfrom
ChrisLundquist:h2-frame-cache

Conversation

@ChrisLundquist

@ChrisLundquist ChrisLundquist commented May 23, 2026

Copy link
Copy Markdown

This stack extends Framer.SetReuseFrames to cover *WindowUpdateFrame,
*HeadersFrame, and *MetaHeadersFrame (in addition to the existing
*DataFrame), reuses the MetaHeadersFrame Fields backing array across
parses, then calls SetReuseFrames once on the per-connection Framer
in serverConn and Transport.newClientConn. Result: -31% geomean
allocations per op on the package's read-path benchmarks (about -50%
on downloads, -22% on header-bound requests), with no measurable
change on write-path benchmarks and no API change.

A new GODEBUG=http2reuseframes=0 setting is registered alongside
http2client / http2server / http2debug and reverts to the pre-CL
allocate-per-frame behavior.

Background

Framer.SetReuseFrames has shipped as an opt-in Framer method since
golang/net CL 34812 (commit bb807669, 2017-02-27, fixing #18502). It
was added for gRPC-Go, whose Framer profiling showed parseDataFrame
accounting for ~66% of allocations (1.4 GB of 2.2 GB) on a 64-conn /
100-stream / 900K RT/sec benchmark. gRPC-Go calls SetReuseFrames
directly on the Framer it owns.

No in-tree caller in the standard library has ever opted in, so the
cache has been effectively dead code from the stdlib's perspective
for nine years. This CL turns it on.

Prior attempts and reviewer concerns

This is the third attempt to enable frame reuse on the stdlib's
HTTP/2 server/Transport:

  • Issue x/net/http2: improve frame decoding performance by reusing frames #68352 (still open, 2024-07-09): a request to improve
    frame decoding performance. neild rejected a user-facing config
    option ("There's no need for a user-configurable option here.
    If there are performance improvements that can be made to frame
    decoding in the HTTP/2 implementation, we can just make them.")
    but left the door open for internal enablement.

  • golang/net CL 597455 (PR x/http/net: enable SetFrameReuse in server and client connections net#216, 2024-07-10): tried to
    enable on both server and Transport unconditionally. neild's
    blocking review: "this is not correct. The server read goroutine
    (serverConn.readFrames) reads frames and delivers them to the
    main server goroutine for processing," and "Changing the HTTP/2
    server and/or transport to use the frame cache is likely to
    require some significant refactoring." Stale; never landed.

  • golang/net CL 676235 (server-only, 2025-05-25): neild Hold+1 on
    2025-06-24: "this is not a safe change for the server.
    SetReuseFrames causes each frame to be invalidated by the next
    call to ReadFrame, but the server processes frames concurrently
    with ReadFrame calls." Abandoned 2025-10-24.

The blocker on both prior CLs was correctness, specifically retention
of frame data past the next ReadFrame. This CL addresses that with a
static retention audit, an ownership guard that turns retention bugs
into deterministic panics, race-detector tests that exercise the
gated handoff, an end-to-end stress test under -race, and a GODEBUG
escape hatch in case the analysis missed something.

Stack layout

Seven commits, each independently reviewable:

  1. reuse *WindowUpdateFrame across ReadFrame calls
  2. reuse *HeadersFrame across ReadFrame calls
  3. reuse *MetaHeadersFrame across ReadFrame calls, with an
    ownership guard on its accessor methods
  4. reuse the MetaHeadersFrame Fields slice across parses
  5. add microbenchmarks for the SetReuseFrames path
  6. add race-detector tests (positive control, adversarial negative
    control)
  7. enable SetReuseFrames in server and Transport, register
    http2reuseframes GODEBUG, add end-to-end stress test

Commits 1-3 mirror the existing *DataFrame cache pattern from CL
34812: wholesale "*p = T{...}" reset at the top of each parse
function, slot in frameCache, nil-safe getFrame getter. No
behavior change without SetReuseFrames; the caches are dormant unless
opted in.

Commit 3 also gives MetaHeadersFrame the equivalent of DataFrame's
checkValid protection: an owned bit set when readMetaFrame returns
the frame and revoked by the next ReadFrame (the frame becomes the
Framer's lastFrame and overrides invalidate). PseudoValue,
RegularFields, PseudoFields, and rfc9218Priority panic on a stale
frame, so a retention bug fails deterministically instead of
silently reading another stream's headers.

Commit 4 is where most of the HEADERS-path win comes from: the
Fields backing array stays in the cached frame and is cleared and
re-filled by the next meta parse. Two safeguards: the array is
cleared before reuse so header values that can hold secrets (Cookie,
Authorization, HPACK Sensitive fields) are released as soon as the
contract permits, and retained capacity is capped at 64 entries so a
single large HEADERS frame cannot permanently inflate per-connection
memory.

Safety analysis

Static retention audit across every code path that consumes a
reusable *Frame, plus a separate concurrency audit. Per frame type:

*DataFrame
f.Data() is copied into dataBuffer.Write
(databuffer.go:126 copy(chunk[b.w:], p)) before readMore. The
pool-allocated chunk has independent backing.

*WindowUpdateFrame
Struct has only scalar fields. Safe by construction.

*HeadersFrame
Both server and Transport set Framer.ReadMetaHeaders during init,
so a bare *HeadersFrame never reaches consumer code.
readMetaFrame clears headerFragBuf=nil and invalidates the
embedded *HeadersFrame before returning.

*MetaHeadersFrame
The Fields backing array is reused across parses, so consumers
must not retain the slice -- and cannot do so unnoticed: the
accessor methods panic once the next ReadFrame has been called,
and the adversarial race test verifies -race observes
retained-slice reads. hpack.decodeString returns string(u.b) or
buf.String(), both of which copy, so Name/Value strings are
independently allocated and alias neither readBuf nor the Fields
array. Server's processHeaders / newWriterAndRequest and
Transport's handleResponse / processTrailers iterate Fields
synchronously and copy into fresh http.Header maps before
returning. No code stores *MetaHeadersFrame past dispatch.

Concurrency:

Server: serverConn.readFrames sends each parsed frame on
readFrameCh then blocks on gate. The serve goroutine receives,
calls processFrameFromReader synchronously, then res.readMore(),
which releases the gate. The single readMore call site fires
only after the synchronous dispatch returns, so the cache is
never overwritten while consumer code still references it.

Transport: clientConnReadLoop.run reads frames in a tight loop
with no per-frame gate, so each process* function must fully
consume aliased data before returning. Verified: every byte that
escapes the loop is either copied (DataFrame -> bufPipe) or
composed of immutable, independently-allocated Go strings
(Header maps populated from hpack-decoded strings).

Spawned goroutines: runHandler (server) only sees the fresh
*ServerRequest whose Header/Trailer/URL are independently
allocated. writeFrameAsync, body-close, Shutdown, ping writers
don't touch read-side cached frames.

Two retention sites intentionally remain outside the reuse contract
and are documented in place rather than silently exempt:
processPing queues writePingAck{f}, which the write scheduler may
serve after the next ReadFrame, and processGoAway stores f in
cc.goAway, whose scalar fields are read at read-loop teardown. Both
are safe solely because PingFrame and GoAwayFrame are not part of
frameCache; the comments require copying the needed fields before
either type is ever added to the cache.

Benchmarks

End-to-end, -count=5 -benchmem, linux/amd64 i7-6770HQ @ 2.60GHz,
reuse on vs GODEBUG=http2reuseframes=0 on this tree (the GODEBUG-off
path is the pre-CL allocation behavior), via benchstat.

Allocations per op (the main win):

Benchmark                       off      on       delta     p
ClientGzip                         174       64  -63.22% 0.008
DownloadFrameSize/16k           368.6k   196.6k  -46.65% 0.008
DownloadFrameSize/64k           96.23k   49.16k  -48.91% 0.008
DownloadFrameSize/128k          48.78k   24.55k  -49.66% 0.008
DownloadFrameSize/256k          24.39k   12.27k  -49.70% 0.008
DownloadFrameSize/512k          12,124    6,288  -48.14% 0.008
ClientRequestHeaders/0              50       39  -22.00% 0.008
ClientRequestHeaders/10             74       62  -16.22% 0.008
ClientResponseHeaders/0             50       39  -22.00% 0.008
ClientResponseHeaders/10           115      102  -11.30% 0.008
geomean (allocs/op)                            -30.83%

Write-side benchmarks (WriteScheduler*, WriteQueue) are unchanged
within noise; those paths don't touch ReadFrame. Header-bound
benchmarks show small significant latency wins (-1.4% to -2.6% at
0/10 headers); download throughput is flat within socket noise.

Per-cache attribution from microbenchmarks added in commit 5:

Cache                  Default                   Reused
ParseDataFrame         114 ns / 48 B / 1 alloc    75 ns / 0 / 0
ParseWindowUpdateFrame 103 ns / 16 B / 1 alloc    81 ns / 0 / 0
ParseHeadersFrame      128 ns / 48 B / 1 alloc    84 ns / 0 / 0
ReadMetaFrame         1073 ns / 472 B / 9 allocs 706 ns / 88 B / 4

The residual 4 allocations in ReadMetaFrame/Reused come from the
SetEmitFunc closure and its boxed captures; converting the closure
to a Framer method is a separate follow-up out of scope for this
stack.

Test plan

frame_reuse_race_test.go (package http2, synthetic Framer-pair
tests, each run in a "meta" variant where HEADERS surface as
*MetaHeadersFrame and a "raw" variant where HEADERS/CONTINUATION
surface individually and HeaderBlockFragment aliases the read
buffer):

  • TestFrameReuseRaceCorrect: positive control. Models
    serverConn.readFrames with a reader goroutine, a channel, a
    per-frame gate, and a consumer that fully consumes payload
    before signalling done. 800/1000 frames across DATA,
    WINDOW_UPDATE, HEADERS, HEADERS+CONTINUATION. Must PASS under
    -race.

  • TestFrameReuseRaceAdversarial: negative control, gated behind
    H2_REUSE_RACE_NEGATIVE=1. Same plumbing, but deliberately leaks
    slices into a sidecar goroutine across the next ReadFrame. Must
    RACE under -race; verified the detector fires for every
    retained case: Data, HeaderBlockFragment, and the Fields slice
    (which races against the next meta parse's clear-and-append).

frame_reuse_e2e_test.go (package http2_test, real server +
Transport, lands with the enablement commit):

  • TestFrameReuseEndToEndStress: 8 workers x 25 multiplexed
    requests through the real httptest-based HTTP/2 server and
    net/http.Transport, both running with SetReuseFrames enabled.
    Handler and client touch every byte of every header name/value
    and trailer in both directions, drain bodies. Regression test
    against future refactors of processHeaders / handleResponse /
    processTrailers / processData that fail to copy aliased data.

Per-type unit tests land with each cache commit: pointer-stability
and no-alloc tests, poison-overwrite tests for stale-field leaks,
distinct-pointer tests for the reuse-off contract, accessor-panic
tests for the ownership guard, and Fields tests for backing-array
reuse, secret-clearing on reuse, and the retention cap.

Validation runs (linux/amd64):

Run                                              Result
net/http/internal/http2 -race                    PASS
every commit in the stack: net/http/internal/http2  PASS
net/http                                         PASS
internal/godebugs, runtime/metrics               PASS
TestFrameReuseRaceCorrect (meta+raw) -race       PASS
TestFrameReuseRaceAdversarial (meta+raw) under
    H2_REUSE_RACE_NEGATIVE=1 -race               RACE (asserted)
TestFrameReuseEndToEndStress -race               PASS
GODEBUG=http2reuseframes=0 + reuse/e2e tests
    -race                                        PASS

GODEBUG escape hatch

Registered alongside the existing http2 GODEBUGs in
src/internal/godebugs/table.go, with Changed: 27, Old: "0":

Value                              Behavior
unset or http2reuseframes=1        reuse on (default)
GODEBUG=http2reuseframes=0         reuse off (pre-CL behavior)
                                   increments
                                   /godebug/non-default-behavior/
                                   http2reuseframes:events

Documented in doc/godebug.md under the Go 1.27 section. Per the
usual GODEBUG compatibility policy, programs in modules whose go.mod
declares a Go version below 1.27 default to http2reuseframes=0 and
keep the pre-CL behavior. The check-and-count logic lives in a
single Framer helper (setReuseFramesFromGODEBUG) called by both the
server and the Transport.

The intent is a compat hatch, not a tuning knob: operators can
disable without recompiling if the static analysis and race coverage
missed something. This addresses the safety concerns from prior CLs
while staying consistent with the preference against user-facing
performance knobs (this is purely a safety/compat escape).

Compatibility

  • No API change.
  • No change in public behavior; the read loop returns the same
    parsed frame values, only the per-call heap allocation is
    elided.
  • internal/http2 is unimportable from user code; the Framer type
    and SetReuseFrames method remain unreachable from outside the
    stdlib.

Updates #18502
Updates #68352

@google-cla

google-cla Bot commented May 23, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@ChrisLundquist

Copy link
Copy Markdown
Author

I guess I'll amend the commits
image

@ChrisLundquist ChrisLundquist marked this pull request as ready for review May 23, 2026 18:40
@gopherbot

Copy link
Copy Markdown
Contributor

This PR (HEAD: 4382c90) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/782280.

Important tips:

  • Don't comment on this PR. All discussion takes place in Gerrit.
  • You need a Gmail or other Google account to log in to Gerrit.
  • To change your code in response to feedback:
    • Push a new commit to the branch used by your GitHub PR.
    • A new "patch set" will then appear in Gerrit.
    • Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
    • Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
    • Multiple commits in the PR will be squashed by GerritBot.
  • The title and description of the GitHub PR are used to construct the final commit message.
    • Edit these as needed via the GitHub web interface (not via Gerrit or git).
    • You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
  • See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

@gopherbot

Copy link
Copy Markdown
Contributor

Message from Gopher Robot:

Patch Set 1:

Congratulations on opening your first change. Thank you for your contribution!

Next steps:
A maintainer will review your change and provide feedback. See
https://go.dev/doc/contribute#review for more info and tips to get your
patch through code review.

Most changes in the Go project go through a few rounds of revision. This can be
surprising to people new to the project. The careful, iterative review process
is our way of helping mentor contributors and ensuring that their contributions
have a lasting impact.

During May-July and Nov-Jan the Go project is in a code freeze, during which
little code gets reviewed or merged. If a reviewer responds with a comment like
R=go1.11 or adds a tag like "wait-release", it means that this CL will be
reviewed as part of the next development cycle. See https://go.dev/s/release
for more details.


Please don’t reply on this GitHub thread. Visit golang.org/cl/782280.
After addressing review feedback, remember to publish your drafts!

ChrisLundquist and others added 7 commits June 9, 2026 20:33
Mirror DataFrame's existing reuse pattern for WindowUpdateFrame: add
the struct as a sibling field on frameCache, expose a nil-safe
getWindowUpdateFrame, and have parseWindowUpdateFrame populate the
cached struct. Reuse is opt-in via Framer.SetReuseFrames, matching
DataFrame's existing API contract; without SetReuseFrames,
parseWindowUpdateFrame returns a freshly-allocated *WindowUpdateFrame
and behavior is identical to before this CL.

The cached struct is reset with a composite literal rather than
field-by-field assignment, so every field — including any added in
the future — is overwritten on each parse by construction, and stale
data from a previous frame cannot leak to the next caller.

When SetReuseFrames is in effect, both in-tree consumers extract the
StreamID and Increment fields synchronously before the next
ReadFrame call:

  * On the server, (*serverConn).processWindowUpdate runs on the
    serve goroutine. The readFrames goroutine that calls ReadFrame
    blocks on res.readMore() until the previous frame has been fully
    processed.

  * On the client, (*clientConnReadLoop).processWindowUpdate runs
    inline within readLoop's frame-handling switch and returns
    before the next ReadFrame call.

WindowUpdateFrame parses dominate the heap-allocation profile for
large transfers: the receiver sends one WINDOW_UPDATE per ~16 KiB of
data at both the stream and connection levels.

BenchmarkParseWindowUpdateFrame (added in a later CL in this stack)
measures the cache's contribution on linux/amd64:

  variant           B/op   allocs/op   ns/op
  Default           16     1           ~95
  Reused             0     0           ~77   (-1 alloc, -19%)

Also drop the stale "SetReuseFrames only currently implements reuse
of DataFrames" comments in frame_test.go, which this stack makes
untrue.

Four tests guard the change:

  TestReadFrameReusesWindowUpdate: same pointer across consecutive
  WINDOW_UPDATE parses when SetReuseFrames is in effect.

  TestReadFrameWindowUpdateNoAllocsWhenReused: locks in zero
  allocations per parse with SetReuseFrames.

  TestReadFrameWindowUpdateOverwrites: defensive — poisons the
  cached struct and confirms every field is re-assigned by the
  next parse (guards against regression to a partial reset).

  TestReadFrameWindowUpdateDistinctWithoutReuse: pre-SetReuseFrames
  API contract — distinct pointers per parse, no mutation across
  calls.

Change-Id: Ibcef724f7bc2ee2bd1a74cb2577d61e3117a872e
Mirror DataFrame's existing reuse pattern for HeadersFrame: add it
as a sibling field on frameCache, expose a nil-safe getHeadersFrame,
and have parseHeadersFrame populate the cached struct in place.
Reuse is opt-in via Framer.SetReuseFrames, matching DataFrame and
the prior WindowUpdateFrame CL on this branch; without
SetReuseFrames, parseHeadersFrame allocates a fresh *HeadersFrame
and behavior is identical to before this CL.

The explicit `*hf = HeadersFrame{FrameHeader: fh}` reset at the top
of parseHeadersFrame zeros every field so values from a previous
parse (Priority, headerFragBuf, Flags carried by FrameHeader)
cannot leak into the next caller. The later flag-gated assignments
fill Priority only when FlagHeadersPriority is set.

Reuse safety, when SetReuseFrames is in effect:

  * In net/http's internal use of this package, the *HeadersFrame
    returned by parseHeadersFrame is wrapped into a MetaHeadersFrame
    by readMetaFrame within the same ReadFrame call. The
    MetaHeadersFrame's embedded *HeadersFrame is then invalidated
    (Header().valid = false) before the next ReadFrame.

  * The headerFragBuf slice already aliases the framer's read
    buffer in both the cached and fresh variants, matching the
    existing DataFrame contract.

Every request on every HTTP/2 connection triggers exactly one
parseHeadersFrame call on each side, so when SetReuseFrames is in
effect this allocation is removed from the hottest of hot paths.

BenchmarkParseHeadersFrame (added in a later CL in this stack)
measures the cache's contribution on linux/amd64:

  variant           B/op   allocs/op   ns/op
  Default           48     1           ~125
  Reused             0     0           ~86   (-1 alloc, -31%)

Three tests guard the change:

  TestReadFrameReusesHeadersFrame: same *HeadersFrame pointer
  across consecutive HEADERS parses when SetReuseFrames is in
  effect.

  TestReadFrameHeadersOverwrites: defensive — parses a HEADERS
  frame with Priority + padding then a HEADERS frame with neither,
  asserts Priority and Flags do not bleed.

  TestReadFrameHeadersDistinctWithoutReuse: pre-SetReuseFrames API
  contract — distinct *HeadersFrame pointers per parse, no
  mutation across calls.

Change-Id: Ieacc0f4cb14454169d87a533b9ab06e01142f809
Mirror DataFrame's existing reuse pattern for the MetaHeadersFrame
wrapper produced by readMetaFrame: store it as a sibling field on
frameCache, expose a nil-safe getMetaHeadersFrame, and have
readMetaFrame populate the cached struct in place. Reuse is opt-in
via Framer.SetReuseFrames, matching DataFrame, WindowUpdateFrame,
and HeadersFrame on this branch; without SetReuseFrames,
readMetaFrame allocates a fresh *MetaHeadersFrame and behavior is
identical to before this CL.

The explicit `*mh = MetaHeadersFrame{HeadersFrame: hf}` reset at the
top of readMetaFrame ensures Fields, Truncated, and any future-added
field are zeroed so values from a previous (possibly aborted) parse
cannot leak into the next caller.

Because the cached struct is overwritten in place, a consumer that
retains the frame past the next ReadFrame would otherwise silently
read another stream's headers — a strictly worse failure mode than
DataFrame's, whose Data accessor panics via checkValid on a stale
frame. Give MetaHeadersFrame the equivalent protection: an owned bit
that readMetaFrame sets before returning and the next ReadFrame
clears (readMetaFrame now leaves the returned frame as the Framer's
lastFrame, and MetaHeadersFrame overrides invalidate to revoke the
bit), checked by PseudoValue, RegularFields, PseudoFields, and
rfc9218Priority. The embedded HeadersFrame's valid bit cannot serve
this purpose because readMetaFrame clears it when the header block
fragment is consumed. Like DataFrame's guard, the protection is
exact without SetReuseFrames and best-effort with it (a retained
pointer becomes owned again once the cache is repopulated by a later
meta parse).

Reuse safety, when SetReuseFrames is in effect:

  * The two consumers ((*serverConn).processHeaders and
    (*clientConnReadLoop).processHeaders) extract PseudoValue,
    RegularFields, and Truncated synchronously and never retain
    the *MetaHeadersFrame past the next ReadFrame.

  * The embedded HeadersFrame is the cached one from the prior CL
    on this branch; the *MetaHeadersFrame wrapper is one more level
    of the same sharing, with the same single-Framer-goroutine
    safety.

BenchmarkReadMetaFrame (added in a later CL in this stack) measures
the combined HF+MHF cache contribution on linux/amd64:

  variant           B/op    allocs/op   ns/op
  Default           472     9           ~1062
  Reused            376     7           ~1004  (-2 allocs, -5%)

The -2 allocs reflect the *HeadersFrame (saved by the HF CL) and the
*MetaHeadersFrame (saved here). The remaining 7 allocations come
from HPACK Fields-slice growth (addressed by the next CL in this
stack) and the SetEmitFunc closure (a separate follow-up).

Four tests guard the change:

  TestReadMetaFrameReusesMetaHeadersFrame: same *MetaHeadersFrame
  pointer across consecutive parses when SetReuseFrames is in
  effect.

  TestReadMetaFrameMetaHeadersOverwrites: defensive — first parse
  triggers Truncated, second parse fits; Truncated does not leak.

  TestReadMetaFrameDistinctWithoutReuse: pre-SetReuseFrames API
  contract — distinct *MetaHeadersFrame pointers per parse.

  TestMetaHeadersFrameAccessorPanicsWhenStale: accessor methods
  panic once the next ReadFrame has been called, with and without
  SetReuseFrames.

Change-Id: I93e6916075b1110137eada96b8017bc38b700165
…arses

After the MetaHeadersFrame struct cache, readMetaFrame's largest
remaining per-parse allocation is the Fields slice: for a typical
4-header HEADERS frame, mh.Fields = append(mh.Fields, hf) grows a
nil-backed slice through cap 1, 2, 4 — three backing-array
allocations per parse.

Keep the backing array in the cached MetaHeadersFrame instead: at the
start of readMetaFrame, capture mh.Fields before the wholesale struct
reset and restore it length-zero. Without SetReuseFrames, mh is a
fresh struct with nil Fields, so every parse still gets a fresh array
and behavior is unchanged.

Two safeguards keep the cache from holding header data longer than
intended:

  * Before reuse, the entire backing array is cleared with
    clear(fields[:cap(fields)]). Header values can contain secrets
    (Cookie, Authorization, HPACK Sensitive=true fields); without the
    clear, the slots beyond the new parse's length would keep those
    strings reachable for the lifetime of the connection.

  * The retained capacity is capped at maxRetainedMetaFields (64)
    entries. A single large HEADERS frame would otherwise permanently
    inflate per-Framer memory; above the cap, the array is dropped at
    the start of the next parse, which then pays the modest growth
    cost again.

Reuse safety beyond memory hygiene:

  * The MetaHeadersFrame.Fields documentation already says the
    underlying slice is owned by the Framer and must not be retained
    after the next call to ReadFrame, and the accessor methods panic
    on a stale frame (previous CL in this stack).

  * The two consumers ((*serverConn).processHeaders calling
    newWriterAndRequest, and (*clientConnReadLoop).processHeaders
    calling handleResponse) iterate the fields and copy each header
    into their own http.Header map before returning, synchronously
    within their single-frame-at-a-time loops.

  * The hpack decoder hands out independent Go strings for each
    HeaderField's Name and Value, so reusing the slice's backing
    array does not invalidate strings that earlier callers copied
    out.

Measured with testing.AllocsPerRun on linux/amd64, parsing a 4-field
HEADERS frame with ReadMetaHeaders set:

  without SetReuseFrames:  9 allocs/op
  with, before this CL:    7 allocs/op
  with, after this CL:     4 allocs/op  (-3)

The remaining 4 allocations come from the SetEmitFunc closure and its
boxed captures, which are a separate follow-up and deliberately not
part of this branch.

Three tests guard the change:

  TestReadMetaFrameReusesFieldsSlice: the backing array is stable
  across equal-size parses, including after growth.

  TestReadMetaFrameFieldsClearsSensitiveTail: after a shorter parse,
  no slot beyond the new length still references the previous
  parse's strings.

  TestReadMetaFrameFieldsRetentionCap: an array grown past
  maxRetainedMetaFields is dropped, not retained.

Change-Id: I63cc1afe39636421387a468f35684b43b7a67c2c
Add direct parser microbenchmarks that exercise the SetReuseFrames-
enabled paths so the reuse savings introduced earlier in this stack
are observable per-cache. All four share one helper
(benchmarkReadFrameReuse) that pre-encodes the frame, performs a
warm-up read so the Framer's read-buffer growth stays out of the
measurement, and runs Default and Reused subtests so benchstat can
attribute the delta to the cache rather than to other parts of the
parse path:

  * BenchmarkParseDataFrame           (preexisting DataFrame cache)
  * BenchmarkParseWindowUpdateFrame   (new on this branch)
  * BenchmarkParseHeadersFrame        (new on this branch)
  * BenchmarkReadMetaFrame            (HeadersFrame + MetaHeadersFrame
                                       + Fields caches)

BenchmarkParseDataFrame is included so the contribution of the
preexisting DataFrame cache can be compared directly against the
caches added in this stack; without it, a reviewer has no reference
point.

Microbench, linux/amd64:

  bench                              Default            Reused
  ParseDataFrame                     48 B, 1 alloc      0 B, 0 alloc
  ParseWindowUpdateFrame             16 B, 1 alloc      0 B, 0 alloc
  ParseHeadersFrame                  48 B, 1 alloc      0 B, 0 alloc
  ReadMetaFrame                      472 B, 9 allocs    88 B, 4 allocs

Each parse-path cache saves the same shape (1 alloc per parse). The
ReadMetaFrame Reused result reflects the HeadersFrame cache (1
alloc), the MetaHeadersFrame cache (1 alloc), and the Fields-slice
reuse (3 allocs); wall time drops ~34% (1073 ns to 706 ns per
parse). The remaining 4 allocations come from the SetEmitFunc
closure and its boxed captures, which are a separate follow-up out
of scope for this branch.

The package's existing end-to-end download benchmarks do not exercise
the SetReuseFrames path because no in-tree caller opts in. Wiring
SetReuseFrames into transport.go/server.go (whether via a test-only
hook or a public knob) is a separate change and is deliberately not
part of this branch.

Change-Id: Ic3651e52baae0417e40d2e31bdd5bdee6b4e6ce9
…etention contract

The SetReuseFrames contract is that frames returned by ReadFrame, any
slice aliasing the Framer's read buffer reachable from that frame,
and the MetaHeadersFrame Fields slice are only valid until the next
ReadFrame call. A retention bug under reuse manifests as one
goroutine reading retained data while the reader goroutine writes
into the same backing memory on the next ReadFrame.

The existing reuse tests assert pointer/slice-header stability across
ReadFrame calls but do not exercise the concurrent reader/consumer
shape that the server uses (serverConn.readFrames feeding the serve
goroutine over a channel gate). Add two complementary tests, each run
in two variants: "meta" with Framer.ReadMetaHeaders set, where
HEADERS surface as *MetaHeadersFrame, and "raw" without it, where
HEADERS and CONTINUATION surface individually and their
HeaderBlockFragment aliases the read buffer.

  TestFrameReuseRaceCorrect (positive control)
    Models the gated handoff: a reader goroutine calls ReadFrame and
    delivers the result over a channel, the consumer fully consumes
    the frame's payload (DataFrame.Data, HeaderBlockFragment,
    MetaHeadersFrame.Fields, WindowUpdateFrame.Increment) and only
    then signals the per-frame gate so the reader may proceed.
    Drives 800 (meta) / 1000 (raw) frames across DATA,
    WINDOW_UPDATE, HEADERS, and HEADERS+CONTINUATION. Under -race
    must pass; future changes that eagerly mutate the cached frame
    or its backing buffer before the consumer signals done would
    fail this test.

  TestFrameReuseRaceAdversarial (negative control)
    Same plumbing, but the consumer deliberately leaks the retained
    slice into a sidecar goroutine and closes the gate immediately.
    All payloads are size-stable so the framer's readBuf stops
    growing and the retained slice keeps aliasing the buffer the
    next ReadFrame writes into; header blocks stay under
    maxRetainedMetaFields so the meta path keeps reusing one Fields
    backing array. Guarded behind H2_REUSE_RACE_NEGATIVE=1 so
    default go test skips it; the assertion is the race detector
    itself. Verified to fire in every retained case: Data and
    HeaderBlockFragment race against the read buffer, and the
    retained Fields slice races against the next meta parse's
    clear-and-append into the same backing array.

Without -race, TestFrameReuseRaceCorrect remains a useful integration
test of the gated reader/consumer pattern.

An end-to-end stress test driving the real net/http server and
Transport lands with the CL that enables SetReuseFrames in both, at
the end of this stack.

No production-code behavior change; tests only.

Change-Id: I8a86eb2440618c2f85f3d823ceabcf789cbb596c
Call Framer.SetReuseFrames once on the per-connection Framer in both
serverConn (serveConn) and Transport (newClientConn) so the parsed
*DataFrame, *WindowUpdateFrame, *HeadersFrame, and *MetaHeadersFrame
structs returned by ReadFrame — and the MetaHeadersFrame Fields
backing array — are reused across calls instead of being
heap-allocated each time. SetReuseFrames has shipped as an opt-in
Framer method since CL 34812 (golang/net, 2017) but no in-tree caller
previously opted in, so the cache was effectively dead code from the
standard library's perspective; this CL turns it on.

A new GODEBUG setting, http2reuseframes, is registered as a compat
escape hatch in line with the existing http2client / http2server /
http2debug settings:

  - default (or http2reuseframes=1): reuse on, this CL's behavior.
  - GODEBUG=http2reuseframes=0:      reuse disabled, pre-CL behavior.

The godebugs table entry carries Changed: 27, Old: "0", so programs
in modules whose go.mod declares a Go version below 1.27 keep the
pre-CL behavior automatically, per the usual GODEBUG compatibility
policy. If a deployment encounters an issue the static audit and
-race coverage didn't catch, operators can also flip the setting
without recompiling. The check-and-count logic lives in one Framer
helper (setReuseFramesFromGODEBUG) called by both the server and the
Transport, so the two cannot drift.

Per-frame allocation savings, microbench numbers (linux/amd64, from
the BenchmarkParse* and BenchmarkReadMetaFrame benchmarks added
earlier in this stack):

  bench                            Default            Reused
  ParseDataFrame                   48 B, 1 alloc      0 B, 0 alloc
  ParseWindowUpdateFrame           16 B, 1 alloc      0 B, 0 alloc
  ParseHeadersFrame                48 B, 1 alloc      0 B, 0 alloc
  ReadMetaFrame                    472 B, 9 allocs    88 B, 4 allocs

End-to-end allocation reductions on the package's existing read-path
benchmarks (this tree, reuse on vs GODEBUG=http2reuseframes=0,
-count=5, benchstat):

  bench                       allocs/op delta   p
  ClientGzip                  -63.22%           0.008
  DownloadFrameSize/16k       -46.65%           0.008
  DownloadFrameSize/64k       -48.91%           0.008
  DownloadFrameSize/128k      -49.66%           0.008
  DownloadFrameSize/256k      -49.70%           0.008
  DownloadFrameSize/512k      -48.14%           0.008
  ClientRequestHeaders/0      -22.00%           0.008
  ClientResponseHeaders/0     -22.00%           0.008
  ClientRequestHeaders/10     -16.22%           0.008
  ClientResponseHeaders/10    -11.30%           0.008
  geomean (allocs/op)         -30.83%

Write-side benchmarks (WriteScheduler*, WriteQueue) are unchanged, as
expected; those paths do not exercise ReadFrame. The remaining
allocations in ReadMetaFrame come from the SetEmitFunc closure, which
this stack does not address.

Reuse safety, by frame type:

  DataFrame
    serverConn.processData and clientConnReadLoop.processData read
    Length, StreamID, StreamEnded, and the Data() slice synchronously
    before returning. The bytes from Data() flow through
    {server,client}Conn.body / cs.bufPipe -> dataBuffer.Write, which
    copies into a pool-allocated chunk; no slice retained past
    ReadFrame.

  WindowUpdateFrame
    serverConn.processWindowUpdate and clientConnReadLoop.processWindowUpdate
    read only the scalar StreamID and Increment fields. The struct
    type has no slice fields, so there is nothing to alias the read
    buffer.

  HeadersFrame
    Both server and Transport set Framer.ReadMetaHeaders during init,
    so a bare *HeadersFrame is never delivered to consumer code; the
    Framer always returns *MetaHeadersFrame on a HEADERS frame.
    readMetaFrame clears MetaHeadersFrame.HeadersFrame.headerFragBuf
    and calls invalidate() on the embedded *HeadersFrame before
    returning. The aliased frag buf is therefore not exposed past
    readMetaFrame.

  MetaHeadersFrame
    The Fields backing array is reused across parses (cleared and
    re-filled at the start of the next meta parse), so consumers must
    not retain the slice — and cannot do so unnoticed: the accessor
    methods panic once the next ReadFrame has been called, and the
    adversarial race test verifies -race observes retained-slice
    reads. HPACK Name/Value strings are independently allocated by
    the decoder (hpack.decodeString returns string(u.b) /
    buf.String(), both of which copy), so no string aliases the read
    buffer or the Fields array.
    serverConn.processHeaders / processTrailerHeaders and
    clientConnReadLoop.processHeaders / handleResponse / processTrailers
    iterate Fields synchronously on the read-loop / serve goroutine
    and copy strings into a fresh http.Header before returning.
    No code stores *MetaHeadersFrame past the dispatch.

Server-side gating: readFrames -> readFrameCh -> serve calls
processFrameFromReader synchronously and only then invokes
readMore(), which unblocks readFrames for the next ReadFrame. So
even though the server consumes frames on a different goroutine
from the one calling ReadFrame, every frame is fully consumed
before the cache is overwritten.

Transport-side gating: clientConnReadLoop.run consumes each frame
synchronously on the read-loop goroutine before the next
ReadFrame, and what escapes to the RoundTrip / response-body
goroutines is either copied (DataFrame -> bufPipe) or composed of
immutable, independently-allocated Go strings (Header maps).

Two retention sites intentionally remain outside the reuse contract
and are now documented in place rather than silently exempt:
processPing queues writePingAck{f}, which the write scheduler may
serve after the next ReadFrame, and processGoAway stores f in
cc.goAway, whose scalar fields are read at read-loop teardown. Both
are safe solely because PingFrame and GoAwayFrame are not part of
frameCache; the new comments require copying the needed fields
before either type is ever added to the cache.

TestFrameReuseEndToEndStress (frame_reuse_e2e_test.go) lands with
this CL: it drives concurrent multiplexed requests through the real
net/http HTTP/2 server and Transport — which now both opt in to
SetReuseFrames — and touches every header/trailer field in ways
that would race against the read loop's next ReadFrame if anything
still aliased a cached frame past the readMore gate (server) or
read-loop iteration (Transport). The synthetic Framer-pair tests in
frame_reuse_race_test.go cannot reach the production process* code
paths; this one does.

Verification (linux/amd64):

  net/http/internal/http2: go test -race                  PASS
  net/http:                go test                        PASS
  internal/godebugs, runtime/metrics: go test             PASS
  TestFrameReuseRaceCorrect (meta+raw) -race              PASS
  TestFrameReuseRaceAdversarial (meta+raw) under
    H2_REUSE_RACE_NEGATIVE=1 -race                        RACE (as
    asserted: Data, HeaderBlockFragment, and Fields retention all
    fire the detector)
  TestFrameReuseEndToEndStress -race                      PASS
  GODEBUG=http2reuseframes=0 go test -race
    (frame reuse + e2e tests)                             PASS

The adversarial test deliberately violates the reuse contract and
asserts the race detector fires; the other runs validate the
production code paths both with reuse on (the default) and with
reuse disabled via GODEBUG.

Change-Id: I9d6d0c2e761b0901314c25f362c99062260b31a7
@gopherbot

Copy link
Copy Markdown
Contributor

This PR (HEAD: d5f2fd7) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/782280.

Important tips:

  • Don't comment on this PR. All discussion takes place in Gerrit.
  • You need a Gmail or other Google account to log in to Gerrit.
  • To change your code in response to feedback:
    • Push a new commit to the branch used by your GitHub PR.
    • A new "patch set" will then appear in Gerrit.
    • Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
    • Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
    • Multiple commits in the PR will be squashed by GerritBot.
  • The title and description of the GitHub PR are used to construct the final commit message.
    • Edit these as needed via the GitHub web interface (not via Gerrit or git).
    • You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
  • See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants