net/http/internal/http2: enable SetReuseFrames in server and Transport#79634
net/http/internal/http2: enable SetReuseFrames in server and Transport#79634ChrisLundquist wants to merge 7 commits into
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
3031bf4 to
4382c90
Compare
|
This PR (HEAD: 4382c90) has been imported to Gerrit for code review. Please visit Gerrit at https://go-review.googlesource.com/c/go/+/782280. Important tips:
|
|
Message from Gopher Robot: Patch Set 1: Congratulations on opening your first change. Thank you for your contribution! Next steps: Most changes in the Go project go through a few rounds of revision. This can be During May-July and Nov-Jan the Go project is in a code freeze, during which Please don’t reply on this GitHub thread. Visit golang.org/cl/782280. |
Mirror DataFrame's existing reuse pattern for WindowUpdateFrame: add
the struct as a sibling field on frameCache, expose a nil-safe
getWindowUpdateFrame, and have parseWindowUpdateFrame populate the
cached struct. Reuse is opt-in via Framer.SetReuseFrames, matching
DataFrame's existing API contract; without SetReuseFrames,
parseWindowUpdateFrame returns a freshly-allocated *WindowUpdateFrame
and behavior is identical to before this CL.
The cached struct is reset with a composite literal rather than
field-by-field assignment, so every field — including any added in
the future — is overwritten on each parse by construction, and stale
data from a previous frame cannot leak to the next caller.
When SetReuseFrames is in effect, both in-tree consumers extract the
StreamID and Increment fields synchronously before the next
ReadFrame call:
* On the server, (*serverConn).processWindowUpdate runs on the
serve goroutine. The readFrames goroutine that calls ReadFrame
blocks on res.readMore() until the previous frame has been fully
processed.
* On the client, (*clientConnReadLoop).processWindowUpdate runs
inline within readLoop's frame-handling switch and returns
before the next ReadFrame call.
WindowUpdateFrame parses dominate the heap-allocation profile for
large transfers: the receiver sends one WINDOW_UPDATE per ~16 KiB of
data at both the stream and connection levels.
BenchmarkParseWindowUpdateFrame (added in a later CL in this stack)
measures the cache's contribution on linux/amd64:
variant B/op allocs/op ns/op
Default 16 1 ~95
Reused 0 0 ~77 (-1 alloc, -19%)
Also drop the stale "SetReuseFrames only currently implements reuse
of DataFrames" comments in frame_test.go, which this stack makes
untrue.
Four tests guard the change:
TestReadFrameReusesWindowUpdate: same pointer across consecutive
WINDOW_UPDATE parses when SetReuseFrames is in effect.
TestReadFrameWindowUpdateNoAllocsWhenReused: locks in zero
allocations per parse with SetReuseFrames.
TestReadFrameWindowUpdateOverwrites: defensive — poisons the
cached struct and confirms every field is re-assigned by the
next parse (guards against regression to a partial reset).
TestReadFrameWindowUpdateDistinctWithoutReuse: pre-SetReuseFrames
API contract — distinct pointers per parse, no mutation across
calls.
Change-Id: Ibcef724f7bc2ee2bd1a74cb2577d61e3117a872e
Mirror DataFrame's existing reuse pattern for HeadersFrame: add it
as a sibling field on frameCache, expose a nil-safe getHeadersFrame,
and have parseHeadersFrame populate the cached struct in place.
Reuse is opt-in via Framer.SetReuseFrames, matching DataFrame and
the prior WindowUpdateFrame CL on this branch; without
SetReuseFrames, parseHeadersFrame allocates a fresh *HeadersFrame
and behavior is identical to before this CL.
The explicit `*hf = HeadersFrame{FrameHeader: fh}` reset at the top
of parseHeadersFrame zeros every field so values from a previous
parse (Priority, headerFragBuf, Flags carried by FrameHeader)
cannot leak into the next caller. The later flag-gated assignments
fill Priority only when FlagHeadersPriority is set.
Reuse safety, when SetReuseFrames is in effect:
* In net/http's internal use of this package, the *HeadersFrame
returned by parseHeadersFrame is wrapped into a MetaHeadersFrame
by readMetaFrame within the same ReadFrame call. The
MetaHeadersFrame's embedded *HeadersFrame is then invalidated
(Header().valid = false) before the next ReadFrame.
* The headerFragBuf slice already aliases the framer's read
buffer in both the cached and fresh variants, matching the
existing DataFrame contract.
Every request on every HTTP/2 connection triggers exactly one
parseHeadersFrame call on each side, so when SetReuseFrames is in
effect this allocation is removed from the hottest of hot paths.
BenchmarkParseHeadersFrame (added in a later CL in this stack)
measures the cache's contribution on linux/amd64:
variant B/op allocs/op ns/op
Default 48 1 ~125
Reused 0 0 ~86 (-1 alloc, -31%)
Three tests guard the change:
TestReadFrameReusesHeadersFrame: same *HeadersFrame pointer
across consecutive HEADERS parses when SetReuseFrames is in
effect.
TestReadFrameHeadersOverwrites: defensive — parses a HEADERS
frame with Priority + padding then a HEADERS frame with neither,
asserts Priority and Flags do not bleed.
TestReadFrameHeadersDistinctWithoutReuse: pre-SetReuseFrames API
contract — distinct *HeadersFrame pointers per parse, no
mutation across calls.
Change-Id: Ieacc0f4cb14454169d87a533b9ab06e01142f809
Mirror DataFrame's existing reuse pattern for the MetaHeadersFrame
wrapper produced by readMetaFrame: store it as a sibling field on
frameCache, expose a nil-safe getMetaHeadersFrame, and have
readMetaFrame populate the cached struct in place. Reuse is opt-in
via Framer.SetReuseFrames, matching DataFrame, WindowUpdateFrame,
and HeadersFrame on this branch; without SetReuseFrames,
readMetaFrame allocates a fresh *MetaHeadersFrame and behavior is
identical to before this CL.
The explicit `*mh = MetaHeadersFrame{HeadersFrame: hf}` reset at the
top of readMetaFrame ensures Fields, Truncated, and any future-added
field are zeroed so values from a previous (possibly aborted) parse
cannot leak into the next caller.
Because the cached struct is overwritten in place, a consumer that
retains the frame past the next ReadFrame would otherwise silently
read another stream's headers — a strictly worse failure mode than
DataFrame's, whose Data accessor panics via checkValid on a stale
frame. Give MetaHeadersFrame the equivalent protection: an owned bit
that readMetaFrame sets before returning and the next ReadFrame
clears (readMetaFrame now leaves the returned frame as the Framer's
lastFrame, and MetaHeadersFrame overrides invalidate to revoke the
bit), checked by PseudoValue, RegularFields, PseudoFields, and
rfc9218Priority. The embedded HeadersFrame's valid bit cannot serve
this purpose because readMetaFrame clears it when the header block
fragment is consumed. Like DataFrame's guard, the protection is
exact without SetReuseFrames and best-effort with it (a retained
pointer becomes owned again once the cache is repopulated by a later
meta parse).
Reuse safety, when SetReuseFrames is in effect:
* The two consumers ((*serverConn).processHeaders and
(*clientConnReadLoop).processHeaders) extract PseudoValue,
RegularFields, and Truncated synchronously and never retain
the *MetaHeadersFrame past the next ReadFrame.
* The embedded HeadersFrame is the cached one from the prior CL
on this branch; the *MetaHeadersFrame wrapper is one more level
of the same sharing, with the same single-Framer-goroutine
safety.
BenchmarkReadMetaFrame (added in a later CL in this stack) measures
the combined HF+MHF cache contribution on linux/amd64:
variant B/op allocs/op ns/op
Default 472 9 ~1062
Reused 376 7 ~1004 (-2 allocs, -5%)
The -2 allocs reflect the *HeadersFrame (saved by the HF CL) and the
*MetaHeadersFrame (saved here). The remaining 7 allocations come
from HPACK Fields-slice growth (addressed by the next CL in this
stack) and the SetEmitFunc closure (a separate follow-up).
Four tests guard the change:
TestReadMetaFrameReusesMetaHeadersFrame: same *MetaHeadersFrame
pointer across consecutive parses when SetReuseFrames is in
effect.
TestReadMetaFrameMetaHeadersOverwrites: defensive — first parse
triggers Truncated, second parse fits; Truncated does not leak.
TestReadMetaFrameDistinctWithoutReuse: pre-SetReuseFrames API
contract — distinct *MetaHeadersFrame pointers per parse.
TestMetaHeadersFrameAccessorPanicsWhenStale: accessor methods
panic once the next ReadFrame has been called, with and without
SetReuseFrames.
Change-Id: I93e6916075b1110137eada96b8017bc38b700165
…arses
After the MetaHeadersFrame struct cache, readMetaFrame's largest
remaining per-parse allocation is the Fields slice: for a typical
4-header HEADERS frame, mh.Fields = append(mh.Fields, hf) grows a
nil-backed slice through cap 1, 2, 4 — three backing-array
allocations per parse.
Keep the backing array in the cached MetaHeadersFrame instead: at the
start of readMetaFrame, capture mh.Fields before the wholesale struct
reset and restore it length-zero. Without SetReuseFrames, mh is a
fresh struct with nil Fields, so every parse still gets a fresh array
and behavior is unchanged.
Two safeguards keep the cache from holding header data longer than
intended:
* Before reuse, the entire backing array is cleared with
clear(fields[:cap(fields)]). Header values can contain secrets
(Cookie, Authorization, HPACK Sensitive=true fields); without the
clear, the slots beyond the new parse's length would keep those
strings reachable for the lifetime of the connection.
* The retained capacity is capped at maxRetainedMetaFields (64)
entries. A single large HEADERS frame would otherwise permanently
inflate per-Framer memory; above the cap, the array is dropped at
the start of the next parse, which then pays the modest growth
cost again.
Reuse safety beyond memory hygiene:
* The MetaHeadersFrame.Fields documentation already says the
underlying slice is owned by the Framer and must not be retained
after the next call to ReadFrame, and the accessor methods panic
on a stale frame (previous CL in this stack).
* The two consumers ((*serverConn).processHeaders calling
newWriterAndRequest, and (*clientConnReadLoop).processHeaders
calling handleResponse) iterate the fields and copy each header
into their own http.Header map before returning, synchronously
within their single-frame-at-a-time loops.
* The hpack decoder hands out independent Go strings for each
HeaderField's Name and Value, so reusing the slice's backing
array does not invalidate strings that earlier callers copied
out.
Measured with testing.AllocsPerRun on linux/amd64, parsing a 4-field
HEADERS frame with ReadMetaHeaders set:
without SetReuseFrames: 9 allocs/op
with, before this CL: 7 allocs/op
with, after this CL: 4 allocs/op (-3)
The remaining 4 allocations come from the SetEmitFunc closure and its
boxed captures, which are a separate follow-up and deliberately not
part of this branch.
Three tests guard the change:
TestReadMetaFrameReusesFieldsSlice: the backing array is stable
across equal-size parses, including after growth.
TestReadMetaFrameFieldsClearsSensitiveTail: after a shorter parse,
no slot beyond the new length still references the previous
parse's strings.
TestReadMetaFrameFieldsRetentionCap: an array grown past
maxRetainedMetaFields is dropped, not retained.
Change-Id: I63cc1afe39636421387a468f35684b43b7a67c2c
Add direct parser microbenchmarks that exercise the SetReuseFrames-
enabled paths so the reuse savings introduced earlier in this stack
are observable per-cache. All four share one helper
(benchmarkReadFrameReuse) that pre-encodes the frame, performs a
warm-up read so the Framer's read-buffer growth stays out of the
measurement, and runs Default and Reused subtests so benchstat can
attribute the delta to the cache rather than to other parts of the
parse path:
* BenchmarkParseDataFrame (preexisting DataFrame cache)
* BenchmarkParseWindowUpdateFrame (new on this branch)
* BenchmarkParseHeadersFrame (new on this branch)
* BenchmarkReadMetaFrame (HeadersFrame + MetaHeadersFrame
+ Fields caches)
BenchmarkParseDataFrame is included so the contribution of the
preexisting DataFrame cache can be compared directly against the
caches added in this stack; without it, a reviewer has no reference
point.
Microbench, linux/amd64:
bench Default Reused
ParseDataFrame 48 B, 1 alloc 0 B, 0 alloc
ParseWindowUpdateFrame 16 B, 1 alloc 0 B, 0 alloc
ParseHeadersFrame 48 B, 1 alloc 0 B, 0 alloc
ReadMetaFrame 472 B, 9 allocs 88 B, 4 allocs
Each parse-path cache saves the same shape (1 alloc per parse). The
ReadMetaFrame Reused result reflects the HeadersFrame cache (1
alloc), the MetaHeadersFrame cache (1 alloc), and the Fields-slice
reuse (3 allocs); wall time drops ~34% (1073 ns to 706 ns per
parse). The remaining 4 allocations come from the SetEmitFunc
closure and its boxed captures, which are a separate follow-up out
of scope for this branch.
The package's existing end-to-end download benchmarks do not exercise
the SetReuseFrames path because no in-tree caller opts in. Wiring
SetReuseFrames into transport.go/server.go (whether via a test-only
hook or a public knob) is a separate change and is deliberately not
part of this branch.
Change-Id: Ic3651e52baae0417e40d2e31bdd5bdee6b4e6ce9
…etention contract
The SetReuseFrames contract is that frames returned by ReadFrame, any
slice aliasing the Framer's read buffer reachable from that frame,
and the MetaHeadersFrame Fields slice are only valid until the next
ReadFrame call. A retention bug under reuse manifests as one
goroutine reading retained data while the reader goroutine writes
into the same backing memory on the next ReadFrame.
The existing reuse tests assert pointer/slice-header stability across
ReadFrame calls but do not exercise the concurrent reader/consumer
shape that the server uses (serverConn.readFrames feeding the serve
goroutine over a channel gate). Add two complementary tests, each run
in two variants: "meta" with Framer.ReadMetaHeaders set, where
HEADERS surface as *MetaHeadersFrame, and "raw" without it, where
HEADERS and CONTINUATION surface individually and their
HeaderBlockFragment aliases the read buffer.
TestFrameReuseRaceCorrect (positive control)
Models the gated handoff: a reader goroutine calls ReadFrame and
delivers the result over a channel, the consumer fully consumes
the frame's payload (DataFrame.Data, HeaderBlockFragment,
MetaHeadersFrame.Fields, WindowUpdateFrame.Increment) and only
then signals the per-frame gate so the reader may proceed.
Drives 800 (meta) / 1000 (raw) frames across DATA,
WINDOW_UPDATE, HEADERS, and HEADERS+CONTINUATION. Under -race
must pass; future changes that eagerly mutate the cached frame
or its backing buffer before the consumer signals done would
fail this test.
TestFrameReuseRaceAdversarial (negative control)
Same plumbing, but the consumer deliberately leaks the retained
slice into a sidecar goroutine and closes the gate immediately.
All payloads are size-stable so the framer's readBuf stops
growing and the retained slice keeps aliasing the buffer the
next ReadFrame writes into; header blocks stay under
maxRetainedMetaFields so the meta path keeps reusing one Fields
backing array. Guarded behind H2_REUSE_RACE_NEGATIVE=1 so
default go test skips it; the assertion is the race detector
itself. Verified to fire in every retained case: Data and
HeaderBlockFragment race against the read buffer, and the
retained Fields slice races against the next meta parse's
clear-and-append into the same backing array.
Without -race, TestFrameReuseRaceCorrect remains a useful integration
test of the gated reader/consumer pattern.
An end-to-end stress test driving the real net/http server and
Transport lands with the CL that enables SetReuseFrames in both, at
the end of this stack.
No production-code behavior change; tests only.
Change-Id: I8a86eb2440618c2f85f3d823ceabcf789cbb596c
Call Framer.SetReuseFrames once on the per-connection Framer in both
serverConn (serveConn) and Transport (newClientConn) so the parsed
*DataFrame, *WindowUpdateFrame, *HeadersFrame, and *MetaHeadersFrame
structs returned by ReadFrame — and the MetaHeadersFrame Fields
backing array — are reused across calls instead of being
heap-allocated each time. SetReuseFrames has shipped as an opt-in
Framer method since CL 34812 (golang/net, 2017) but no in-tree caller
previously opted in, so the cache was effectively dead code from the
standard library's perspective; this CL turns it on.
A new GODEBUG setting, http2reuseframes, is registered as a compat
escape hatch in line with the existing http2client / http2server /
http2debug settings:
- default (or http2reuseframes=1): reuse on, this CL's behavior.
- GODEBUG=http2reuseframes=0: reuse disabled, pre-CL behavior.
The godebugs table entry carries Changed: 27, Old: "0", so programs
in modules whose go.mod declares a Go version below 1.27 keep the
pre-CL behavior automatically, per the usual GODEBUG compatibility
policy. If a deployment encounters an issue the static audit and
-race coverage didn't catch, operators can also flip the setting
without recompiling. The check-and-count logic lives in one Framer
helper (setReuseFramesFromGODEBUG) called by both the server and the
Transport, so the two cannot drift.
Per-frame allocation savings, microbench numbers (linux/amd64, from
the BenchmarkParse* and BenchmarkReadMetaFrame benchmarks added
earlier in this stack):
bench Default Reused
ParseDataFrame 48 B, 1 alloc 0 B, 0 alloc
ParseWindowUpdateFrame 16 B, 1 alloc 0 B, 0 alloc
ParseHeadersFrame 48 B, 1 alloc 0 B, 0 alloc
ReadMetaFrame 472 B, 9 allocs 88 B, 4 allocs
End-to-end allocation reductions on the package's existing read-path
benchmarks (this tree, reuse on vs GODEBUG=http2reuseframes=0,
-count=5, benchstat):
bench allocs/op delta p
ClientGzip -63.22% 0.008
DownloadFrameSize/16k -46.65% 0.008
DownloadFrameSize/64k -48.91% 0.008
DownloadFrameSize/128k -49.66% 0.008
DownloadFrameSize/256k -49.70% 0.008
DownloadFrameSize/512k -48.14% 0.008
ClientRequestHeaders/0 -22.00% 0.008
ClientResponseHeaders/0 -22.00% 0.008
ClientRequestHeaders/10 -16.22% 0.008
ClientResponseHeaders/10 -11.30% 0.008
geomean (allocs/op) -30.83%
Write-side benchmarks (WriteScheduler*, WriteQueue) are unchanged, as
expected; those paths do not exercise ReadFrame. The remaining
allocations in ReadMetaFrame come from the SetEmitFunc closure, which
this stack does not address.
Reuse safety, by frame type:
DataFrame
serverConn.processData and clientConnReadLoop.processData read
Length, StreamID, StreamEnded, and the Data() slice synchronously
before returning. The bytes from Data() flow through
{server,client}Conn.body / cs.bufPipe -> dataBuffer.Write, which
copies into a pool-allocated chunk; no slice retained past
ReadFrame.
WindowUpdateFrame
serverConn.processWindowUpdate and clientConnReadLoop.processWindowUpdate
read only the scalar StreamID and Increment fields. The struct
type has no slice fields, so there is nothing to alias the read
buffer.
HeadersFrame
Both server and Transport set Framer.ReadMetaHeaders during init,
so a bare *HeadersFrame is never delivered to consumer code; the
Framer always returns *MetaHeadersFrame on a HEADERS frame.
readMetaFrame clears MetaHeadersFrame.HeadersFrame.headerFragBuf
and calls invalidate() on the embedded *HeadersFrame before
returning. The aliased frag buf is therefore not exposed past
readMetaFrame.
MetaHeadersFrame
The Fields backing array is reused across parses (cleared and
re-filled at the start of the next meta parse), so consumers must
not retain the slice — and cannot do so unnoticed: the accessor
methods panic once the next ReadFrame has been called, and the
adversarial race test verifies -race observes retained-slice
reads. HPACK Name/Value strings are independently allocated by
the decoder (hpack.decodeString returns string(u.b) /
buf.String(), both of which copy), so no string aliases the read
buffer or the Fields array.
serverConn.processHeaders / processTrailerHeaders and
clientConnReadLoop.processHeaders / handleResponse / processTrailers
iterate Fields synchronously on the read-loop / serve goroutine
and copy strings into a fresh http.Header before returning.
No code stores *MetaHeadersFrame past the dispatch.
Server-side gating: readFrames -> readFrameCh -> serve calls
processFrameFromReader synchronously and only then invokes
readMore(), which unblocks readFrames for the next ReadFrame. So
even though the server consumes frames on a different goroutine
from the one calling ReadFrame, every frame is fully consumed
before the cache is overwritten.
Transport-side gating: clientConnReadLoop.run consumes each frame
synchronously on the read-loop goroutine before the next
ReadFrame, and what escapes to the RoundTrip / response-body
goroutines is either copied (DataFrame -> bufPipe) or composed of
immutable, independently-allocated Go strings (Header maps).
Two retention sites intentionally remain outside the reuse contract
and are now documented in place rather than silently exempt:
processPing queues writePingAck{f}, which the write scheduler may
serve after the next ReadFrame, and processGoAway stores f in
cc.goAway, whose scalar fields are read at read-loop teardown. Both
are safe solely because PingFrame and GoAwayFrame are not part of
frameCache; the new comments require copying the needed fields
before either type is ever added to the cache.
TestFrameReuseEndToEndStress (frame_reuse_e2e_test.go) lands with
this CL: it drives concurrent multiplexed requests through the real
net/http HTTP/2 server and Transport — which now both opt in to
SetReuseFrames — and touches every header/trailer field in ways
that would race against the read loop's next ReadFrame if anything
still aliased a cached frame past the readMore gate (server) or
read-loop iteration (Transport). The synthetic Framer-pair tests in
frame_reuse_race_test.go cannot reach the production process* code
paths; this one does.
Verification (linux/amd64):
net/http/internal/http2: go test -race PASS
net/http: go test PASS
internal/godebugs, runtime/metrics: go test PASS
TestFrameReuseRaceCorrect (meta+raw) -race PASS
TestFrameReuseRaceAdversarial (meta+raw) under
H2_REUSE_RACE_NEGATIVE=1 -race RACE (as
asserted: Data, HeaderBlockFragment, and Fields retention all
fire the detector)
TestFrameReuseEndToEndStress -race PASS
GODEBUG=http2reuseframes=0 go test -race
(frame reuse + e2e tests) PASS
The adversarial test deliberately violates the reuse contract and
asserts the race detector fires; the other runs validate the
production code paths both with reuse on (the default) and with
reuse disabled via GODEBUG.
Change-Id: I9d6d0c2e761b0901314c25f362c99062260b31a7
4382c90 to
d5f2fd7
Compare
|
This PR (HEAD: d5f2fd7) has been imported to Gerrit for code review. Please visit Gerrit at https://go-review.googlesource.com/c/go/+/782280. Important tips:
|

This stack extends Framer.SetReuseFrames to cover *WindowUpdateFrame,
*HeadersFrame, and *MetaHeadersFrame (in addition to the existing
*DataFrame), reuses the MetaHeadersFrame Fields backing array across
parses, then calls SetReuseFrames once on the per-connection Framer
in serverConn and Transport.newClientConn. Result: -31% geomean
allocations per op on the package's read-path benchmarks (about -50%
on downloads, -22% on header-bound requests), with no measurable
change on write-path benchmarks and no API change.
A new GODEBUG=http2reuseframes=0 setting is registered alongside
http2client / http2server / http2debug and reverts to the pre-CL
allocate-per-frame behavior.
Background
Framer.SetReuseFrames has shipped as an opt-in Framer method since
golang/net CL 34812 (commit bb807669, 2017-02-27, fixing #18502). It
was added for gRPC-Go, whose Framer profiling showed parseDataFrame
accounting for ~66% of allocations (1.4 GB of 2.2 GB) on a 64-conn /
100-stream / 900K RT/sec benchmark. gRPC-Go calls SetReuseFrames
directly on the Framer it owns.
No in-tree caller in the standard library has ever opted in, so the
cache has been effectively dead code from the stdlib's perspective
for nine years. This CL turns it on.
Prior attempts and reviewer concerns
This is the third attempt to enable frame reuse on the stdlib's
HTTP/2 server/Transport:
Issue x/net/http2: improve frame decoding performance by reusing frames #68352 (still open, 2024-07-09): a request to improve
frame decoding performance. neild rejected a user-facing config
option ("There's no need for a user-configurable option here.
If there are performance improvements that can be made to frame
decoding in the HTTP/2 implementation, we can just make them.")
but left the door open for internal enablement.
golang/net CL 597455 (PR x/http/net: enable
SetFrameReusein server and client connections net#216, 2024-07-10): tried toenable on both server and Transport unconditionally. neild's
blocking review: "this is not correct. The server read goroutine
(serverConn.readFrames) reads frames and delivers them to the
main server goroutine for processing," and "Changing the HTTP/2
server and/or transport to use the frame cache is likely to
require some significant refactoring." Stale; never landed.
golang/net CL 676235 (server-only, 2025-05-25): neild Hold+1 on
2025-06-24: "this is not a safe change for the server.
SetReuseFrames causes each frame to be invalidated by the next
call to ReadFrame, but the server processes frames concurrently
with ReadFrame calls." Abandoned 2025-10-24.
The blocker on both prior CLs was correctness, specifically retention
of frame data past the next ReadFrame. This CL addresses that with a
static retention audit, an ownership guard that turns retention bugs
into deterministic panics, race-detector tests that exercise the
gated handoff, an end-to-end stress test under -race, and a GODEBUG
escape hatch in case the analysis missed something.
Stack layout
Seven commits, each independently reviewable:
ownership guard on its accessor methods
control)
http2reuseframes GODEBUG, add end-to-end stress test
Commits 1-3 mirror the existing *DataFrame cache pattern from CL
34812: wholesale "*p = T{...}" reset at the top of each parse
function, slot in frameCache, nil-safe getFrame getter. No
behavior change without SetReuseFrames; the caches are dormant unless
opted in.
Commit 3 also gives MetaHeadersFrame the equivalent of DataFrame's
checkValid protection: an owned bit set when readMetaFrame returns
the frame and revoked by the next ReadFrame (the frame becomes the
Framer's lastFrame and overrides invalidate). PseudoValue,
RegularFields, PseudoFields, and rfc9218Priority panic on a stale
frame, so a retention bug fails deterministically instead of
silently reading another stream's headers.
Commit 4 is where most of the HEADERS-path win comes from: the
Fields backing array stays in the cached frame and is cleared and
re-filled by the next meta parse. Two safeguards: the array is
cleared before reuse so header values that can hold secrets (Cookie,
Authorization, HPACK Sensitive fields) are released as soon as the
contract permits, and retained capacity is capped at 64 entries so a
single large HEADERS frame cannot permanently inflate per-connection
memory.
Safety analysis
Static retention audit across every code path that consumes a
reusable *Frame, plus a separate concurrency audit. Per frame type:
*DataFrame
f.Data() is copied into dataBuffer.Write
(databuffer.go:126 copy(chunk[b.w:], p)) before readMore. The
pool-allocated chunk has independent backing.
*WindowUpdateFrame
Struct has only scalar fields. Safe by construction.
*HeadersFrame
Both server and Transport set Framer.ReadMetaHeaders during init,
so a bare *HeadersFrame never reaches consumer code.
readMetaFrame clears headerFragBuf=nil and invalidates the
embedded *HeadersFrame before returning.
*MetaHeadersFrame
The Fields backing array is reused across parses, so consumers
must not retain the slice -- and cannot do so unnoticed: the
accessor methods panic once the next ReadFrame has been called,
and the adversarial race test verifies -race observes
retained-slice reads. hpack.decodeString returns string(u.b) or
buf.String(), both of which copy, so Name/Value strings are
independently allocated and alias neither readBuf nor the Fields
array. Server's processHeaders / newWriterAndRequest and
Transport's handleResponse / processTrailers iterate Fields
synchronously and copy into fresh http.Header maps before
returning. No code stores *MetaHeadersFrame past dispatch.
Concurrency:
Server: serverConn.readFrames sends each parsed frame on
readFrameCh then blocks on gate. The serve goroutine receives,
calls processFrameFromReader synchronously, then res.readMore(),
which releases the gate. The single readMore call site fires
only after the synchronous dispatch returns, so the cache is
never overwritten while consumer code still references it.
Transport: clientConnReadLoop.run reads frames in a tight loop
with no per-frame gate, so each process* function must fully
consume aliased data before returning. Verified: every byte that
escapes the loop is either copied (DataFrame -> bufPipe) or
composed of immutable, independently-allocated Go strings
(Header maps populated from hpack-decoded strings).
Spawned goroutines: runHandler (server) only sees the fresh
*ServerRequest whose Header/Trailer/URL are independently
allocated. writeFrameAsync, body-close, Shutdown, ping writers
don't touch read-side cached frames.
Two retention sites intentionally remain outside the reuse contract
and are documented in place rather than silently exempt:
processPing queues writePingAck{f}, which the write scheduler may
serve after the next ReadFrame, and processGoAway stores f in
cc.goAway, whose scalar fields are read at read-loop teardown. Both
are safe solely because PingFrame and GoAwayFrame are not part of
frameCache; the comments require copying the needed fields before
either type is ever added to the cache.
Benchmarks
End-to-end, -count=5 -benchmem, linux/amd64 i7-6770HQ @ 2.60GHz,
reuse on vs GODEBUG=http2reuseframes=0 on this tree (the GODEBUG-off
path is the pre-CL allocation behavior), via benchstat.
Allocations per op (the main win):
Write-side benchmarks (WriteScheduler*, WriteQueue) are unchanged
within noise; those paths don't touch ReadFrame. Header-bound
benchmarks show small significant latency wins (-1.4% to -2.6% at
0/10 headers); download throughput is flat within socket noise.
Per-cache attribution from microbenchmarks added in commit 5:
The residual 4 allocations in ReadMetaFrame/Reused come from the
SetEmitFunc closure and its boxed captures; converting the closure
to a Framer method is a separate follow-up out of scope for this
stack.
Test plan
frame_reuse_race_test.go (package http2, synthetic Framer-pair
tests, each run in a "meta" variant where HEADERS surface as
*MetaHeadersFrame and a "raw" variant where HEADERS/CONTINUATION
surface individually and HeaderBlockFragment aliases the read
buffer):
TestFrameReuseRaceCorrect: positive control. Models
serverConn.readFrames with a reader goroutine, a channel, a
per-frame gate, and a consumer that fully consumes payload
before signalling done. 800/1000 frames across DATA,
WINDOW_UPDATE, HEADERS, HEADERS+CONTINUATION. Must PASS under
-race.
TestFrameReuseRaceAdversarial: negative control, gated behind
H2_REUSE_RACE_NEGATIVE=1. Same plumbing, but deliberately leaks
slices into a sidecar goroutine across the next ReadFrame. Must
RACE under -race; verified the detector fires for every
retained case: Data, HeaderBlockFragment, and the Fields slice
(which races against the next meta parse's clear-and-append).
frame_reuse_e2e_test.go (package http2_test, real server +
Transport, lands with the enablement commit):
requests through the real httptest-based HTTP/2 server and
net/http.Transport, both running with SetReuseFrames enabled.
Handler and client touch every byte of every header name/value
and trailer in both directions, drain bodies. Regression test
against future refactors of processHeaders / handleResponse /
processTrailers / processData that fail to copy aliased data.
Per-type unit tests land with each cache commit: pointer-stability
and no-alloc tests, poison-overwrite tests for stale-field leaks,
distinct-pointer tests for the reuse-off contract, accessor-panic
tests for the ownership guard, and Fields tests for backing-array
reuse, secret-clearing on reuse, and the retention cap.
Validation runs (linux/amd64):
GODEBUG escape hatch
Registered alongside the existing http2 GODEBUGs in
src/internal/godebugs/table.go, with Changed: 27, Old: "0":
Documented in doc/godebug.md under the Go 1.27 section. Per the
usual GODEBUG compatibility policy, programs in modules whose go.mod
declares a Go version below 1.27 default to http2reuseframes=0 and
keep the pre-CL behavior. The check-and-count logic lives in a
single Framer helper (setReuseFramesFromGODEBUG) called by both the
server and the Transport.
The intent is a compat hatch, not a tuning knob: operators can
disable without recompiling if the static analysis and race coverage
missed something. This addresses the safety concerns from prior CLs
while staying consistent with the preference against user-facing
performance knobs (this is purely a safety/compat escape).
Compatibility
parsed frame values, only the per-call heap allocation is
elided.
and SetReuseFrames method remain unreachable from outside the
stdlib.
Updates #18502
Updates #68352