Add FairQueue single-peer fast path#282
Open
rgbkrk wants to merge 2 commits into
Open
Conversation
With a single connected peer, FairQueue still paid the full multi-peer bookkeeping cost on every message: ready-queue push/pop, queued-set updates, map remove/insert, and requeueing. The one-way large-payload IPC receive benchmarks pointed straight at this common receive-side cost rather than ROUTER identity or echo-send work. Store the lone connected stream outside the ready-queue machinery in a single_stream slot and poll it directly while it is the only stream. When a second stream is inserted, promote both streams back into the existing fair-queue path so multi-peer polling semantics are unchanged. Disconnect callback behavior is preserved on the single-stream path. This layers on top of the existing spin fix that bounds ready events to those present at poll entry; the fast path runs before the bounded multi-peer loop and does not touch the ready queue. Adds regression coverage for single-stream delivery, promotion when a second stream is inserted, and disconnect-callback handling on the single-stream path.
Add a regression test that a single stream left Pending on the fast path is promoted to the multi-peer path when a second peer connects, and that the task parked on the fast path is woken so a real executor re-polls. Document on the single_stream field that promotion is one-way (no demotion after peer churn).
Alexei-Kornienko
left a comment
Collaborator
There was a problem hiding this comment.
Just an idea:
Would it be better to replace fair queue on the top level with an enum type that would have 3 variants:
- empty queue
- single client
- many clients
Each enum variant may have it's own subtype implementing basic logic
and we can have several helper methods that would cleanly transform types into each other.
My main concern is that this type is way to complicated and I would prefer to simplify it for ease of maintenance.
Member
Author
That's a really good idea. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
With a single connected peer,
FairQueuestill pays the full multi-peer bookkeeping cost on every message: ready-queue push/pop, queued-set updates, map remove/insert, and requeueing. The one-way large-payload IPC receive benchmarks pointed straight at this common receive-side cost rather than ROUTER identity or echo-send work. This stores the lone connected stream outside the ready-queue machinery and polls it directly while it remains the only stream, then promotes both streams back into the existing fair-queue path the moment a second peer connects.Design decision
The fast path is a
single_stream: Option<(K, Pin<Box<S>>)>slot onQueueInner. The safety boundary is the stream count:single_streamslotpoll_nextstreams+ready_queueReady(None))on_disconnectfired outside the lock, same as the multi-peer pathremoveinsertis the only promotion site. If a stream already sits instreams, a new insert never takes the fast path, so the fast path can only ever hold the sole stream. The direct poll happens before the bounded multi-peer loop and never touchesready_queue, so it composes with the existing spin fix (ready events are still bounded to those present at poll entry).queue_empty_pollfoldssingle_stream.is_some()into the "stay Pending" check so an idle single stream does not get reported as end-of-stream.This does add complexity to a core scheduling type. The benchmark numbers below are why it carries its weight. The promotion boundary keeps that complexity contained: anything past one peer runs the exact code that ran before.
Benchmark evidence
From the original measurement run (not re-measured here). Benchmark shape: one-way workloads that isolate receive cost from echo/reply.
DEALER/ROUTER one-way: DEALER sendsBATCH_SIZEmessages, ROUTER receivesBATCH_SIZE, no echoPUSH/PULL one-way: PUSH sendsBATCH_SIZE, PULL receivesBATCH_SIZEipc, payload4096BZMQRS_BENCH_SAMPLE_SIZE=20,ZMQRS_BENCH_MEASUREMENT_MS=2000,ZMQRS_BENCH_WARMUP_MS=500zmqrs D/R one-way ipc 40964.8989 ms4.1979 mszmqrs P/P one-way ipc 40965.2582 ms4.1240 msSince each workload moves a fixed number of messages, the lower elapsed time corresponds directly to higher receive-side throughput.
Validation
cargo test --lib fair_queueandcargo test --lib test_fair_queue: 9 passed, 0 failedcargo clippy --all-targets -- --deny warnings: exit 0 (only the repo's pre-existing renamed-lint notes)Recreates #271.