Skip to content

Add FairQueue single-peer fast path#282

Open
rgbkrk wants to merge 2 commits into
masterfrom
perf/fair-queue-single-peer
Open

Add FairQueue single-peer fast path#282
rgbkrk wants to merge 2 commits into
masterfrom
perf/fair-queue-single-peer

Conversation

@rgbkrk

@rgbkrk rgbkrk commented May 29, 2026

Copy link
Copy Markdown
Member

Summary

With a single connected peer, FairQueue still pays the full multi-peer bookkeeping cost on every message: ready-queue push/pop, queued-set updates, map remove/insert, and requeueing. The one-way large-payload IPC receive benchmarks pointed straight at this common receive-side cost rather than ROUTER identity or echo-send work. This stores the lone connected stream outside the ready-queue machinery and polls it directly while it remains the only stream, then promotes both streams back into the existing fair-queue path the moment a second peer connects.

Design decision

The fast path is a single_stream: Option<(K, Pin<Box<S>>)> slot on QueueInner. The safety boundary is the stream count:

state where the stream lives how it's polled
zero streams, then first insert single_stream slot polled directly with the task waker at the top of poll_next
second insert arrives both moved into streams + ready_queue existing multi-peer fair-queue loop, semantics unchanged
single stream disconnects (Ready(None)) slot cleared on_disconnect fired outside the lock, same as the multi-peer path
single stream removed via remove slot cleared n/a

insert is the only promotion site. If a stream already sits in streams, a new insert never takes the fast path, so the fast path can only ever hold the sole stream. The direct poll happens before the bounded multi-peer loop and never touches ready_queue, so it composes with the existing spin fix (ready events are still bounded to those present at poll entry). queue_empty_poll folds single_stream.is_some() into the "stay Pending" check so an idle single stream does not get reported as end-of-stream.

This does add complexity to a core scheduling type. The benchmark numbers below are why it carries its weight. The promotion boundary keeps that complexity contained: anything past one peer runs the exact code that ran before.

Benchmark evidence

From the original measurement run (not re-measured here). Benchmark shape: one-way workloads that isolate receive cost from echo/reply.

  • DEALER/ROUTER one-way: DEALER sends BATCH_SIZE messages, ROUTER receives BATCH_SIZE, no echo
  • PUSH/PULL one-way: PUSH sends BATCH_SIZE, PULL receives BATCH_SIZE
  • transport ipc, payload 4096B
  • ZMQRS_BENCH_SAMPLE_SIZE=20, ZMQRS_BENCH_MEASUREMENT_MS=2000, ZMQRS_BENCH_WARMUP_MS=500
workload before after elapsed reduction throughput
zmqrs D/R one-way ipc 4096 4.8989 ms 4.1979 ms ~14.3% ~16.7% more
zmqrs P/P one-way ipc 4096 5.2582 ms 4.1240 ms ~21.6% ~27.5% more

Since each workload moves a fixed number of messages, the lower elapsed time corresponds directly to higher receive-side throughput.

Validation

  • cargo test --lib fair_queue and cargo test --lib test_fair_queue: 9 passed, 0 failed
  • cargo clippy --all-targets -- --deny warnings: exit 0 (only the repo's pre-existing renamed-lint notes)

Recreates #271.

rgbkrk added 2 commits May 29, 2026 07:15
With a single connected peer, FairQueue still paid the full multi-peer
bookkeeping cost on every message: ready-queue push/pop, queued-set
updates, map remove/insert, and requeueing. The one-way large-payload IPC
receive benchmarks pointed straight at this common receive-side cost
rather than ROUTER identity or echo-send work.

Store the lone connected stream outside the ready-queue machinery in a
single_stream slot and poll it directly while it is the only stream. When
a second stream is inserted, promote both streams back into the existing
fair-queue path so multi-peer polling semantics are unchanged. Disconnect
callback behavior is preserved on the single-stream path.

This layers on top of the existing spin fix that bounds ready events to
those present at poll entry; the fast path runs before the bounded
multi-peer loop and does not touch the ready queue.

Adds regression coverage for single-stream delivery, promotion when a
second stream is inserted, and disconnect-callback handling on the
single-stream path.
Add a regression test that a single stream left Pending on the fast path is
promoted to the multi-peer path when a second peer connects, and that the task
parked on the fast path is woken so a real executor re-polls. Document on the
single_stream field that promotion is one-way (no demotion after peer churn).
@rgbkrk rgbkrk marked this pull request as ready for review May 29, 2026 18:00

@Alexei-Kornienko Alexei-Kornienko left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea:
Would it be better to replace fair queue on the top level with an enum type that would have 3 variants:

  1. empty queue
  2. single client
  3. many clients

Each enum variant may have it's own subtype implementing basic logic
and we can have several helper methods that would cleanly transform types into each other.

My main concern is that this type is way to complicated and I would prefer to simplify it for ease of maintenance.

@rgbkrk

rgbkrk commented May 29, 2026

Copy link
Copy Markdown
Member Author

Would it be better to replace fair queue on the top level with an enum type that would have 3 variants:

That's a really good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants