Add FairQueue single-peer fast path by rgbkrk · Pull Request #282 · zeromq/zmq.rs

rgbkrk · 2026-05-29T14:24:06Z

Summary

With a single connected peer, FairQueue still pays the full multi-peer bookkeeping cost on every message: ready-queue push/pop, queued-set updates, map remove/insert, and requeueing. The one-way large-payload IPC receive benchmarks pointed straight at this common receive-side cost rather than ROUTER identity or echo-send work. This stores the lone connected stream outside the ready-queue machinery and polls it directly while it remains the only stream, then promotes both streams back into the existing fair-queue path the moment a second peer connects.

Design decision

The fast path is a single_stream: Option<(K, Pin<Box<S>>)> slot on QueueInner. The safety boundary is the stream count:

state	where the stream lives	how it's polled
zero streams, then first insert	`single_stream` slot	polled directly with the task waker at the top of `poll_next`
second insert arrives	both moved into `streams` + `ready_queue`	existing multi-peer fair-queue loop, semantics unchanged
single stream disconnects (`Ready(None)`)	slot cleared	`on_disconnect` fired outside the lock, same as the multi-peer path
single stream removed via `remove`	slot cleared	n/a

insert is the only promotion site. If a stream already sits in streams, a new insert never takes the fast path, so the fast path can only ever hold the sole stream. The direct poll happens before the bounded multi-peer loop and never touches ready_queue, so it composes with the existing spin fix (ready events are still bounded to those present at poll entry). queue_empty_poll folds single_stream.is_some() into the "stay Pending" check so an idle single stream does not get reported as end-of-stream.

This does add complexity to a core scheduling type. The benchmark numbers below are why it carries its weight. The promotion boundary keeps that complexity contained: anything past one peer runs the exact code that ran before.

Benchmark evidence

From the original measurement run (not re-measured here). Benchmark shape: one-way workloads that isolate receive cost from echo/reply.

DEALER/ROUTER one-way: DEALER sends BATCH_SIZE messages, ROUTER receives BATCH_SIZE, no echo
PUSH/PULL one-way: PUSH sends BATCH_SIZE, PULL receives BATCH_SIZE
transport ipc, payload 4096B
ZMQRS_BENCH_SAMPLE_SIZE=20, ZMQRS_BENCH_MEASUREMENT_MS=2000, ZMQRS_BENCH_WARMUP_MS=500

workload	before	after	elapsed reduction	throughput
`zmqrs D/R one-way ipc 4096`	`4.8989 ms`	`4.1979 ms`	~14.3%	~16.7% more
`zmqrs P/P one-way ipc 4096`	`5.2582 ms`	`4.1240 ms`	~21.6%	~27.5% more

Since each workload moves a fixed number of messages, the lower elapsed time corresponds directly to higher receive-side throughput.

Validation

cargo test --lib fair_queue and cargo test --lib test_fair_queue: 9 passed, 0 failed
cargo clippy --all-targets -- --deny warnings: exit 0 (only the repo's pre-existing renamed-lint notes)

Recreates #271.

With a single connected peer, FairQueue still paid the full multi-peer bookkeeping cost on every message: ready-queue push/pop, queued-set updates, map remove/insert, and requeueing. The one-way large-payload IPC receive benchmarks pointed straight at this common receive-side cost rather than ROUTER identity or echo-send work. Store the lone connected stream outside the ready-queue machinery in a single_stream slot and poll it directly while it is the only stream. When a second stream is inserted, promote both streams back into the existing fair-queue path so multi-peer polling semantics are unchanged. Disconnect callback behavior is preserved on the single-stream path. This layers on top of the existing spin fix that bounds ready events to those present at poll entry; the fast path runs before the bounded multi-peer loop and does not touch the ready queue. Adds regression coverage for single-stream delivery, promotion when a second stream is inserted, and disconnect-callback handling on the single-stream path.

Add a regression test that a single stream left Pending on the fast path is promoted to the multi-peer path when a second peer connects, and that the task parked on the fast path is woken so a real executor re-polls. Document on the single_stream field that promotion is one-way (no demotion after peer churn).

Alexei-Kornienko

Just an idea:
Would it be better to replace fair queue on the top level with an enum type that would have 3 variants:

empty queue
single client
many clients

Each enum variant may have it's own subtype implementing basic logic
and we can have several helper methods that would cleanly transform types into each other.

My main concern is that this type is way to complicated and I would prefer to simplify it for ease of maintenance.

rgbkrk · 2026-05-29T20:11:02Z

Would it be better to replace fair queue on the top level with an enum type that would have 3 variants:

That's a really good idea.

rgbkrk added 2 commits May 29, 2026 07:15

rgbkrk marked this pull request as ready for review May 29, 2026 18:00

Alexei-Kornienko reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FairQueue single-peer fast path#282

Add FairQueue single-peer fast path#282
rgbkrk wants to merge 2 commits into
masterfrom
perf/fair-queue-single-peer

rgbkrk commented May 29, 2026

Uh oh!

Alexei-Kornienko left a comment

Uh oh!

rgbkrk commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rgbkrk commented May 29, 2026

Summary

Design decision

Benchmark evidence

Validation

Uh oh!

Alexei-Kornienko left a comment

Choose a reason for hiding this comment

Uh oh!

rgbkrk commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants