Skip to content

Bugfix: RDMA client cannot fall back to TCP when RDMA is not enabled on the server#3350

Open
chenBright wants to merge 1 commit into
apache:masterfrom
chenBright:rdma_client_fallback
Open

Bugfix: RDMA client cannot fall back to TCP when RDMA is not enabled on the server#3350
chenBright wants to merge 1 commit into
apache:masterfrom
chenBright:rdma_client_fallback

Conversation

@chenBright

@chenBright chenBright commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: resolve

Problem Summary:

When an RDMA-enabled client connects to a server that supports TCP only,
the client sends its handshake hello ("RDMA"/"RDM3" magic + body) and
then blocks waiting for the server hello. The server does not recognize those
leading bytes as any known protocol and simply closes the connection, so the
client only sees EOF on the socket and has no signal to downgrade to TCP.

What is changed and the side effects?

Changed:

A new server-side-only pseudo-protocol PROTOCOL_RDMA_HANDSHAKE
(registered ahead of PROTOCOL_HTTP so the protocol selector tries it first)
implements a tiny handshake state machine:

  1. Detect: If the first 4 bytes are "RDMA" or "RDM3", enter the fallback path;
    otherwise return TRY_OTHERS and let normal protocol parsing continue.

  2. Drain client hello: Read and drain the full v2 or v3 hello.

  3. Reply an un-negotiable hello:

  • v2: well-formed framing (correct magic + msg_len) but hello_ver = 0xFFFF,
    which makes the client's ValidHelloMessage() fail.
  • v3: a RdmaHello protobuf that parses cleanly (all required fields set) but fails
    ValidRdmaHello() because block_size = 0 < MIN_BLOCK_SIZE and qp_num = 0.
  • In both cases the client sets negotiated = false and downgrades to TCP on the
    same connection rather than erroring at the IO layer.
  1. Drain the client ACK: A per-socket parsing context records that we are now
    expecting the 4B handshake ACK; the next call drops those bytes, clears the
    context, and returns TRY_OTHERS so the subsequent real RPC payload goes
    through normal parsing.

Side effects:

  • Performance effects:

  • Breaking backward compatibility:


Check List:

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a downgrade-path bug where an RDMA-enabled client connecting to a TCP-only server could block waiting for an RDMA server hello and fail to fall back cleanly. It does so by adding a server-side pseudo-protocol that recognizes RDMA handshake magic, drains the client hello, replies an intentionally “un-negotiable” hello to trigger client-side downgrade to TCP on the same connection, then drains the client ACK and returns control to normal protocol parsing.

Changes:

  • Add a server-side rdma_handshake fallback protocol to enable RDMA-client → TCP-server downgrade on the same TCP connection.
  • Centralize RDMA handshake wire constants into a shared header and update RDMA handshake code/tests to use them.
  • Extend RawPacker/RawUnpacker with 16-bit and raw-bytes packing helpers and update v2 handshake serialization to use them.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/brpc_rdma_unittest.cpp Updates tests to use shared handshake constants and verifies RDMA-client → TCP-server calls no longer fail.
src/butil/raw_pack.h Adds pack16/unpack16 and pack/unpack raw-bytes helpers used by handshake serialization.
src/brpc/server.cpp Ensures rdma_handshake isn’t filtered out by enabled_protocols whitelist logic.
src/brpc/rdma/rdma_handshake.h Switches handshake constants to the shared constants header.
src/brpc/rdma/rdma_handshake.cpp Refactors v2 hello (de)serialization and uses shared constants for versions/sizes/magic.
src/brpc/rdma/rdma_handshake_constants.h New shared header for RDMA handshake wire constants (magic, lengths, versions, bounds, ACK format).
src/brpc/rdma/rdma_endpoint.cpp Uses shared constants for magic-length handling in server handshake path.
src/brpc/policy/rdma_handshake_fallback_protocol.h Declares the new server-side RDMA-handshake fallback protocol parser.
src/brpc/policy/rdma_handshake_fallback_protocol.cpp Implements the fallback handshake state machine using socket parsing context.
src/brpc/parse_result.h Adds missing stddef.h include (size_t).
src/brpc/options.proto Introduces PROTOCOL_RDMA_HANDSHAKE and changes protocol numeric ordering.
src/brpc/input_messenger.cpp Removes the old special-case RDMA “TRY_OTHERS” hack in input parsing.
src/brpc/global.cpp Registers the new rdma_handshake protocol at startup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/butil/raw_pack.h
Comment thread src/butil/raw_pack.h
Comment thread src/brpc/policy/rdma_handshake_fallback_protocol.cpp
Comment thread src/brpc/options.proto
Comment thread src/brpc/server.cpp
@chenBright chenBright force-pushed the rdma_client_fallback branch from 556984f to 4d56434 Compare June 18, 2026 17:36
@chenBright chenBright force-pushed the rdma_client_fallback branch from 4d56434 to 7b86333 Compare June 19, 2026 06:37
@chenBright chenBright changed the title Bugfix: RDMA client cannot fall back to TCP when server does not recognize RDMA handshake Bugfix: RDMA client cannot fall back to TCP when RDMA is not enabled on the server Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants