Skip to content

fix: return get_files patches as raw text to avoid double-encoding#2666

Open
rodboev wants to merge 2 commits into
github:mainfrom
rodboev:pr/get-files-raw-patches
Open

fix: return get_files patches as raw text to avoid double-encoding#2666
rodboev wants to merge 2 commits into
github:mainfrom
rodboev:pr/get-files-raw-patches

Conversation

@rodboev

@rodboev rodboev commented Jun 10, 2026

Copy link
Copy Markdown

Summary

Fixes get_files, which double-encodes per-file patch content by marshaling the entire file array to JSON (escaping newlines) and then wrapping that JSON string in a TextContent field that the MCP transport JSON-encodes again, roughly doubling escape overhead on patch content and pushing moderate PRs past token limits.

Why

GetPullRequestFiles passes the []MinimalPRFile slice (including each file's Patch string) to MarshalledTextResult, which calls json.Marshal to produce a JSON array, then wraps it in NewToolResultText. On the MCP wire, TextContent.Text is JSON-encoded a second time, so every \n in every patch becomes \\n (4 bytes instead of 1). For a ~73 KB diff, this inflates the payload to 125 KB+, exceeding tool response token limits on PRs that are not unusually large.

get_diff is not affected: it already returns raw diff text via NewToolResultText(string(raw)), which gets only the single unavoidable MCP wire encoding.

Fixes #2242

What changed

  • pkg/github/pullrequests.go: GetPullRequestFiles now returns a multi-content CallToolResult. Content[0] is a JSON array of file metadata (filename, status, additions, deletions, changes, previous_filename) without patches. Content[1..N] are raw TextContent blocks, one per file with a non-empty patch, containing the filename as a header followed by the patch text. Patches are no longer embedded inside a JSON string value, eliminating the double-encoding.
  • pkg/github/pullrequests_test.go: Test_GetPullRequestFiles verifies the multi-content response structure: metadata JSON in Content[0], raw patch text blocks in Content[1..N], and no patch field in the metadata JSON.

MCP impact

  • No tool or API changes
  • Tool schema or behavior changed
    The tool's response format changes from a single JSON text block with embedded patch strings to multiple text content blocks: metadata JSON plus raw patch text. Tool parameters, name, description, and annotations are unchanged, so toolsnaps and generated docs are unaffected.
  • New tool added

Prompts tested (tool changes only)

Security / limits

  • No security or limits impact
  • Auth / permissions considered
    The tool continues to use the existing pull request file API path and repo permission requirements; no new scope or permission check is added.
  • Data exposure, filtering, or token/size limits considered
    This returns the same file metadata and patch content in separate content blocks, reducing encoding overhead without adding fields or API calls.

Tool renaming

  • I am renaming tools as part of this PR (e.g. a part of a consolidation effort)
    • I have added the new tool aliases in deprecated_tool_aliases.go
  • I am not renaming tools as part of this PR

Lint & tests

  • Linted locally with ./script/lint
    Not run locally.

  • Tested locally with ./script/test
    Not run locally. The full race suite was not run locally.

  • go test -race ./pkg/github -run Test_GetPullRequestFiles - covers: multi-content response structure, metadata-only JSON without patches, raw patch text blocks with filename headers, pagination, and error cases.

Docs

  • Not needed
    Tool parameters and schema are unchanged. The response format change is behavioral, not schema-level. Toolsnaps are unaffected.

@rodboev rodboev requested a review from a team as a code owner June 10, 2026 20:31
@rodboev rodboev force-pushed the pr/get-files-raw-patches branch from 439fd3a to 1503978 Compare June 11, 2026 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get_diff / get_files: JSON serialization inflates diff payload far beyond token limits

1 participant