Skip to content

feat: add ref (commit SHA) to search and symbol navigation results#1307

Open
Amresh-01 wants to merge 1 commit into
sourcebot-dev:mainfrom
Amresh-01:feature/add-ref-to-search-results
Open

feat: add ref (commit SHA) to search and symbol navigation results#1307
Amresh-01 wants to merge 1 commit into
sourcebot-dev:mainfrom
Amresh-01:feature/add-ref-to-search-results

Conversation

@Amresh-01

@Amresh-01 Amresh-01 commented Jun 13, 2026

Copy link
Copy Markdown

Fixes : #1173

[FR] Add ref to search results
#1173

Description

This PR addresses issue #1173 by returning the git ref (commit SHA) in search results, symbol definitions, and symbol references. This allows clients and AI agents to fetch stable file references at the exact matching commit via /api/source later.

Additionally, this PR resolves local repository setup and indexing compatibility bugs on Windows OS when paths contain spaces.

Key Changes

1. Schema Enhancements

  • Added ref as an optional string in search (searchFileSchema) and symbol navigation (findRelatedSymbolsResponseSchema) schemas.
  • Regenerated OpenAPI specs (sourcebot-public.openapi.json) to reflect the new ref field.

2. Result Mapping

  • Mapped the Zoekt search version field (which stores the commit SHA) to the ref field in search response parsing.
  • Passed the mapped ref cleanly through the definitions and references endpoints.

3. Agent Tools Update

  • Updated the findSymbolDefinitions and findSymbolReferences agent tools to use file.ref as their revision instead of hardcoding "HEAD".

4. Windows Compatibility Bug Fixes

  • Used fileURLToPath with a fallback mechanism in repoCompileUtils.ts and shared/src/utils.ts to prevent path parsing errors (such as Path /C:/... does not exist) on Windows.
  • Added double quotes around the -index path and repository source path in zoekt-git-index CLI commands to support directories containing spaces (e.g. Open Source).
  • Mapped repository path.join calls to path.posix.join to enforce forward-slash / URI separators consistently across all operating systems.

Verification & Testing

  • Automated Tests: Ran the full workspace tests (yarn test). All 884 tests passed successfully.
  • Integration Test: Verified that the local server successfully indexes local directories on Windows.
  • API Test: Verified that queries to /api/find_references return the correct commit SHA (ref) in the JSON output:
    "files": [
      {
        "fileName": "packages\\web\\src\\features\\codeNav\\api.ts",
        "repository": "github.com\\Amresh-01\\sourcebot",
        "ref": "4ec87e1cc721341d51c1b9e8e03fbe7ea8289f3b"
      }
    ]
    
    

Summary by CodeRabbit

  • New Features

    • Public search and symbol APIs now include a per-file git commit SHA ("ref") in responses.
  • Bug Fixes

    • More consistent cross-platform path handling and safer command-line argument quoting.
  • Tests

    • Added tests verifying symbol search/definition behavior includes file commit refs.

@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 61d4221f-cad1-49f8-a46c-7cc9efdc17ea

📥 Commits

Reviewing files that changed from the base of the PR and between cbbe225 and 22700d9.

📒 Files selected for processing (11)
  • docs/api-reference/sourcebot-public.openapi.json
  • packages/backend/src/repoCompileUtils.ts
  • packages/backend/src/zoekt.ts
  • packages/shared/src/utils.ts
  • packages/web/src/features/codeNav/api.test.ts
  • packages/web/src/features/codeNav/api.ts
  • packages/web/src/features/codeNav/types.ts
  • packages/web/src/features/search/types.ts
  • packages/web/src/features/search/zoektSearcher.ts
  • packages/web/src/features/tools/findSymbolDefinitions.ts
  • packages/web/src/features/tools/findSymbolReferences.ts
🚧 Files skipped from review as they are similar to previous changes (10)
  • packages/web/src/features/codeNav/api.ts
  • packages/web/src/features/tools/findSymbolDefinitions.ts
  • packages/web/src/features/search/zoektSearcher.ts
  • packages/web/src/features/tools/findSymbolReferences.ts
  • docs/api-reference/sourcebot-public.openapi.json
  • packages/shared/src/utils.ts
  • packages/web/src/features/codeNav/types.ts
  • packages/web/src/features/codeNav/api.test.ts
  • packages/backend/src/zoekt.ts
  • packages/backend/src/repoCompileUtils.ts

Walkthrough

This PR propagates git ref/commit SHA information across the system and standardizes repository path handling. It updates public API schemas to include ref fields, normalizes all repository path derivation to use POSIX path joining, fixes file:// URL-to-filesystem path conversion, and threads the ref value through search responses, CodeNav APIs, and symbol tools with comprehensive tests.

Changes

Git Ref Propagation and Path Normalization

Layer / File(s) Summary
Public API Response Schemas with ref
docs/api-reference/sourcebot-public.openapi.json
PublicSearchResponse and PublicFindSymbolsResponse schemas add optional ref string field representing the git commit SHA for each file chunk and file item respectively.
Backend Repository Path Normalization
packages/backend/src/repoCompileUtils.ts
All repository compile functions (GitHub, GitLab, Gitea, Gerrit, Bitbucket, generic, Azure DevOps) switch to path.posix.join for consistent forward-slash repo names. File:// URLs derive normalized folderPath via fileURLToPath with try/catch fallback, used for globbing and logging.
Generic git host file:// path handling
packages/backend/src/repoCompileUtils.ts
Adds a file://-specific compilation path that converts file:// URLs to native filesystem paths, normalizes separators, and uses that folderPath in glob patterns and log messages.
File URL-to-Filesystem Path Conversion
packages/shared/src/utils.ts
getRepoPath utility correctly converts file: protocol clone URLs to filesystem paths using fileURLToPath instead of pathname, with proper character normalization and JSDoc.
Zoekt Command Argument Quoting
packages/backend/src/zoekt.ts
zoekt-git-index command construction quotes all dynamic arguments (index cache directory, branches, repository path) for shell robustness; JSDoc added for indexGitRepository.
Search Response Processing with ref Field
packages/web/src/features/search/types.ts, packages/web/src/features/search/zoektSearcher.ts
searchFileSchema and transformZoektSearchResponse add ref field sourced from zoekt file.version, propagating commit SHA through the search pipeline.
CodeNav API Integration with ref and Tests
packages/web/src/features/codeNav/types.ts, packages/web/src/features/codeNav/api.ts, packages/web/src/features/codeNav/api.test.ts
findRelatedSymbolsResponseSchema adds optional ref field, parseRelatedSymbolsSearchResponse maps ref in file results, and Vitest suite verifies both findSearchBasedSymbolReferences and findSearchBasedSymbolDefinitions propagate commit SHA from search responses.
Tool Metadata with Per-File Revision
packages/web/src/features/tools/findSymbolDefinitions.ts, packages/web/src/features/tools/findSymbolReferences.ts
find_symbol_definitions and find_symbol_references now record each file's revision from file.ref when available, falling back to tool default (HEAD), enabling correct commit SHA in tool metadata and derived sources.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

  • sourcebot-dev/sourcebot#615: Both PRs modify the web zoekt search response transformation; #615 refactors zoekt parsing while this PR extends the transformed file shape with ref.
  • sourcebot-dev/sourcebot#1066: Modifies PublicFindSymbolsResponse and other public API schemas that are extended with ref fields in this PR.
  • sourcebot-dev/sourcebot#813: Also threads ref (commit SHA) through file/source retrieval; changes may overlap in callers and UI usage of ref.

Suggested Reviewers

  • msukkari
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and concisely summarizes the main change: adding git refs (commit SHAs) to search and symbol navigation results, which aligns with the core functionality across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/backend/src/zoekt.ts (1)

17-33: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use execFile/spawn args instead of shell-string exec to prevent command injection.

Line 32 still executes a shell command string assembled from dynamic inputs (Lines 23, 27, 28). Quoting alone is not a complete boundary if inputs contain shell-significant characters. Please pass arguments as an array via execFile (or spawn) to eliminate shell parsing.

Suggested fix
-import { exec } from "child_process";
+import { execFile } from "child_process";
@@
-    const command = [
-        'zoekt-git-index',
-        '-allow_missing_branches',
-        `-index "${INDEX_CACHE_DIR}"`,
-        `-max_trigram_count ${settings.maxTrigramCount}`,
-        `-file_limit ${settings.maxFileSize}`,
-        `-branches "${revisions.join(',')}"`,
-        `-tenant_id ${repo.orgId}`,
-        `-repo_id ${repo.id}`,
-        `-shard_prefix_override ${shardPrefix}`,
-        ...largeFileGlobPatterns.map((pattern) => `-large_file "${pattern}"`),
-        `"${repoPath}"`
-    ].join(' ');
+    const args = [
+        '-allow_missing_branches',
+        '-index', INDEX_CACHE_DIR,
+        '-max_trigram_count', String(settings.maxTrigramCount),
+        '-file_limit', String(settings.maxFileSize),
+        '-branches', revisions.join(','),
+        '-tenant_id', String(repo.orgId),
+        '-repo_id', String(repo.id),
+        '-shard_prefix_override', shardPrefix,
+        ...largeFileGlobPatterns.flatMap((pattern) => ['-large_file', pattern]),
+        repoPath,
+    ];
@@
-        exec(command, { signal }, (error, stdout, stderr) => {
+        execFile('zoekt-git-index', args, { signal }, (error, stdout, stderr) => {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/backend/src/zoekt.ts` around lines 17 - 33, The code currently
builds a shell command string from dynamic inputs (INDEX_CACHE_DIR,
settings.maxTrigramCount, settings.maxFileSize, revisions, repo.orgId, repo.id,
shardPrefix, largeFileGlobPatterns, repoPath) and calls exec, which risks
command injection; change to use execFile or spawn by constructing an args array
instead of a single command string (e.g., args = ['-allow_missing_branches',
'-index', INDEX_CACHE_DIR, '-max_trigram_count',
String(settings.maxTrigramCount), '-file_limit', String(settings.maxFileSize),
'-branches', revisions.join(','), '-tenant_id', String(repo.orgId), '-repo_id',
String(repo.id), '-shard_prefix_override', shardPrefix,
...largeFileGlobPatterns.flatMap(p => ['-large_file', p]), repoPath]) and call
execFile('zoekt-git-index', args, { signal }, (error, stdout, stderr) => ...) so
all dynamic values are passed as argv items (no shell parsing) and preserve the
same callback behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/backend/src/zoekt.ts`:
- Around line 17-33: The code currently builds a shell command string from
dynamic inputs (INDEX_CACHE_DIR, settings.maxTrigramCount, settings.maxFileSize,
revisions, repo.orgId, repo.id, shardPrefix, largeFileGlobPatterns, repoPath)
and calls exec, which risks command injection; change to use execFile or spawn
by constructing an args array instead of a single command string (e.g., args =
['-allow_missing_branches', '-index', INDEX_CACHE_DIR, '-max_trigram_count',
String(settings.maxTrigramCount), '-file_limit', String(settings.maxFileSize),
'-branches', revisions.join(','), '-tenant_id', String(repo.orgId), '-repo_id',
String(repo.id), '-shard_prefix_override', shardPrefix,
...largeFileGlobPatterns.flatMap(p => ['-large_file', p]), repoPath]) and call
execFile('zoekt-git-index', args, { signal }, (error, stdout, stderr) => ...) so
all dynamic values are passed as argv items (no shell parsing) and preserve the
same callback behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cdcfd05c-c5ce-42b8-ae9d-1c3e82eaf561

📥 Commits

Reviewing files that changed from the base of the PR and between 4ec87e1 and cbbe225.

📒 Files selected for processing (11)
  • docs/api-reference/sourcebot-public.openapi.json
  • packages/backend/src/repoCompileUtils.ts
  • packages/backend/src/zoekt.ts
  • packages/shared/src/utils.ts
  • packages/web/src/features/codeNav/api.test.ts
  • packages/web/src/features/codeNav/api.ts
  • packages/web/src/features/codeNav/types.ts
  • packages/web/src/features/search/types.ts
  • packages/web/src/features/search/zoektSearcher.ts
  • packages/web/src/features/tools/findSymbolDefinitions.ts
  • packages/web/src/features/tools/findSymbolReferences.ts

@Amresh-01 Amresh-01 force-pushed the feature/add-ref-to-search-results branch from cbbe225 to 22700d9 Compare June 13, 2026 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FR] Add ref to search results

1 participant