LCORE-1613: RAG tool calls loop indefinitely#1998
Conversation
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (7)
📜 Recent review details⏰ Context from checks skipped due to timeout. (7)
🧰 Additional context used📓 Path-based instructions (4)src/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
src/app/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
src/models/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
tests/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (5)📚 Learning: 2026-02-23T14:56:59.186ZApplied to files:
📚 Learning: 2026-06-24T13:45:37.249ZApplied to files:
📚 Learning: 2026-04-06T20:18:07.852ZApplied to files:
📚 Learning: 2026-01-12T10:58:40.230ZApplied to files:
📚 Learning: 2026-02-25T07:46:33.545ZApplied to files:
🛑 Comments failed to post (1)
🔇 Additional comments (6)
WalkthroughThe PR adds optional inference limit fields to ChangesResponses inference limits
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| api_params.max_infer_iters = configuration.inference.max_infer_iters | ||
| if "max_tool_calls" not in original_request.model_fields_set: | ||
| api_params.max_tool_calls = configuration.inference.max_tool_calls | ||
|
|
There was a problem hiding this comment.
It's just a nit, but could you please follow the same pattern as other parameter overrides do. Namely, override attributes of updated_request (if not explicitly set) above the model_validate command.
c16c5d3 to
de9d863
Compare
|
/ok-to-test |
|
@arin-deloatch please fix unit tests, config asserts should contain default values of new config options you created |
de9d863 to
59edfd1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/utils/test_responses.py (1)
1983-2029: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick winAssert the propagated inference defaults in a happy-path test.
These tests now instantiate
InferenceConfiguration(), but they still don't verify thatprepare_responses_params()actually forwardsmax_infer_itersandmax_tool_calls. A regression dropping either field would still pass here.Suggested assertion update
result = await prepare_responses_params( mock_client, query_request, None, "token" ) assert result.input == "test" assert result.model == "provider1/model1" assert result.conversation == "llama_conv1" + assert result.max_infer_iters == 10 + assert result.max_tool_calls == 30🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/utils/test_responses.py` around lines 1983 - 2029, The happy-path tests for prepare_responses_params currently set configuration.inference but do not assert that its defaults are propagated, so add assertions in the existing prepare_responses_params test cases to verify the returned params include the inference values for max_infer_iters and max_tool_calls. Use the prepare_responses_params symbol and the InferenceConfiguration setup in these tests to confirm both fields are forwarded in the result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/models/config.py`:
- Around line 2335-2337: Reject blank model_id values in the config model so
malformed guardrail config is caught during load. Update the Guard/Config
Pydantic model that defines model_id to add a non-empty validation check using a
field_validator (or model_validator if that’s the existing pattern), ensuring
empty or whitespace-only strings are rejected. Keep the validation close to the
model_id field so invalid config cannot pass parsing and only valid IDs reach
later model usage.
---
Outside diff comments:
In `@tests/unit/utils/test_responses.py`:
- Around line 1983-2029: The happy-path tests for prepare_responses_params
currently set configuration.inference but do not assert that its defaults are
propagated, so add assertions in the existing prepare_responses_params test
cases to verify the returned params include the inference values for
max_infer_iters and max_tool_calls. Use the prepare_responses_params symbol and
the InferenceConfiguration setup in these tests to confirm both fields are
forwarded in the result.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ad61c2fd-da64-4c23-a4e4-859bcc4765ef
📒 Files selected for processing (7)
docs/openapi.jsonsrc/app/endpoints/responses.pysrc/models/config.pysrc/utils/responses.pytests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_inference_configuration.pytests/unit/utils/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout. (7)
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/utils/responses.pysrc/app/endpoints/responses.pysrc/models/config.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: FastAPI dependencies: Import fromfastapimodule forAPIRouter,HTTPException,Request,status,Depends
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoints and handleAPIConnectionErrorfrom Llama Stack
Files:
src/app/endpoints/responses.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models must use
@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype
Files:
src/models/config.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/models/config/test_inference_configuration.pytests/unit/utils/test_responses.pytests/unit/models/config/test_dump_configuration.py
🧠 Learnings (5)
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.
Applied to files:
src/utils/responses.py
📚 Learning: 2026-06-24T13:45:37.249Z
Learnt from: Jdubrick
Repo: lightspeed-core/lightspeed-stack PR: 1971
File: src/utils/markdown_repair.py:31-36
Timestamp: 2026-06-24T13:45:37.249Z
Learning: In the lightspeed-stack repository, docstrings must use the section header name "Parameters:" (not "Args:") for function arguments, even if the project references Google Python docstring conventions. Ensure docstrings follow the project’s established "Parameters:" header format for any documented function parameters.
Applied to files:
src/utils/responses.pysrc/app/endpoints/responses.pysrc/models/config.pytests/unit/models/config/test_inference_configuration.pytests/unit/utils/test_responses.pytests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/responses.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
🔇 Additional comments (6)
src/models/config.py (1)
681-682: LGTM!Also applies to: 1732-1750
docs/openapi.json (1)
13618-13644: LGTM!tests/unit/models/config/test_inference_configuration.py (1)
95-160: LGTM!tests/unit/models/config/test_dump_configuration.py (1)
166-167: LGTM!Also applies to: 380-381, 738-739, 995-996, 1232-1233, 1464-1465, 1841-1842, 2064-2065, 2287-2288, 2517-2518
src/utils/responses.py (1)
432-433: LGTM!src/app/endpoints/responses.py (1)
467-470: LGTM!
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/utils/test_responses.py (1)
1983-2029: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick winAssert the propagated inference defaults in a happy-path test.
These tests now instantiate
InferenceConfiguration(), but they still don't verify thatprepare_responses_params()actually forwardsmax_infer_itersandmax_tool_calls. A regression dropping either field would still pass here.Suggested assertion update
result = await prepare_responses_params( mock_client, query_request, None, "token" ) assert result.input == "test" assert result.model == "provider1/model1" assert result.conversation == "llama_conv1" + assert result.max_infer_iters == 10 + assert result.max_tool_calls == 30🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/utils/test_responses.py` around lines 1983 - 2029, The happy-path tests for prepare_responses_params currently set configuration.inference but do not assert that its defaults are propagated, so add assertions in the existing prepare_responses_params test cases to verify the returned params include the inference values for max_infer_iters and max_tool_calls. Use the prepare_responses_params symbol and the InferenceConfiguration setup in these tests to confirm both fields are forwarded in the result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/models/config.py`:
- Around line 2335-2337: Reject blank model_id values in the config model so
malformed guardrail config is caught during load. Update the Guard/Config
Pydantic model that defines model_id to add a non-empty validation check using a
field_validator (or model_validator if that’s the existing pattern), ensuring
empty or whitespace-only strings are rejected. Keep the validation close to the
model_id field so invalid config cannot pass parsing and only valid IDs reach
later model usage.
---
Outside diff comments:
In `@tests/unit/utils/test_responses.py`:
- Around line 1983-2029: The happy-path tests for prepare_responses_params
currently set configuration.inference but do not assert that its defaults are
propagated, so add assertions in the existing prepare_responses_params test
cases to verify the returned params include the inference values for
max_infer_iters and max_tool_calls. Use the prepare_responses_params symbol and
the InferenceConfiguration setup in these tests to confirm both fields are
forwarded in the result.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ad61c2fd-da64-4c23-a4e4-859bcc4765ef
📒 Files selected for processing (7)
docs/openapi.jsonsrc/app/endpoints/responses.pysrc/models/config.pysrc/utils/responses.pytests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_inference_configuration.pytests/unit/utils/test_responses.py
📜 Review details
🔇 Additional comments (6)
src/models/config.py (1)
681-682: LGTM!Also applies to: 1732-1750
docs/openapi.json (1)
13618-13644: LGTM!tests/unit/models/config/test_inference_configuration.py (1)
95-160: LGTM!tests/unit/models/config/test_dump_configuration.py (1)
166-167: LGTM!Also applies to: 380-381, 738-739, 995-996, 1232-1233, 1464-1465, 1841-1842, 2064-2065, 2287-2288, 2517-2518
src/utils/responses.py (1)
432-433: LGTM!src/app/endpoints/responses.py (1)
467-470: LGTM!
🛑 Comments failed to post (1)
src/models/config.py (1)
2335-2337: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Reject blank
model_idvalues at config load.
straccepts"", so a malformed guardrail config survives parsing and fails later when the model is used. Add non-empty validation here so invalid config is rejected early.Proposed fix
- model_id: str = Field( - ..., title="Model id", description="The model_id to use for the guard" - ) + model_id: str = Field( + ..., + min_length=1, + title="Model id", + description="The model_id to use for the guard", + )As per coding guidelines,
src/models/**/*.py: "Pydantic models must use@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype`."📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.model_id: str = Field( ..., min_length=1, title="Model id", description="The model_id to use for the guard", )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/models/config.py` around lines 2335 - 2337, Reject blank model_id values in the config model so malformed guardrail config is caught during load. Update the Guard/Config Pydantic model that defines model_id to add a non-empty validation check using a field_validator (or model_validator if that’s the existing pattern), ensuring empty or whitespace-only strings are rejected. Keep the validation close to the model_id field so invalid config cannot pass parsing and only valid IDs reach later model usage.Source: Coding guidelines
|
/ok-to-test |
Description
When using small models with limited context windows (e.g., Llama 3.1 8B on vLLM), RAG tool calls can loop indefinitely. The model repeatedly searches for a specific piece of text, each iteration appending tokens to the context until the model's context window is exceeded, resulting in a 400 error:
Error code: 400 - This model's maximum context ength is 113920 tokens. However, your request has 116873 input tokens.The
max_infer_itersandmax_tool_callsfields already existed as optional per-request parameters on the Responses API, but both defaulted to None (no limit). There was no server-side mechanism for operators to set safety defaults, so unless a client explicitly set these values, nothing prevented the runaway loop.Changes
Added
max_infer_iters(default 10) andmax_tool_calls(default 30) as configurable fields onInferenceConfigurationinsrc/models/config.py. Operators can override these in their YAML config or set them to None to disable the limit. Per-request values from clients always take precedence.Applied config defaults in
prepare_responses_params()for the/v1/query,/v1/streaming_query, and A2A endpoints.Applied config defaults in
responses_endpoint_handler()for the/v1/responses endpoint, only when the client omits the values.Added 10 unit tests for the new config fields (default values, positive int acceptance, zero/negative rejection, None acceptance).
Configuration example
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
uv run pytest tests/unit/models/config/test_inference_configuration.py -vuv run pytest tests/unit/app/endpoints/test_responses.py -vuv run pytest tests/unit/utils/test_responses.py -vuv run make formatanduv run make verifypass (only pre-existing mypy errors in test_container_lifecycle.py).Test Regressions
8 existing tests in
tests/unit/utils/test_responses.pysetmock_config.inference = None, which causedAttributeErrorwhen the new code accessedconfiguration.inference.max_infer_iters. Updated these mocks to use a realInferenceConfiguration()instance.Summary by CodeRabbit
New Features
Bug Fixes
Tests