Skip to content

LCORE-1613: RAG tool calls loop indefinitely#1998

Open
arin-deloatch wants to merge 5 commits into
lightspeed-core:mainfrom
arin-deloatch:bug/LCORE-1613
Open

LCORE-1613: RAG tool calls loop indefinitely#1998
arin-deloatch wants to merge 5 commits into
lightspeed-core:mainfrom
arin-deloatch:bug/LCORE-1613

Conversation

@arin-deloatch

@arin-deloatch arin-deloatch commented Jun 25, 2026

Copy link
Copy Markdown

Description

When using small models with limited context windows (e.g., Llama 3.1 8B on vLLM), RAG tool calls can loop indefinitely. The model repeatedly searches for a specific piece of text, each iteration appending tokens to the context until the model's context window is exceeded, resulting in a 400 error:

Error code: 400 - This model's maximum context ength is 113920 tokens. However, your request has 116873 input tokens.

The max_infer_iters and max_tool_calls fields already existed as optional per-request parameters on the Responses API, but both defaulted to None (no limit). There was no server-side mechanism for operators to set safety defaults, so unless a client explicitly set these values, nothing prevented the runaway loop.

Changes

Added max_infer_iters (default 10) and max_tool_calls (default 30) as configurable fields on InferenceConfiguration in src/models/config.py. Operators can override these in their YAML config or set them to None to disable the limit. Per-request values from clients always take precedence.

Applied config defaults in prepare_responses_params() for the /v1/query, /v1/streaming_query, and A2A endpoints.

Applied config defaults in responses_endpoint_handler() for the /v1/responses endpoint, only when the client omits the values.

Added 10 unit tests for the new config fields (default values, positive int acceptance, zero/negative rejection, None acceptance).

Configuration example

  inference:

    default_model: "meta-llama/Llama-3.1-8B-Instruct"

    default_provider: "vllm"

    max_infer_iters: 10

    max_tool_calls: 30

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude Code (Claude Opus 4.6)

Related Tickets & Documents

  • Closes LCORE-1613

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  1. Run config tests: uv run pytest tests/unit/models/config/test_inference_configuration.py -v
  2. Run responses endpoint tests: uv run pytest tests/unit/app/endpoints/test_responses.py -v
  3. Run responses utils tests: uv run pytest tests/unit/utils/test_responses.py -v
  4. All 286 tests pass across the 3 affected test files.
  5. uv run make format and uv run make verify pass (only pre-existing mypy errors in test_container_lifecycle.py).

Test Regressions

8 existing tests in tests/unit/utils/test_responses.py set mock_config.inference = None, which caused AttributeError when the new code accessed configuration.inference.max_infer_iters. Updated these mocks to use a real InferenceConfiguration() instance.

Summary by CodeRabbit

  • New Features

    • Added configurable server-side limits for inference iterations and tool calls, including defaults (10 and 30) and the ability to disable each by setting to null.
  • Bug Fixes

    • Requests now automatically apply the server-side default limits when those values aren’t explicitly provided, improving consistent behavior during request processing.
  • Tests

    • Expanded unit tests to validate defaults, positive integer constraints (reject zero/negative), and null disabling for the new configuration fields.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ad61c2fd-da64-4c23-a4e4-859bcc4765ef

📥 Commits

Reviewing files that changed from the base of the PR and between c16c5d3 and 59edfd1.

📒 Files selected for processing (7)
  • docs/openapi.json
  • src/app/endpoints/responses.py
  • src/models/config.py
  • src/utils/responses.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_inference_configuration.py
  • tests/unit/utils/test_responses.py
📜 Recent review details
⏰ Context from checks skipped due to timeout. (7)
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/utils/responses.py
  • src/app/endpoints/responses.py
  • src/models/config.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: FastAPI dependencies: Import from fastapi module for APIRouter, HTTPException, Request, status, Depends
Use FastAPI HTTPException with appropriate status codes for API endpoints and handle APIConnectionError from Llama Stack

Files:

  • src/app/endpoints/responses.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/config.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/models/config/test_inference_configuration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/models/config/test_dump_configuration.py
🧠 Learnings (5)
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-06-24T13:45:37.249Z
Learnt from: Jdubrick
Repo: lightspeed-core/lightspeed-stack PR: 1971
File: src/utils/markdown_repair.py:31-36
Timestamp: 2026-06-24T13:45:37.249Z
Learning: In the lightspeed-stack repository, docstrings must use the section header name "Parameters:" (not "Args:") for function arguments, even if the project references Google Python docstring conventions. Ensure docstrings follow the project’s established "Parameters:" header format for any documented function parameters.

Applied to files:

  • src/utils/responses.py
  • src/app/endpoints/responses.py
  • src/models/config.py
  • tests/unit/models/config/test_inference_configuration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
🛑 Comments failed to post (1)
src/models/config.py (1)

2335-2337: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Reject blank model_id values at config load.

str accepts "", so a malformed guardrail config survives parsing and fails later when the model is used. Add non-empty validation here so invalid config is rejected early.

Proposed fix
-    model_id: str = Field(
-        ..., title="Model id", description="The model_id to use for the guard"
-    )
+    model_id: str = Field(
+        ...,
+        min_length=1,
+        title="Model id",
+        description="The model_id to use for the guard",
+    )

As per coding guidelines, src/models/**/*.py: "Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type`."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    model_id: str = Field(
        ...,
        min_length=1,
        title="Model id",
        description="The model_id to use for the guard",
    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/config.py` around lines 2335 - 2337, Reject blank model_id values
in the config model so malformed guardrail config is caught during load. Update
the Guard/Config Pydantic model that defines model_id to add a non-empty
validation check using a field_validator (or model_validator if that’s the
existing pattern), ensuring empty or whitespace-only strings are rejected. Keep
the validation close to the model_id field so invalid config cannot pass parsing
and only valid IDs reach later model usage.

Source: Coding guidelines

🔇 Additional comments (6)
src/models/config.py (1)

681-682: LGTM!

Also applies to: 1732-1750

docs/openapi.json (1)

13618-13644: LGTM!

tests/unit/models/config/test_inference_configuration.py (1)

95-160: LGTM!

tests/unit/models/config/test_dump_configuration.py (1)

166-167: LGTM!

Also applies to: 380-381, 738-739, 995-996, 1232-1233, 1464-1465, 1841-1842, 2064-2065, 2287-2288, 2517-2518

src/utils/responses.py (1)

432-433: LGTM!

src/app/endpoints/responses.py (1)

467-470: LGTM!


Walkthrough

The PR adds optional inference limit fields to InferenceConfiguration, threads them into Responses request parameters, and applies server-side defaults in the responses endpoint when requests omit them.

Changes

Responses inference limits

Layer / File(s) Summary
InferenceConfiguration limits
src/models/config.py, docs/openapi.json, tests/unit/models/config/test_inference_configuration.py, tests/unit/models/config/test_dump_configuration.py
InferenceConfiguration gains optional max_infer_iters and max_tool_calls fields, the OpenAPI schema documents both defaults, and unit tests cover construction and serialized config output.
Responses param defaults
src/utils/responses.py, tests/unit/utils/test_responses.py
prepare_responses_params adds max_infer_iters and max_tool_calls to ResponsesApiParams, and tests initialize mock_config.inference for the updated request-parameter paths.
Endpoint request defaults
src/app/endpoints/responses.py
The responses endpoint fills missing max_infer_iters and max_tool_calls from server configuration before building api_params.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • max-svistunov
  • tisnik
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly references the main bug being fixed: infinite RAG tool-call loops.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@tisnik tisnik requested a review from asimurka June 26, 2026 11:29

@tisnik tisnik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

api_params.max_infer_iters = configuration.inference.max_infer_iters
if "max_tool_calls" not in original_request.model_fields_set:
api_params.max_tool_calls = configuration.inference.max_tool_calls

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a nit, but could you please follow the same pattern as other parameter overrides do. Namely, override attributes of updated_request (if not explicitly set) above the model_validate command.

@asimurka

Copy link
Copy Markdown
Contributor

/ok-to-test

@asimurka

Copy link
Copy Markdown
Contributor

@arin-deloatch please fix unit tests, config asserts should contain default values of new config options you created

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/utils/test_responses.py (1)

1983-2029: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Assert the propagated inference defaults in a happy-path test.

These tests now instantiate InferenceConfiguration(), but they still don't verify that prepare_responses_params() actually forwards max_infer_iters and max_tool_calls. A regression dropping either field would still pass here.

Suggested assertion update
         result = await prepare_responses_params(
             mock_client, query_request, None, "token"
         )
         assert result.input == "test"
         assert result.model == "provider1/model1"
         assert result.conversation == "llama_conv1"
+        assert result.max_infer_iters == 10
+        assert result.max_tool_calls == 30
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/utils/test_responses.py` around lines 1983 - 2029, The happy-path
tests for prepare_responses_params currently set configuration.inference but do
not assert that its defaults are propagated, so add assertions in the existing
prepare_responses_params test cases to verify the returned params include the
inference values for max_infer_iters and max_tool_calls. Use the
prepare_responses_params symbol and the InferenceConfiguration setup in these
tests to confirm both fields are forwarded in the result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/models/config.py`:
- Around line 2335-2337: Reject blank model_id values in the config model so
malformed guardrail config is caught during load. Update the Guard/Config
Pydantic model that defines model_id to add a non-empty validation check using a
field_validator (or model_validator if that’s the existing pattern), ensuring
empty or whitespace-only strings are rejected. Keep the validation close to the
model_id field so invalid config cannot pass parsing and only valid IDs reach
later model usage.

---

Outside diff comments:
In `@tests/unit/utils/test_responses.py`:
- Around line 1983-2029: The happy-path tests for prepare_responses_params
currently set configuration.inference but do not assert that its defaults are
propagated, so add assertions in the existing prepare_responses_params test
cases to verify the returned params include the inference values for
max_infer_iters and max_tool_calls. Use the prepare_responses_params symbol and
the InferenceConfiguration setup in these tests to confirm both fields are
forwarded in the result.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ad61c2fd-da64-4c23-a4e4-859bcc4765ef

📥 Commits

Reviewing files that changed from the base of the PR and between c16c5d3 and 59edfd1.

📒 Files selected for processing (7)
  • docs/openapi.json
  • src/app/endpoints/responses.py
  • src/models/config.py
  • src/utils/responses.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_inference_configuration.py
  • tests/unit/utils/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout. (7)
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/utils/responses.py
  • src/app/endpoints/responses.py
  • src/models/config.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: FastAPI dependencies: Import from fastapi module for APIRouter, HTTPException, Request, status, Depends
Use FastAPI HTTPException with appropriate status codes for API endpoints and handle APIConnectionError from Llama Stack

Files:

  • src/app/endpoints/responses.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/config.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/models/config/test_inference_configuration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/models/config/test_dump_configuration.py
🧠 Learnings (5)
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-06-24T13:45:37.249Z
Learnt from: Jdubrick
Repo: lightspeed-core/lightspeed-stack PR: 1971
File: src/utils/markdown_repair.py:31-36
Timestamp: 2026-06-24T13:45:37.249Z
Learning: In the lightspeed-stack repository, docstrings must use the section header name "Parameters:" (not "Args:") for function arguments, even if the project references Google Python docstring conventions. Ensure docstrings follow the project’s established "Parameters:" header format for any documented function parameters.

Applied to files:

  • src/utils/responses.py
  • src/app/endpoints/responses.py
  • src/models/config.py
  • tests/unit/models/config/test_inference_configuration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/models/config/test_dump_configuration.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
🔇 Additional comments (6)
src/models/config.py (1)

681-682: LGTM!

Also applies to: 1732-1750

docs/openapi.json (1)

13618-13644: LGTM!

tests/unit/models/config/test_inference_configuration.py (1)

95-160: LGTM!

tests/unit/models/config/test_dump_configuration.py (1)

166-167: LGTM!

Also applies to: 380-381, 738-739, 995-996, 1232-1233, 1464-1465, 1841-1842, 2064-2065, 2287-2288, 2517-2518

src/utils/responses.py (1)

432-433: LGTM!

src/app/endpoints/responses.py (1)

467-470: LGTM!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/utils/test_responses.py (1)

1983-2029: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Assert the propagated inference defaults in a happy-path test.

These tests now instantiate InferenceConfiguration(), but they still don't verify that prepare_responses_params() actually forwards max_infer_iters and max_tool_calls. A regression dropping either field would still pass here.

Suggested assertion update
         result = await prepare_responses_params(
             mock_client, query_request, None, "token"
         )
         assert result.input == "test"
         assert result.model == "provider1/model1"
         assert result.conversation == "llama_conv1"
+        assert result.max_infer_iters == 10
+        assert result.max_tool_calls == 30
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/utils/test_responses.py` around lines 1983 - 2029, The happy-path
tests for prepare_responses_params currently set configuration.inference but do
not assert that its defaults are propagated, so add assertions in the existing
prepare_responses_params test cases to verify the returned params include the
inference values for max_infer_iters and max_tool_calls. Use the
prepare_responses_params symbol and the InferenceConfiguration setup in these
tests to confirm both fields are forwarded in the result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/models/config.py`:
- Around line 2335-2337: Reject blank model_id values in the config model so
malformed guardrail config is caught during load. Update the Guard/Config
Pydantic model that defines model_id to add a non-empty validation check using a
field_validator (or model_validator if that’s the existing pattern), ensuring
empty or whitespace-only strings are rejected. Keep the validation close to the
model_id field so invalid config cannot pass parsing and only valid IDs reach
later model usage.

---

Outside diff comments:
In `@tests/unit/utils/test_responses.py`:
- Around line 1983-2029: The happy-path tests for prepare_responses_params
currently set configuration.inference but do not assert that its defaults are
propagated, so add assertions in the existing prepare_responses_params test
cases to verify the returned params include the inference values for
max_infer_iters and max_tool_calls. Use the prepare_responses_params symbol and
the InferenceConfiguration setup in these tests to confirm both fields are
forwarded in the result.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ad61c2fd-da64-4c23-a4e4-859bcc4765ef

📥 Commits

Reviewing files that changed from the base of the PR and between c16c5d3 and 59edfd1.

📒 Files selected for processing (7)
  • docs/openapi.json
  • src/app/endpoints/responses.py
  • src/models/config.py
  • src/utils/responses.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_inference_configuration.py
  • tests/unit/utils/test_responses.py
📜 Review details
🔇 Additional comments (6)
src/models/config.py (1)

681-682: LGTM!

Also applies to: 1732-1750

docs/openapi.json (1)

13618-13644: LGTM!

tests/unit/models/config/test_inference_configuration.py (1)

95-160: LGTM!

tests/unit/models/config/test_dump_configuration.py (1)

166-167: LGTM!

Also applies to: 380-381, 738-739, 995-996, 1232-1233, 1464-1465, 1841-1842, 2064-2065, 2287-2288, 2517-2518

src/utils/responses.py (1)

432-433: LGTM!

src/app/endpoints/responses.py (1)

467-470: LGTM!

🛑 Comments failed to post (1)
src/models/config.py (1)

2335-2337: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Reject blank model_id values at config load.

str accepts "", so a malformed guardrail config survives parsing and fails later when the model is used. Add non-empty validation here so invalid config is rejected early.

Proposed fix
-    model_id: str = Field(
-        ..., title="Model id", description="The model_id to use for the guard"
-    )
+    model_id: str = Field(
+        ...,
+        min_length=1,
+        title="Model id",
+        description="The model_id to use for the guard",
+    )

As per coding guidelines, src/models/**/*.py: "Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type`."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    model_id: str = Field(
        ...,
        min_length=1,
        title="Model id",
        description="The model_id to use for the guard",
    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/config.py` around lines 2335 - 2337, Reject blank model_id values
in the config model so malformed guardrail config is caught during load. Update
the Guard/Config Pydantic model that defines model_id to add a non-empty
validation check using a field_validator (or model_validator if that’s the
existing pattern), ensuring empty or whitespace-only strings are rejected. Keep
the validation close to the model_id field so invalid config cannot pass parsing
and only valid IDs reach later model usage.

Source: Coding guidelines

@asimurka

Copy link
Copy Markdown
Contributor

/ok-to-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants