LCORE-: Wired compaction turn persistence into agentic query flows#1953
LCORE-: Wired compaction turn persistence into agentic query flows#1953asimurka wants to merge 1 commit into
Conversation
|
Warning Review limit reached
Next review available in: 15 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (10)
WalkthroughThe PR removes ChangesCompaction Turn Persistence Refactor
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/pydantic_ai_lightspeed/llamastack/_model.py`:
- Around line 246-277: The _patch_responses_create method mutates the shared
singleton client.responses.create method, causing a race condition where
concurrent LlamaStackResponsesModel instances overwrite each other's patches and
execute the wrong compaction context. Instead of directly assigning to
responses_api.create, implement per-instance method wrapping that preserves the
original create method behavior while maintaining instance-specific state
without mutating the shared client, or create a per-request client instance to
avoid sharing mutable state between concurrent requests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3a16c48b-ee7d-4f57-a12e-b55fc68e3576
📒 Files selected for processing (10)
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/models/common/turn_summary.pysrc/pydantic_ai_lightspeed/llamastack/__init__.pysrc/pydantic_ai_lightspeed/llamastack/_model.pysrc/utils/agents/query.pysrc/utils/agents/streaming.pysrc/utils/pydantic_ai.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/utils/agents/test_streaming.py
💤 Files with no reviewable changes (2)
- tests/unit/app/endpoints/test_streaming_query.py
- src/models/common/turn_summary.py
📜 Review details
⏰ Context from checks skipped due to timeout. (2)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/pydantic_ai_lightspeed/llamastack/__init__.pysrc/app/endpoints/query.pysrc/utils/pydantic_ai.pysrc/utils/agents/query.pysrc/utils/agents/streaming.pysrc/app/endpoints/streaming_query.pysrc/pydantic_ai_lightspeed/llamastack/_model.py
src/**/__init__.py
📄 CodeRabbit inference engine (AGENTS.md)
Package
__init__.pyfiles must contain brief package descriptions
Files:
src/pydantic_ai_lightspeed/llamastack/__init__.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: FastAPI dependencies: Import fromfastapimodule forAPIRouter,HTTPException,Request,status,Depends
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoints and handleAPIConnectionErrorfrom Llama Stack
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/utils/agents/test_streaming.py
🧠 Learnings (2)
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
🔇 Additional comments (18)
src/app/endpoints/streaming_query.py (5)
337-342: LGTM!
350-361: LGTM!
507-512: LGTM!
872-895: LGTM!
701-706: This code path is unreachable dead code; the compacted branch ingenerate_responseis never executed.The
streaming_query_endpoint_handlerroutes all requests through eithergenerate_response_with_compaction(for compacted paths) orgenerate_agent_response(standard paths). The deprecatedgenerate_responsefunction is never called from the handler, and all test calls use the defaultcompacted=False. Lines 701–706 are therefore unreachable. Either remove this dead code or mark it explicitly as such with a comment.> Likely an incorrect or invalid review comment.src/pydantic_ai_lightspeed/llamastack/_model.py (5)
1-26: LGTM!
28-47: LGTM!
49-90: LGTM!
91-138: LGTM!
171-190: Document the assumption about MCP argument delta format and consider adding validation.Line 175 unconditionally appends
"}"to combined MCP argument deltas. This assumes Llama Stack's MCP event stream always delivers arguments without the closing brace, but this assumption lacks documentation and validation.If the buffered deltas already include the closing brace (or if the format changes), this produces malformed JSON. Add a comment explaining why the closing brace is required based on Llama Stack's MCP streaming behavior, and consider a defensive check:
combined_args.rstrip() + "}"or verify the last buffered delta doesn't already contain}.src/pydantic_ai_lightspeed/llamastack/__init__.py (1)
1-14: LGTM!src/utils/pydantic_ai.py (2)
13-17: LGTM!Also applies to: 116-121
142-157: LGTM!src/utils/agents/query.py (1)
280-333: LGTM!src/utils/agents/streaming.py (2)
83-138: LGTM!
141-266: LGTM!src/app/endpoints/query.py (1)
231-237: LGTM!tests/unit/utils/agents/test_streaming.py (1)
530-566: LGTM!
32f06e4 to
fec773e
Compare
fec773e to
662e633
Compare
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/utils/agents/query.py (2)
285-296: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick winAlign the changed signature and docstring with project conventions.
Use
ResponseInput | None, and preferParameters:overArgs:in changed docstrings.As per coding guidelines, functions should use modern union syntax. Based on learnings, docstrings should use
Parameters:rather thanArgs:.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/agents/query.py` around lines 285 - 296, The function signature in retrieve_response still uses Optional[ResponseInput] for original_input; update it to modern union syntax ResponseInput | None to match project conventions. Also revise the retrieve_response docstring section header from Args: to Parameters: while keeping the existing parameter descriptions aligned with the function’s named arguments and symbols.Sources: Coding guidelines, Learnings
304-310: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick winPreserve falsy
original_inputvalues.Using
orfalls back to the rewritten input whenoriginal_inputis a valid falsy value, such as""or[]. Check forNoneexplicitly.Proposed fix
- original_input or responses_params.input, + original_input if original_input is not None else responses_params.input,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/agents/query.py` around lines 304 - 310, The blocked-moderation branch in query logic is incorrectly using a truthiness fallback for original_input, which can replace valid falsy values like an empty string or list with responses_params.input. Update the append_turn_items_to_conversation call in the moderation handling path to preserve original_input whenever it is provided by checking explicitly for None instead of using or. Use the nearby identifiers moderation_result, original_input, responses_params.input, and append_turn_items_to_conversation to locate the change.src/app/endpoints/streaming_query.py (1)
703-712: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy liftCompacted streaming turns now persist no assistant output.
store_compacted_turn()appendsoriginal_inputplus the suppliedoutput_items. Passing[]here means compacted streams store only the user side of the turn, so the assistant response disappears from conversation history. The blocking path insrc/app/endpoints/query.py:381-386still passesresponse.output, so streaming compaction now diverges from non-streaming persistence.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/app/endpoints/streaming_query.py` around lines 703 - 712, The compacted streaming path in `streaming_query.py` is dropping assistant output because `store_compacted_turn()` is being called with an empty output list. Update the call in the streaming compaction flow to pass the actual streamed assistant output items, matching the non-streaming `query.py` persistence path that uses `response.output`, so `store_compacted_turn()` records both sides of the turn.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 640-655: The interrupted-stream path is replaying already-emitted
text because build_interrupted_response() returns a suffix that duplicates the
partial response, and this call site then persists full_text and yields the
duplicate suffix. Update the helper contract in build_interrupted_response() so
the returned suffix contains only the new interrupt tail, not the previously
streamed prefix, and ensure full_text used by persist_interrupted_turn() does
not repeat client-visible content. Verify the streaming path in
streaming_query.py still passes turn_summary.partial_tokens through without
re-adding the prefix, and keep the token event payload limited to the
non-duplicated suffix.
In `@src/pydantic_ai_lightspeed/llamastack/_model.py`:
- Around line 91-100: The __init__ signature in _model.py should follow the
project’s modern type and docstring conventions: replace
Optional[CompactionTurnContext] with CompactionTurnContext | None, and update
the docstring header from Args: to Parameters:. Apply the same cleanup
consistently to the other affected constructor or method block referenced in the
review, keeping the naming aligned with the existing symbols like __init__ and
any related stream wrapper methods in this module.
In `@src/utils/agents/streaming.py`:
- Around line 88-98: Update the affected function signatures in streaming.py to
use modern union syntax by replacing Optional[...] with the corresponding | None
form, including the original_input parameter in the streaming helper and any
other touched signatures in this change. Also revise the related docstrings to
follow project conventions by changing the “Args:” section header to
“Parameters:” while keeping the existing parameter descriptions aligned with the
named symbols such as original_input and the streaming return helper.
- Around line 108-113: The append_turn_items_to_conversation call in
streaming.py is using `original_input or responses_params.input`, which can
incorrectly fall back when `original_input` is a valid falsy value. Update the
logic around `append_turn_items_to_conversation` to preserve falsy originals by
checking explicitly for `None` before choosing the fallback input, using the
existing `original_input` and `responses_params.input` symbols.
- Around line 236-258: The HTTPException handling in streaming.py is aborting
the whole stream too early in the topic-summary path. In the try/except around
maybe_get_topic_summary, stop yielding an error event and returning; instead log
the warning, set topic_summary = None, and continue so token consumption, the
end event, and store_query_results() still run. Keep the existing identifiers
maybe_get_topic_summary, topic_summary, and serialize_event in place while
removing the early-exit behavior.
- Around line 205-215: The interruption path is reusing already-streamed tokens,
so the client-visible text and persisted response are duplicated. Update the
handling around build_interrupted_response() in streaming.py so the returned
suffix excludes the partial text that was already yielded, and only the new
interrupt-specific tail is appended. Make sure turn_summary.llm_response is set
to the non-duplicating interrupted content before calling
persist_interrupted_turn(), and keep the persist_guard flow unchanged.
In `@src/utils/pydantic_ai.py`:
- Line 103: Update the changed type annotations in the pydantic_ai helpers to
use modern union syntax instead of Optional. Specifically, adjust the return
annotation on the relevant function signature and the other affected annotation
referenced in the review to use list[AgentCapability[Any]] | None and
ResponseInput | None, keeping the existing symbols and behavior unchanged.
- Around line 144-149: The compaction flow is still typed too narrowly for
library mode, so update the type annotations used by
CompactionTurnContext.client and append_turn_items_to_conversation to the shared
client union accepted by build_agent() (AsyncLlamaStackClient plus
AsyncLlamaStackAsLibraryClient). Adjust the relevant signatures and any related
context construction in pydantic_ai.py so compaction can receive either client
type without a type mismatch.
In `@tests/unit/utils/agents/test_streaming.py`:
- Around line 1053-1054: The interrupted-response test for
build_interrupted_response() is too weak because it only checks containment and
can miss duplicated streamed prefixes. Update the test in test_streaming.py to
assert the exact turn_summary.llm_response for the simple "Hello world" case,
and also verify the final token payload if available. Use the existing
turn_summary and INTERRUPTED_RESPONSE_MESSAGE assertions as anchors, but replace
the partial string checks with exact expected output matching the interrupted
assembly behavior.
---
Outside diff comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 703-712: The compacted streaming path in `streaming_query.py` is
dropping assistant output because `store_compacted_turn()` is being called with
an empty output list. Update the call in the streaming compaction flow to pass
the actual streamed assistant output items, matching the non-streaming
`query.py` persistence path that uses `response.output`, so
`store_compacted_turn()` records both sides of the turn.
In `@src/utils/agents/query.py`:
- Around line 285-296: The function signature in retrieve_response still uses
Optional[ResponseInput] for original_input; update it to modern union syntax
ResponseInput | None to match project conventions. Also revise the
retrieve_response docstring section header from Args: to Parameters: while
keeping the existing parameter descriptions aligned with the function’s named
arguments and symbols.
- Around line 304-310: The blocked-moderation branch in query logic is
incorrectly using a truthiness fallback for original_input, which can replace
valid falsy values like an empty string or list with responses_params.input.
Update the append_turn_items_to_conversation call in the moderation handling
path to preserve original_input whenever it is provided by checking explicitly
for None instead of using or. Use the nearby identifiers moderation_result,
original_input, responses_params.input, and append_turn_items_to_conversation to
locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 39b58bda-ff8e-48e5-90e1-2f688dd44870
📒 Files selected for processing (10)
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/models/common/turn_summary.pysrc/pydantic_ai_lightspeed/llamastack/__init__.pysrc/pydantic_ai_lightspeed/llamastack/_model.pysrc/utils/agents/query.pysrc/utils/agents/streaming.pysrc/utils/pydantic_ai.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/utils/agents/test_streaming.py
📜 Review details
⏰ Context from checks skipped due to timeout. (15)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: unit_tests (3.12)
- GitHub Check: unit_tests (3.13)
- GitHub Check: build-pr
- GitHub Check: Pylinter
- GitHub Check: integration_tests (3.12)
- GitHub Check: integration_tests (3.13)
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
⚠️ CI failures not shown inline (2)
GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt: LCORE-: Wired compaction turn persistence into agentic query flows
Conclusion: failure
##[group]Run set -euo pipefail
�[36;1mset -euo pipefail�[0m
�[36;1muv run python scripts/generate_openapi_schema.py /tmp/openapi-generated.json�[0m
�[36;1mif ! diff -u docs/openapi.json /tmp/openapi-generated.json; then�[0m
�[36;1m echo "::error::docs/openapi.json is out of date. Regenerate with: uv run scripts/generate_openapi_schema.py docs/openapi.json"�[0m
GitHub Actions: OpenAPI (Spectral) / spectral: LCORE-: Wired compaction turn persistence into agentic query flows
Conclusion: failure
##[group]Run set -euo pipefail
�[36;1mset -euo pipefail�[0m
�[36;1muv run python scripts/generate_openapi_schema.py /tmp/openapi-generated.json�[0m
�[36;1mif ! diff -u docs/openapi.json /tmp/openapi-generated.json; then�[0m
�[36;1m echo "::error::docs/openapi.json is out of date. Regenerate with: uv run scripts/generate_openapi_schema.py docs/openapi.json"�[0m
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/pydantic_ai_lightspeed/llamastack/__init__.pysrc/utils/agents/query.pysrc/app/endpoints/query.pysrc/utils/pydantic_ai.pysrc/models/common/turn_summary.pysrc/app/endpoints/streaming_query.pysrc/pydantic_ai_lightspeed/llamastack/_model.pysrc/utils/agents/streaming.py
src/**/__init__.py
📄 CodeRabbit inference engine (AGENTS.md)
Package
__init__.pyfiles must contain brief package descriptions
Files:
src/pydantic_ai_lightspeed/llamastack/__init__.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/utils/agents/test_streaming.pytests/unit/app/endpoints/test_streaming_query.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: FastAPI dependencies: Import fromfastapimodule forAPIRouter,HTTPException,Request,status,Depends
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoints and handleAPIConnectionErrorfrom Llama Stack
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models must use
@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype
Files:
src/models/common/turn_summary.py
🧠 Learnings (5)
📚 Learning: 2026-06-24T13:45:37.249Z
Learnt from: Jdubrick
Repo: lightspeed-core/lightspeed-stack PR: 1971
File: src/utils/markdown_repair.py:31-36
Timestamp: 2026-06-24T13:45:37.249Z
Learning: In the lightspeed-stack repository, docstrings must use the section header name "Parameters:" (not "Args:") for function arguments, even if the project references Google Python docstring conventions. Ensure docstrings follow the project’s established "Parameters:" header format for any documented function parameters.
Applied to files:
src/pydantic_ai_lightspeed/llamastack/__init__.pytests/unit/utils/agents/test_streaming.pysrc/utils/agents/query.pysrc/app/endpoints/query.pysrc/utils/pydantic_ai.pysrc/models/common/turn_summary.pysrc/app/endpoints/streaming_query.pytests/unit/app/endpoints/test_streaming_query.pysrc/pydantic_ai_lightspeed/llamastack/_model.pysrc/utils/agents/streaming.py
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/common/turn_summary.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/common/turn_summary.py
🪛 GitHub Actions: Type checks / 0_mypy.txt
src/pydantic_ai_lightspeed/llamastack/_model.py
[error] 305-305: mypy: No overload variant of "_responses_create" of "OpenAIResponsesModel" matches argument types "list[ModelRequest | ModelResponse]", "bool", "OpenAIResponsesModelSettings", "ModelRequestParameters". [call-overload]
🪛 GitHub Actions: Type checks / mypy
src/pydantic_ai_lightspeed/llamastack/_model.py
[error] 305-305: mypy call-overload: No overload variant of "_responses_create" of "OpenAIResponsesModel" matches argument types "list[ModelRequest | ModelResponse]", "bool", "OpenAIResponsesModelSettings", "ModelRequestParameters".
🔇 Additional comments (1)
src/pydantic_ai_lightspeed/llamastack/__init__.py (1)
3-13: LGTM!
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/utils/agents/query.py (2)
285-296: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick winAlign the changed signature and docstring with project conventions.
Use
ResponseInput | None, and preferParameters:overArgs:in changed docstrings.As per coding guidelines, functions should use modern union syntax. Based on learnings, docstrings should use
Parameters:rather thanArgs:.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/agents/query.py` around lines 285 - 296, The function signature in retrieve_response still uses Optional[ResponseInput] for original_input; update it to modern union syntax ResponseInput | None to match project conventions. Also revise the retrieve_response docstring section header from Args: to Parameters: while keeping the existing parameter descriptions aligned with the function’s named arguments and symbols.Sources: Coding guidelines, Learnings
304-310: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick winPreserve falsy
original_inputvalues.Using
orfalls back to the rewritten input whenoriginal_inputis a valid falsy value, such as""or[]. Check forNoneexplicitly.Proposed fix
- original_input or responses_params.input, + original_input if original_input is not None else responses_params.input,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/agents/query.py` around lines 304 - 310, The blocked-moderation branch in query logic is incorrectly using a truthiness fallback for original_input, which can replace valid falsy values like an empty string or list with responses_params.input. Update the append_turn_items_to_conversation call in the moderation handling path to preserve original_input whenever it is provided by checking explicitly for None instead of using or. Use the nearby identifiers moderation_result, original_input, responses_params.input, and append_turn_items_to_conversation to locate the change.src/app/endpoints/streaming_query.py (1)
703-712: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy liftCompacted streaming turns now persist no assistant output.
store_compacted_turn()appendsoriginal_inputplus the suppliedoutput_items. Passing[]here means compacted streams store only the user side of the turn, so the assistant response disappears from conversation history. The blocking path insrc/app/endpoints/query.py:381-386still passesresponse.output, so streaming compaction now diverges from non-streaming persistence.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/app/endpoints/streaming_query.py` around lines 703 - 712, The compacted streaming path in `streaming_query.py` is dropping assistant output because `store_compacted_turn()` is being called with an empty output list. Update the call in the streaming compaction flow to pass the actual streamed assistant output items, matching the non-streaming `query.py` persistence path that uses `response.output`, so `store_compacted_turn()` records both sides of the turn.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 640-655: The interrupted-stream path is replaying already-emitted
text because build_interrupted_response() returns a suffix that duplicates the
partial response, and this call site then persists full_text and yields the
duplicate suffix. Update the helper contract in build_interrupted_response() so
the returned suffix contains only the new interrupt tail, not the previously
streamed prefix, and ensure full_text used by persist_interrupted_turn() does
not repeat client-visible content. Verify the streaming path in
streaming_query.py still passes turn_summary.partial_tokens through without
re-adding the prefix, and keep the token event payload limited to the
non-duplicated suffix.
In `@src/pydantic_ai_lightspeed/llamastack/_model.py`:
- Around line 91-100: The __init__ signature in _model.py should follow the
project’s modern type and docstring conventions: replace
Optional[CompactionTurnContext] with CompactionTurnContext | None, and update
the docstring header from Args: to Parameters:. Apply the same cleanup
consistently to the other affected constructor or method block referenced in the
review, keeping the naming aligned with the existing symbols like __init__ and
any related stream wrapper methods in this module.
In `@src/utils/agents/streaming.py`:
- Around line 88-98: Update the affected function signatures in streaming.py to
use modern union syntax by replacing Optional[...] with the corresponding | None
form, including the original_input parameter in the streaming helper and any
other touched signatures in this change. Also revise the related docstrings to
follow project conventions by changing the “Args:” section header to
“Parameters:” while keeping the existing parameter descriptions aligned with the
named symbols such as original_input and the streaming return helper.
- Around line 108-113: The append_turn_items_to_conversation call in
streaming.py is using `original_input or responses_params.input`, which can
incorrectly fall back when `original_input` is a valid falsy value. Update the
logic around `append_turn_items_to_conversation` to preserve falsy originals by
checking explicitly for `None` before choosing the fallback input, using the
existing `original_input` and `responses_params.input` symbols.
- Around line 236-258: The HTTPException handling in streaming.py is aborting
the whole stream too early in the topic-summary path. In the try/except around
maybe_get_topic_summary, stop yielding an error event and returning; instead log
the warning, set topic_summary = None, and continue so token consumption, the
end event, and store_query_results() still run. Keep the existing identifiers
maybe_get_topic_summary, topic_summary, and serialize_event in place while
removing the early-exit behavior.
- Around line 205-215: The interruption path is reusing already-streamed tokens,
so the client-visible text and persisted response are duplicated. Update the
handling around build_interrupted_response() in streaming.py so the returned
suffix excludes the partial text that was already yielded, and only the new
interrupt-specific tail is appended. Make sure turn_summary.llm_response is set
to the non-duplicating interrupted content before calling
persist_interrupted_turn(), and keep the persist_guard flow unchanged.
In `@src/utils/pydantic_ai.py`:
- Line 103: Update the changed type annotations in the pydantic_ai helpers to
use modern union syntax instead of Optional. Specifically, adjust the return
annotation on the relevant function signature and the other affected annotation
referenced in the review to use list[AgentCapability[Any]] | None and
ResponseInput | None, keeping the existing symbols and behavior unchanged.
- Around line 144-149: The compaction flow is still typed too narrowly for
library mode, so update the type annotations used by
CompactionTurnContext.client and append_turn_items_to_conversation to the shared
client union accepted by build_agent() (AsyncLlamaStackClient plus
AsyncLlamaStackAsLibraryClient). Adjust the relevant signatures and any related
context construction in pydantic_ai.py so compaction can receive either client
type without a type mismatch.
In `@tests/unit/utils/agents/test_streaming.py`:
- Around line 1053-1054: The interrupted-response test for
build_interrupted_response() is too weak because it only checks containment and
can miss duplicated streamed prefixes. Update the test in test_streaming.py to
assert the exact turn_summary.llm_response for the simple "Hello world" case,
and also verify the final token payload if available. Use the existing
turn_summary and INTERRUPTED_RESPONSE_MESSAGE assertions as anchors, but replace
the partial string checks with exact expected output matching the interrupted
assembly behavior.
---
Outside diff comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 703-712: The compacted streaming path in `streaming_query.py` is
dropping assistant output because `store_compacted_turn()` is being called with
an empty output list. Update the call in the streaming compaction flow to pass
the actual streamed assistant output items, matching the non-streaming
`query.py` persistence path that uses `response.output`, so
`store_compacted_turn()` records both sides of the turn.
In `@src/utils/agents/query.py`:
- Around line 285-296: The function signature in retrieve_response still uses
Optional[ResponseInput] for original_input; update it to modern union syntax
ResponseInput | None to match project conventions. Also revise the
retrieve_response docstring section header from Args: to Parameters: while
keeping the existing parameter descriptions aligned with the function’s named
arguments and symbols.
- Around line 304-310: The blocked-moderation branch in query logic is
incorrectly using a truthiness fallback for original_input, which can replace
valid falsy values like an empty string or list with responses_params.input.
Update the append_turn_items_to_conversation call in the moderation handling
path to preserve original_input whenever it is provided by checking explicitly
for None instead of using or. Use the nearby identifiers moderation_result,
original_input, responses_params.input, and append_turn_items_to_conversation to
locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 39b58bda-ff8e-48e5-90e1-2f688dd44870
📒 Files selected for processing (10)
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/models/common/turn_summary.pysrc/pydantic_ai_lightspeed/llamastack/__init__.pysrc/pydantic_ai_lightspeed/llamastack/_model.pysrc/utils/agents/query.pysrc/utils/agents/streaming.pysrc/utils/pydantic_ai.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/utils/agents/test_streaming.py
📜 Review details
🔇 Additional comments (1)
src/pydantic_ai_lightspeed/llamastack/__init__.py (1)
3-13: LGTM!
🛑 Comments failed to post (9)
src/app/endpoints/streaming_query.py (1)
640-655: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Don't replay the full partial response in the interrupt suffix.
build_interrupted_response()currently returns asuffixthat includes the already-streamed text again, andfull_textrepeats that same prefix in storage. Once any partial tokens have been emitted, interrupted streams will duplicate content in both the client-visible token stream and the persisted transcript. The fix belongs insrc/utils/stream_interrupts.py, but this call path now exposes the bad output contract.Proposed helper fix
def build_interrupted_response(partial_tokens: list[str]) -> tuple[str, str]: partial_text = "".join(partial_tokens) repaired_text = close_open_markdown(partial_text) interrupted_indicator = f"\n\n*{INTERRUPTED_RESPONSE_MESSAGE}*" - suffix = repaired_text + interrupted_indicator - final_text = partial_text + suffix + final_text = repaired_text + interrupted_indicator + suffix = final_text[len(partial_text) :] return final_text, suffix🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/app/endpoints/streaming_query.py` around lines 640 - 655, The interrupted-stream path is replaying already-emitted text because build_interrupted_response() returns a suffix that duplicates the partial response, and this call site then persists full_text and yields the duplicate suffix. Update the helper contract in build_interrupted_response() so the returned suffix contains only the new interrupt tail, not the previously streamed prefix, and ensure full_text used by persist_interrupted_turn() does not repeat client-visible content. Verify the streaming path in streaming_query.py still passes turn_summary.partial_tokens through without re-adding the prefix, and keep the token event payload limited to the non-duplicated suffix.src/pydantic_ai_lightspeed/llamastack/_model.py (1)
91-100: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Align changed annotations and docstring headers with project conventions.
The new optional annotations should use
CompactionTurnContext | None, and changed docstrings should useParameters:instead ofArgs:.Proposed cleanup
-from typing import Any, Literal, Optional, cast, overload +from typing import Any, Literal, cast, overload @@ - compaction: Optional[CompactionTurnContext] = None, + compaction: CompactionTurnContext | None = None, @@ - Args: + Parameters: @@ - compaction: Optional[CompactionTurnContext] = None, + compaction: CompactionTurnContext | None = None, @@ - Args: + Parameters:As per coding guidelines, functions should use modern union syntax. Based on learnings, docstrings should use
Parameters:rather thanArgs:.Also applies to: 236-253
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/pydantic_ai_lightspeed/llamastack/_model.py` around lines 91 - 100, The __init__ signature in _model.py should follow the project’s modern type and docstring conventions: replace Optional[CompactionTurnContext] with CompactionTurnContext | None, and update the docstring header from Args: to Parameters:. Apply the same cleanup consistently to the other affected constructor or method block referenced in the review, keeping the naming aligned with the existing symbols like __init__ and any related stream wrapper methods in this module.Sources: Coding guidelines, Learnings
src/utils/agents/streaming.py (4)
88-98: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Align changed signatures and docstrings with project conventions.
Use
ResponseInput | None, and preferParameters:overArgs:in changed docstrings.As per coding guidelines, functions should use modern union syntax. Based on learnings, docstrings should use
Parameters:rather thanArgs:.Also applies to: 149-167
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/agents/streaming.py` around lines 88 - 98, Update the affected function signatures in streaming.py to use modern union syntax by replacing Optional[...] with the corresponding | None form, including the original_input parameter in the streaming helper and any other touched signatures in this change. Also revise the related docstrings to follow project conventions by changing the “Args:” section header to “Parameters:” while keeping the existing parameter descriptions aligned with the named symbols such as original_input and the streaming return helper.Sources: Coding guidelines, Learnings
108-113: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win
Preserve falsy
original_inputvalues.Using
orcan persist the rewritten compacted input instead of a valid falsy original input. Use an explicitNonecheck.Proposed fix
- original_input or responses_params.input, + original_input if original_input is not None else responses_params.input,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.await append_turn_items_to_conversation( context.client, responses_params.conversation, original_input if original_input is not None else responses_params.input, [context.moderation_result.refusal_response], )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/agents/streaming.py` around lines 108 - 113, The append_turn_items_to_conversation call in streaming.py is using `original_input or responses_params.input`, which can incorrectly fall back when `original_input` is a valid falsy value. Update the logic around `append_turn_items_to_conversation` to preserve falsy originals by checking explicitly for `None` before choosing the fallback input, using the existing `original_input` and `responses_params.input` symbols.
205-215: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Avoid duplicating already-streamed partial tokens on interruption.
build_interrupted_response()currently returns a suffix that includes the full partial text, but those tokens were already yielded. This duplicates client-visible text and the persisted interrupted response.Proposed helper fix
- suffix = repaired_text + interrupted_indicator - final_text = partial_text + suffix + suffix = repaired_text.removeprefix(partial_text) + interrupted_indicator + final_text = repaired_text + interrupted_indicator🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/agents/streaming.py` around lines 205 - 215, The interruption path is reusing already-streamed tokens, so the client-visible text and persisted response are duplicated. Update the handling around build_interrupted_response() in streaming.py so the returned suffix excludes the partial text that was already yielded, and only the new interrupt-specific tail is appended. Make sure turn_summary.llm_response is set to the non-duplicating interrupted content before calling persist_interrupted_turn(), and keep the persist_guard flow unchanged.
236-258: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Do not drop a completed response when topic-summary generation fails.
This path returns before token consumption, the end event, and
store_query_results(). Since topic summaries are ancillary, log the failure and continue withtopic_summary = None.Proposed fix
except HTTPException as exc: logger.warning( "Topic summary failed for request %s: %s", context.request_id, exc.detail, ) - detail: dict[str, str] = exc.detail if isinstance(exc.detail, dict) else {} - yield serialize_event( - ErrorStreamPayload.create( - status_code=exc.status_code, - response=detail.get("response", "Internal server error"), - cause=detail.get("cause", str(exc.detail)), - ), - media_type, - ) - return + topic_summary = None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.try: topic_summary = await maybe_get_topic_summary( generate_topic_summary=should_generate_topic_summary, input_text=context.query_request.query, client=context.client, model_id=responses_params.model, ) except HTTPException as exc: logger.warning( "Topic summary failed for request %s: %s", context.request_id, exc.detail, ) topic_summary = None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/agents/streaming.py` around lines 236 - 258, The HTTPException handling in streaming.py is aborting the whole stream too early in the topic-summary path. In the try/except around maybe_get_topic_summary, stop yielding an error event and returning; instead log the warning, set topic_summary = None, and continue so token consumption, the end event, and store_query_results() still run. Keep the existing identifiers maybe_get_topic_summary, topic_summary, and serialize_event in place while removing the early-exit behavior.src/utils/pydantic_ai.py (2)
103-103: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Use modern optional syntax in changed annotations.
Prefer
list[AgentCapability[Any]] | NoneandResponseInput | Nonefor new/changed annotations.As per coding guidelines, functions should use modern union syntax.
Also applies to: 122-122
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/pydantic_ai.py` at line 103, Update the changed type annotations in the pydantic_ai helpers to use modern union syntax instead of Optional. Specifically, adjust the return annotation on the relevant function signature and the other affected annotation referenced in the review to use list[AgentCapability[Any]] | None and ResponseInput | None, keeping the existing symbols and behavior unchanged.Source: Coding guidelines
144-149: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash rg -n "class CompactionTurnContext|def append_turn_items_to_conversation|AsyncLlamaStackAsLibraryClient" --type py -C 4Repository: lightspeed-core/lightspeed-stack
Length of output: 170
🏁 Script executed:
#!/bin/bash set -euo pipefail # Map the relevant file and nearby symbols. git ls-files 'src/utils/pydantic_ai.py' 'src/**/_model.py' 'src/**/model.py' | sed 's#^`#FILE` #' echo '--- OUTLINE pydantic_ai.py ---' ast-grep outline src/utils/pydantic_ai.py --view expanded || true echo '--- SEARCH relevant symbols ---' rg -n "CompactionTurnContext|append_turn_items_to_conversation|AsyncLlamaStackAsLibraryClient|AsyncLlamaStackClient" src -C 3Repository: lightspeed-core/lightspeed-stack
Length of output: 50388
🏁 Script executed:
#!/bin/bash set -euo pipefail # Read the target file and likely model definitions in small slices. for f in src/utils/pydantic_ai.py; do echo "--- $f (lines 1-260) ---" nl -ba "$f" | sed -n '1,260p' done echo "--- candidate model files containing CompactionTurnContext or append helper ---" rg -n "CompactionTurnContext|append_turn_items_to_conversation|AsyncLlamaStackAsLibraryClient" src -lRepository: lightspeed-core/lightspeed-stack
Length of output: 257
🏁 Script executed:
#!/bin/bash set -euo pipefail # Read the target file and the type definitions it uses. nl -ba src/utils/pydantic_ai.py | sed -n '1,260p' echo '--- related definitions ---' rg -n "class CompactionTurnContext|def append_turn_items_to_conversation|AsyncLlamaStackAsLibraryClient|AsyncLlamaStackClient" src -C 4Repository: lightspeed-core/lightspeed-stack
Length of output: 210
🏁 Script executed:
#!/bin/bash set -euo pipefail echo '--- src/utils/pydantic_ai.py (118-155) ---' sed -n '118,155p' src/utils/pydantic_ai.py echo '--- src/pydantic_ai_lightspeed/llamastack/_model.py (57-76, 150-160, 320-330) ---' sed -n '57,76p;150,160p;320,330p' src/pydantic_ai_lightspeed/llamastack/_model.py echo '--- src/utils/conversations.py (453-470) ---' sed -n '453,470p' src/utils/conversations.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 4395
Broaden the compaction client type.
build_agent()acceptsAsyncLlamaStackAsLibraryClient, butCompactionTurnContext.clientandappend_turn_items_to_conversation()are still typed asAsyncLlamaStackClient. Widen those annotations to the shared client union, or library-mode compaction stays mismatched.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/pydantic_ai.py` around lines 144 - 149, The compaction flow is still typed too narrowly for library mode, so update the type annotations used by CompactionTurnContext.client and append_turn_items_to_conversation to the shared client union accepted by build_agent() (AsyncLlamaStackClient plus AsyncLlamaStackAsLibraryClient). Adjust the relevant signatures and any related context construction in pydantic_ai.py so compaction can receive either client type without a type mismatch.tests/unit/utils/agents/test_streaming.py (1)
1053-1054: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Assert the exact interrupted text here.
These containment checks still pass when interrupted-response assembly duplicates the already-streamed prefix, so they won't catch the regression in
build_interrupted_response(). Please assert the exactturn_summary.llm_response(and ideally the final token payload) for this simple"Hello world"case.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/utils/agents/test_streaming.py` around lines 1053 - 1054, The interrupted-response test for build_interrupted_response() is too weak because it only checks containment and can miss duplicated streamed prefixes. Update the test in test_streaming.py to assert the exact turn_summary.llm_response for the simple "Hello world" case, and also verify the final token payload if available. Use the existing turn_summary and INTERRUPTED_RESPONSE_MESSAGE assertions as anchors, but replace the partial string checks with exact expected output matching the interrupted assembly behavior.
| responses_params, | ||
| moderation_result, | ||
| endpoint_path, | ||
| compaction.original_input if compaction.compacted else None, |
There was a problem hiding this comment.
compaction.compacted is True iff compaction.original_input is not None
| else context.query_request.query | ||
| ), | ||
| turn_summary.output_items, | ||
| [], # field was removed from TurnSummary |
There was a problem hiding this comment.
This is inside deprecated helper function so it has no effect.
| rag_chunks: list[RAGChunk] = Field(default_factory=list) | ||
| referenced_documents: list[ReferencedDocument] = Field(default_factory=list) | ||
| token_usage: TokenCounter = Field(default_factory=TokenCounter) | ||
| output_items: list[OpenAIResponseOutput] = Field( |
There was a problem hiding this comment.
This attribute is unnecessary now, we capture the output inside the agentic loop
662e633 to
4a19bc4
Compare
|
|
||
| assert isinstance(turn_summary, TurnSummary) | ||
| assert turn_summary.llm_response == "Content blocked" | ||
| # Structured refusal captured for compacted-mode persistence (LCORE-1572). |
There was a problem hiding this comment.
Removed some irrelevant tests - persistence now takes place within agentic loop
| responses_params.conversation, | ||
| responses_params.input, | ||
| [context.moderation_result.refusal_response], | ||
| ) |
There was a problem hiding this comment.
Blocked compacted turn never reaches agentic loop, so has to be persisted here
| client, | ||
| responses_params.conversation, | ||
| responses_params.input, | ||
| original_input or responses_params.input, |
There was a problem hiding this comment.
If original input is not None -> store the original prompt, responses_api.input contains the whole compacted history.
| model_request_parameters: ModelRequestParameters, | ||
| ) -> AsyncStream[responses.ResponseStreamEvent]: ... | ||
|
|
||
| async def _responses_create( |
There was a problem hiding this comment.
Patching _responses_create function in order to intercept input and output of each responses api call
| and not stream | ||
| and isinstance(result, responses.Response) | ||
| ): | ||
| await append_turn_items_to_conversation( |
There was a problem hiding this comment.
Store here for non-streaming query
| compaction.latest_round_input, | ||
| cast(Sequence[Any], result.output), | ||
| ) | ||
| compaction.original_input_persisted = True |
There was a problem hiding this comment.
First LLM call contains the whole compacted history, therefore we need to store onpy the original input, subsequent turns are stored normally
| and self._compaction is not None | ||
| ): | ||
| compaction = self._compaction | ||
| await append_turn_items_to_conversation( |
There was a problem hiding this comment.
Store streaming compacted turn from final event
Description
Wires explicit turn compaction for compacted turns in query agentic flows.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
Refactor
CompactionTurnContextfor compaction-related usage.Tests