Skip to content

LCORE-2080: Added E2E Steps for Agent Skills#1941

Open
jrobertboos wants to merge 1 commit into
lightspeed-core:mainfrom
jrobertboos:lcore-2080
Open

LCORE-2080: Added E2E Steps for Agent Skills#1941
jrobertboos wants to merge 1 commit into
lightspeed-core:mainfrom
jrobertboos:lcore-2080

Conversation

@jrobertboos

@jrobertboos jrobertboos commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Description

Added the missing E2E steps for testing agent skills.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Cursor (Composer 2.5)
  • Generated by: Cursor (Composer 2.5)

Related Tickets & Documents

  • Related Issue LCORE-2080
  • Closes LCORE-2080

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features
    • Added skills asset support to end-to-end stack setups, with new sample skills for echoing text and summarizing content.
    • Improved end-to-end visibility for skills by capturing and validating tool call/tool result behavior in both streaming and non-streaming responses.
  • Tests
    • Added new end-to-end configurations for skills in both server and library modes.
    • Updated skills test coverage to match the updated tool_calls/tool_results response schema, including skill loading, resource reading, multi-skill discovery, and refreshed expectations.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@jrobertboos, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 3 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2e5e4291-5088-4470-aaf7-3576a9c91614

📥 Commits

Reviewing files that changed from the base of the PR and between e2990a2 and 9af8d80.

📒 Files selected for processing (14)
  • docker-compose-library.yaml
  • docker-compose.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-skills-directory.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-skills.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-skills-directory.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-skills.yaml
  • tests/e2e/features/skills.feature
  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/llm_query_response.py
  • tests/e2e/skills/echo/SKILL.md
  • tests/e2e/skills/echo/references/guide.md
  • tests/e2e/skills/summarize/SKILL.md
  • tests/e2e/skills/summarize/references/guide.md
  • tests/e2e/test_list.txt

Walkthrough

Adds e2e skill fixtures, compose mounts, and Lightspeed stack configs for server and library modes. Updates streaming response helpers to capture tool calls and results, and expands the skills feature scenarios to use the new load/read skill flows.

Changes

Skills e2e wiring

Layer / File(s) Summary
Compose mounts and skill fixtures
docker-compose-library.yaml, docker-compose.yaml, tests/e2e/skills/echo/*, tests/e2e/skills/summarize/*
Compose mounts expose the skills test directory, and new echo and summarize skill documents and guides are added.
Lightspeed stack configs
tests/e2e/configuration/library-mode/lightspeed-stack-skills*.yaml, tests/e2e/configuration/server-mode/lightspeed-stack-skills*.yaml
New server-mode and library-mode LCS configs set binding, logging, authentication, llama-stack client wiring, data storage, and skills paths.
Streaming response helpers
tests/e2e/features/steps/common_http.py, tests/e2e/features/steps/llm_query_response.py
A response-field assertion step is added, and streamed SSE parsing now accumulates and exposes tool calls and tool results.
Skills scenarios
tests/e2e/features/skills.feature, tests/e2e/test_list.txt
The skills feature updates tool-name and tool-call assertions across registration, load, read-resource, multi-skill, and progressive disclosure scenarios, and adds the feature to the e2e list.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • tisnik
  • radofuchs
  • asimurka
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is concise and accurately reflects the PR’s main theme: adding end-to-end support for agent skills.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@jrobertboos jrobertboos force-pushed the lcore-2080 branch 3 times, most recently from c201e27 to fe7754f Compare June 23, 2026 16:29

@SkillsConfig
@SkillsConfig @skip
Scenario: Skill tools are registered when skills are configured

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Need to reflect skill tools (list_skills, load_skill, read_skill_resource) in /tools.

"""
And The token metrics have increased

# --- Error handling: unknown skill ---

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Error Paths" will have to be skipped for now as the skill tools do fail and produce a result, but it's a different type that the response-building code silently discards.

Below I have helpful part of conversation with Claude about the issue.


Pydantic-ai catches ModelRetry and wraps the error in a RetryPromptPart (not a ToolReturnPart). The FunctionToolResultEvent.part is typed as ToolReturnPart | RetryPromptPart — it can be either.

Where LCS drops it:

In the non-streaming path, build_turn_summary_from_agent_run only processes ToolReturnPart:

query.py
Lines 266-269

        elif isinstance(message, ModelRequest):
            for request_part in message.parts:
                if isinstance(request_part, ToolReturnPart):
                    process_function_tool_result(state, request_part)

In the streaming path, the same filter exists:

streaming.py
Lines 522-524

    part = event.part
    if not isinstance(part, ToolReturnPart):
        return None

Both paths explicitly ignore RetryPromptPart, so the retry/error message for load_skill is never surfaced as a tool_result in the API response.

The result:

  • Both tool calls appear (because both ToolCallPart instances from the ModelResponse are processed)
  • Only the list_skills result appears (because it succeeded and produced a ToolReturnPart)
  • The load_skill result is missing (because it raised ModelRetry → became a RetryPromptPart → silently dropped)

]
"""

# --- Full progressive disclosure flow ---

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will likely be quite flaky as the LLM (through appendage of system prompt, I think) is given only the "names" of skills so sometimes will result in just load_skill and read_skill_resource being used completely skipping list_skills.

@jrobertboos jrobertboos force-pushed the lcore-2080 branch 3 times, most recently from bd2b990 to 159e8ae Compare June 25, 2026 13:43
@jrobertboos

Copy link
Copy Markdown
Contributor Author

Please Review:

@jrobertboos jrobertboos marked this pull request as ready for review June 25, 2026 15:37

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/e2e/configuration/server-mode/lightspeed-stack-skills-directory.yaml`:
- Around line 24-26: The skills discovery config is using a relative path in the
`skills.paths` entry, which can break startup when the working directory
changes. Update the YAML to use the absolute mounted path expected by the stack,
and keep the change localized to the `skills` block in
`lightspeed-stack-skills-directory.yaml` so startup consistently finds the
skills directory.

In `@tests/e2e/configuration/server-mode/lightspeed-stack-skills.yaml`:
- Around line 24-26: The skills path in the stack config is CWD-sensitive and
should be pinned to the mounted absolute location instead. Update the
`skills.paths` entry in the YAML so it points to `/app-root/skills/echo` rather
than the relative `skills/echo`, keeping the `skills` configuration
deterministic under the compose mount.

In `@tests/e2e/features/steps/common_http.py`:
- Around line 334-335: The expected JSON in the step implementation still parses
context.text directly, so placeholder tokens like {MODEL} are not substituted
before validation. Update the relevant step in common_http.py to apply the same
placeholder resolution used by the existing partial-body handling before calling
json.loads and validate_json_partially. Keep the fix localized to the step that
consumes context.text and ensure the parsed expected_value reflects substituted
placeholders first.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2beba6a7-1aff-4350-92f7-60524e66a1c4

📥 Commits

Reviewing files that changed from the base of the PR and between 890a6f7 and 1f11ea7.

📒 Files selected for processing (14)
  • docker-compose-library.yaml
  • docker-compose.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-skills-directory.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-skills.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-skills-directory.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-skills.yaml
  • tests/e2e/features/skills.feature
  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/llm_query_response.py
  • tests/e2e/skills/echo/SKILL.md
  • tests/e2e/skills/echo/references/guide.md
  • tests/e2e/skills/summarize/SKILL.md
  • tests/e2e/skills/summarize/references/guide.md
  • tests/e2e/test_list.txt
📜 Review details
⏰ Context from checks skipped due to timeout. (2)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/llm_query_response.py
tests/e2e/**/*.{py,feature}

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework for end-to-end testing with Gherkin feature files

Files:

  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/llm_query_response.py
  • tests/e2e/features/skills.feature
🧠 Learnings (4)
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

  • docker-compose-library.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-skills-directory.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-skills.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-skills.yaml
  • docker-compose.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-skills-directory.yaml
📚 Learning: 2026-04-07T09:20:26.590Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1467
File: tests/e2e/features/steps/common.py:36-49
Timestamp: 2026-04-07T09:20:26.590Z
Learning: For Behave-based Python tests, rely on Behave’s Context layered stack for attribute lifecycle: Behave pushes a new Context layer when entering feature scope (before_feature) and again for scenario scope (before_scenario). Attributes assigned inside given/when/then steps live on the current scenario layer and are automatically removed when the scenario ends. As a result, step-set attributes should not be expected to persist across scenarios or features, and manual cleanup in after_scenario/after_feature is generally unnecessary for attributes set in step functions. Only perform manual cleanup for attributes that you set explicitly in before_feature/before_scenario, since those live on the respective feature/scenario layers.

Applied to files:

  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/llm_query_response.py
📚 Learning: 2026-04-13T13:39:54.963Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:54.963Z
Learning: In lightspeed-stack E2E tests under tests/e2e/features, it is intentional to set context.feature_config inside Background/step functions (scenario-scoped Behave layer). The environment.py after_scenario restore logic should only restore configuration when context.scenario_lightspeed_override_active is True; this flag is set by configure_service only when a real config switch occurs (so restore does not run for scenarios without a switch). Additionally, steps/common.py’s module-level _active_lightspeed_stack_config_basename is used to prevent re-applying the same config across subsequent scenarios, ensuring scenario_lightspeed_override_active stays False after the first apply. Therefore, reviewers should not “fix” this flow as if feature_config were incorrectly scoped or if after_scenario restoration is missing—config switching and restoration are meant to happen exactly once per actual switch, not redundantly per scenario.

Applied to files:

  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/llm_query_response.py
📚 Learning: 2026-06-24T13:45:37.249Z
Learnt from: Jdubrick
Repo: lightspeed-core/lightspeed-stack PR: 1971
File: src/utils/markdown_repair.py:31-36
Timestamp: 2026-06-24T13:45:37.249Z
Learning: In the lightspeed-stack repository, docstrings must use the section header name "Parameters:" (not "Args:") for function arguments, even if the project references Google Python docstring conventions. Ensure docstrings follow the project’s established "Parameters:" header format for any documented function parameters.

Applied to files:

  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/llm_query_response.py
🪛 LanguageTool
tests/e2e/skills/echo/SKILL.md

[style] ~17-~17: Using “back” with the verb “return” may be redundant.
Context: ...r's input text 2. Return the exact text back to the user without modification For f...

(RETURN_BACK)

🔇 Additional comments (8)
docker-compose-library.yaml (1)

23-23: LGTM!

docker-compose.yaml (1)

90-90: LGTM!

tests/e2e/skills/echo/SKILL.md (1)

1-19: LGTM!

tests/e2e/skills/echo/references/guide.md (1)

1-20: LGTM!

tests/e2e/skills/summarize/SKILL.md (1)

1-22: LGTM!

tests/e2e/skills/summarize/references/guide.md (1)

1-21: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack-skills-directory.yaml (1)

1-26: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack-skills.yaml (1)

1-26: LGTM!

Comment on lines +24 to +26
skills:
paths:
- skills

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Use absolute skills path to avoid CWD-dependent startup failures.

skills is relative; if the service working directory changes, skills discovery can fail at startup. Use /app-root/skills to match the compose mount explicitly.

Proposed change
 skills:
   paths:
-    - skills
+    - /app-root/skills
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
skills:
paths:
- skills
skills:
paths:
- /app-root/skills
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/configuration/server-mode/lightspeed-stack-skills-directory.yaml`
around lines 24 - 26, The skills discovery config is using a relative path in
the `skills.paths` entry, which can break startup when the working directory
changes. Update the YAML to use the absolute mounted path expected by the stack,
and keep the change localized to the `skills` block in
`lightspeed-stack-skills-directory.yaml` so startup consistently finds the
skills directory.

Comment on lines +24 to +26
skills:
paths:
- skills/echo

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Pin the skill path to the mounted absolute location.

skills/echo is CWD-sensitive. Prefer /app-root/skills/echo for deterministic resolution against the compose mount.

Proposed change
 skills:
   paths:
-    - skills/echo
+    - /app-root/skills/echo
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
skills:
paths:
- skills/echo
skills:
paths:
- /app-root/skills/echo
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/configuration/server-mode/lightspeed-stack-skills.yaml` around
lines 24 - 26, The skills path in the stack config is CWD-sensitive and should
be pinned to the mounted absolute location instead. Update the `skills.paths`
entry in the YAML so it points to `/app-root/skills/echo` rather than the
relative `skills/echo`, keeping the `skills` configuration deterministic under
the compose mount.

Comment on lines +334 to +335
expected_value = json.loads(context.text)
validate_json_partially(actual_value, expected_value)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Apply placeholder substitution before parsing expected JSON.

At Line 334, this step parses context.text directly, so placeholders like {MODEL} won’t be resolved here (unlike the existing partial-body step). That can cause false failures in scenario assertions.

Proposed fix
-    expected_value = json.loads(context.text)
+    json_str = replace_placeholders(context, context.text)
+    expected_value = json.loads(json_str)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
expected_value = json.loads(context.text)
validate_json_partially(actual_value, expected_value)
json_str = replace_placeholders(context, context.text)
expected_value = json.loads(json_str)
validate_json_partially(actual_value, expected_value)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/features/steps/common_http.py` around lines 334 - 335, The expected
JSON in the step implementation still parses context.text directly, so
placeholder tokens like {MODEL} are not substituted before validation. Update
the relevant step in common_http.py to apply the same placeholder resolution
used by the existing partial-body handling before calling json.loads and
validate_json_partially. Keep the fix localized to the step that consumes
context.text and ensure the parsed expected_value reflects substituted
placeholders first.

@anik120 anik120 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps: squashing commits to have a single commit for a PR (unless having multiple commits is by design, in which case too, the question would be "why aren't they multiple PRs instead"), is the hygienic thing to do.

Otherwise they show up as

"fix"

"fix"

"address code rabbit"

when someone is searching through git history trying to figure out what changes were made.

Here's an article I highly recommend reading https://medium.com/@madhav2002/git-hygiene-commits-branching-and-rewriting-history-bc6dee5f953f

@radofuchs radofuchs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in overall, just a few details

Comment thread tests/e2e/features/steps/common_http.py Outdated
Comment thread tests/e2e/features/steps/llm_query_response.py Outdated
@asimurka

asimurka commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Just a conceptual question: Is the skill invocation really so strict that when you prompt to run a non-existing skill, the LLM really tries to execute it and ends up with failure?

@jrobertboos

jrobertboos commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@asimurka when u prompt the LLM to use a skill, if u are direct enough, it will try to use the load_skill tool with the highlighted skill. e.g. this is what it looks like right now:

INPUT

curl -X 'POST' \
  'http://localhost:8080/v1/query' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "gpt-4o-mini",
  "provider": "openai",
  "query": "load the skill '\''non-existent'\''."
}'

OUTPUT

{
  "conversation_id": "eb995e1ee43557e33d6c43feacf47a4afc73565ba7478294",
  "response": "It appears that there are currently no available skills to load. Please let me know if you need assistance with something else!",
  "rag_chunks": [],
  "referenced_documents": [],
  "truncated": false,
  "input_tokens": 2290,
  "output_tokens": 52,
  "available_quotas": {},
  "tool_calls": [
    {
      "id": "call_51npMnMSenv6Qnp7encji746",
      "name": "load_skill",
      "args": {
        "skill_name": "non-existent"
      },
      "type": "function_call"
    },
    {
      "id": "call_retBuV3RnzjkfsR8ltVNLoq3",
      "name": "list_skills",
      "args": {},
      "type": "function_call"
    }
  ],
  "tool_results": [
    {
      "id": "call_retBuV3RnzjkfsR8ltVNLoq3",
      "status": "success",
      "content": "{}",
      "type": "function_call_output",
      "round": 1
    }
  ]
}

Does that answer your question?

@asimurka

asimurka commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Is it possible that this is just model-specific behavior? Because I think you shouldn't be able to influence the model behavior like this (with bare prompt).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/features/skills.feature (1)

59-92: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Align the advertised tool parameters with the asserted call args.

Line 63 exposes load_skill with parameter name, and Line 87 exposes read_skill_resource with parameter path, but the later call assertions in this same feature expect skill_name and resource_name. Both contracts cannot be correct at once, so either /tools is asserting stale metadata or the tool_calls checks will never match the real invocation shape.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/features/skills.feature` around lines 59 - 92, The tool metadata in
skills.feature is inconsistent with the expected call arguments: load_skill
currently advertises name while the assertions use skill_name, and
read_skill_resource advertises path while the assertions use resource_name.
Update the feature so the parameter names in the tool definitions and the
tool_calls checks match exactly, using the same symbols load_skill and
read_skill_resource throughout.
♻️ Duplicate comments (2)
tests/e2e/configuration/server-mode/lightspeed-stack-skills.yaml (1)

24-26: 🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win

Pin skill path to absolute mounted location.

skills/echo is CWD-sensitive. Use /app-root/skills/echo for deterministic resolution against the compose mount. This was flagged in a previous review and remains unaddressed.

Proposed fix
 skills:
   paths:
-    - skills/echo
+    - /app-root/skills/echo
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/configuration/server-mode/lightspeed-stack-skills.yaml` around
lines 24 - 26, The skills path in the configuration is still relative and
depends on the working directory, so it should be pinned to the mounted absolute
location instead. Update the `skills.paths` entry in the
`lightspeed-stack-skills.yaml` config to use the compose mount target
`/app-root/skills/echo` so resolution is deterministic. Make sure the change is
applied in the `skills` section and not elsewhere in the e2e configuration.
tests/e2e/configuration/server-mode/lightspeed-stack-skills-directory.yaml (1)

24-26: 🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win

Use absolute path for skills directory to prevent startup failures.

skills is a relative path. If the service working directory differs from /app-root, skill discovery fails at startup. Change to /app-root/skills to match the compose mount explicitly. This was flagged in a previous review and remains unaddressed.

Proposed fix
 skills:
   paths:
-    - skills
+    - /app-root/skills
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/configuration/server-mode/lightspeed-stack-skills-directory.yaml`
around lines 24 - 26, The skills directory configuration is using a relative
path, which can break startup when the working directory is not the expected
root. Update the skills path in the lightspeed stack config to use the absolute
mounted location instead of the current relative value, and make sure the change
is applied in the skills discovery config that the startup flow reads so skill
loading works reliably regardless of cwd.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docker-compose-library.yaml`:
- Line 23: Align the SELinux label used for the `./tests/e2e/skills` bind mount
in `docker-compose-library.yaml` with the one used in `docker-compose.yaml` to
avoid unnecessary divergence. Update the volume entry under the library compose
service to use the same `ro,z`/`ro,Z` convention consistently across both
compose files, or explicitly document why `docker-compose-library.yaml` should
differ. Use the shared skills mount definition as the reference point when
making the change.

In `@tests/e2e/features/skills.feature`:
- Around line 14-16: The skills feature scenarios are applying the new MCP
skills config before resetting toolgroups, which can leave stale server-mode
registrations in place. Update each affected scenario in skills.feature so
reset_mcp_toolgroups_for_new_configuration runs before The service uses the
lightspeed-stack-skills.yaml configuration, then restart the service afterward.
Keep the step order consistent in all listed scenarios so list_skills and
load_skill assertions always use the fresh toolgroup state.

In `@tests/e2e/skills/echo/SKILL.md`:
- Line 17: The SKILL.md guidance in the echo skill uses the redundant phrase
“return back”; update the wording in the instruction text to say “Return the
exact text to the user without modification” so it stays clear and concise. Make
this edit in the echo skill’s step that describes the response behavior, keeping
the rest of the instruction unchanged.

---

Outside diff comments:
In `@tests/e2e/features/skills.feature`:
- Around line 59-92: The tool metadata in skills.feature is inconsistent with
the expected call arguments: load_skill currently advertises name while the
assertions use skill_name, and read_skill_resource advertises path while the
assertions use resource_name. Update the feature so the parameter names in the
tool definitions and the tool_calls checks match exactly, using the same symbols
load_skill and read_skill_resource throughout.

---

Duplicate comments:
In `@tests/e2e/configuration/server-mode/lightspeed-stack-skills-directory.yaml`:
- Around line 24-26: The skills directory configuration is using a relative
path, which can break startup when the working directory is not the expected
root. Update the skills path in the lightspeed stack config to use the absolute
mounted location instead of the current relative value, and make sure the change
is applied in the skills discovery config that the startup flow reads so skill
loading works reliably regardless of cwd.

In `@tests/e2e/configuration/server-mode/lightspeed-stack-skills.yaml`:
- Around line 24-26: The skills path in the configuration is still relative and
depends on the working directory, so it should be pinned to the mounted absolute
location instead. Update the `skills.paths` entry in the
`lightspeed-stack-skills.yaml` config to use the compose mount target
`/app-root/skills/echo` so resolution is deterministic. Make sure the change is
applied in the `skills` section and not elsewhere in the e2e configuration.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 07dc80e8-7a8f-4ea7-a655-f00e2ffeee6d

📥 Commits

Reviewing files that changed from the base of the PR and between 1f11ea7 and e2990a2.

📒 Files selected for processing (14)
  • docker-compose-library.yaml
  • docker-compose.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-skills-directory.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-skills.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-skills-directory.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-skills.yaml
  • tests/e2e/features/skills.feature
  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/llm_query_response.py
  • tests/e2e/skills/echo/SKILL.md
  • tests/e2e/skills/echo/references/guide.md
  • tests/e2e/skills/summarize/SKILL.md
  • tests/e2e/skills/summarize/references/guide.md
  • tests/e2e/test_list.txt
📜 Review details
⏰ Context from checks skipped due to timeout. (12)
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: build-pr
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
⚠️ CI failures not shown inline (2)

GitHub Actions: OpenAPI (Spectral) / spectral: LCORE-2080: Added E2E Steps for Agent Skills

Conclusion: failure

View job details

##[group]Run set -euo pipefail
 �[36;1mset -euo pipefail�[0m
 �[36;1muv run python scripts/generate_openapi_schema.py /tmp/openapi-generated.json�[0m
 �[36;1mif ! diff -u docs/openapi.json /tmp/openapi-generated.json; then�[0m
 �[36;1m  echo "::error::docs/openapi.json is out of date. Regenerate with: uv run scripts/generate_openapi_schema.py docs/openapi.json"�[0m

GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt: LCORE-2080: Added E2E Steps for Agent Skills

Conclusion: failure

View job details

##[group]Run set -euo pipefail
 �[36;1mset -euo pipefail�[0m
 �[36;1muv run python scripts/generate_openapi_schema.py /tmp/openapi-generated.json�[0m
 �[36;1mif ! diff -u docs/openapi.json /tmp/openapi-generated.json; then�[0m
 �[36;1m  echo "::error::docs/openapi.json is out of date. Regenerate with: uv run scripts/generate_openapi_schema.py docs/openapi.json"�[0m
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/llm_query_response.py
tests/e2e/**/*.{py,feature}

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework for end-to-end testing with Gherkin feature files

Files:

  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/llm_query_response.py
  • tests/e2e/features/skills.feature
🧠 Learnings (4)
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

  • docker-compose-library.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-skills.yaml
  • docker-compose.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-skills-directory.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-skills-directory.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-skills.yaml
📚 Learning: 2026-04-07T09:20:26.590Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1467
File: tests/e2e/features/steps/common.py:36-49
Timestamp: 2026-04-07T09:20:26.590Z
Learning: For Behave-based Python tests, rely on Behave’s Context layered stack for attribute lifecycle: Behave pushes a new Context layer when entering feature scope (before_feature) and again for scenario scope (before_scenario). Attributes assigned inside given/when/then steps live on the current scenario layer and are automatically removed when the scenario ends. As a result, step-set attributes should not be expected to persist across scenarios or features, and manual cleanup in after_scenario/after_feature is generally unnecessary for attributes set in step functions. Only perform manual cleanup for attributes that you set explicitly in before_feature/before_scenario, since those live on the respective feature/scenario layers.

Applied to files:

  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/llm_query_response.py
📚 Learning: 2026-04-13T13:39:54.963Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:54.963Z
Learning: In lightspeed-stack E2E tests under tests/e2e/features, it is intentional to set context.feature_config inside Background/step functions (scenario-scoped Behave layer). The environment.py after_scenario restore logic should only restore configuration when context.scenario_lightspeed_override_active is True; this flag is set by configure_service only when a real config switch occurs (so restore does not run for scenarios without a switch). Additionally, steps/common.py’s module-level _active_lightspeed_stack_config_basename is used to prevent re-applying the same config across subsequent scenarios, ensuring scenario_lightspeed_override_active stays False after the first apply. Therefore, reviewers should not “fix” this flow as if feature_config were incorrectly scoped or if after_scenario restoration is missing—config switching and restoration are meant to happen exactly once per actual switch, not redundantly per scenario.

Applied to files:

  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/llm_query_response.py
📚 Learning: 2026-06-24T13:45:37.249Z
Learnt from: Jdubrick
Repo: lightspeed-core/lightspeed-stack PR: 1971
File: src/utils/markdown_repair.py:31-36
Timestamp: 2026-06-24T13:45:37.249Z
Learning: In the lightspeed-stack repository, docstrings must use the section header name "Parameters:" (not "Args:") for function arguments, even if the project references Google Python docstring conventions. Ensure docstrings follow the project’s established "Parameters:" header format for any documented function parameters.

Applied to files:

  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/llm_query_response.py
🪛 LanguageTool
tests/e2e/skills/echo/SKILL.md

[style] ~17-~17: Using “back” with the verb “return” may be redundant.
Context: ...r's input text 2. Return the exact text back to the user without modification For f...

(RETURN_BACK)

🔇 Additional comments (8)
tests/e2e/features/steps/common_http.py (1)

331-333: Apply placeholder substitution before parsing the expected JSON.

This step still calls json.loads(context.text) directly, so {MODEL}-style placeholders here will fail even though the sibling partial-body step resolves them first.

tests/e2e/features/steps/llm_query_response.py (1)

94-97: LGTM!

Also applies to: 368-404

docker-compose.yaml (1)

90-90: LGTM!

tests/e2e/skills/echo/references/guide.md (1)

1-20: LGTM!

tests/e2e/skills/summarize/SKILL.md (1)

1-21: LGTM!

tests/e2e/skills/summarize/references/guide.md (1)

1-21: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack-skills-directory.yaml (1)

1-26: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack-skills.yaml (1)

1-26: LGTM!

Comment thread docker-compose-library.yaml
Comment thread tests/e2e/features/skills.feature
## Instructions

1. Read the user's input text
2. Return the exact text back to the user without modification

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Remove redundant "back" in "return back".

"Return" already implies giving back; "return back" is pleonastic.

Proposed fix
-2. Return the exact text back to the user without modification
+2. Return the exact text to the user without modification
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
2. Return the exact text back to the user without modification
2. Return the exact text to the user without modification
🧰 Tools
🪛 LanguageTool

[style] ~17-~17: Using “back” with the verb “return” may be redundant.
Context: ...r's input text 2. Return the exact text back to the user without modification For f...

(RETURN_BACK)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/skills/echo/SKILL.md` at line 17, The SKILL.md guidance in the echo
skill uses the redundant phrase “return back”; update the wording in the
instruction text to say “Return the exact text to the user without modification”
so it stays clear and concise. Make this edit in the echo skill’s step that
describes the response behavior, keeping the rest of the instruction unchanged.

@jrobertboos jrobertboos requested a review from radofuchs June 29, 2026 14:05
refined E2E tests for skills and added necessary step implementations.

close: LCORE-2080
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants