Skip to content

feat: Add ReasoningBank for reusable reasoning strategies#702

Open
nebrass wants to merge 13 commits into
google:mainfrom
nebrass:feature/reasoning-bank
Open

feat: Add ReasoningBank for reusable reasoning strategies#702
nebrass wants to merge 13 commits into
google:mainfrom
nebrass:feature/reasoning-bank

Conversation

@nebrass

@nebrass nebrass commented Jan 5, 2026

Copy link
Copy Markdown

Summary

This PR implements ReasoningBank in ADK Java — a memory framework that lets agents distill
reusable reasoning strategies from their past task executions (both successful and failed)
and retrieve them to guide new, similar tasks.

Ouyang et al. "ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory" (ICLR 2026).
Paper: https://arxiv.org/abs/2509.25140 · Blog: https://research.google/blog/reasoningbank-enabling-agents-to-learn-from-experience/ · Reference implementation: https://github.com/google-research/reasoning-bank

The design mirrors ADK's existing Memory feature (BaseMemoryService, InMemoryMemoryService,
LoadMemoryTool).

What is ReasoningBank?

Unlike memory mechanisms that store raw trajectories (Synapse) or only successful workflows (Agent
Workflow Memory), ReasoningBank distills compact, transferable memory items from both successes
and failures. Failure-derived items become preventative "guardrails" — e.g. "verify the page
identifier before loading more results to avoid infinite-scroll traps."

Components (com.google.adk.reasoning)

Data models

  • ReasoningMemoryItem — immutable memory item with the paper's canonical title / description / content schema, plus sourceTraceSuccessful so failure-derived preventative lessons are first-class (also id, tags, createdAt).
  • ReasoningTrace — a raw task trajectory (task, output, intermediate reasoning steps, successful flag) retained for later distillation.
  • SearchReasoningResponse — search result wrapper.

Service layer

  • BaseReasoningBankService — storage/retrieval contract: storeMemoryItem, storeTrace, searchMemoryItems.
  • InMemoryReasoningBankService — prototype implementation using bag-of-words keyword scoring (title > description > tags > content). Not production-grade — the reference implementation uses embedding-based retrieval.

Extraction SPI

  • MemoryExtractor (+ NoOpMemoryExtractor) — extension point for the "judge & extract" step of the loop; extract(query, List<ReasoningTrace>) accommodates parallel/sequential MaTTS distillation later without an API break. LLM-backed extractors are intentionally left to downstream modules to keep this contrib module dependency-free.

Tool integration (com.google.adk.tools)

  • LoadReasoningMemoryTool — a FunctionTool exposing retrieval to agents as loadReasoningMemory(query).
  • LoadReasoningMemoryResponse — tool response record.

The closed loop

retrieve ──► act (agent / env) ──► judge (LLM) ──► extract (LLM) ──► consolidate
   ▲                                                                      │
   └──────────────────────────────────────────────────────────────────────┘
  • searchMemoryItemsretrieve · the agent runtime → act · MemoryExtractorjudge & extract · storeMemoryItemconsolidate (append).

Integration

The module is self-contained and does not modify InvocationContext or ToolContext.
Agents use it by constructing LoadReasoningMemoryTool(reasoningBankService, appName) and adding it
to their tool list (constructor injection). No core ADK changes are required.

Out of scope (documented in the module README)

  • Embedding-based retrieval (the in-memory service uses keyword matching).
  • Memory-aware Test-Time Scaling (MaTTS) driver (parallel self-contrast / sequential refinement).
  • LLM-as-a-judge and LLM extraction prompts (SUCCESSFUL_SI, FAILED_SI, PARALLEL_SI, …).

Usage

BaseReasoningBankService reasoningBank = new InMemoryReasoningBankService();

// Store a distilled memory item (here, a preventative lesson from a failed run)
reasoningBank.storeMemoryItem(
        "myApp",
        ReasoningMemoryItem.builder()
            .id("pagination-guardrail")
            .title("Verify page identifier before pagination")
            .description("Confirm the active page before loading more results.")
            .content(
                "Cross-reference the current page id with active filters to avoid "
                    + "infinite-scroll traps.")
            .tags(ImmutableList.of("web", "pagination"))
            .sourceTraceSuccessful(false)
            .build())
    .blockingAwait();

// Expose retrieval to an agent
LoadReasoningMemoryTool tool = new LoadReasoningMemoryTool(reasoningBank, "myApp");
// add `tool` to your agent's tool list

Test Plan

  • ReasoningMemoryItemTest (4)
  • ReasoningTraceTest (5)
  • InMemoryReasoningBankServiceTest (12) — includes retrieval of failure-derived items
  • NoOpMemoryExtractorTest (2)
  • All 23 module unit tests pass

Related

@google-cla

google-cla Bot commented Jan 5, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @nebrass, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the foundational ReasoningBank feature, designed to enhance agent capabilities by allowing them to learn from and reuse successful problem-solving approaches. By providing mechanisms to store and retrieve distilled reasoning strategies and raw execution traces, agents can apply proven methods to new, similar tasks, thereby improving their efficiency and effectiveness. The implementation includes core data models, a service interface with an in-memory prototype, and seamless integration into the existing tool and invocation contexts.

Highlights

  • New Feature: ReasoningBank: Introduces the ReasoningBank feature, enabling agents to store and retrieve proven reasoning strategies, based on the 'Reasoning-Bank: Learning from the Traces of Thought' paper.
  • New Data Models: Added ReasoningStrategy (for distilled reasoning approaches), ReasoningTrace (for raw task execution data), and SearchReasoningResponse (for strategy search results).
  • Service Layer Implementation: Defined BaseReasoningBankService interface and provided an InMemoryReasoningBankService implementation for prototyping, utilizing keyword matching for strategy retrieval.
  • Tool Integration: Integrated LoadReasoningStrategyTool as a function tool, allowing agents to search for and load relevant strategies, along with its corresponding LoadReasoningStrategyResponse.
  • Context Updates: Modified InvocationContext to include the reasoningBankService and ToolContext to expose a searchReasoningStrategies() method for agent access.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@nebrass nebrass force-pushed the feature/reasoning-bank branch from e874b9e to 37a1f5c Compare January 5, 2026 14:19

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a ReasoningBank feature, which is a significant and well-implemented addition. The new components, including data models, services, and tool integrations, are clearly defined and follow the existing architectural patterns of the project. The code is well-documented and accompanied by a comprehensive set of unit tests, ensuring the new functionality is robust. I have a couple of minor suggestions for code refinement in the InMemoryReasoningBankService to improve conciseness and use more idiomatic Java constructs, but overall, this is excellent work.

Comment thread core/src/main/java/com/google/adk/reasoning/InMemoryReasoningBankService.java Outdated
Comment thread core/src/main/java/com/google/adk/reasoning/InMemoryReasoningBankService.java Outdated
@nebrass nebrass force-pushed the feature/reasoning-bank branch from 37a1f5c to c29b9c6 Compare January 5, 2026 14:23
@glaforge

Copy link
Copy Markdown
Contributor

Do you think you could move the contribution in the contrib folder?
In core, we'd like to keep feature that are available across all our language runtimes, and agreed upon. Here, the Reasoning Bank would be only available in Java (for now at least). So that would make sense to move it in the contribution section as this is specific to ADK Java.

@nebrass

nebrass commented Feb 11, 2026

Copy link
Copy Markdown
Author

Thanks @glaforge, that makes sense. I've moved the entire ReasoningBank contribution to contrib/reasoning-bank/:

  • Created a new contrib/reasoning-bank Maven module with its own pom.xml
  • Moved all reasoning models (ReasoningStrategy, ReasoningTrace, SearchReasoningResponse), the service interface (BaseReasoningBankService), the in-memory implementation, and the tool (LoadReasoningStrategyTool, LoadReasoningStrategyResponse) from core/ to contrib/reasoning-bank/
  • Reverted all changes to InvocationContext and ToolContext in core — no reasoning-specific code remains in core
  • Refactored LoadReasoningStrategyTool to be self-contained: it accepts the BaseReasoningBankService and appName via its constructor rather than relying on ToolContext or InvocationContext
  • All tests (both core and reasoning-bank) pass

@nebrass nebrass force-pushed the feature/reasoning-bank branch 2 times, most recently from 067c700 to e49198b Compare February 11, 2026 14:53
Add a contrib/reasoning-bank module implementing the ReasoningBank
pattern (arXiv:2509.25140) for storing and retrieving proven reasoning
strategies. Includes data models, in-memory service, and a FunctionTool
for agent integration.
@glaforge glaforge force-pushed the feature/reasoning-bank branch from 087e713 to 373fe3d Compare March 18, 2026 11:35
@glaforge

Copy link
Copy Markdown
Contributor

Looks like the paper has been updated?
https://research.google/blog/reasoningbank-enabling-agents-to-learn-from-experience/
Is it changing something for your implementation?

The ReasoningBank paper (arXiv:2509.25140) and its reference implementation
at google-research/reasoning-bank were updated; the memory item schema and
loop are now pinned. This commit aligns the contrib module.

Key changes:

* Replace ReasoningStrategy with ReasoningMemoryItem matching the paper's
  schema: title / description / content (+ tags, id, createdAt). The prior
  problemPattern + ordered 'steps' shape was closer to Agent Workflow
  Memory, which the paper explicitly positions ReasoningBank against.
* Add sourceTraceSuccessful flag on memory items. Failure-derived items
  (preventative lessons / guardrails) are first-class, matching the paper's
  emphasis on distilling insights from both successful and failed runs.
* Add MemoryExtractor SPI (+ NoOpMemoryExtractor) to represent the
  'judge & extract' step of the closed loop. LLM-backed extractors stay
  out of this module to keep it dependency-free.
* extract() takes List<ReasoningTrace> so memory-aware test-time scaling
  (MaTTS) parallel/sequential distillation can be layered on later without
  an API break.
* Rename service methods storeStrategy/searchStrategies to
  storeMemoryItem/searchMemoryItems and the tool to LoadReasoningMemoryTool.
* Update InMemoryReasoningBankService scoring: title (x3) > description
  (x2) > tags (x1) > content (flat bonus). Take a snapshot of the
  synchronized list before iterating.
* Add README covering scope, the retrieve -> act -> judge -> extract ->
  consolidate loop, and what is intentionally out of scope (embedding
  retrieval, MaTTS driver, LLM extraction prompts).

All 23 unit tests pass.
@nebrass nebrass force-pushed the feature/reasoning-bank branch from 8357b94 to e80d9c0 Compare June 18, 2026 13:52
nebrass added 4 commits June 18, 2026 21:35
Phase 0 of the closed-loop work. Additive, backward-compatible on the
unreleased schema.

* ReasoningMemoryItem gains sourceTraceId, judgeVerdict, judgeConfidence
  (all nullable) and trust (default 1.0). Provenance makes a judge-minted
  item locatable/evictable and lets failure-derived items be trust-demoted
  at retrieval -- the audit primitives the closed loop needs to be safe.
* InMemoryReasoningBankService default retrieval cap 5 -> 3, matching the
  paper's k-ablation (more retrieved memories monotonically hurt).
Phase 1 of the closed loop. Both impls use core's BaseLlm only -- no new
module dependencies -- and are fully testable offline via a FakeLlm double.

* TrajectoryJudge SPI + Verdict (three-state SUCCESS/FAILURE/INDETERMINATE).
  LlmTrajectoryJudge ports the reference judge's asymmetric-strictness rubric
  (generalized off WebArena): mark failure when uncertain. A judge that ran
  but was unparseable -> FAILURE; a judge that errored/returned nothing ->
  INDETERMINATE (abstain, mint nothing) so a non-run never fabricates a
  preventative guardrail.
* LlmMemoryExtractor implements MemoryExtractor, routing on trajectory
  count/outcome to the SUCCESSFUL_SI / FAILED_SI / PARALLEL_SI prompts,
  emitting JSON parsed via outputSchema-style typing, capped in code
  (3 single / 5 parallel) and never throwing (malformed -> empty list).
  Minted items carry provenance (sourceTraceId, judgeVerdict, outcome).

13 new tests (judge 6, extractor 7); 39 module tests pass.
Phase 2. One plugin, no ADK core edits (service captured by constructor).

* Retrieve (read-only, always on): beforeModelCallback searches the bank for
  the latest user turn and injects matches as a DE-PRIVILEGED, fenced,
  escaped 'untrusted DATA' user turn -- never a system instruction. Item text
  that tries to close the fence is neutralized, so stored memory cannot inject
  instructions into the agent (a poisoned item is re-injected forever).
* Judge -> extract -> consolidate (write, OPT-IN, triple-gated on
  autoConsolidate + judge + extractor): afterRunCallback judges the
  trajectory, and on a SUCCESS/FAILURE verdict distills and stores items;
  an INDETERMINATE verdict (judge errored) abstains and mints nothing.
  Runs off the critical path (Schedulers.io, onErrorComplete) so it never
  blocks or fails the run.
* Updates README to document the now-complete loop and the safety model.

9 new tests; 48 module tests pass.
…Phase 5)

Driven by an adversarial red-team of the memory-injection path.

* Injection containment is now structural, not marker whack-a-mole: sanitize
  strips format/zero-width/bidi (Cf) controls, collapses every line/paragraph
  separator (incl. U+2028/U+2029/U+0085) to a space, strips C0/C1 controls,
  neutralizes the exact fence markers, and length-caps fields; buildMemoryTurn
  caps item count. Attacker-controlled title/content can no longer forge a
  bullet, preamble, role marker, or confusable/fullwidth fence -- all collapse
  to inert inline data in the de-privileged user turn. 9-case corpus (C1-C12).
* Per-run mint rate-limit (maxItemsPerRun, new constructor overload; existing
  signatures preserved) bounds how much one verdict can write.
* Failure trust-demotion: a failure-derived guardrail surfaces only when no
  success item matched the query; trust() is now a live within-tier tiebreaker.
* ConsolidationPolicy SPI with append-only identity() default (faithful) and a
  boundedByCreatedAt(n) example; InMemoryReasoningBankService store path is now
  read-modify-write under its existing monitor, observationally unchanged by
  default.

20 new tests; 68 module tests pass.
@nebrass

nebrass commented Jun 18, 2026

Copy link
Copy Markdown
Author

Thanks @glaforge — good catch, and yes. Looking into it turned into a proper alignment plus building out the rest of the loop.

On the paper: arXiv:2509.25140 now has a camera-ready v2 (16 Mar 2026) and was accepted to ICLR 2026, and the blog accompanies the public release of the official reference implementation (google-research/reasoning-bank) — which didn't exist when I first opened this PR. One correction to my own PR while I was at it: the title was always "ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory" (v1 and v2 match on that) — the "Learning from the Traces of Thought" title in my original description was simply wrong, and I've fixed the description.

What it changed for the implementation — the official code + the blog's crystallized Title / Description / Content schema let me align the port and then realize the full closed loop:

  • Schema — replaced ReasoningStrategy (name/problemPattern/ordered steps, which was actually closer to Agent Workflow Memory, the baseline the paper positions against) with ReasoningMemoryItem (title/description/content) + provenance (sourceTraceId, judgeVerdict, …).
  • Learn from failure — first-class sourceTraceSuccessful; failure-derived items become preventative guardrails.
  • The loopTrajectoryJudge (LLM-as-a-judge with the reference's asymmetric "mark failure when uncertain" rubric, plus a third INDETERMINATE state so a crashed judge mints nothing), LlmMemoryExtractor (the SUCCESSFUL_SI/FAILED_SI/PARALLEL_SI distillation prompts, capped + structured output), and a ReasoningBankPlugin that wires retrieve-before / opt-in consolidate-after through the plugin callbacks — no ADK core changes.
  • Safety — retrieved memory is injected as a de-privileged, fenced, structurally-contained untrusted-data turn (never a system instruction), with a per-run mint cap and failure trust-demotion; consolidation is opt-in and append-only by default behind a ConsolidationPolicy seam.

Kept dependency-free (the LLM impls use core's BaseLlm; embedding-based retrieval and MaTTS fan-out are noted as follow-ups). All behind tests.

This did grow the PR a fair bit — happy to split it (e.g. the schema alignment first, then the judge/extractor/plugin) if that's easier to review.

nebrass added 2 commits June 18, 2026 21:36
Merging main bumped the root POM to 1.4.1-SNAPSHOT, but this module's parent
version was still 0.9.1-SNAPSHOT, breaking the reactor build. Align it with the
root and the other contrib modules.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants