feat: Add ReasoningBank for reusable reasoning strategies#702
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @nebrass, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces the foundational ReasoningBank feature, designed to enhance agent capabilities by allowing them to learn from and reuse successful problem-solving approaches. By providing mechanisms to store and retrieve distilled reasoning strategies and raw execution traces, agents can apply proven methods to new, similar tasks, thereby improving their efficiency and effectiveness. The implementation includes core data models, a service interface with an in-memory prototype, and seamless integration into the existing tool and invocation contexts. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
e874b9e to
37a1f5c
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a ReasoningBank feature, which is a significant and well-implemented addition. The new components, including data models, services, and tool integrations, are clearly defined and follow the existing architectural patterns of the project. The code is well-documented and accompanied by a comprehensive set of unit tests, ensuring the new functionality is robust. I have a couple of minor suggestions for code refinement in the InMemoryReasoningBankService to improve conciseness and use more idiomatic Java constructs, but overall, this is excellent work.
37a1f5c to
c29b9c6
Compare
|
Do you think you could move the contribution in the |
|
Thanks @glaforge, that makes sense. I've moved the entire ReasoningBank contribution to
|
067c700 to
e49198b
Compare
Add a contrib/reasoning-bank module implementing the ReasoningBank pattern (arXiv:2509.25140) for storing and retrieving proven reasoning strategies. Includes data models, in-memory service, and a FunctionTool for agent integration.
087e713 to
373fe3d
Compare
Resolves a build failure caused by an unresolvable parent POM version of 0.5.1-SNAPSHOT in the contrib/reasoning-bank module.
|
Looks like the paper has been updated? |
The ReasoningBank paper (arXiv:2509.25140) and its reference implementation at google-research/reasoning-bank were updated; the memory item schema and loop are now pinned. This commit aligns the contrib module. Key changes: * Replace ReasoningStrategy with ReasoningMemoryItem matching the paper's schema: title / description / content (+ tags, id, createdAt). The prior problemPattern + ordered 'steps' shape was closer to Agent Workflow Memory, which the paper explicitly positions ReasoningBank against. * Add sourceTraceSuccessful flag on memory items. Failure-derived items (preventative lessons / guardrails) are first-class, matching the paper's emphasis on distilling insights from both successful and failed runs. * Add MemoryExtractor SPI (+ NoOpMemoryExtractor) to represent the 'judge & extract' step of the closed loop. LLM-backed extractors stay out of this module to keep it dependency-free. * extract() takes List<ReasoningTrace> so memory-aware test-time scaling (MaTTS) parallel/sequential distillation can be layered on later without an API break. * Rename service methods storeStrategy/searchStrategies to storeMemoryItem/searchMemoryItems and the tool to LoadReasoningMemoryTool. * Update InMemoryReasoningBankService scoring: title (x3) > description (x2) > tags (x1) > content (flat bonus). Take a snapshot of the synchronized list before iterating. * Add README covering scope, the retrieve -> act -> judge -> extract -> consolidate loop, and what is intentionally out of scope (embedding retrieval, MaTTS driver, LLM extraction prompts). All 23 unit tests pass.
8357b94 to
e80d9c0
Compare
Phase 0 of the closed-loop work. Additive, backward-compatible on the unreleased schema. * ReasoningMemoryItem gains sourceTraceId, judgeVerdict, judgeConfidence (all nullable) and trust (default 1.0). Provenance makes a judge-minted item locatable/evictable and lets failure-derived items be trust-demoted at retrieval -- the audit primitives the closed loop needs to be safe. * InMemoryReasoningBankService default retrieval cap 5 -> 3, matching the paper's k-ablation (more retrieved memories monotonically hurt).
Phase 1 of the closed loop. Both impls use core's BaseLlm only -- no new module dependencies -- and are fully testable offline via a FakeLlm double. * TrajectoryJudge SPI + Verdict (three-state SUCCESS/FAILURE/INDETERMINATE). LlmTrajectoryJudge ports the reference judge's asymmetric-strictness rubric (generalized off WebArena): mark failure when uncertain. A judge that ran but was unparseable -> FAILURE; a judge that errored/returned nothing -> INDETERMINATE (abstain, mint nothing) so a non-run never fabricates a preventative guardrail. * LlmMemoryExtractor implements MemoryExtractor, routing on trajectory count/outcome to the SUCCESSFUL_SI / FAILED_SI / PARALLEL_SI prompts, emitting JSON parsed via outputSchema-style typing, capped in code (3 single / 5 parallel) and never throwing (malformed -> empty list). Minted items carry provenance (sourceTraceId, judgeVerdict, outcome). 13 new tests (judge 6, extractor 7); 39 module tests pass.
Phase 2. One plugin, no ADK core edits (service captured by constructor). * Retrieve (read-only, always on): beforeModelCallback searches the bank for the latest user turn and injects matches as a DE-PRIVILEGED, fenced, escaped 'untrusted DATA' user turn -- never a system instruction. Item text that tries to close the fence is neutralized, so stored memory cannot inject instructions into the agent (a poisoned item is re-injected forever). * Judge -> extract -> consolidate (write, OPT-IN, triple-gated on autoConsolidate + judge + extractor): afterRunCallback judges the trajectory, and on a SUCCESS/FAILURE verdict distills and stores items; an INDETERMINATE verdict (judge errored) abstains and mints nothing. Runs off the critical path (Schedulers.io, onErrorComplete) so it never blocks or fails the run. * Updates README to document the now-complete loop and the safety model. 9 new tests; 48 module tests pass.
…Phase 5) Driven by an adversarial red-team of the memory-injection path. * Injection containment is now structural, not marker whack-a-mole: sanitize strips format/zero-width/bidi (Cf) controls, collapses every line/paragraph separator (incl. U+2028/U+2029/U+0085) to a space, strips C0/C1 controls, neutralizes the exact fence markers, and length-caps fields; buildMemoryTurn caps item count. Attacker-controlled title/content can no longer forge a bullet, preamble, role marker, or confusable/fullwidth fence -- all collapse to inert inline data in the de-privileged user turn. 9-case corpus (C1-C12). * Per-run mint rate-limit (maxItemsPerRun, new constructor overload; existing signatures preserved) bounds how much one verdict can write. * Failure trust-demotion: a failure-derived guardrail surfaces only when no success item matched the query; trust() is now a live within-tier tiebreaker. * ConsolidationPolicy SPI with append-only identity() default (faithful) and a boundedByCreatedAt(n) example; InMemoryReasoningBankService store path is now read-modify-write under its existing monitor, observationally unchanged by default. 20 new tests; 68 module tests pass.
|
Thanks @glaforge — good catch, and yes. Looking into it turned into a proper alignment plus building out the rest of the loop. On the paper: arXiv:2509.25140 now has a camera-ready v2 (16 Mar 2026) and was accepted to ICLR 2026, and the blog accompanies the public release of the official reference implementation (google-research/reasoning-bank) — which didn't exist when I first opened this PR. One correction to my own PR while I was at it: the title was always "ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory" (v1 and v2 match on that) — the "Learning from the Traces of Thought" title in my original description was simply wrong, and I've fixed the description. What it changed for the implementation — the official code + the blog's crystallized Title / Description / Content schema let me align the port and then realize the full closed loop:
Kept dependency-free (the LLM impls use core's This did grow the PR a fair bit — happy to split it (e.g. the schema alignment first, then the judge/extractor/plugin) if that's easier to review. |
Merging main bumped the root POM to 1.4.1-SNAPSHOT, but this module's parent version was still 0.9.1-SNAPSHOT, breaking the reactor build. Align it with the root and the other contrib modules.
Summary
This PR implements ReasoningBank in ADK Java — a memory framework that lets agents distill
reusable reasoning strategies from their past task executions (both successful and failed)
and retrieve them to guide new, similar tasks.
The design mirrors ADK's existing Memory feature (
BaseMemoryService,InMemoryMemoryService,LoadMemoryTool).What is ReasoningBank?
Unlike memory mechanisms that store raw trajectories (Synapse) or only successful workflows (Agent
Workflow Memory), ReasoningBank distills compact, transferable memory items from both successes
and failures. Failure-derived items become preventative "guardrails" — e.g. "verify the page
identifier before loading more results to avoid infinite-scroll traps."
Components (
com.google.adk.reasoning)Data models
ReasoningMemoryItem— immutable memory item with the paper's canonicaltitle/description/contentschema, plussourceTraceSuccessfulso failure-derived preventative lessons are first-class (alsoid,tags,createdAt).ReasoningTrace— a raw task trajectory (task, output, intermediate reasoning steps,successfulflag) retained for later distillation.SearchReasoningResponse— search result wrapper.Service layer
BaseReasoningBankService— storage/retrieval contract:storeMemoryItem,storeTrace,searchMemoryItems.InMemoryReasoningBankService— prototype implementation using bag-of-words keyword scoring (title>description>tags>content). Not production-grade — the reference implementation uses embedding-based retrieval.Extraction SPI
MemoryExtractor(+NoOpMemoryExtractor) — extension point for the "judge & extract" step of the loop;extract(query, List<ReasoningTrace>)accommodates parallel/sequential MaTTS distillation later without an API break. LLM-backed extractors are intentionally left to downstream modules to keep this contrib module dependency-free.Tool integration (
com.google.adk.tools)LoadReasoningMemoryTool— aFunctionToolexposing retrieval to agents asloadReasoningMemory(query).LoadReasoningMemoryResponse— tool response record.The closed loop
searchMemoryItems→ retrieve · the agent runtime → act ·MemoryExtractor→ judge & extract ·storeMemoryItem→ consolidate (append).Integration
The module is self-contained and does not modify
InvocationContextorToolContext.Agents use it by constructing
LoadReasoningMemoryTool(reasoningBankService, appName)and adding itto their tool list (constructor injection). No core ADK changes are required.
Out of scope (documented in the module README)
SUCCESSFUL_SI,FAILED_SI,PARALLEL_SI, …).Usage
Test Plan
ReasoningMemoryItemTest(4)ReasoningTraceTest(5)InMemoryReasoningBankServiceTest(12) — includes retrieval of failure-derived itemsNoOpMemoryExtractorTest(2)Related
BaseMemoryService,InMemoryMemoryService,LoadMemoryTool).