Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions contrib/reasoning-bank/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# reasoning-bank (contrib)

A Java implementation of **ReasoningBank**, a memory mechanism that lets agents learn from both
successful *and* failed trajectories and apply those lessons to new, similar tasks.

> Ouyang et al. "ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory", ICLR 2026.
> Paper: <https://arxiv.org/abs/2509.25140> · Blog: <https://research.google/blog/reasoningbank-enabling-agents-to-learn-from-experience/>
> Reference implementation: <https://github.com/google-research/reasoning-bank>

This module is **dependency-free** beyond ADK core: the LLM-backed judge and extractor use ADK's
`BaseLlm`, so they add no new model-client dependencies. Embedding-based retrieval (the one piece
that needs the Vertex SDK) is intentionally left to a future sibling module.

## What it provides

| Type | Purpose |
|---|---|
| `ReasoningMemoryItem` | A distilled memory item with the paper's `title` / `description` / `content` schema, plus `sourceTraceSuccessful` and provenance (`sourceTraceId`, `judgeVerdict`, `judgeConfidence`, `trust`) so a judge-minted item is auditable and evictable. |
| `ReasoningTrace` | A raw task trajectory (task, output, intermediate reasoning, success flag) kept for distillation. |
| `BaseReasoningBankService` / `InMemoryReasoningBankService` | Storage + retrieval (`storeMemoryItem`, `storeTrace`, `searchMemoryItems`). The in-memory impl uses bag-of-words keyword scoring — **not production-grade**; the reference uses embedding retrieval. |
| `TrajectoryJudge` (+ `LlmTrajectoryJudge`) | LLM-as-a-judge for the **judge** step. Returns a three-state `Verdict` (SUCCESS / FAILURE / INDETERMINATE). Ports the reference's asymmetric-strictness rubric: *mark failure when uncertain — a false success poisons future behavior.* |
| `MemoryExtractor` (+ `LlmMemoryExtractor`, `NoOpMemoryExtractor`) | The **extract** step. Routes by trajectory count/outcome to the `SUCCESSFUL_SI` / `FAILED_SI` / `PARALLEL_SI` prompts (generalized off WebArena), capped in code (3 single / 5 parallel) and never-throwing. |
| `ReasoningBankPlugin` | Wires the whole loop into the agent lifecycle: auto-retrieve (read-only) + opt-in consolidation. |
| `LoadReasoningMemoryTool` | Optional `FunctionTool` exposing retrieval to agents as `loadReasoningMemory(query)` for explicit/manual use. |

## The closed loop

`ReasoningBankPlugin` realizes the paper's continuous loop:

```
retrieve ──► act (agent/env) ──► judge (LLM) ──► extract (LLM) ──► consolidate
▲ │
└───────────────────────────────────────────────────────────────────────────┘
```

- **retrieve** — `beforeModelCallback` searches the bank for the latest user turn and injects the
matches (read-only, always on).
- **act** — the agent runtime.
- **judge → extract → consolidate** — `afterRunCallback` self-assesses the trajectory
(`TrajectoryJudge`), distills items (`MemoryExtractor`), and appends them (`storeMemoryItem`).
This is **opt-in and triple-gated** (`autoConsolidate` + a judge + an extractor), because enabling
writes turns a read-only system into a self-modifying one under an imperfect judge.

### Safety

Distilled memory is a stored, self-feeding channel — a poisoned item is re-injected on every future
retrieval — so the module defends the *integrity* of the write/inject path, not just accuracy:

- **De-privileged, fenced injection.** Retrieved memory is prepended as an *untrusted user content
turn* inside an escaped fence, never a system instruction (a deliberate divergence from the
reference, which injects into the system prompt).
- **Structural containment.** Each item field is sanitized so it cannot contribute a line boundary
or an invisible control character: format/zero-width/bidi controls are stripped, all line and
paragraph separators collapse to spaces, and fields are length-capped. Forged bullets, fake
preambles, role markers, and confusable/fullwidth fences all collapse to inert inline data.
- **Abstain on non-run.** A judge that errors yields `INDETERMINATE` and mints nothing, so a
non-run never fabricates a guardrail.
- **Bounded blast radius.** A per-run mint cap limits how much one (possibly wrong) verdict can
write; failure-derived guardrails are trust-demoted at retrieval (they surface only when no
success item matches the query).

These controls guarantee retrieved memory stays *untrusted data* and cannot escalate into a
system/instruction position. They do **not** stop a model from reading persuasive text inside an
item — that is the LLM's own instruction-hierarchy responsibility; the module's job is to never
present memory as authoritative.

## Not (yet) implemented

- **Embedding-based retrieval.** The in-memory service uses keyword matching; see the `screening`
function in the reference repo for the Gemini / Qwen3 embedding recipe. The default retrieval cap
is 3 items (the paper's k-ablation: more retrieved monotonically hurts).
- **MaTTS rollout fan-out and sequential refinement.** The parallel self-contrast *distillation*
seam ships (`LlmMemoryExtractor` switches to `PARALLEL_SI` when given >1 trajectory), but running
k same-task trajectories and the sequential prompts are future work.
- **Eviction policy by default.** Consolidation is append-only by default (faithful baseline). The
`ConsolidationPolicy` SPI ships with an `identity()` (append-only) default and a
`boundedByCreatedAt(n)` example; dedup/decay policies can drop in without core changes.

## Example

```java
BaseReasoningBankService bank = new InMemoryReasoningBankService();

// Retrieve-only: the agent draws on past memory, the bank is never written.
ReasoningBankPlugin retrieveOnly = new ReasoningBankPlugin(bank, "my-app");

// Or close the loop (opt-in): judge + distill + consolidate after each run.
ReasoningBankPlugin selfEvolving =
new ReasoningBankPlugin(
bank,
"my-app",
new LlmTrajectoryJudge(llm),
new LlmMemoryExtractor(llm),
/* autoConsolidate= */ true);

// Register the plugin with your Runner / App.
```
92 changes: 92 additions & 0 deletions contrib/reasoning-bank/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright 2025 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>com.google.adk</groupId>
<artifactId>google-adk-parent</artifactId>
<version>1.4.1-SNAPSHOT</version><!-- {x-version-update:google-adk:current} -->
<relativePath>../../pom.xml</relativePath>
</parent>

<artifactId>google-adk-reasoning-bank</artifactId>
<name>Agent Development Kit - Reasoning Bank</name>
<description>Reasoning Bank integration with Agent Development Kit for reusable reasoning strategies</description>

<dependencies>

<dependency>
<groupId>com.google.adk</groupId>
<artifactId>google-adk</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>com.google.auto.value</groupId>
<artifactId>auto-value-annotations</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>
<dependency>
<groupId>io.reactivex.rxjava3</groupId>
<artifactId>rxjava</artifactId>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>33.0.0-jre</version>
</dependency>
<dependency>
<groupId>com.google.truth</groupId>
<artifactId>truth</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.vintage</groupId>
<artifactId>junit-vintage-engine</artifactId>
<scope>test</scope>
</dependency>

</dependencies>

<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.jacoco</groupId>
<artifactId>jacoco-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
Loading