feat(campaign): add typed policy edit loop#286
Conversation
tangletools
left a comment
There was a problem hiding this comment.
🟢 Value Audit — sound
| Verdict | sound |
| Concerns | 0 (none) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 108.3s (2 bridge agents) |
| Total | 108.3s |
💰 Value — sound
Adds a schema-versioned, hash-identified PolicyEdit contract + admission gate + SurfaceProposer so analyst findings become typed, measurable candidate surfaces; plugs cleanly into the existing firewall/grammar/profile-cell primitives with no real duplication.
- What it does: Introduces a typed
PolicyEditenvelope (src/analyst/policy-edit.ts:86-100) with deterministic sha256 IDs (computePolicyEditId, line 155), full validate-round-trip (validatePolicyEdit, line 162), and a fail-loud admission gate (admitPolicyEdit, line 271) that scores edits on evidence/confidence/expected-gain/target-specificity and rejects judge-derived findings, evidence-less edits, sub-threshold - Goals it achieves: (1) Make analyst-proposed edits auditable and hash-stable — every edit carries schema_version + editId + source findings + expected-gain contract, so the improvement loop can diff and attribute candidates. (2) Fail-loud pre-admission gate so unmeasurable edits (no expected gain), evidence-less guesses, judge-leak findings, and high-risk changes never become candidates — the loop measures only edit
- Assessment: Built firmly in the codebase's grain. It mirrors
AgentProfileCell's shape exactly (schemaVersion + sha256 id + validate round-trip + ValidationError subclass). It uses the existing steer firewall (assertNoJudgeVerdict, src/analyst/steer-firewall.ts:69) at both the fromFinding and fromFindings entry points, the existingparseFindingSubjectgrammar for routing, the existingAgentProfileCell - Better / existing approach: Looked for overlap with (a) SkillPatch (src/campaign/skill-patch.ts:55) — same anchor-replace idea but on skill-doc surfaces with op-bundles, not prompts/runtime-config; the ~15-line text-replace overlap is sub-scale and the surfaces are genuinely different siblings. (b) analysisEditProposer (src/campaign/proposers/analysis-edit.ts:43) — applies findings via ONE LLM call; PolicyEdit is its determi
- Model: opencode/zai-coding-plan/glm-5.2
- Bridge attempts: 2
- Bridge warning: opencode/kimi-for-coding/k2p7: bridge stream ended without value-audit content
🎯 Usefulness — sound
A typed, content-addressed, admission-gated PolicyEdit contract plus a policyEditProposer that conforms to the existing SurfaceProposer pattern and reuses the canonical FindingSubject grammar, AgentProfileCell, and steer-firewall primitives — net-new capability, no dead surface, no competing pattern
- Assessment: Net-new, coherent capability that fills a real gap: the codebase had LLM-rewrite proposers and best-effort text curators but no typed, validated, content-addressed, admission-gated edit envelope an analyst could emit and the held-out loop could measure. Will be used — it is a new option in the existing proposer menu, drivable by the same selfImprove/runOptimization entry points, with a clear first
- Integration: Fully wired and reachable. Exported from src/campaign/index.ts:160-163 and src/index.ts:93-122. Consumed through the standard runOptimization loop (src/campaign/presets/run-optimization.ts:159-170) which calls proposer.propose({ findings }) like it does for every other proposer; no new loop wiring required. The proposer reads ctx.findings and accepts both pre-typed PolicyEdit objects and legacy An
- Fit with existing patterns: Built in the grain of the codebase. Routes via the existing typed FindingSubject grammar (parseFindingSubject) rather than inventing a new locus scheme — exactly the fix finding-subject.ts documents as necessary. Reuses AgentProfileCell as the canonical target identity, assertNoJudgeVerdict (steer-firewall.ts:69) plus a defense-in-depth derivedFromJudge admission check (policy-edit.ts:281-283), an
- Real-world viability: Holds up past the happy path. Validation recomputes the content-addressed ID and rejects mismatches (policy-edit.ts:188-191), enforces enum/range/JSON-compat invariants, idempotent text append/prepend via includes guard (policy-edit.ts:472,476), sameSurface no-op skip in the proposer (policy-edit.ts:53). Steer firewall at two layers. No-silent-fallback doctrine honored: findings lacking typed expe
- Model: opencode/zai-coding-plan/glm-5.2
- Bridge attempts: 1
No concerns — sound change, no better or existing approach found. ✅
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
✅ No Blockers —
|
tangletools
left a comment
There was a problem hiding this comment.
✅ Approved — 14 non-blocking findings — a99e2245
Full multi-shot audit completed 6/6 planned shots over 8 changed files. Global verifier still owns final merge decision.
Full immutable report for this review: trace
Summary comment for this run: full summary
tangletools · 2026-06-28T19:47:35Z · immutable trace
tangletools
left a comment
There was a problem hiding this comment.
✅ Refreshed approval after new commits — abdd9f59
A previous trusted approval on this PR was invalidated by new commits.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: stale_approval_refresh · 2026-06-28T19:53:13Z
tangletools
left a comment
There was a problem hiding this comment.
🟢 Value Audit — sound
| Verdict | sound |
| Concerns | 1 (1 weak-concern) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 115.5s (2 bridge agents) |
| Total | 115.5s |
💰 Value — sound
Adds a typed, hash-addressed PolicyEdit contract + deterministic SurfaceProposer that turns analyst findings into measured candidates without an LLM apply step — reuses every existing primitive (AgentProfileCell, canonicalize, parseFindingSubject, assertNoJudgeVerdict) and conforms exactly to the Su
- What it does: Introduces a
PolicyEditenvelope (src/analyst/policy-edit.ts, 798 lines) — a schema-versioned, sha256-content-addressed contract for analyst-proposed edits. Each edit carries an axis (representation/tool_contract/budget/...), a target (surface + path + optional canonicalAgentProfileCell), a typed change (text append/prepend/replace or JSON set/merge/remove at a dotted path), an expected gain, - Goals it achieves: Give campaigns a DETERMINISTIC, AUDITABLE path from analyst findings to measured candidate surfaces — the counterpart to the LLM-brokered apply step in
analysisEditProposer/traceAnalystProposer/haloProposer. Today those proposers hand findings to an LLM and accept whatever text comes back; this PR lets an analyst emit a typed edit (or a finding carrying typedmetadata.policyEdit) and have - Assessment: Fits the codebase's grain precisely. Every needed primitive already exists and is reused rather than reinvented:
AgentProfileCell+validateAgentProfileCellfor deployment identity (src/agent-profile-cell.ts:47-56),canonicalizefrom pre-registration for the content hash (src/pre-registration.ts:83),parseFindingSubject+ theFindingSubjectgrammar for routing (src/analyst/finding-subjec - Better / existing approach: none — this is the right approach. I checked for overlap with two nearby primitives: (1)
src/campaign/skill-patch.ts(SkillPatchOpadd/delete/replace) — it is line-anchored, single-surface, multi-line-splice with partial-apply semantics forskillOptProposer;PolicyEditChangeis substring-anchored with dedup-via-includes for many surfaces plus a JSON-path mode skill-patch has no equivalent - Model: opencode/zai-coding-plan/glm-5.2
- Bridge attempts: 2
- Bridge warning: opencode/kimi-for-coding/k2p7: bridge stream ended without value-audit content
🎯 Usefulness — sound
Adds a typed, content-addressed PolicyEdit contract + policyEditProposer that fills a real gap (no existing typed finding→edit pipeline exists), plugs into the established SurfaceProposer factory pattern, and respects the judge-firewall invariant.
- Integration: Cleanly wired into the established surface.
policyEditProposerreturns aSurfaceProposer(src/campaign/proposers/policy-edit.ts:36-69), the same interface used by every other proposer (gepa, ace, skillOpt, memoryCuration, parameterSweep, fapo, halo, traceAnalyst, evolutionary) — verified againstsrc/campaign/types.ts:286and the proposer registry insrc/campaign/index.ts:130-174. It is e - Fit with existing patterns: Fits the codebase grain precisely. It is the deterministic counterpart to the existing LLM-apply path (
src/campaign/proposers/analysis-edit.ts:17—APPLY_SYSTEMrewrites the prompt via an LLM from findings). The PR's typed-edit alternative lets an analyst emit a content-addressed edit and skip the LLM rewrite. It reuses the established steer firewall (assertNoJudgeVerdictat `src/analyst/pol - Real-world viability: Holds up beyond the happy path. Validation runs on every make/admit/apply path (
policy-edit.ts:152,275,308); deterministic SHA-256 editId is asserted to match content atvalidatePolicyEdit:188. Edge cases are handled: missing expected gain drops the finding rather than fabricating one (policyEditFromFinding:225); idempotent append/prepend checksurface.includes(change.value)(`applyTextCha - Model: opencode/zai-coding-plan/glm-5.2
- Bridge attempts: 1
💰 Value Audit
🟡 Readiness-score weights and target-specificity bonuses are ungrounded magic numbers [maintenance] ``
scorePolicyEditReadiness (src/analyst/policy-edit.ts:262-268) uses fixed weights 0.30/0.25/0.25/0.20 with risk penalties -0.35/-0.20; targetSpecificityScore (line 730-737) uses 0.4/0.25/0.15/0.2/0.1 bonuses. These are heuristics with no documented empirical basis. The calibration test (tests/policy-edit.test.ts:46-71) only pins strong>0.7 and weak<0.3, which is the real contract; the internal weights could drift silently. Not blocking — defaults are tunable via PolicyEditAdmissionOptions and the
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
✅ No Blockers —
|
tangletools
left a comment
There was a problem hiding this comment.
✅ Approved — 13 non-blocking findings — abdd9f59
Full multi-shot audit completed 6/6 planned shots over 8 changed files. Global verifier still owns final merge decision.
Full immutable report for this review: trace
Summary comment for this run: full summary
tangletools · 2026-06-28T20:08:03Z · immutable trace
Summary
PolicyEditcontract for typed analyst-proposed edits with deterministic IDs, expected gain, risk, source evidence, and optional canonicalAgentProfileCelltargetpolicyEditProposerso campaigns can turn admitted typed edits or compatible analyst findings into candidate surfaces.claude/worktrees/**from Vitest discovery so normal test runs do not execute duplicate worktree copiesChecks
pnpm exec biome check --write src/analyst/policy-edit.ts src/campaign/proposers/policy-edit.ts src/analyst/index.ts src/campaign/index.ts src/index.ts tests/policy-edit.test.ts tests/campaign/policy-edit-proposer.test.tspnpm vitest run tests/policy-edit.test.ts tests/campaign/policy-edit-proposer.test.tspnpm typecheckpnpm lintpnpm testNODE_OPTIONS=--max-old-space-size=8192 pnpm buildpnpm verify:packagegit merge-tree --write-tree origin/main HEAD