RFC: Continuity Verification Module for Sandboxed Agents (Pre‑/Post‑Execution Drift Detection)

### Summary

Add an optional continuity verification module to agt-sandbox that captures a cryptographic hash of the agent's observer identity and reference frame (policy, delegation, external state) before sandbox execution, re‑hashes after execution, and detects drift (unauthorised changes). If drift is detected, the module outputs a structured JSON trace with continuity_valid: false, a deterministic DENY decision, and machine‑actionable failure codes. This closes the continuity gap that sandboxes alone do not address.

### Motivation

agt-sandbox isolates untrusted agent code (Hyperlight, Docker, Azure Container Apps) and enforces tool call restrictions + egress allow lists. However, isolation does not verify continuity – i.e., whether the same authorised observer remains in control across the entire execution lifecycle. Policy versions, delegation chains, external reference states, or the observer’s own identity can drift (even inside a sandbox) without triggering any re‑authorisation.

Current AGT provides policy enforcement and execution sandboxing, but it does not offer a built‑in mechanism to detect whether the agent’s authority context has mutated between the start and end of a sandboxed run. This leaves a governance gap: an agent can appear operationally normal while the legal or contractual basis for its action has already changed.

The proposed module closes this gap by adding a lightweight, pluggable continuity check that produces a replayable, auditable trace – compatible with open‑source verification tools (e.g., DecisionAssure) and ready for compliance regimes such as EU AI Act (Article 12, tamper‑evident audit trail).

### Detailed Design

1. New module: agt_continuity

Add a new optional module inside agt-sandbox (or as a separate package that integrates with AGT). The module provides:

Pre‑execution hook: captures and stores two hashes:

observer_identity_hash: SHA‑256 of the agent’s identity + session + memory state (canonical JSON)
reference_frame_hash: SHA‑256 of the active policy version, delegation chain, and external reference state (canonical JSON)
Post‑execution hook: recomputes the same hashes using the current state and compares them to the pre‑execution values.
Diff detection: if either hash changed, the module records a diff object (reference_frame_diff) showing exactly which fields mutated (e.g., policy_version from v1 to v2, or delegation_chain changed).
Trace output: outputs a JSON trace following the DecisionAssure schema v1.1 (or a simplified AGT‑native format) containing:

continuity_valid (boolean)
decision (DENY if drift, else ALLOW)
reference_frame_diff (if any)
control_objective_id (optional, e.g., "CO-001" for authority continuity)
recommended_next_action (e.g., "reauthorize", "escalate_to_human")
2. Integration with agt-sandbox

Introduce a configuration flag: --enable-continuity-verification (default: false for backward compatibility).
When enabled, the sandbox runtime calls the pre‑hook before executing the untrusted code and the post‑hook after code completion (or on any early termination).
The trace is written to a configurable output location (stdout, file, or forward to a collector).
3. Compatibility with DecisionAssure (optional)

The emitted trace is designed to be consumable by the open‑source DecisionAssure verifier (governance_score_cli.py), which computes a 0–100 governance score and returns structured findings. This allows AGT users to leverage an existing auditing tool without extra development.

4. Example trace (simplified)

{
  "step_index": 1,
  "sandbox_execution_id": "agt-sbx-001",
  "continuity_valid": false,
  "observer_identity_hash": "0xabcd...",
  "reference_frame_hash": "0xdeadbeef...",
  "reference_frame_diff": {
    "policy_version": { "old": "v1", "new": "v2" }
  },
  "decision": "DENY",
  "recommended_next_action": "reauthorize",
  "control_objective_id": "CO-001"
}

Note:

control_objective_id is optional. It maps a continuity failure to a specific governance control (e.g., "CO-001" for authority binding continuity).
The decision field is deterministic (DENY if drift detected, ALLOW otherwise).
recommended_next_action provides a human‑readable suggestion; downstream systems may act on it or ignore it.

### Alternatives Considered


Rely only on post‑execution logging (no hash comparison) – Does not prove that authority remained unchanged; logs are mutable and not cryptographically anchored.

Require external tool (e.g., DecisionAssure) as a mandatory dependency – Would create lock‑in and increase complexity; AGT should remain self‑contained. The module makes the feature optional and format‑compatible without forcing the external tool.

Perform continuity check only at commit boundary (outside sandbox) – Misses intra‑sandbox drift (e.g., policy mutation during agent execution).

Use probabilistic anomaly detection instead of deterministic hashing – Would introduce false positives/negatives and weaken auditability; governance requires deterministic evidence.

### Security Implications

Trust model: The pre‑ and post‑execution hashes are computed inside the sandbox’s trusted environment (same trust boundary as the sandbox itself). The trace is written after execution; tampering with the trace after the fact does not affect the sandbox decision but would be detectable via hash comparison if the trace is signed (future extension).

Cryptographic boundaries: No new keys are introduced. The module uses SHA‑256, which is already present in AGT’s ecosystem.

Attack surface: The additional hooks run before and after untrusted code; they do not increase the attack surface of the sandbox because they operate in the same isolated process/container. Malicious code cannot alter the pre‑execution hashes because they are captured before it runs.

Fail‑closed behaviour: If drift is detected, the module can optionally raise an exception or return a non‑zero exit code, allowing the orchestrator to halt execution (deterministic DENY). This aligns with “fail‑closed” governance principles.


### Migration / Backward Compatibility

No breaking changes: The module is opt‑in via a new configuration flag. Existing workflows that do not enable the flag behave exactly as before.
API additions: New functions (pre‑hook, post‑hook, trace writer) are additive; no existing public API is modified.
Trace format: The output trace is compatible with DecisionAssure v1.1, but AGT does not require that tool. Users can ignore the trace or process it with their own scripts.

### Scope

Single package

### Target Placement

New package

### Prior Art

DecisionAssure Trace Schema v1.1 – Defines the JSON format used for continuity traces, including reference_frame_diff, control_objective_id, and KEV attribution.
GitHub: https://github.com/a1k7/DecisionAssure-Runtime-Governance

SCQOS: A constitutional gate that tracks reference frame hashes and observer identity, forming the foundation for pre-commit continuity checks.

NIST AI RMF: Highlights the need for traceable accountability and continuity; this proposal provides technical controls to satisfy these requirements.

GLM (Governance Layer Manifest): Provides machine-readable boundary declarations; this proposal's external_anchors concept is inspired by GLM's pattern of providing “informational reference without authority transfer.”

Collusion Interceptor (Akhilesh Warik / DecisionAssure): A real-time multi-agent detection tool that establishes a clear, independent lineage for the core concepts of runtime drift detection. Github link:https://github.com/a1k7/collusion-interceptor

Governance Score CLI (Akhilesh Warik / DecisionAssure): Computes a 0–100 governance score from any agentic trace, demonstrating a practical method for quantifying system integrity.It computes a 0–100 governance score from the trace, providing a quantitative measure of continuity, evidence freshness, and rollback viability.Github Link:https://github.com/a1k7/governance-score

### Checklist

- [x] I have searched existing issues and RFCs for duplicates
- [x] I have read the ADR index (adr/index.md) for related decisions
- [x] I am willing to implement this RFC or help review an implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Continuity Verification Module for Sandboxed Agents (Pre‑/Post‑Execution Drift Detection) #2873

Summary

Motivation

Detailed Design

Alternatives Considered

Security Implications

Migration / Backward Compatibility

Scope

Target Placement

Prior Art

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

RFC: Continuity Verification Module for Sandboxed Agents (Pre‑/Post‑Execution Drift Detection) #2873

Description

Summary

Motivation

Detailed Design

Alternatives Considered

Security Implications

Migration / Backward Compatibility

Scope

Target Placement

Prior Art

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions