Skip to content

Feature request: expose extended prompt-cache TTL (1h) for orchestrator-shaped sessions #344

@ppatel-volley

Description

@ppatel-volley

Use case

We run a multi-agent build orchestrator on the Agent SDK: a long-lived main session dispatches specialist subagents (via the Agent tool) and waits for them to return. Those waits routinely exceed the 5-minute prompt-cache TTL.

Measured impact

Profiling one production-shaped session (37 API turns, ~430k-token context, usage taken from the SDK transcript JSONL):

  • Two dispatch-wait gaps of ~8–9 minutes each expired the cache; the next turns show cache_read_input_tokens: 0 and cache_creation_input_tokens of ~426k and ~437k respectively — full-context re-writes at the 1.25× creation rate.
  • Those two events account for ~75% of the session's total cache-creation spend. Steady-state turns are cache-clean (creations of a few hundred tokens against growing reads), so the TTL is the dominant avoidable cost for this workload shape.

The Anthropic API's extended-TTL beta (1-hour cache_control TTL) would convert these re-writes into reads, but as far as we can tell neither Options in sdk.d.ts nor any documented env var exposes it through the SDK.

Ask

A way to opt a session into the extended cache TTL — an Options field, or honouring an env var passed through to the underlying client. Happy to provide fuller traces if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions