Use case
We run a multi-agent build orchestrator on the Agent SDK: a long-lived main session dispatches specialist subagents (via the Agent tool) and waits for them to return. Those waits routinely exceed the 5-minute prompt-cache TTL.
Measured impact
Profiling one production-shaped session (37 API turns, ~430k-token context, usage taken from the SDK transcript JSONL):
- Two dispatch-wait gaps of ~8–9 minutes each expired the cache; the next turns show
cache_read_input_tokens: 0 and cache_creation_input_tokens of ~426k and ~437k respectively — full-context re-writes at the 1.25× creation rate.
- Those two events account for ~75% of the session's total cache-creation spend. Steady-state turns are cache-clean (creations of a few hundred tokens against growing reads), so the TTL is the dominant avoidable cost for this workload shape.
The Anthropic API's extended-TTL beta (1-hour cache_control TTL) would convert these re-writes into reads, but as far as we can tell neither Options in sdk.d.ts nor any documented env var exposes it through the SDK.
Ask
A way to opt a session into the extended cache TTL — an Options field, or honouring an env var passed through to the underlying client. Happy to provide fuller traces if useful.
Use case
We run a multi-agent build orchestrator on the Agent SDK: a long-lived main session dispatches specialist subagents (via the Agent tool) and waits for them to return. Those waits routinely exceed the 5-minute prompt-cache TTL.
Measured impact
Profiling one production-shaped session (37 API turns, ~430k-token context, usage taken from the SDK transcript JSONL):
cache_read_input_tokens: 0andcache_creation_input_tokensof ~426k and ~437k respectively — full-context re-writes at the 1.25× creation rate.The Anthropic API's extended-TTL beta (1-hour
cache_controlTTL) would convert these re-writes into reads, but as far as we can tell neitherOptionsinsdk.d.tsnor any documented env var exposes it through the SDK.Ask
A way to opt a session into the extended cache TTL — an
Optionsfield, or honouring an env var passed through to the underlying client. Happy to provide fuller traces if useful.