Extended Thinking is a feature introduced in the Claude 4 series, letting the model perform longer internal reasoning in a "thinking space" before generating a final answer. Design inspiration: when humans solve complex problems, they typically draft, list reasoning steps, verify each step, then give conclusions — rather than directly "thinking up" an answer.
Standard mode vs Extended Thinking mode: in standard mode, Claude receives input and directly generates output — sufficient for simple tasks (translation, summarization, format conversion). In Extended Thinking mode, Claude first reasons in a hidden (optionally visible) space — analyzing problem dimensions, considering solution paths, verifying intermediate steps — then generates the final answer based on this reasoning. This makes it significantly better at multi-step derivation tasks.
Why does "thinking one more step" make answers more accurate? In LLM generation mechanics, each output token is predicted based on all previous tokens. When Claude "thinks before answering," reasoning steps become anchors for subsequent generation — each step makes the next step's starting point more accurate rather than jumping directly from a vague question to an answer. Same fundamental reason Chain-of-Thought Prompting works.
Where does Extended Thinking show the most significant effects, and where is it not worth enabling?
Most significant task types: mathematics and formal reasoning (multi-step calculations, formula derivation, logical proofs — verifying each step before giving the answer dramatically reduces error rates); complex algorithm design (systematically exploring solution spaces); multi-constraint decision problems; rigorous argumentation analysis.
Limited effect / not worth enabling: translation, format conversion, text summarization (don't require deep reasoning; adding Extended Thinking only increases cost and latency with almost no quality improvement); creative writing (Extended Thinking's logical nature may make creative output overly structured); simple factual queries.
How to correctly enable Extended Thinking in the Claude API? What parameters need to be set?
Enabling Extended Thinking in Anthropic Python SDK:
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": "your question"}]
)
Key parameter details: budget_tokens — maximum tokens the thinking process can use; too small (1,000 tokens) may not allow deep reasoning; recommendation: 5,000-10,000 for medium-complexity tasks, 10,000-32,000 for high-difficulty tasks. max_tokens — total token limit for the entire response including thinking and final answer; set large enough, at least budget_tokens + 1,000.
Cost calculation: thinking process tokens are also billed (counted as input tokens); actual cost 2-5× standard mode.
Viewing the thinking process: the response has a thinking content block with Claude's reasoning text. Read from this block to display to users; ignore if you only need the final answer.
What's the relationship between Extended Thinking and Chain-of-Thought Prompting? Are they the same thing?
Same underlying principle, different implementations:
Chain-of-Thought Prompting: a prompting technique — add "think step by step" or provide reasoning-process demonstration examples in prompts. CoT is prompt-driven; output is in normal response text; no special API parameters needed.
Extended Thinking: a model-level feature — lets Claude reason in a separate "thinking space" that can be much longer than the final answer; enables more "drafting and backtracking" (the model can "try and fail" in the thinking space); has more direct influence on final output. Requires API's thinking parameter.
Key differences: CoT reasoning steps are part of the output (visible in the response); Extended Thinking reasoning is in a separate block (optionally hidden from users). Extended Thinking's thinking space is larger, model has more freedom, typically stronger than CoT. For API users with deep-reasoning tasks, Extended Thinking typically outperforms Zero-Shot CoT; for claude.ai users, try manually triggering CoT behavior with "think step by step" instructions.
A data scientist needs Claude to help design an algorithm for identifying anomalies in noisy time series data. This involves multiple possible approaches (statistical, machine learning, rule-based) and different conditions of applicability (data volume, noise type, real-time requirements).
Without Extended Thinking: Claude might directly recommend "Isolation Forest or LOF algorithm," provide code, explain this is a standard anomaly detection method. Correct in many scenarios, but skips evaluating whether these methods fit this specific problem.
With Extended Thinking: in the thinking space, Claude first analyzes key problem dimensions: time series characteristics (seasonality, trends), anomaly types (point anomalies, contextual anomalies, collective anomalies), real-time requirements, data volume. Then evaluates several methods' applicable conditions and limitations. Final answer is conditional: "if your data has clear seasonality, use STL decomposition then anomaly detection; if data is stationary, Isolation Forest is a good choice; if real-time requirements exist, consider RRCF."
The second answer is significantly higher quality, but without Extended Thinking, Claude may not have sufficient "space" for this systematic method evaluation.
Extended Thinking's core trade-off: reasoning depth vs cost and speed. When enabled, thinking process tokens are also billed (typically 2-5× input tokens); response time is also longer. For high-accuracy-required tasks where cost and speed aren't primary concerns, this cost is entirely worthwhile. For high-frequency simple tasks, this cost is unnecessary. Most effective usage: a tiered strategy — only enable Extended Thinking for requests classified as "high complexity requiring deep reasoning"; use standard mode for other requests. This keeps overall costs controlled while giving deep-reasoning tasks the highest quality output.