Bible Network Crypto DeFi Onchain RWA AI Agent Stablecoin Chain SAFU CryptoTax DeFAI AGI Claude Me Claude Skill Claude Design Claude Cowork
Independent Media
Not affiliated with any project
Exploring the Frontier of AI Intelligence
claude-me.com
LATEST
2026 Claude Model Family Deep Dive: What's New, When to Switch, and What It Costs  ·  Claude API Production Deployment: Engineering Checklist from Prototype to Stable Launch  ·  Five Common Claude Mistakes Beginners Make (And How to Fix Them)  ·  Claude Enterprise vs Team: Which Plan Does Your Company Actually Need? Past This Scale You Must Upgrade  ·  Using Claude for Deep Research and Knowledge Synthesis: From Multi-Source Information to Opinionated Analysis Reports  ·  Mechanistic Interpretability: Why Anthropic is Dissecting Claude's 'Brain' — Frontier AI Explainability Research
Glossary · core-concepts

Extended Thinking

core-concepts Intermediate

30-Second Version · For the impatient
A feature letting Claude perform longer internal reasoning in a "thinking space" before giving a final answer. When enabled, Claude first generates a reasoning process (optionally visible) before delivering the final answer. Research shows that on tasks requiring deep reasoning, Extended Thinking can make Claude Sonnet 4.5 outperform Claude Opus 4 without the feature — achieving higher accuracy at lower cost.
Full Explanation +
01 · What is this?

Extended Thinking is a feature introduced in the Claude 4 series, letting the model perform longer internal reasoning in a "thinking space" before generating a final answer. Design inspiration: when humans solve complex problems, they typically draft, list reasoning steps, verify each step, then give conclusions — rather than directly "thinking up" an answer.

Standard mode vs Extended Thinking mode: in standard mode, Claude receives input and directly generates output — sufficient for simple tasks (translation, summarization, format conversion). In Extended Thinking mode, Claude first reasons in a hidden (optionally visible) space — analyzing problem dimensions, considering solution paths, verifying intermediate steps — then generates the final answer based on this reasoning. This makes it significantly better at multi-step derivation tasks.

Why does "thinking one more step" make answers more accurate? In LLM generation mechanics, each output token is predicted based on all previous tokens. When Claude "thinks before answering," reasoning steps become anchors for subsequent generation — each step makes the next step's starting point more accurate rather than jumping directly from a vague question to an answer. Same fundamental reason Chain-of-Thought Prompting works.

02 · Why does it exist?

Where does Extended Thinking show the most significant effects, and where is it not worth enabling?

Most significant task types: mathematics and formal reasoning (multi-step calculations, formula derivation, logical proofs — verifying each step before giving the answer dramatically reduces error rates); complex algorithm design (systematically exploring solution spaces); multi-constraint decision problems; rigorous argumentation analysis.

Limited effect / not worth enabling: translation, format conversion, text summarization (don't require deep reasoning; adding Extended Thinking only increases cost and latency with almost no quality improvement); creative writing (Extended Thinking's logical nature may make creative output overly structured); simple factual queries.

03 · How does it affect your decisions?

How to correctly enable Extended Thinking in the Claude API? What parameters need to be set?

Enabling Extended Thinking in Anthropic Python SDK:

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "your question"}]
)

Key parameter details: budget_tokens — maximum tokens the thinking process can use; too small (1,000 tokens) may not allow deep reasoning; recommendation: 5,000-10,000 for medium-complexity tasks, 10,000-32,000 for high-difficulty tasks. max_tokens — total token limit for the entire response including thinking and final answer; set large enough, at least budget_tokens + 1,000.

Cost calculation: thinking process tokens are also billed (counted as input tokens); actual cost 2-5× standard mode.

Viewing the thinking process: the response has a thinking content block with Claude's reasoning text. Read from this block to display to users; ignore if you only need the final answer.

04 · What should you do?

What's the relationship between Extended Thinking and Chain-of-Thought Prompting? Are they the same thing?

Same underlying principle, different implementations:

Chain-of-Thought Prompting: a prompting technique — add "think step by step" or provide reasoning-process demonstration examples in prompts. CoT is prompt-driven; output is in normal response text; no special API parameters needed.

Extended Thinking: a model-level feature — lets Claude reason in a separate "thinking space" that can be much longer than the final answer; enables more "drafting and backtracking" (the model can "try and fail" in the thinking space); has more direct influence on final output. Requires API's thinking parameter.

Key differences: CoT reasoning steps are part of the output (visible in the response); Extended Thinking reasoning is in a separate block (optionally hidden from users). Extended Thinking's thinking space is larger, model has more freedom, typically stronger than CoT. For API users with deep-reasoning tasks, Extended Thinking typically outperforms Zero-Shot CoT; for claude.ai users, try manually triggering CoT behavior with "think step by step" instructions.

Real-World Example +

A data scientist needs Claude to help design an algorithm for identifying anomalies in noisy time series data. This involves multiple possible approaches (statistical, machine learning, rule-based) and different conditions of applicability (data volume, noise type, real-time requirements).

Without Extended Thinking: Claude might directly recommend "Isolation Forest or LOF algorithm," provide code, explain this is a standard anomaly detection method. Correct in many scenarios, but skips evaluating whether these methods fit this specific problem.

With Extended Thinking: in the thinking space, Claude first analyzes key problem dimensions: time series characteristics (seasonality, trends), anomaly types (point anomalies, contextual anomalies, collective anomalies), real-time requirements, data volume. Then evaluates several methods' applicable conditions and limitations. Final answer is conditional: "if your data has clear seasonality, use STL decomposition then anomaly detection; if data is stationary, Isolation Forest is a good choice; if real-time requirements exist, consider RRCF."

The second answer is significantly higher quality, but without Extended Thinking, Claude may not have sufficient "space" for this systematic method evaluation.

Common Misconceptions +
✕ Misconception 1
× Misconception 1: Extended Thinking makes Claude's answers longer, producing more 'filler.' Extended Thinking increases the thinking process (in a separate block), not final answer length. The final answer can be as concise as without Extended Thinking — the difference is that this answer is derived from more thorough reasoning, not that it's more verbose in presentation. You can choose not to show the thinking process to users; they see only a more accurate final answer.
✕ Misconception 2
× Misconception 2: Extended Thinking is always better when enabled; it should always be on. Extended Thinking adds almost nothing to tasks not requiring reasoning (translation, summarization, format conversion) but increases cost and latency. In production environments, dynamically decide whether to enable based on each request type — enable for deep-reasoning requests, disable for simple tasks. The most efficient usage strategy.
The Missing Link +
Direct Impact

Extended Thinking's core trade-off: reasoning depth vs cost and speed. When enabled, thinking process tokens are also billed (typically 2-5× input tokens); response time is also longer. For high-accuracy-required tasks where cost and speed aren't primary concerns, this cost is entirely worthwhile. For high-frequency simple tasks, this cost is unnecessary. Most effective usage: a tiered strategy — only enable Extended Thinking for requests classified as "high complexity requiring deep reasoning"; use standard mode for other requests. This keeps overall costs controlled while giving deep-reasoning tasks the highest quality output.

Ask a Question
Please enter at least 10 characters