fundamentals

Emergent Capabilities: Why Scaling AI Models Suddenly Unlocks Abilities That Weren't There Before

30-Second Version · For the impatient

Emergent capabilities are one of LLMs' most counterintuitive properties: a task that's "nearly impossible" on a small model suddenly becomes "quite good" on a large one — not a linear improvement, but a jump. This explains why Claude 4 can do things Claude 3 simply couldn't.

Hannah Scott · June 05, 2026

Full Explanation +

01 · Why did this happen?

Emergent Capabilities refers to the phenomenon where certain LLM capabilities are nearly zero before a model reaches a specific scale threshold, then suddenly appear and rapidly improve after crossing it. Most typical cases: multi-step arithmetic reasoning, CoT effectiveness, analogy reasoning, code semantic understanding. This non-linear capability growth pattern explains why LLM generational upgrades often bring not just "more accurate" but entirely new capabilities.

02 · What is the mechanism?

Emergent capability discoveries have profound AI Safety implications: if AI capabilities emerge non-linearly, monitoring and predicting AI capabilities becomes dramatically harder. You might think a model "doesn't yet have the capability to do something dangerous" — but once its scale crosses a threshold, that dangerous capability might suddenly appear. This is part of the rationale behind Anthropic's RSP ASL classification system: safety assessments need to happen before capabilities emerge, not in reaction to their appearance.

03 · How does it affect me?

Understanding emergent capabilities helps you make smarter model choices when using Claude. When Sonnet doesn't handle a task well, before switching to Opus, ask yourself: "Is the capability this task requires one that hasn't fully emerged at Sonnet's scale?" If so, upgrading to Opus may bring not just a linear accuracy improvement but a qualitative capability change. Conversely, if the required capability is already fully emerged at Sonnet's scale, the marginal benefit of upgrading to Opus may be limited.

04 · What should I do?

To go deeper on emergent capabilities: (1) "Emergent Abilities of Large Language Models" (Wei et al., 2022, Google) — the landmark paper; (2) "Are Emergent Abilities of Large Language Models a Mirage?" (Schaeffer et al., 2023) — critical analysis suggesting evaluation methods may influence observed emergence; (3) Anthropic's Model Cards — document capability assessments across Claude versions, showing non-linear improvements between generations.

Full Content +

In 2022, Google researchers described a phenomenon that shook the AI research community: certain language model capabilities barely exist below a certain scale threshold — then once that threshold is crossed, the capability suddenly appears and rapidly reaches considerable levels. This phenomenon is called "Emergent Capabilities."

What Is Emergence?

"Emergence" originates in complex systems theory: when you combine enough simple components, the system as a whole exhibits new properties no individual component possesses. In LLMs, emergent capabilities are abilities that are nearly zero before sufficient scale — then suddenly appear and rapidly improve past a threshold.

Most Typical Emergent Capability Cases

Multi-step arithmetic: Near-random accuracy below the threshold, suddenly high accuracy above it — discontinuous, not linear.

Chain-of-Thought reasoning: CoT prompting has almost no effect on small models; on large models it dramatically improves reasoning accuracy. The effectiveness of CoT itself is an emergent capability.

Analogy reasoning: "Beijing is to China as Tokyo is to (?)" — barely possible in small models; suddenly highly accurate in large models.

Code understanding and generation: Non-linear improvement past a scale threshold, where models suddenly understand code semantics rather than just syntactic patterns.

Why Does Emergence Happen?

Multi-task combination hypothesis: Complex capabilities are combinations of simple sub-capabilities. Only when the model is large enough to simultaneously master all necessary sub-capabilities does the combined capability appear.

Noise threshold hypothesis: Small models are actually doing the relevant reasoning, just at too low an accuracy to be useful — once scale pushes accuracy past a practical threshold, capability shifts from "unusable" to "usable," appearing sudden.

What Emergent Capabilities Mean for Understanding Claude

Claude 4 vs Claude 3 differences aren't just "a bit more accurate" — certain capabilities qualitatively emerged. Why CoT sometimes works and sometimes doesn't: if the needed reasoning capacity hasn't emerged at your model's scale, forcing CoT won't help much. Why AI progress isn't linear: capabilities appear at breakthrough nodes, suddenly expanding application domains.

Diagram

Feel free to share. Please credit the source.

Ask a Question

Related Terms