fundamentals

Why Claude Forgets: A Complete Guide to Context Windows

30-Second Version · For the impatient

Claude didn't forget. Your words simply fell outside its context window.

Ryan Holt · June 02, 2026

Full Explanation +

01 · Why did this happen?

The Context Window is the maximum amount of text Claude can process in a single conversation, measured in tokens. Tokens aren't equivalent to words or characters — roughly 0.75 English words make one Token, while one Chinese character equals approximately 1–2 tokens. Claude Sonnet 4.6 supports up to 200,000 tokens per conversation.

The more critical concept: Claude has no persistent memory across conversations. Every new conversation is a blank slate. The context window is the only source of information Claude can draw from — if you haven't placed something inside it, Claude has no awareness it exists.

02 · What is the mechanism?

The Context Window exists because of how the Transformer architecture fundamentally works. Processing input requires computing "attention" across the entire input sequence — determining which parts of the text relate to which other parts. The computational cost of this scales quadratically with input length, making it extremely expensive. The context window limit is an engineering trade-off between compute cost, memory usage, and inference speed — not an arbitrary design choice.

Training data structure also plays a role: models are trained on text sequences with length constraints, and extending reasoning beyond those lengths requires specialized techniques like RoPE position encoding extensions. This is why context window sizes vary dramatically across models — they reflect real differences in technical investment and infrastructure.

03 · How does it affect me?

Understanding the Context Window directly shapes how you structure tasks and organize prompts. If you habitually dump large amounts of information into a single conversation, you'll find that output quality quietly degrades as the window fills — Claude starts missing details and giving vaguer answers, but it won't proactively warn you that it's running low on context.

For developers, context window size directly affects API costs. Both input and output tokens are billed, and filling the context on every request accumulates expenses quickly. Learning to include only the information genuinely needed for the current task is a core cost-control skill.

For general users, the most practical implication is this: long conversations will inevitably hit a quality inflection point. It's not a Claude failure — it's a characteristic of the tool. Knowing this in advance, you'll proactively start fresh conversations at the right moment instead of spending time wondering why Claude suddenly seems less capable.

04 · What should I do?

Immediately actionable adjustments:

Front-load critical information: At the start of every new conversation, include your role context, project background, and Output Format requirements in the first message. Claude pays the most attention to the beginning — don't bury critical information in the middle.
Process long documents in chunks: For documents over ~3,000 words, process them in sections. At the end of each section, ask Claude to summarize the key points before moving to the next.
Start fresh proactively: When a conversation has grown long and you notice output quality declining, don't keep pushing in the same session. Open a new conversation and bring only the essential conclusions forward.
Use System Prompts for fixed instructions (developers): Move standing instructions into the System Prompt to reduce per-turn Token consumption and ensure the model sees them consistently.
Monitor token counts via API: Check the prompt_tokens field in the usage response. Proactively manage context before truncation silently degrades your outputs.

Full Content +

You've probably experienced this: you're deep into a conversation with Claude, and suddenly it seems to forget something you mentioned at the very beginning. You assumed AI would remember everything, but the responses start contradicting earlier points, as if the context had been wiped.

This isn't a bug. It isn't Claude getting dumber. It's the physical constraint of the Context Window kicking in.

What Is a Context Window?

A context window is the maximum amount of text Claude can "see" during a single conversation. Think of it as Claude's working desk: the desk has a fixed surface area, and as you pile more things on, the earliest items start falling off the edge.

More precisely, a context window is measured in Tokens. Tokens aren't quite words — in English, roughly 0.75 words equal one Token; in Chinese, each character is roughly 1–2 tokens. Claude Sonnet 4.6 supports up to 200,000 tokens in its context window. That sounds enormous, but a full technical document, several dozen conversational turns, and a Block of code can fill that space faster than you'd expect.

Why the Limit Matters

Many people assume AI has some hidden long-term memory. In practice, Claude starts from zero with every new conversation. Your last conversation, something you mentioned last week, your preferences — none of that exists in Claude's "memory" unless you explicitly tell it within the current session.

The context window is the only source of information Claude can draw from. It contains three components:

System Prompt: Instructions set by the deployer before the conversation starts (usually invisible to end users)
Conversation History: Every message exchanged in the current session, from the first to the most recent
Uploaded Content: Any text you paste in or attach as documents

All three combined must fit within the Token Limit.

What Happens When the Window Fills Up?

Different systems handle this differently, but the two most common approaches:

Truncation: The earliest messages are discarded, keeping only the most recent content. This is why Claude "forgets" — it's not that the information disappeared from your view, it's that it's no longer inside Claude's window. You're reading the full conversation; Claude only sees the last several thousand tokens.

Summarization: Some systems automatically compress older conversation history into a summary and inject it at the beginning of the context. This extends effective memory but introduces information loss — nuance and specifics get dropped.

Lost in the Middle: An Underappreciated Problem

Not every token position in the context window carries equal weight. Research has shown that LLMs pay the most attention to content at the beginning and end of the context, while the middle section tends to be relatively underweighted — a phenomenon called "Lost in the Middle."

Practical implication: if you have a critical document for Claude to reference, don't bury it in the middle of a long exchange. Place it at the start of the conversation, or explicitly re-anchor Claude before asking your question: "Based on the document I included at the beginning of our conversation, please answer the following."

What This Means for Your Daily Usage

Once you understand how context windows work, you'll start interacting with Claude differently:

Break long tasks into segments: Instead of dumping an entire report at once, process it in chunks. At the end of each chunk, ask Claude to summarize the key conclusions. Use those summaries to seed the next segment, keeping the most important context alive within the window.

Front-load important information: Your role, project background, Output Format requirements — put these at the very start of every conversation. Claude won't remember them from last time, and you want them in the highest-attention zone of the context window.

Leverage System Prompts (for developers): Fixed instructions, persona definitions, formatting rules — all of this belongs in the System Prompt, not repeated in every user message. System prompts anchor the start of the context window and are consistently visible to the model.

Monitor token usage: Claude API users can see token counts in every response (prompt_tokens + completion_tokens). When you're approaching the limit, proactively reset the conversation or start a new session rather than letting truncation silently degrade output quality.

Context Windows Are Growing Fast

GPT-3's context window was 4,096 tokens. Today's leading models support 100,000 to 200,000 tokens, and research models are hitting 1 million. This growth means Claude can handle increasingly complex tasks — from analyzing a single article to processing an entire book in one session.

But even a million-token context has a boundary. Understanding that boundary — knowing when to split tasks, when to re-anchor, when to start fresh — is one of the most practical skills any serious Claude user can develop.

Diagram

Feel free to share. Please credit the source.

Ask a Question

Related Terms

Useful Resources

Claude API Status → Model Pricing → Prompt Playground → Token Counter → MCP Servers → LLM Benchmarks → Model Comparison →