What is a token and how do I know how many I have used?
A token is the smallest unit a language model uses to process text — not quite a word and not quite a character, but somewhere in between. In English, roughly 4 characters is 1 token, and a common word is typically 1-2 tokens. A page of English text is approximately 400-600 tokens.
To see how many you have used, the most direct method is Anthropic's official tokenizer tool for estimates, or check the usage field in API responses (which includes input_tokens and output_tokens). Regular Claude.ai users won't see a token count directly, but remembering that 'long documents = many tokens = a lot of space consumed' is enough for practical use.
I pasted a very long document and Claude seems to have missed parts of it. Why?
Two possible reasons. First, the context window overflowed: if your document plus earlier conversation already exceeds the limit, the parts that don't fit get cut off and Claude can only read as far as it reaches. Second, a subtler issue: even if the document fits within the context limit, Claude's attention across a very long input isn't distributed evenly — research shows it pays more attention to the beginning and end of a document, and may underweight the middle.
Practical fix: if your question is only about one section of the document, paste only that section. If you need Claude to synthesize the full document, break it into sections, have Claude summarize each separately, then combine the summaries. This is more reliable than dumping everything in at once.
Is there a difference in context window size between Claude Pro and the free version?
Yes, though the specific numbers change with model updates. Generally, Pro gives access to newer model versions that may have larger context windows; the free version may be limited to older models or smaller context limits. Current Claude flagship models support context windows that can reach hundreds of thousands of tokens, but the specific limit depends on Anthropic's current documentation.
The more practical question: for everyday conversation, even the context window available on older versions is very large, and most users won't hit the limit in normal use. The cases where you're more likely to run into it: you're doing code review and pasting large amounts of code, analyzing long reports, or running a conversation that has gone dozens of turns. In those cases, Pro's larger available space has real practical value.
Advanced: does the system prompt also take up context window space? What does this mean for developers?
Yes, the system prompt also occupies context window space, and it is a fixed cost — every conversation pays for those tokens regardless of whether your system prompt is 100 tokens or 5,000 tokens.
Practical implications for developers: the longer your system prompt, the less space remains for conversation history and user input, and the higher the cost per API call. This is why writing system prompts is an art of conciseness — keep the most critical instructions and rules, cut the redundant explanation. A 5,000-token system prompt and a 500-token one may perform similarly, but the former significantly raises your API costs. Anthropic's prompt caching feature can reduce repeated billing for system prompts, but that is a separate topic.
If you have used Claude for a while, have you ever noticed this: partway through a long conversation, Claude starts 'forgetting' things you said earlier, gives answers that contradict previous ones, or the quality just seems to drop? Almost all of this relates to one thing: the context window — Claude's working memory.
The goal of this article is to help you genuinely understand what a context window is, what goes inside it, what happens when it fills up, and a few practical habits that will make you a smarter user.
The most intuitive analogy: the context window is like a notepad you and Claude share, with a fixed size measured in tokens (Claude's limit ranges from hundreds of thousands to around two million tokens depending on the version). Every message you send, every reply Claude gives, every document you paste in takes up space on this notepad.
A token is the basic unit language models use to process text — it doesn't map exactly to words. In English, roughly every 4 characters is one token, and a typical word is 1-2 tokens. A page of English text is roughly 400-600 tokens. Keep this in mind when you are pasting large documents.
Every time you send a message, Claude doesn't just see that one message — it reads everything on the shared notepad: the system prompt (if there is one, setting Claude's role and rules), all messages in the conversation from the beginning (both yours and Claude's), any documents or files you pasted or uploaded, and your new message.
This is why Claude can 'remember' what you said earlier — it isn't real memory; it re-reads the whole notepad every time. An important implication: Claude has no memory across separate conversations. Every new conversation starts with a blank notepad.
When the conversation plus your documents exceeds the context window limit, there is no obvious error message. What usually happens: the earliest conversation turns are quietly dropped to make room for new content. You may not notice immediately, until Claude gives an answer that ignores something you clearly stated earlier.
This is why answer quality often drops in the later stages of a long conversation — Claude hasn't gotten stupider; it simply can no longer 'see' the important premises you set up at the beginning.
First, start a new conversation at natural breakpoints. Don't let one conversation run indefinitely. When you sense Claude is starting to 'forget,' or when the topic has shifted, open a new conversation and bring in the necessary context with a brief summary. This almost always works better than continuing to push a long conversation.
Second, watch the size of what you paste. Every document you paste takes up context space. If you have a 50-page report but only want to ask about one section, paste only that section — not the whole thing. Or have Claude summarize the long document first, then continue the conversation with the summary.
Third, put important things close to the end. Not all content in the context window is weighted equally — generally, content closer to the end of the conversation gets more attention from Claude. If you have a critical constraint or requirement, repeat it near each point where it matters, rather than stating it once at the start and assuming it persists.
Understanding the context window means you know that Claude's 'forgetting' is not a bug — it is a basic characteristic of how it works. This lets you actively manage conversation quality rather than being confused when problems appear. The simplest way to remember it: Claude re-reads the whole notepad on every reply; your job is to make sure the notepad contains what it needs, and that you are not filling it with things it doesn't.