The basic unit AI uses to process text — roughly 3/4 of an English word, or about 1-2 tokens per Chinese character.
Full Explanation+
01 · What is this?
A token is the smallest unit a language model uses to process text — it doesn't equal a character, a word, or a sentence. English tokenization roughly follows this logic: common words are one token (cat = 1 token), while uncommon words get split (tokenization = 4 tokens). Chinese characters are typically 1-2 tokens each, because tokenizers were historically optimized for English.
The most practical reason to understand tokens is cost: Anthropic API billing is by token, not by character count, not by message count. The tokens in your prompt plus the tokens in Claude's response equals the cost of that API call. A 1,000-character Chinese article is roughly 1,500-2,000 tokens; the same information in English is about 1,300 tokens — Chinese costs 20-50% more than English.
02 · Why does it exist?
At its core, a language model operates on vectors (embeddings), and the input unit is a token, not a character. If characters were used, the vocabulary would explode in size, making it impossible for the model to learn effectively. Tokenization is an engineering trade-off: using common subword units allows processing of all languages while keeping the vocabulary manageable (typically 50,000-100,000 tokens). English has the most training data so tokenizers are most efficient for English; CJK languages have lower token efficiency, directly affecting API costs.
03 · How does it affect your decisions?
Token counts affect three key decisions:
Cost estimation: Before building an API application, estimate tokens in your prompt and Claude's expected response, multiply by the API unit price. Don't use character counts — for Chinese text, character count times 1.5-2 gives a closer approximation.
Context Window allocation: Every token in your System Prompt is billed again on every API call. 2,000-token System Prompt times 10,000 calls per day = 20 million input tokens per day just from the System Prompt. Streamlining it is the most direct cost optimization lever.
Language selection: If your application is bilingual, the same content in English uses 20-50% fewer tokens than in Chinese. Many developers write System Prompts in English even for Chinese-facing products — a reasonable cost efficiency decision.
04 · What should you do?
Three immediately actionable steps:
Use Anthropic's token counting tool: In the Console Workbench, entering your prompt directly shows the token count. Calculate your cost structure before deploying — don't discover unexpected costs after launch.
Compress your System Prompt: Remove redundant explanations and examples. Replace descriptive text with directive text. Reply in Traditional Chinese uses more than twice fewer tokens than a verbose equivalent instruction.
Track token usage: Log the input/output token counts from the usage field in every API response. When a conversation's input tokens exceed a preset threshold, trigger a prompt for the user to start a new conversation, or automatically compress conversation history.
Real-World Example+
Scenario: Building an AI customer service chatbot processing 10,000 conversations per day.
Initial setup: System Prompt 3,000 tokens, average conversation 5 turns (50 user + 200 Claude tokens per turn), using claude-sonnet-4-5.
Cost calculation:
- Input tokens per conversation: System Prompt (3,000) + 5-turn history (1,250) = 4,250 tokens
- Output tokens per conversation: 5 replies (1,000) = 1,000 tokens
- Cost per conversation: ~$0.028, Daily: $280, Monthly: ~$8,400
After System Prompt optimization (compressed from 3,000 to 800 tokens):
- Monthly cost: ~$6,300
- Monthly savings: $2,100 (~25% reduction)
Key insight: System Prompt is a cost multiplier — it gets billed again on every single call. Compressing it is the most direct cost optimization, requiring no architectural changes.
Diagram
Feel free to share. Please credit the source.
Common Misconceptions+
✕ Misconception 1
× Myth 1: Tokens equal word/character count, so you can estimate costs using character counts. 100 English words is about 75 tokens; 100 Chinese characters is about 100-200 tokens. Using character counts to estimate Chinese API costs may underestimate by 50-100%.
✕ Misconception 2
× Myth 2: Input tokens are cheaper than output tokens, so a long System Prompt is fine. Every token in your System Prompt is billed again on every API call. 3,000-token System Prompt times 10,000 calls per day = 30 million input tokens per day just from the System Prompt.
The Missing Link+
Direct Impact
Token efficiency vs. expression completeness: Compressing prompts saves cost but may lose important context. Every compression requires testing. It's not about being as short as possible, but achieving the required behavior with the minimum tokens.
Chinese vs. English prompts: English costs less, but a Chinese System Prompt lets Claude respond more naturally in Chinese, typically producing better output quality. The cost-quality trade-off requires evaluation based on your specific scenario.
Generate Share Card
Claude MeGlossary
新手
Token
Token
Basic unit of AI text processing
API costs billed per token, not per character
CJK uses 30-50% more tokens than English
Compressing System Prompt = direct cost reduction
The Missing Link
Tokens are AI's billing currency: the same info costs 30-50% more in Chinese than English. Calculate before you build.