Glossary · Claude Tools

Message Batches API

Q: Why does Message Batches API matter?

Message Batches API is Anthropic 's asynchronous API — submit many requests at once, the system processes them in the background, then you download results — rather than waiting for each request immediately. Workflow: package multiple API requests into a JSONL file (each line is one request with custom_id, model, messages); submit via API for a batch_id; periodically poll batch status (in_progress/ended); download result file when status becomes ended; use custom_id to match results with original requests. Cost advantage: 50% of standard Messages API. No minimum request count — a batch can have just one request (though pointless). Time commitment: Anthropic guarantees completion within 24 hours, but typically faster (minutes to hours depending on system load and batch size). If you have SLA requirements (e.g., 'must complete in 2 hours'), standard real-time API is more reliable. Supported models: currently supports major Claude models including Fable 5, Opus 4.8, Sonnet 4.6, Haiku 4.5.

Claude Tools Intermediate

30-Second Version · For the impatient

<a href="/en/glossary/core-concepts/anthropic/">Anthropic</a>'s asynchronous <a href="https://claudecowork-me.com/en/glossary/workflow-automation/batch-processing/" target="_blank">Batch Processing</a> API, priced at 50% of standard real-time API, with the trade-off of no guaranteed immediate response (typically completing within 24 hours). Suited for high-volume tasks not requiring real-time feedback: nightly data analysis, bulk document summarization, offline content generation. Combined with <a href="/en/glossary/core-concepts/prompt-caching/">Prompt Caching</a>, can compress actual costs to 5-15% of list price — the highest-savings <a href="/en/glossary/claude-tools/claude-api/">Claude API</a> usage mode.

Full Explanation +

01 · What is this?

Message Batches API is Anthropic's asynchronous API — submit many requests at once, the system processes them in the background, then you download results — rather than waiting for each request immediately.

Workflow: package multiple API requests into a JSONL file (each line is one request with custom_id, model, messages); submit via API for a batch_id; periodically poll batch status (in_progress/ended); download result file when status becomes ended; use custom_id to match results with original requests.

Cost advantage: 50% of standard Messages API. No minimum request count — a batch can have just one request (though pointless).

Time commitment: Anthropic guarantees completion within 24 hours, but typically faster (minutes to hours depending on system load and batch size). If you have SLA requirements (e.g., 'must complete in 2 hours'), standard real-time API is more reliable.

Supported models: currently supports major Claude models including Fable 5, Opus 4.8, Sonnet 4.6, Haiku 4.5.

02 · Why does it exist?

How do Batch API and Prompt Caching work together, and how much can actual costs be reduced?

Batch API savings logic: cuts standard API costs in half directly. Sonnet 4.6 output $15/M becomes $7.5/M.

Prompt Caching savings logic: for fixed System Prompts over 1,024 tokens, cache hits reduce input costs by 90%.

Combined calculation (Sonnet 4.6 example, 3,000-Token System Prompt + 5,000-token document + 1,000-token output per request): Standard API: ~$0.039/request; Batch + Caching (85% hit rate): ~$0.012/request — 69% savings. At 10,000 monthly requests: standard API $390/month → Batch + Caching $120/month, saving $270/month.

Extreme cases: if batch task volume is very high and cache hit rate is maintained above 90%, combined effective cost can reach around 8-10% of standard real-time API list price.

03 · How does it affect your decisions?

What are Batch API's usage limitations and caveats? When is it not suitable?

Technical limits: maximum 100,000 requests or 256MB per JSONL file — larger batches need splitting. Result files retained for ~29 days; must download within retention period. If some requests in a batch fail, successful requests are still billed, failed ones aren't — need mechanism to resubmit failures.

Special feature support: some beta features in Batch API require special handling. Extended output (300K output tokens, Opus 4.8/4.7/4.6 and Sonnet 4.6) requires adding output-300k-2026-03-24 beta header in batch requests.

Absolutely unsuitable Batch API scenarios: any interaction where users wait at screen — chat interfaces, real-time Q&A, any user-triggered operations → use standard API. Tasks requiring guaranteed SLA (e.g., 'must complete in 10 minutes') — Batch API's 24-hour SLA can't meet this. Sequential workflows requiring next step based on previous step results — async nature makes it unsuitable.

Best Batch API scenario characteristics: tasks completely independent from each other; results don't need to be visible minutes after triggering; large task volumes (dozens to tens of thousands of requests).

04 · What should you do?

How do you implement the complete Batch API flow in Python?

Complete Python implementation:

import <a href="/en/glossary/core-concepts/anthropic/">Anthropic</a>, json, time
client = <a href="/en/glossary/core-concepts/anthropic/">Anthropic</a>.Anthropic()

# Step 1: Prepare batch requests
requests = []
for i, document in enumerate(documents_to_analyze):
    requests.append({
        "custom_id": f"doc-{i}",
        "params": {
            "model": "claude-sonnet-4-6",
            "max_tokens": 1024,
            "system": [{"type": "text", "text": "...",
                        "cache_control": {"type": "ephemeral"}}],
            "messages": [{"role": "user", "content": document}]
        }
    })

# Step 2: Submit batch
batch = client.messages.batches.create(requests=requests)
batch_id = batch.id

# Step 3: Poll status
while True:
    status = client.messages.batches.retrieve(batch_id)
    if status.processing_status == "ended": break
    time.sleep(60)

# Step 4: Download and process results
results = {}
for result in client.messages.batches.results(batch_id):
    if result.result.type == "succeeded":
        results[result.custom_id] = result.result.message.content[0].text
    else:
        print(f"Request {result.custom_id} failed: {result.result.error}")

Notes: client.messages.batches.results() is streaming — handles large result files without loading all into memory. In production, use webhooks (not polling) to be notified when batch completes.

Real-World Example +

A market research company needing weekly analysis of 5,000 consumer survey questionnaires (open-ended questions) to extract topic classifications, sentiment tendencies, and keywords:

Previous approach (standard real-time API): 5,000 questionnaires, average 500 token input + 200 token output, using Sonnet 4.6: ~$22.5/week. But bigger problem: 5,000 serial API calls plus rate limiting can take hours. Analysts wait until noon Monday to start working.

With Batch API + Prompt Caching: 1,500-token System Prompt (analysis framework) with Prompt Caching enabled, added to batch requests. Batch API: 50% of real-time cost; Prompt Caching hits (90% assumed): 90% savings on System Prompt portion. Actual cost: ~$8/week, 64% savings. Timing: submit Sunday 10pm, results ready Monday 7am — analysts can start as soon as they arrive.

Dual benefit: 64% cost reduction, plus 'waiting time' transformed from workflow disruption into 'background processing, completely imperceptible.'

Common Misconceptions +

✕ Misconception 1

× Misconception 1: Batch API is only for very large volumes (tens of thousands of requests) — small batches aren't worth it. Batch API has no minimum request requirement — even 10 requests, as long as they don't need immediate response, benefits from 50% cost savings. 'Batch' implies 'only worthwhile with large volume,' but any task not requiring immediate response benefits, regardless of size.

✕ Misconception 2

× Misconception 2: Batch API's 24-hour SLA is too slow; few business scenarios fit. 24 hours sounds long, but many business tasks genuinely don't need immediate completion — daily scheduled reports (ready next morning), document batch classification (processed afternoon, viewed next day), offline content generation (submitted today, used tomorrow). For these scheduled batch tasks, Batch API isn't just cheaper — it also lets you concentrate compute needs in off-peak hours, avoiding rate limiting from large volumes of real-time requests.

The Missing Link +

Direct Impact

Batch API's core trade-off: cost vs timeliness. The 50% cost discount comes with a 'maximum 24 hours' processing time commitment — completely unsuitable for real-time scenarios. The design logic is clear: Anthropic trades timeliness for flexibility in off-peak resource usage, beneficial to both sides. For developers, evaluating Batch API applicability comes down to one core question: 'Does this task's result need to be available within minutes, or is hours or even tomorrow acceptable?' If the latter, Batch API is essentially a zero-risk cost optimization choice.

Ask a Question

Related Terms

Useful Resources

Claude API Status → Model Pricing → Prompt Playground → Token Counter → MCP Servers → LLM Benchmarks → Model Comparison →