Message Batches API is Anthropic's asynchronous API — submit many requests at once, the system processes them in the background, then you download results — rather than waiting for each request immediately.
Workflow: package multiple API requests into a JSONL file (each line is one request with custom_id, model, messages); submit via API for a batch_id; periodically poll batch status (in_progress/ended); download result file when status becomes ended; use custom_id to match results with original requests.
Cost advantage: 50% of standard Messages API. No minimum request count — a batch can have just one request (though pointless).
Time commitment: Anthropic guarantees completion within 24 hours, but typically faster (minutes to hours depending on system load and batch size). If you have SLA requirements (e.g., 'must complete in 2 hours'), standard real-time API is more reliable.
Supported models: currently supports major Claude models including Fable 5, Opus 4.8, Sonnet 4.6, Haiku 4.5.
How do Batch API and Prompt Caching work together, and how much can actual costs be reduced?
Batch API savings logic: cuts standard API costs in half directly. Sonnet 4.6 output $15/M becomes $7.5/M.
Prompt Caching savings logic: for fixed System Prompts over 1,024 tokens, cache hits reduce input costs by 90%.
Combined calculation (Sonnet 4.6 example, 3,000-token System Prompt + 5,000-token document + 1,000-token output per request): Standard API: ~$0.039/request; Batch + Caching (85% hit rate): ~$0.012/request — 69% savings. At 10,000 monthly requests: standard API $390/month → Batch + Caching $120/month, saving $270/month.
Extreme cases: if batch task volume is very high and cache hit rate is maintained above 90%, combined effective cost can reach around 8-10% of standard real-time API list price.
What are Batch API's usage limitations and caveats? When is it not suitable?
Technical limits: maximum 100,000 requests or 256MB per JSONL file — larger batches need splitting. Result files retained for ~29 days; must download within retention period. If some requests in a batch fail, successful requests are still billed, failed ones aren't — need mechanism to resubmit failures.
Special feature support: some beta features in Batch API require special handling. Extended output (300K output tokens, Opus 4.8/4.7/4.6 and Sonnet 4.6) requires adding output-300k-2026-03-24 beta header in batch requests.
Absolutely unsuitable Batch API scenarios: any interaction where users wait at screen — chat interfaces, real-time Q&A, any user-triggered operations → use standard API. Tasks requiring guaranteed SLA (e.g., 'must complete in 10 minutes') — Batch API's 24-hour SLA can't meet this. Sequential workflows requiring next step based on previous step results — async nature makes it unsuitable.
Best Batch API scenario characteristics: tasks completely independent from each other; results don't need to be visible minutes after triggering; large task volumes (dozens to tens of thousands of requests).
How do you implement the complete Batch API flow in Python?
Complete Python implementation:
import anthropic, json, time
client = anthropic.Anthropic()
# Step 1: Prepare batch requests
requests = []
for i, document in enumerate(documents_to_analyze):
requests.append({
"custom_id": f"doc-{i}",
"params": {
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"system": [{"type": "text", "text": "...",
"cache_control": {"type": "ephemeral"}}],
"messages": [{"role": "user", "content": document}]
}
})
# Step 2: Submit batch
batch = client.messages.batches.create(requests=requests)
batch_id = batch.id
# Step 3: Poll status
while True:
status = client.messages.batches.retrieve(batch_id)
if status.processing_status == "ended": break
time.sleep(60)
# Step 4: Download and process results
results = {}
for result in client.messages.batches.results(batch_id):
if result.result.type == "succeeded":
results[result.custom_id] = result.result.message.content[0].text
else:
print(f"Request {result.custom_id} failed: {result.result.error}")
Notes: client.messages.batches.results() is streaming — handles large result files without loading all into memory. In production, use webhooks (not polling) to be notified when batch completes.
A market research company needing weekly analysis of 5,000 consumer survey questionnaires (open-ended questions) to extract topic classifications, sentiment tendencies, and keywords:
Previous approach (standard real-time API): 5,000 questionnaires, average 500 token input + 200 token output, using Sonnet 4.6: ~$22.5/week. But bigger problem: 5,000 serial API calls plus rate limiting can take hours. Analysts wait until noon Monday to start working.
With Batch API + Prompt Caching: 1,500-token System Prompt (analysis framework) with Prompt Caching enabled, added to batch requests. Batch API: 50% of real-time cost; Prompt Caching hits (90% assumed): 90% savings on System Prompt portion. Actual cost: ~$8/week, 64% savings. Timing: submit Sunday 10pm, results ready Monday 7am — analysts can start as soon as they arrive.
Dual benefit: 64% cost reduction, plus 'waiting time' transformed from workflow disruption into 'background processing, completely imperceptible.'
Batch API's core trade-off: cost vs timeliness. The 50% cost discount comes with a 'maximum 24 hours' processing time commitment — completely unsuitable for real-time scenarios. The design logic is clear: Anthropic trades timeliness for flexibility in off-peak resource usage, beneficial to both sides. For developers, evaluating Batch API applicability comes down to one core question: 'Does this task's result need to be available within minutes, or is hours or even tomorrow acceptable?' If the latter, Batch API is essentially a zero-risk cost optimization choice.