Claude 3.5 Haiku
Claude 3.5 Haiku is worth evaluating for coding, rag, and agents when its provider route and context window match the workload.
Use it for
- Teams evaluating coding, rag, and agents
- Workloads that can use a 200k context window
- Buyers comparing 4 tracked provider routes
Do not use it for
- Workloads where another current model has stronger sourced task evidence
- Family
- Claude 3.5
- Released
- 2024-10-22
- Context
- 200k
- Architecture
- Decoder Only
- Specialization
- general
- Training
- finetuned
Cheapest of 6 routes · Anthropic · cache read $0.080
About
Claude 3.5 Haiku is Anthropic's latest AI model, known for its speed and efficiency while maintaining high intelligence. It is optimized for applications needing rapid response, like interactive chatbots and real-time content moderation. Initially text-only, future plans include image input capabilities. It excels in delivering fast, accurate code suggestions, processing and categorizing information swiftly, and handling large volumes of user interactions. Priced accessibly, it offers advanced coding, tool use, and reasoning abilities. Though initially surpassing Claude 3 Haiku in benchmarks, its pricing reflects its enhanced performance 123457.
Claude 3.5 Haiku is Anthropic's fastest and most cost-efficient model in the Claude 3.5 family, released on October 22, 2024. It supports a 200,000-token context window, processes both text and image inputs, and supports tool use, parallel function calls, and the Message Batches API. These features make it well-suited for high-throughput applications: interactive chatbots, real-time content moderation, automated code review, and large-scale data classification pipelines where latency and cost matter more than frontier accuracy.
On capability benchmarks, Claude 3.5 Haiku surpasses Claude 3 Opus on several evaluations including HumanEval (87.6%) while maintaining Haiku-tier response speed. It is particularly strong on coding, structured output, and agentic tool use with self-correction, outperforming Claude 3 Haiku substantially on instruction-following quality. The knowledge cutoff is July 2024.
Pricing as of late 2024 was $0.80 per million input tokens and $4.00 per million output tokens. Prompt caching reduces effective input cost by up to 90%; the Message Batches API offers an additional 50% reduction for non-latency-sensitive workloads. The model is available through Anthropic's direct API, AWS Bedrock, Google Vertex AI, Fireworks AI, OpenRouter, and the Vercel AI Gateway.
Claude 3.5 Haiku has a 200k-token context window.
Claude 3.5 Haiku input tokens at $0.8/1M, output at $4/1M.
Top use-case fit: coding, agents, and build tasks
Coding
Included by capability and metadata signals in the decision map.
RAG
Included by capability and metadata signals in the decision map.
Agents
Included by capability and metadata signals in the decision map.
Provider price ladder
Compare all 6Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.
| Provider | Input / 1M | Output / 1M | Batch in / out | Cache | Route |
|---|---|---|---|---|---|
| Anthropic | $0.800 | $4.00 | $0.400 / $2.00 | read $0.080 / 5m $1.00 / 1h $1.60 | Serverless |
| AWS Bedrock | $0.800 | $4.00 | - | - | Serverless |
| GCP Vertex AI | $0.800 | $4.00 | - | - | Serverless |
| OpenRouter | $0.800 | $4.00 | - | - | Serverless |
Capabilities
Benchmark peer barsfor Coding
No task-mapped benchmark peers are available for this model yet.
Migration checks
No linked migration route is available for this model yet.