Claude 3.5 Haiku

Name: Claude 3.5 Haiku
Author: Anthropic

Released

2024-10-22

Last refreshed

2026-06-29

Status

Researched 47d ago

ProprietaryCommercial use: conditionalCodingRAGAgentsLong contextVisionClassificationJSON / Tool use

Claude 3.5 Haiku is worth evaluating for coding, rag, and agents when its provider route and context window match the workload.

Use it for

Teams evaluating coding, rag, and agents
Workloads that can use a 200k context window
Buyers comparing 4 tracked provider routes

Do not use it for

Workloads where another current model has stronger sourced task evidence

Specifications

Family: Claude 3.5
Released: 2024-10-22
Context: 200k
Architecture: Decoder Only
Specialization: general
Openness: Proprietary
License: ProprietaryCommercial use: conditional
Weights: Not released
Code: Unknown
Training: Fine-tuned

Created by

Anthropic

Developing safe and ethical AI systems.

San Francisco, California, United States

Founded 2021

Website

Pricing

Output / 1M

$4.00

Input / 1M

$0.800

Cheapest of 6 routes · Anthropic · cache read $0.080

Providers(6)

Anthropic OpenRouter AWS Bedrock GCP Vertex AI Replicate API Vercel AI Gateway

View 6 provider routes

About

Claude 3.5 Haiku is Anthropic's latest AI model, known for its speed and efficiency while maintaining high intelligence. It is optimized for applications needing rapid response, like interactive chatbots and real-time content moderation. Initially text-only, future plans include image input capabilities. It excels in delivering fast, accurate code suggestions, processing and categorizing information swiftly, and handling large volumes of user interactions. Priced accessibly, it offers advanced coding, tool use, and reasoning abilities. Though initially surpassing Claude 3 Haiku in benchmarks, its pricing reflects its enhanced performance 123457.

Claude 3.5 Haiku is Anthropic's fastest and most cost-efficient model in the Claude 3.5 family, released on October 22, 2024. It supports a 200,000-token context window, processes both text and image inputs, and supports tool use, parallel function calls, and the Message Batches API. These features make it well-suited for high-throughput applications: interactive chatbots, real-time content moderation, automated code review, and large-scale data classification pipelines where latency and cost matter more than frontier accuracy.

On capability benchmarks, Claude 3.5 Haiku surpasses Claude 3 Opus on several evaluations including HumanEval (87.6%) while maintaining Haiku-tier response speed. It is particularly strong on coding, structured output, and agentic tool use with self-correction, outperforming Claude 3 Haiku substantially on instruction-following quality. The knowledge cutoff is July 2024.

Pricing as of late 2024 was $0.80 per million input tokens and $4.00 per million output tokens. Prompt caching reduces effective input cost by up to 90%; the Message Batches API offers an additional 50% reduction for non-latency-sensitive workloads. The model is available through Anthropic's direct API, AWS Bedrock, Google Vertex AI, Fireworks AI, OpenRouter, and the Vercel AI Gateway.

Claude 3.5 Haiku has a 200k-token context window.

Claude 3.5 Haiku input tokens at $0.8/1M, output at $4/1M.

Top use-case fit: coding, agents, and build tasks

Coding

Included by capability and metadata signals in the decision map.

RAG

Included by capability and metadata signals in the decision map.

Agents

Included by capability and metadata signals in the decision map.

Provider price ladder

Compare all 6

Compare API pricing across 4 providers for input and output tokens, batch, and cached reads when available.

Provider	Input / 1M	Output / 1M	Batch in / out	Cache	Route
Anthropic	$0.800	$4.00	$0.400 / $2.00	read $0.080 / 5m $1.00 / 1h $1.60	Serverless
AWS Bedrock	$0.800	$4.00	-	-	Serverless
GCP Vertex AI	$0.800	$4.00	-	-	Serverless
OpenRouter	$0.800	$4.00	-	-	Serverless

Available via routers & gateways(15)

LiteLLM

Gateway

Open-source Python SDK and proxy server that unifies 100+ LLM APIs behind a single OpenAI-compatible interface, with load balancing, cost tracking, and configurable failover.

Free OSSAnthropicGCP Vertex AI

OpenRouter

Hybrid

Unified hybrid gateway to 400+ models from 60+ providers via a single OpenAI-compatible API, with optional auto-routing that selects the best model per prompt.

PassthroughAnthropicGCP Vertex AI

Portkey

Gateway

Production AI gateway routing to 1,600+ LLMs with failover, load balancing, semantic caching, and guardrails; Apache 2.0 core is fully self-hostable with the complete feature set.

SubscriptionAnthropicGCP Vertex AI

AIRouter

Router

Commercial LLM router that analyzes incoming requests and routes to the optimal model for cost/quality/latency via a drop-in OpenAI-compatible API, with a privacy-preserving embedding mode that avoids sending prompt content.

Passthrough + feeAnthropicGCP Vertex AI

Amazon Bedrock Intelligent Prompt Routing

Router

AWS Bedrock's native intelligent prompt router that routes prompts between Anthropic Claude model tiers (Haiku/Sonnet) based on predicted task complexity, with no extra per-routing charge.

PassthroughAWS Bedrock

Helicone

Gateway

Observability-first AI gateway with routing, caching, rate limiting, and request tracing; Apache 2.0 open-source core with a managed hosted tier for logging and analytics.

SubscriptionAnthropicGCP Vertex AI