How to Choose an LLM for Your Project
There are hundreds of LLMs available today — ranging from free open-weight models you can self-host to frontier APIs costing tens of dollars per million tokens. Picking the wrong one can mean overpaying, hitting capability walls, or locking into a provider that doesn't fit your use case. This guide gives you a framework for choosing the right model in under 10 minutes.
Step 1: Start with your task
The single biggest driver of which model to pick is what you're trying to do. LLMs are not interchangeable — they have meaningfully different strengths.
| If your main task is… | Start here |
|---|---|
| Writing code, debugging, or agentic coding | Best LLMs for coding |
| Drafting or editing text | Best LLMs for writing |
| Long documents, RAG, or document QA | Best LLMs for RAG |
| Multi-step autonomous agents | Best AI agents |
| Image or video understanding | Best vision LLMs |
| Reasoning, math, or complex logic | Best reasoning LLMs |
| Tool use or structured JSON output | Best LLMs for function calling |
| Keeping costs minimal | Cheapest LLM APIs |
| Free usage, no API key | Best free LLMs |
| Running locally or self-hosting | Best open-weight LLMs |
| Enterprise deployment with SLAs | Best enterprise LLMs |
Step 2: Set your budget
LLM costs vary by three to four orders of magnitude. Here's a rough cost map:
| Tier | Cost range | What you get |
|---|---|---|
| Free | $0 | Limited rate limits; usually older or smaller models |
| Budget | $0.10–$1 per 1M input tokens | Strong open-weight models (Llama, Mistral, Qwen) |
| Mid-tier | $1–$5 per 1M input tokens | Frontier-adjacent models (GPT-4o, Claude Haiku, Gemini Flash) |
| Frontier | $5–$15+ per 1M input tokens | Top reasoning and coding models (GPT-5, Claude Opus, Gemini Pro) |
A quick cost estimate: if you're running 10,000 requests per day at an average of 2,000 tokens each (input + output), that's 20M tokens/day. At $1/1M tokens, that's $20/day. At $10/1M tokens (frontier), it's $200/day.
For prototyping, start with a mid-tier model and optimize later. For production at scale, run the numbers before committing.
→ See live pricing: LLM API pricing comparison
Step 3: Check the context window
The context window is how much text an LLM can read and remember in a single call — the sum of your input and its output. Everything outside the context window is invisible to the model.
| Context window | Good for |
|---|---|
| 8K–32K tokens (~6–24K words) | Short chat, quick Q&A, code snippets |
| 128K tokens (~100K words) | Medium documents, codebase analysis |
| 200K–1M tokens (~150K–750K words) | Full books, large codebases, long transcripts |
| 1M+ tokens | Entire repositories, very long multi-session tasks |
Rule of thumb: take the length of your longest expected input, double it (to account for output), and pick a model whose context window is larger than that number.
→ See models ranked by context window: Best long-context LLMs
Step 4: Pick a provider
The same model is often available from multiple providers at different prices, latencies, and reliability levels. For example, Anthropic's Claude models are available via:
- Anthropic direct — official API, all model tiers
- AWS Bedrock — enterprise SLA, VPC, IAM integration
- Google Vertex AI — Google Cloud ecosystem
- OpenRouter — unified multi-provider API, easier switching
When to go direct: if you need the latest model versions, the highest rate limits, or direct support.
When to use a router/aggregator: if you need enterprise compliance, multi-region deployment, fallback routing, or want to avoid vendor lock-in.
→ See all providers: LLM API providers
Step 5: Consider open-weight vs proprietary
Proprietary models (GPT-5, Claude, Gemini) are closed — you access them only via API, and the provider controls the weights, pricing, and availability.
Open-weight models (Llama, Mistral, Qwen, DeepSeek) have publicly available weights that you can download, self-host, and (often) fine-tune.
| Proprietary | Open-weight | |
|---|---|---|
| Performance ceiling | Highest | Approaching proprietary at large sizes |
| Cost at scale | Pay-per-token | Fixed infra cost after setup |
| Data privacy | Depends on provider agreement | Full control if self-hosted |
| Customization | Limited (fine-tuning on some) | Full fine-tuning available |
| Setup complexity | Low (API key) | High (infra, GPU, serving stack) |
For most teams, start with a proprietary API. Move to open-weight if you need full data control, have very high token volumes, or have specific fine-tuning requirements.
Quick-pick decision table
| Situation | Recommended starting point |
|---|---|
| Just prototyping, cost doesn't matter yet | GPT-4o or Claude Sonnet (balanced quality + cost) |
| Production coding agent | See the coding podium |
| Long document analysis (>100K tokens) | See long-context picks |
| High-volume pipeline, cost-sensitive | DeepSeek V3 or Qwen via OpenRouter |
| Enterprise with AWS | Anthropic Claude on AWS Bedrock |
| Need to self-host | Meta Llama (latest) or Qwen |
| Free usage for testing | Gemini via Google AI Studio (free tier) or Llama via Groq |
What to do next
- Pick a task category from the table in Step 1 and see the current ranked shortlist.
- Run a quick cost estimate using the pricing table at /best/cheap.
- Check the context window for your longest expected input.
- Start with one model — you can switch later. The /compare tool lets you run head-to-head comparisons: compare any two models.
The model landscape changes fast. LLMReference tracks it daily so you don't have to.