LLM Reference

How to Choose an LLM for Your Project

There are hundreds of LLMs available today — ranging from free open-weight models you can self-host to frontier APIs costing tens of dollars per million tokens. Picking the wrong one can mean overpaying, hitting capability walls, or locking into a provider that doesn't fit your use case. This guide gives you a framework for choosing the right model in under 10 minutes.

Step 1: Start with your task

The single biggest driver of which model to pick is what you're trying to do. LLMs are not interchangeable — they have meaningfully different strengths.

If your main task is…Start here
Writing code, debugging, or agentic codingBest LLMs for coding
Drafting or editing textBest LLMs for writing
Long documents, RAG, or document QABest LLMs for RAG
Multi-step autonomous agentsBest AI agents
Image or video understandingBest vision LLMs
Reasoning, math, or complex logicBest reasoning LLMs
Tool use or structured JSON outputBest LLMs for function calling
Keeping costs minimalCheapest LLM APIs
Free usage, no API keyBest free LLMs
Running locally or self-hostingBest open-weight LLMs
Enterprise deployment with SLAsBest enterprise LLMs

Step 2: Set your budget

LLM costs vary by three to four orders of magnitude. Here's a rough cost map:

TierCost rangeWhat you get
Free$0Limited rate limits; usually older or smaller models
Budget$0.10–$1 per 1M input tokensStrong open-weight models (Llama, Mistral, Qwen)
Mid-tier$1–$5 per 1M input tokensFrontier-adjacent models (GPT-4o, Claude Haiku, Gemini Flash)
Frontier$5–$15+ per 1M input tokensTop reasoning and coding models (GPT-5, Claude Opus, Gemini Pro)

A quick cost estimate: if you're running 10,000 requests per day at an average of 2,000 tokens each (input + output), that's 20M tokens/day. At $1/1M tokens, that's $20/day. At $10/1M tokens (frontier), it's $200/day.

For prototyping, start with a mid-tier model and optimize later. For production at scale, run the numbers before committing.

→ See live pricing: LLM API pricing comparison

Step 3: Check the context window

The context window is how much text an LLM can read and remember in a single call — the sum of your input and its output. Everything outside the context window is invisible to the model.

Context windowGood for
8K–32K tokens (~6–24K words)Short chat, quick Q&A, code snippets
128K tokens (~100K words)Medium documents, codebase analysis
200K–1M tokens (~150K–750K words)Full books, large codebases, long transcripts
1M+ tokensEntire repositories, very long multi-session tasks

Rule of thumb: take the length of your longest expected input, double it (to account for output), and pick a model whose context window is larger than that number.

→ See models ranked by context window: Best long-context LLMs

Step 4: Pick a provider

The same model is often available from multiple providers at different prices, latencies, and reliability levels. For example, Anthropic's Claude models are available via:

  • Anthropic direct — official API, all model tiers
  • AWS Bedrock — enterprise SLA, VPC, IAM integration
  • Google Vertex AI — Google Cloud ecosystem
  • OpenRouter — unified multi-provider API, easier switching

When to go direct: if you need the latest model versions, the highest rate limits, or direct support.

When to use a router/aggregator: if you need enterprise compliance, multi-region deployment, fallback routing, or want to avoid vendor lock-in.

→ See all providers: LLM API providers

Step 5: Consider open-weight vs proprietary

Proprietary models (GPT-5, Claude, Gemini) are closed — you access them only via API, and the provider controls the weights, pricing, and availability.

Open-weight models (Llama, Mistral, Qwen, DeepSeek) have publicly available weights that you can download, self-host, and (often) fine-tune.

ProprietaryOpen-weight
Performance ceilingHighestApproaching proprietary at large sizes
Cost at scalePay-per-tokenFixed infra cost after setup
Data privacyDepends on provider agreementFull control if self-hosted
CustomizationLimited (fine-tuning on some)Full fine-tuning available
Setup complexityLow (API key)High (infra, GPU, serving stack)

For most teams, start with a proprietary API. Move to open-weight if you need full data control, have very high token volumes, or have specific fine-tuning requirements.

Best open-weight LLMs

Quick-pick decision table

SituationRecommended starting point
Just prototyping, cost doesn't matter yetGPT-4o or Claude Sonnet (balanced quality + cost)
Production coding agentSee the coding podium
Long document analysis (>100K tokens)See long-context picks
High-volume pipeline, cost-sensitiveDeepSeek V3 or Qwen via OpenRouter
Enterprise with AWSAnthropic Claude on AWS Bedrock
Need to self-hostMeta Llama (latest) or Qwen
Free usage for testingGemini via Google AI Studio (free tier) or Llama via Groq

What to do next

  1. Pick a task category from the table in Step 1 and see the current ranked shortlist.
  2. Run a quick cost estimate using the pricing table at /best/cheap.
  3. Check the context window for your longest expected input.
  4. Start with one model — you can switch later. The /compare tool lets you run head-to-head comparisons: compare any two models.

The model landscape changes fast. LLMReference tracks it daily so you don't have to.