How to Choose an LLM for Your Project

There are hundreds of LLMs available today — ranging from free open-weight models you can self-host to frontier APIs costing tens of dollars per million tokens. Picking the wrong one can mean overpaying, hitting capability walls, or locking into a provider that doesn't fit your use case. This guide gives you a framework for choosing the right model in under 10 minutes.

Step 1: Start with your task

The single biggest driver of which model to pick is what you're trying to do. LLMs are not interchangeable — they have meaningfully different strengths.

If your main task is…	Start here
Writing code, debugging, or agentic coding	Best LLMs for coding
Drafting or editing text	Best LLMs for writing
Long documents, RAG, or document QA	Best LLMs for RAG
Multi-step autonomous agents	Best AI agents
Image or video understanding	Best vision LLMs
Reasoning, math, or complex logic	Best reasoning LLMs
Tool use or structured JSON output	Best LLMs for function calling
Keeping costs minimal	Cheapest LLM APIs
Free usage, no API key	Best free LLMs
Running locally or self-hosting	Best open-weight LLMs
Enterprise deployment with SLAs	Best enterprise LLMs

Step 2: Set your budget

LLM costs vary by three to four orders of magnitude. Here's a rough cost map:

Tier	Cost range	What you get
Free	$0	Limited rate limits; usually older or smaller models
Budget	$0.10–$1 per 1M input tokens	Strong open-weight models (Llama, Mistral, Qwen)
Mid-tier	$1–$5 per 1M input tokens	Frontier-adjacent models (GPT-4o, Claude Haiku, Gemini Flash)
Frontier	$5–$15+ per 1M input tokens	Top reasoning and coding models (GPT-5, Claude Opus, Gemini Pro)

A quick cost estimate: if you're running 10,000 requests per day at an average of 2,000 tokens each (input + output), that's 20M tokens/day. At $1/1M tokens, that's $20/day. At $10/1M tokens (frontier), it's $200/day.

For prototyping, start with a mid-tier model and optimize later. For production at scale, run the numbers before committing.

→ See live pricing: LLM API pricing comparison

Step 3: Check the context window

The context window is how much text an LLM can read and remember in a single call — the sum of your input and its output. Everything outside the context window is invisible to the model.

Context window	Good for
8K–32K tokens (~6–24K words)	Short chat, quick Q&A, code snippets
128K tokens (~100K words)	Medium documents, codebase analysis
200K–1M tokens (~150K–750K words)	Full books, large codebases, long transcripts
1M+ tokens	Entire repositories, very long multi-session tasks

Rule of thumb: take the length of your longest expected input, double it (to account for output), and pick a model whose context window is larger than that number.

→ See models ranked by context window: Best long-context LLMs

Step 4: Pick a provider

The same model is often available from multiple providers at different prices, latencies, and reliability levels. For example, Anthropic's Claude models are available via:

Anthropic direct — official API, all model tiers
AWS Bedrock — enterprise SLA, VPC, IAM integration
Google Vertex AI — Google Cloud ecosystem
OpenRouter — unified multi-provider API, easier switching

When to go direct: if you need the latest model versions, the highest rate limits, or direct support.

When to use a router/aggregator: if you need enterprise compliance, multi-region deployment, fallback routing, or want to avoid vendor lock-in.

→ See all providers: LLM API providers

Step 5: Consider open-weight vs proprietary

Proprietary models (GPT-5, Claude, Gemini) are closed — you access them only via API, and the provider controls the weights, pricing, and availability.

Open-weight models (Llama, Mistral, Qwen, DeepSeek) have publicly available weights that you can download, self-host, and (often) fine-tune.

	Proprietary	Open-weight
Performance ceiling	Highest	Approaching proprietary at large sizes
Cost at scale	Pay-per-token	Fixed infra cost after setup
Data privacy	Depends on provider agreement	Full control if self-hosted
Customization	Limited (fine-tuning on some)	Full fine-tuning available
Setup complexity	Low (API key)	High (infra, GPU, serving stack)

For most teams, start with a proprietary API. Move to open-weight if you need full data control, have very high token volumes, or have specific fine-tuning requirements.

→ Best open-weight LLMs

Quick-pick decision table

Situation	Recommended starting point
Just prototyping, cost doesn't matter yet	GPT-4o or Claude Sonnet (balanced quality + cost)
Production coding agent	See the coding podium
Long document analysis (>100K tokens)	See long-context picks
High-volume pipeline, cost-sensitive	DeepSeek V3 or Qwen via OpenRouter
Enterprise with AWS	Anthropic Claude on AWS Bedrock
Need to self-host	Meta Llama (latest) or Qwen
Free usage for testing	Gemini via Google AI Studio (free tier) or Llama via Groq

What to do next

Pick a task category from the table in Step 1 and see the current ranked shortlist.
Run a quick cost estimate using the pricing table at /best/cheap.
Check the context window for your longest expected input.
Start with one model — you can switch later. The /compare tool lets you run head-to-head comparisons: compare any two models.

The model landscape changes fast. LLMReference tracks it daily so you don't have to.

Previous: What Is an LLM?Next: LLM API Pricing Explained