LLM Reference

Best LLMs for Writing (2026)

Last refreshed 2026-06-04. Next refresh: weekly.

Top language models for long-form writing, essays, and creative prose. Ranked by Chatbot Arena human-preference scores with MMLU as a fallback.

Looking for ad copy, email, social posts, localization, or brand voice? Use the marketing use-case matrix instead of treating general prose quality as the whole decision.

Verdict

Use Claude Opus 4.7 for long-form writing today.

Claude Opus 4.6 is the runner-up: 1503 vs 1501 on Arena.

Researched 10d agoWhy this pickMethodology

How we rank

Writing picks are for essays, drafts, and long-form prose. The target rubric is EQ-Bench Creative Writing v3 when coverage is available; the live fallback is Chatbot Arena, then MMLU.

  1. EligibilityGeneral chat models excluding pure code/embedding SKUs.
  2. Target rankingEQ-Bench Creative Writing v3 becomes the primary signal once it exists in seed data for enough current models.
  3. Current live rankingUntil that coverage lands, the page keeps the current Chatbot Arena preference score fallback, then MMLU, then newer release.
  4. Variant collapseWe keep one row per model family (`familySlug` + parameter tier). When headline scores tie within ±0.5 pt (±10 Elo on Chatbot Arena), we pick the canonical SKU by lowest tracked input price, then GA over preview or limited access, then newest `release`. A folded sibling within the benchmark noise band can show a "Tied within margin" chip on that score cell.
  5. Why this ranking?Arena is an imperfect but useful human-preference proxy for prose quality; MMLU is only a general capability floor, not a style or brand-voice metric.
  6. Marketing boundaryUse `/best/marketing` for ad copy, email, social posts, localization, and brand-voice content; this page stays focused on general writing.
#ModelInput $/1MOutput $/1M
1Claude Opus 4.7
ReasoningVisionTools

Arena: 1503

$5.00$25.00
2Claude Opus 4.6
ReasoningVisionTools

Arena: 1501

$5.00$25.00
3Gemini 3.1 Pro Preview
PreviewVisionTools

Arena: 1493

$2.00$12.00
4Muse Spark
ReasoningVisionTools

Arena: 1491

5GPT-5.5
ReasoningVisionTools

Arena: 1488

$5.00$30.00
6Gemini 3 Pro
VisionTools

Arena: 1486

$1.25$5.00
7GPT-5.4
ReasoningTools

Arena: 1479

$2.50$15.00
8ERNIE 5.1
Tools

Arena: 1476

$0.59$2.65
9Qwen3.7-Max
ReasoningTools

Arena: 1475

$1.25$3.75
10GLM-5.1
ReasoningTools

Arena: 1472

$0.98$3.08
11Gemini 3 Flash
PreviewVisionTools

Arena: 1467

$0.50$3.00
12Claude Opus 4.5
ReasoningVisionTools

Arena: 1466

$5.00$25.00
13Grok 4.1
ReasoningTools

Arena: 1464

14DeepSeek V4 Pro
ReasoningTools

Arena: 1460

$0.43$0.87
15Claude Sonnet 4.6
ReasoningVisionTools

Arena: 1459

$3.00$15.00
16Gemini 3.1 Flash-Lite
VisionTools

Arena: 1432

$0.25$1.50
17o3
ReasoningVisionTools

Arena: 1412

$2.00$8.00
18Gemini 2.5 Pro
ReasoningVisionTools

Arena: 1398

$1.25$10.00
19DeepSeek R1
Reasoning

Arena: 1372

$0.10$0.30
20Llama 4 Maverick 17B Instruct

Arena: 1365

$0.24$0.97

Honorable mentions

Next seats in this ranking. Lines below are from each model's stored description in LLMReference seed data—spot-check the model page before relying on a capability claim.

  • #4Gemini 3 Pro

    Google DeepMind's most advanced reasoning Gemini model. Part of the Gemini 3 series with frontier-class intelligence, multimodal understanding, and 1M token context window.

    1486

    Arena

  • GPT-5.4 is OpenAI's flagship frontier reasoning model, released March 5, 2026. It incorporates advances from GPT-5.3-Codex for coding and agentic workflows, and adds 'Thinking' mode with editable reasoning plans. Key capabilities include computer use (navigating interfaces via Playwright), image understanding and generation integration, full-stack web app generation, tool calling, and deep research. Knowledge cutoff is August 31, 2025. Model ID: gpt-5.4.

    1479

    Arena

  • ERNIE 5.1 is Baidu's fifth-generation flagship language model, officially released May 9, 2026. Achieved via disaggregated fully-asynchronous reinforcement learning and scaled agentic post-training, it delivers leading performance at approximately 6% of the pre-training compute cost of comparable models — with roughly one-third the total parameters and half the active parameters of ERNIE 5.0. ERNIE 5.1 ranks #4 globally and #1 among Chinese models on the LMArena Search leaderboard (score: 1,223), with standout performance in legal reasoning, mathematics (AIME26: 99.6), and business domains. API model ID: ernie-5.1. Context: 128K tokens; max output: 65,536 tokens.

    1476

    Arena