The transcription leaderboard · for creatives

Best for transcription

4 editor picks · 22 eligible models · Speech to text, low WER on the messy stuff.

Editorial pick plus benchmark and API pricing context.

EDITOR'S CHOICEResearched 29d ago

Whisper large-v3-turbo

OpenAI

Excellent

Low WER on the messy, real-world audio.

Lowest WER on noisy real-world audio with the broadest language coverage; cheap to self-host.

The numbers

Pricing

—

see model page

Context

—

stt

Pros

Cons

Also worth picking

ranked by editorial pick orderEditorial tiersExcellentStrongSolid

#ModelTierEditor's note

Deepgram

Cheapest credible STT with the fastest streaming latency in production (Deepgram).

Deepgram

Cheapest credible STT with the fastest streaming latency in production (Deepgram).

Best speaker diarization and PII redaction out of the box.

Best speaker diarization and PII redaction out of the box.

Deepgram

Deepgram's conversational ASR tuned for low-latency voice agents.

Deepgram

Deepgram's conversational ASR tuned for low-latency voice agents.

Eligibility

Eligibility means tagged with useCases: [stt]. Pins must come from this pool.