LLM Reference

OpenAI Whisper Models by OpenAI

OpenAIProprietaryProprietaryAudio
4 models2022–2024

Details

ResearcherOpenAI
LicenseProprietary
Commercial useCommercial use with conditions
Models4
Released2022–2024

Capabilities

MultimodalAll models

Links

Website

About

OpenAI's Whisper family provides multilingual automatic speech recognition and translation models exposed through the OpenAI Audio API.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

4 in view

Use when the workload needs speech recognition, multimodal inputs, and audio.

2024-01speech recognitionmultimodal inputsaudio

Use when the workload needs speech recognition, 74M parameters, and multimodal inputs.

2022-12speech recognition74M parametersmultimodal inputs

Use when the workload needs speech recognition, 39M parameters, and multimodal inputs.

2022-12speech recognition39M parametersmultimodal inputs
WhisperCurrent

Use when the workload needs speech recognition, multimodal inputs, and audio.

2022-09speech recognitionmultimodal inputsaudio

Release Timeline

3 release groups
2024-01
1 current
Whisper large-v3-turbo
speech recognitionmultimodal inputsaudio
Current
2022-12
2 current
Whisper Base
speech recognition74M parametersmultimodal inputs
Current
Whisper Tiny EN
speech recognition39M parametersmultimodal inputs
Current
2022-09
1 current
Whisper
speech recognitionmultimodal inputsaudio
Current

Specifications(4 models)

OpenAI Whisper model specifications comparison
ModelReleasedParametersMultimodal
Whisper large-v3-turbo2024-01Yes
Whisper Base2022-1274MYes
Whisper Tiny EN2022-1239MYes
Whisper2022-09Yes

Frequently Asked Questions

What is OpenAI Whisper used for?
OpenAI Whisper is used for audio, speech recognition, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
How does OpenAI Whisper compare to GPT Realtime 2?
OpenAI Whisper by OpenAI is strongest where you need audio, while GPT Realtime 2 by OpenAI is the closest related family to check for realtime voice. OpenAI Whisper has 4 listed variants, while GPT Realtime 2 reaches up to 131k context, so compare the specs and pricing tables before choosing a production model.
Which OpenAI Whisper model should I use?
If price is the main constraint, use the pricing table first because OpenAI Whisper does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Whisper large-v3-turbo with multimodal inputs.