LLM Reference

Google Cloud Speech-to-Text Models by Google

GoogleProprietaryProprietaryAudio
1 model2023

Details

ResearcherGoogle
LicenseProprietary
Commercial useCommercial use with conditions
Models1
Released2023

Capabilities

MultimodalAll models

Links

Website

About

Google Cloud hosted automatic speech recognition model family.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

1 in view

Use when the workload needs speech recognition, multimodal inputs, and audio.

2023-01speech recognitionmultimodal inputsaudio

Release Timeline

1 release group
2023-01
1 current
Google Cloud Speech-to-Text
speech recognitionmultimodal inputsaudio
Current

Specifications(1 models)

Google Cloud Speech-to-Text model specifications comparison
ModelReleasedMultimodal
Google Cloud Speech-to-Text2023-01Yes

Frequently Asked Questions

What is Google Cloud Speech-to-Text used for?
Google Cloud Speech-to-Text is used for audio, speech recognition, and vision and multimodal work. The family description and listed model capabilities point to those workloads as the best fit.
How does Google Cloud Speech-to-Text compare to OpenAI Whisper?
Google Cloud Speech-to-Text by Google is strongest where you need audio, while OpenAI Whisper by OpenAI is the closest related family to check for audio. Google Cloud Speech-to-Text has 1 listed variant, so compare the specs and pricing tables before choosing a production model.
Which Google Cloud Speech-to-Text model should I use?
If price is the main constraint, use the pricing table first because Google Cloud Speech-to-Text does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Google Cloud Speech-to-Text with multimodal inputs.