LLM Reference

Azure Speech Services Models by Microsoft Research

Microsoft ResearchProprietaryProprietaryAudio
2 models2023

Details

LicenseProprietary
Commercial useCommercial use with conditions
Models2
Released2023

Capabilities

Multimodal1 of 2 models

Links

Website

About

Microsoft Azure Speech Services model family for speech-to-text and text-to-speech APIs.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

2 in view

Use when the workload needs text to speech and audio.

2023-01text to speechaudio

Use when the workload needs speech recognition, multimodal inputs, and audio.

2023-01speech recognitionmultimodal inputsaudio

Release Timeline

1 release group
2023-01
2 current
Azure Speech Services (STT)
speech recognitionmultimodal inputsaudio
Current
Azure Speech Services (TTS)
text to speechaudio
Current

Specifications(2 models)

Azure Speech Services model specifications comparison
ModelReleasedMultimodal
Azure Speech Services (TTS)2023-01No
Azure Speech Services (STT)2023-01Yes

Frequently Asked Questions

What is Azure Speech Services used for?
Azure Speech Services is used for audio, text to speech, and speech recognition. The family description and listed model capabilities point to those workloads as the best fit.
How does Azure Speech Services compare to Harrier?
Azure Speech Services by Microsoft Research is strongest where you need audio, while Harrier by Microsoft Research is the closest related family to check for embedding. Azure Speech Services has 2 listed variants, while Harrier reaches up to 33k context, so compare the specs and pricing tables before choosing a production model.
Which Azure Speech Services model should I use?
If price is the main constraint, use the pricing table first because Azure Speech Services does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Azure Speech Services (STT) with multimodal inputs.