Question 1

What is Azure Speech Services used for?

Accepted Answer

Azure Speech Services is used for audio, text to speech, and speech recognition. The family description and listed model capabilities point to those workloads as the best fit.

Question 2

How does Azure Speech Services compare to Harrier?

Accepted Answer

Azure Speech Services by Microsoft Research is strongest where you need audio, while Harrier by Microsoft Research is the closest related family to check for embedding. Azure Speech Services has 2 listed variants, while Harrier reaches up to 33k context, so compare the specs and pricing tables before choosing a production model.

Question 3

Which Azure Speech Services model should I use?

Accepted Answer

If price is the main constraint, use the pricing table first because Azure Speech Services does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Azure Speech Services (STT) with multimodal inputs.

Model	Use when	Released	Signals	Status
Azure Speech Services (TTS)	Use when the workload needs text to speech and audio.	2023-01	text to speechaudio	Current
Azure Speech Services (STT)	Use when the workload needs speech recognition, multimodal inputs, and audio.	2023-01	speech recognitionmultimodal inputsaudio	Current

Model	Released	Multimodal
Azure Speech Services (TTS)	2023-01	No
Azure Speech Services (STT)	2023-01	Yes

Azure Speech Services Models by Microsoft Research

Details

Capabilities

Links

About

Current Variants

Release Timeline

Specifications(2 models)

Frequently Asked Questions

Models(2)