Azure Speech Services Models by Microsoft Research
2 models2023
Details
ResearcherMicrosoft Research
LicenseProprietary
Commercial useCommercial use with conditions
Models2
Released2023
Capabilities
Multimodal1 of 2 models
Links
WebsiteAbout
Microsoft Azure Speech Services model family for speech-to-text and text-to-speech APIs.
Current Variants
Use-when guidance is derived from seed capabilities, context, release, and replacement fields.
2 in view
Azure Speech Services (TTS)Current
Use when the workload needs text to speech and audio.
2023-01text to speechaudio
Azure Speech Services (STT)Current
Use when the workload needs speech recognition, multimodal inputs, and audio.
2023-01speech recognitionmultimodal inputsaudio
| Model | Use when | Released | Signals | Status |
|---|---|---|---|---|
| Azure Speech Services (TTS) | Use when the workload needs text to speech and audio. | 2023-01 | text to speechaudio | Current |
| Azure Speech Services (STT) | Use when the workload needs speech recognition, multimodal inputs, and audio. | 2023-01 | speech recognitionmultimodal inputsaudio | Current |
Release Timeline
1 release group2023-01
2 current
Azure Speech Services (STT)
Currentspeech recognitionmultimodal inputsaudio
Azure Speech Services (TTS)
Currenttext to speechaudio
Specifications(2 models)
| Model | Released | Multimodal |
|---|---|---|
| Azure Speech Services (TTS) | 2023-01 | No |
| Azure Speech Services (STT) | 2023-01 | Yes |
Frequently Asked Questions
- What is Azure Speech Services used for?
- Azure Speech Services is used for audio, text to speech, and speech recognition. The family description and listed model capabilities point to those workloads as the best fit.
- How does Azure Speech Services compare to Harrier?
- Azure Speech Services by Microsoft Research is strongest where you need audio, while Harrier by Microsoft Research is the closest related family to check for embedding. Azure Speech Services has 2 listed variants, while Harrier reaches up to 33k context, so compare the specs and pricing tables before choosing a production model.
- Which Azure Speech Services model should I use?
- If price is the main constraint, use the pricing table first because Azure Speech Services does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Azure Speech Services (STT) with multimodal inputs.
