Azure Cognitive Services

  • Bing Speech
  • Translator Speech
  • Custom Speech
  • Speaker Recognition
  • Speech

are cognitive speech services; available @ the Azure Cloud. This article points out the difference between each service to provide a knowledge base for further decision making. The provided code are just examples, not using the full power of each API. In addition, not only C#; but also other languages are supported as well.

Quick Reference

Without reading the whole article, the quick reference points out when to use which service.

Bing Speech offers the possibility to translate text into speech and the other way around. No customization, simple API and straight forward.

Translator Speech simply translates text from one language to another. (Language detection included)

Custom Speech allows (as Bing Speech) to translate text into speech and the other way around. The difference: Custom Speech offers the possibility to include domain driven terms. (Product names, dialects,…)

Speaker Recognition is able to recognize and identify the speaking person.

Speech combines several services. Bing Speech, Translator Speech, Custom Speech and Custom Voice.

If you seek for more information, continue reading 🙂

Bing Speech

Bing Speech provides the functionality to enable speech to text; or the other way around; text to speech.


  • Convert voice to text
  • Convert text to voice

Depending on the API usage (REST or Client Libraries) the functionality differs.

Use case REST API Client Library
Convert audio files with a length of max. 15 sec Yes Yes
Convert audio files longer than 15 sec No Yes
Stream intermediate results No Yes
LUIS integration No Yes


REST API Example

C# Desktop API Example

Translator Speech

Translator Speech is a text to text translation service.


  • Uses a Neural Machine Translation mechanism to improve the naturalness and the quality of a translation. Thus, delivers better results than a Statistical Machine Translation.
  • Language Detection
  • Offering alternative solutions/translations


Custom Speech (Preview)

Custom Speech uses customized speech recognition to fit every scenario. E.g.: If you plan to use product names, acronyms or any other unusual voice inputs.


  • Apply domain driven terms
  • Filter background noises
  • Improve over time by analysing samples

Speaker Recognition

Speaker recognition is a speaker identification service, thus able to identify an individual person just by its voice-input.



Speech service combines a variety of services.
Bing-Speech, Speech Translator, Custom Speech and Custom Voice.


Voice Recognition

  • Translates voice into text (near realtime)
  • Translates parts of an audio file into text
  • Configurations for specific use cases. Such as: Dictating, Conversations
  • Recognizes the end of a conversation, blurs out offensive input, applies formatting
  • Is able to work hand in hand with LUIS.

Voice Analysis

  • Converts text into a natural sounding voice
  • Offers different voice agents (Gender, dialects)
  • Supports SSML

Speech isn’t only able to receive and process audio, but also to return either text or voice.




Use case Speech Translator Speech Bing Speech Custom Speech Speaker Recognition
LUIS integration
Language Detection
Domain driven voice input
User identification




Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahre mehr darüber, wie deine Kommentardaten verarbeitet werden.