Azure Cognitive Services

  • Bing Speech
  • Translator Speech
  • Custom Speech
  • Speaker Recognition
  • Speech

are cognitive speech services; available @ the Azure Cloud. This article points out the difference between each service to provide a knowledge base for further decision making. The provided code are just examples, not using the full power of each API. In addition, not only C#; but also other languages are supported as well.

Quick Reference

Without reading the whole article, the quick reference points out when to use which service.

Bing Speech offers the possibility to translate text into speech and the other way around. No customization, simple API and straight forward.

Translator Speech simply translates text from one language to another. (Language detection included)

Custom Speech allows (as Bing Speech) to translate text into speech and the other way around. The difference: Custom Speech offers the possibility to include domain driven terms. (Product names, dialects,…)

Speaker Recognition is able to recognize and identify the speaking person.

Speech combines several services. Bing Speech, Translator Speech, Custom Speech and Custom Voice.

If you seek for more information, continue reading 🙂

Bing Speech

Bing Speech provides the functionality to enable speech to text; or the other way around; text to speech.


  • Convert voice to text
  • Convert text to voice

Depending on the API usage (REST or Client Libraries) the functionality differs.

Use case REST API Client Library
Convert audio files with a length of max. 15 sec Yes Yes
Convert audio files longer than 15 sec No Yes
Stream intermediate results No Yes
LUIS integration No Yes


REST API Example

C# Desktop API Example

Translator Speech

Translator Speech is a text to text translation service.


  • Uses a Neural Machine Translation mechanism to improve the naturalness and the quality of a translation. Thus, delivers better results than a Statistical Machine Translation.
  • Language Detection
  • Offering alternative solutions/translations


  class Program
        static string host = "";
        static string path = "/translate?api-version=3.0";
        // Translate to German and Italian.
        static string params_ = "&to=de&to=it";

        static string uri = host + path + params_;

        // NOTE: Replace this example key with a valid subscription key.
        static string key = "ENTER KEY HERE";

        static string text = "Hello world!";

        async static void Translate()
            System.Object[] body = new System.Object[] { new { Text = text } };
            var requestBody = JsonConvert.SerializeObject(body);

            using (var client = new HttpClient())
            using (var request = new HttpRequestMessage())
                request.Method = HttpMethod.Post;
                request.RequestUri = new Uri(uri);
                request.Content = new StringContent(requestBody, Encoding.UTF8, "application/json");
                request.Headers.Add("Ocp-Apim-Subscription-Key", key);

                var response = await client.SendAsync(request);
                var responseBody = await response.Content.ReadAsStringAsync();
                var result = JsonConvert.SerializeObject(JsonConvert.DeserializeObject(responseBody), Formatting.Indented);

                Console.OutputEncoding = UnicodeEncoding.UTF8;

        static void Main(string[] args)

Custom Speech (Preview)

Custom Speech uses customized speech recognition to fit every scenario. E.g.: If you plan to use product names, acronyms or any other unusual voice inputs.


  • Apply domain driven terms
  • Filter background noises
  • Improve over time by analysing samples

Speaker Recognition

Speaker recognition is a speaker identification service, thus able to identify an individual person just by its voice-input.

static class Program
        static void Main()
            Console.WriteLine("Hit ENTER to exit...");
        static async void MakeRequest()
            var client = new HttpClient();
            var queryString = HttpUtility.ParseQueryString(string.Empty);

            // Request headers
            client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "{subscription key}");

            // Request parameters
            queryString["shortAudio"] = "{boolean}";
            var uri = "{identificationProfileIds}&" + queryString;

            HttpResponseMessage response;

            // Request body
            byte[] byteData = Encoding.UTF8.GetBytes("{body}");

            using (var content = new ByteArrayContent(byteData))
               content.Headers.ContentType = new MediaTypeHeaderValue("< your content type, i.e. application/json >");
               response = await client.PostAsync(uri, content);




Speech service combines a variety of services.
Bing-Speech, Speech Translator, Custom Speech and Custom Voice.


Voice Recognition

  • Translates voice into text (near realtime)
  • Translates parts of an audio file into text
  • Configurations for specific use cases. Such as: Dictating, Conversations
  • Recognizes the end of a conversation, blurs out offensive input, applies formatting
  • Is able to work hand in hand with LUIS.

Voice Analysis

  • Converts text into a natural sounding voice
  • Offers different voice agents (Gender, dialects)
  • Supports SSML

Speech isn’t only able to receive and process audio, but also to return either text or voice.


 class Program
        public static async Task RecognizeSpeechAsync()
            // Creates an instance of a speech factory with specified
            // subscription key and service region. Replace with your own subscription key
            // and service region (e.g., "westus").
            var factory = SpeechFactory.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

            // Creates a speech recognizer.
            using (var recognizer = factory.CreateSpeechRecognizer())
                Console.WriteLine("Say something...");

                // Performs recognition.
                // RecognizeAsync() returns when the first utterance has been recognized, so it is suitable 
                // only for single shot recognition like command or query. For long-running recognition, use
                // StartContinuousRecognitionAsync() instead.
                var result = await recognizer.RecognizeAsync();

                // Checks result.
                if (result.RecognitionStatus != RecognitionStatus.Recognized)
                    Console.WriteLine($"Recognition status: {result.RecognitionStatus.ToString()}");
                    if (result.RecognitionStatus == RecognitionStatus.Canceled)
                        Console.WriteLine($"There was an error, reason: {result.RecognitionFailureReason}");
                        Console.WriteLine("No speech could be recognized.\n");
                    Console.WriteLine($"We recognized: {result.Text}");

        static void Main()
            Console.WriteLine("Please press a key to continue.");



Use case Speech Translator Speech Bing Speech Custom Speech Speaker Recognition
LUIS integration
Language Detection
Domain driven voice input
User identification




0 0 votes
Article Rating
Notify of
Inline Feedbacks
View all comments