Online Since 1995. Yep, that long.

Fun With Speech Recognition in WPF

At last week’s CapArea.NET meeting, I demonstrated using the built in speech recognition of Windows Vista with a demo compass application. [source code]

I spoke the direction I wanted the needle to point and the computer would recognize the command and point the arrow. After a few commands, the computer tells me to “stop bossing it around.”

arrow app north east by you.

It was simple but it illustrated several points. One, speech can add value to you applications. Two, it’s easy to add. Three, it’s free. 

Best of all, it’s fun.

First, you’ll need to add a reference to the System.Speech library. This is where all the speech recognition and speech synthesis classes live.

arrow references by you.

Once your project has the references, add the following using statements to your code behind.

   1: using System.Speech.Recognition;
   2: using System.Speech.Synthesis;
The Recognition namespace contains all the code needed to recognize speech and the Synthesis namespace handles the code to turn text to speech. Input and output, respectively.
With all the references to the speech DLLs in place, we can now instantiate the speech related objects.
   1: this._speechSynthesizer = new SpeechSynthesizer();
   2: this._speechRecognizer = new SpeechRecognizer();
   4: this._speechRecognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(_speechRecognizer_SpeechRecognized);
   5: this._speechRecognizer.Enabled = true;
When speech gets recognized, the SpeechRecognizer fires an event, appropriately named “Speech Recognized.”
   1: private void _speechRecognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
   2: {
   3:     string directionResult = e.Result.Text;
   5:     // Set Window Title
   6:     this.Title = directionResult;
   8:     Storyboard directionStoryboard = this.Resources[directionResult] as Storyboard;
  10:     if (directionStoryboard != null)
  11:     {
  12:         directionStoryboard.Begin();
  13:     }
  14:     else
  15:     {
  16:         this.Title = "Not a storyboard";
  17:     }
  18: }

It’s in that event that we get passed the results of the recognition inside the SpeecRecognizedEventArgs and you’ll see I drop that into a string and set the Window’s Title property to display what the system interpreted the speech to be.

On line 8, I use the recognized string to get the appropriate Storyboard. I saved myself some time by cleverly naming them. ;)

   1: <Storyboard x:Key="South">
   2:     <DoubleAnimationUsingKeyFrames BeginTime="00:00:00" Storyboard.TargetName="pthArrow" Storyboard.TargetProperty="(UIElement.RenderTransform).(TransformGroup.Children)[2].(RotateTransform.Angle)">
   3:         <SplineDoubleKeyFrame KeyTime="00:00:00.7000000" Value="179.048"/>
   4:     </DoubleAnimationUsingKeyFrames>
   5: </Storyboard>
   6: <Storyboard x:Key="West">
   7:     <DoubleAnimationUsingKeyFrames BeginTime="00:00:00" Storyboard.TargetName="pthArrow" Storyboard.TargetProperty="(UIElement.RenderTransform).(TransformGroup.Children)[2].(RotateTransform.Angle)">
   8:         <SplineDoubleKeyFrame KeyTime="00:00:00.7000000" Value="-89.818"/>
   9:     </DoubleAnimationUsingKeyFrames>
  10: </Storyboard>

Don’t worry if the Storyboard syntax doesn’t make sense to you, I could talk about Silverlight and WPF animation all day, but here the focus is on Speech, not XAML.

When you run the application, you may get the Speech Setup Tutorial if you’ve never run speech recognition before.

You don’t have to run through the tutorial, but I recommend you do as it will demonstrate the power of the engine built right in to the OS.

The system also uses the tutorial to set up your microphone, adjust your settings and start learning your voice.

Once you get past the tutorial (it takes bout 10 minutes), you’ll notice the speech recognition tool bar on your desktop.

Stop! Grammar Time

In order to increase the reliability of the sample app, I added a grammar to limit the number of possibilities the speech recognizer had.

You want to do this to narrow down the potential results from millions of words to dozens. Narrowing the recognition pool increases the accuracy.

Grammars can get quite complex and there even is a W3C standard (SRGS) for defining them.

However, since we’re dealing with a compass, we really only need eight points: the four directions (North, West, South, East) and the four in between points.

   1: private Choices GetChoices()
   2: {
   3:     Choices choices = new Choices();
   5:     choices.Add("North");
   6:     choices.Add("West");
   7:     choices.Add("East");
   8:     choices.Add("South");
  10:     choices.Add("NorthWest");
  11:     choices.Add("SouthWest");
  12:     choices.Add("NorthEast");
  13:     choices.Add("SouthEast");
  15:     return choices;
  16: }

I use the following code to load the grammar into my recognizer.

   1: Choices choices = GetChoices();
   3: GrammarBuilder grammarBuilder = new GrammarBuilder(choices);
   4: Grammar grammarDirections = new Grammar(grammarBuilder);
   6: this._speechRecognizer.LoadGrammar(grammarDirections);

Talk to Me

The code to make the computer speak is actually much easier.

In fact, it can come down to one line of code (two if you count the call to the constructor):

this._speechSynthesizer.Speak("Stop bossing me around!");

I wrote a blog post a little while back just on speech synthesis and it’s own demo app.

Now, you know that it’s actually quite easy to add a little bit of NUI (Natural User Interface) to your applications.



Add a Comment