Sunday, February 19, 2012

Voice Recognition for FM Repeaters

Last year Google pushed version 11 of their Chrome browser, and along with it, one really interesting new feature- support for the HTML5 speech input API.

This means that you'll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.

If you dig around in the source code, you learn how the speech recognition is implemented:

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

Audio is collected from the mic encoded in FLAC format, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results.

Some Asterisk Telephony enthusiasts have been monkeying with this Google Speech API. This is how I first learned of it.

http://zaf.github.com/asterisk-speech-recog/

Interacting with a repeater thus far has been limited to DTMF to query the time, etc.

This API opens a whole new world to craft your own Siri like repeater system. Just set up a series of IF statements to grep/match the text returned.

You ask "What time is it?" It sees "time" and does a time lookup and speaks it back.
You ask "where is KB8ZXE?" it sees where and KB8ZXE and passes a query to APRS.fi and reports back that he was last 2.1 miles NorthEast of Green Bay".... etc

I've been experimenting with this on IRLP node/ repeater (147.075 MHz) here in Green Bay. It's really quite trivial to implement. I bet we are the first ham radio repeater to implement voice recognition.



Here is all you really need to get started:
 #!/bin/sh  
 echo "1 SoX Sound Exchange - Convert WAV to FLAC with 16000"   
 sox message.wav message.flac rate 16k  
 echo "2 Submit to Google Voice Recognition"  
 wget -q -U "Mozilla/5.0" --post-file message.flac --header="Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-US&client=chromium" > message.ret   
 echo "3 SED Extract recognized text"   
 cat message.ret | sed 's/.*utterance":"//' | sed 's/","confidence.*//' > message.txt  
 echo "4 Remove Temporary Files"  
 rm message.flac  
 #rm message.ret  
 echo "5 Show Text "  
 cat message.txt  

{edit 2014}
This blog entry is over a year old is meant as a starting place for someone who has some Linux experience.  Since that time the Google speech API has changed a bit.  They block queries without a server key.

Step 0. Using an existing Google/Gmail account, join the Chrome-Dev Group.
https://groups.google.com/a/chromium.org/forum/#!forum/chromium-dev
Step 1. Create a new Project here (e.g. Speech Recognition)
Step 2. Click on your newly created project and choose APIs & auth.
Step 3. Turn ON Speech API by clicking on its Status button.
tep 4. Click on Credentials in APIs & auth and choose Create New Key -> Server key. Leave the IP address restriction blank.
Step 5. Write down your new API key or copy it to the clipboard.

I have easily added code to existing IRLP and Allstar Linux computers.  IRLP or Allstar has the hooks to catch DTMF strings to invoke this application to record your spoken commands, and submit them for translation.  From there you can code keyword triggers a number of ways.  An easy example is to use grep.

 if grep --quiet time /tmp/message.txt; then  
  /home/irlp/bin/key 
  TIME=`date "+%l:%M %p"`  
  echo "the time is $TIME" | festival --tts   
  /home/irlp/bin/unkey
 fi  



1 comment:

Bill Chellis said...

Steve,

This is pretty cool.
I might just have to setup my own repeater now just to play with this.
God knows no one else will around here.

Once again, please know you are not the only ham out there who realizes their ham ticket is a bona fide license to TINKER.

Bill, KB1ROP