Sunday, February 19, 2012

Voice Recognition for FM Repeaters

Last year Google pushed version 11 of their Chrome browser, and along with it, one really interesting new feature- support for the HTML5 speech input API.

This means that you'll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.

If you dig around in the source code, you learn how the speech recognition is implemented:

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

Audio is collected from the mic encoded in FLAC format, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results.

Some Asterisk Telephony enthusiasts have been monkeying with this Google Speech API. This is how I first learned of it.

http://zaf.github.com/asterisk-speech-recog/

Interacting with a repeater thus far has been limited to DTMF to query the time, etc.

This API opens a whole new world to craft your own Siri like repeater system. Just set up a series of IF statements to grep/match the text returned.

You ask "What time is it?" It sees "time" and does a time lookup and speaks it back.
You ask "where is KB8ZXE?" it sees where and KB8ZXE and passes a query to APRS.fi and reports back that he was last 2.1 miles NorthEast of Green Bay".... etc

I've been experimenting with this on IRLP node/ repeater (147.075 MHz) here in Green Bay. It's really quite trivial to implement. I bet we are the first ham radio repeater to implement voice recognition.



Here is all you really need to get started:
 #!/bin/sh  
 echo "1 SoX Sound Exchange - Convert WAV to FLAC with 16000"   
 sox message.wav message.flac rate 16k  
 echo "2 Submit to Google Voice Recognition"  
 wget -q -U "Mozilla/5.0" --post-file message.flac --header="Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-US&client=chromium" > message.ret   
 echo "3 SED Extract recognized text"   
 cat message.ret | sed 's/.*utterance":"//' | sed 's/","confidence.*//' > message.txt  
 echo "4 Remove Temporary Files"  
 rm message.flac  
 #rm message.ret  
 echo "5 Show Text "  
 cat message.txt  

Steve Ford, WB8IMY picked up on this blog and published it in the July 2012 issue of QST magazine.


{edit 2014}

This blog entry is over a year old is meant as a starting place for someone who has some Linux experience.  Since that time the Google speech API has changed a bit.  They block queries without a server key.

Step 0. Using an existing Google/Gmail account, join the Chrome-Dev Group.
https://console.developers.google.com/project
Step 1. Create a new Project here (e.g. Speech Recognition)
Step 2. Click on your newly created project and choose APIs & auth.
Step 3. Turn ON Speech API by clicking on its Status button.
tep 4. Click on Credentials in APIs & auth and choose Create New Key -> Server key. Leave the IP address restriction blank.
Step 5. Write down your new API key or copy it to the clipboard.





Now for version 2 of the API you submit like so (replace with your API key):


 echo "1 SoX Sound Exchange - Convert WAV to FLAC with 16000"  
 sox message.wav message.flac rate 16k  
 echo "2 Submit to Google Voice Recognition"  
 wget -q -U "Mozilla/5.0" --post-file message.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v2/recognize?lang=en-us&client=chromium&key=AxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxY" > message.ret   
 echo "3 SED Extract recognized text"  
 cat message.ret | sed 's/.*transcript":"//' | awk -F '"}' '{print $1}' | tail -1 > message.txt  
 echo "4 Remove Temporary Files"  
 rm message.flac  
 rm message.ret  
 echo "5 Show Text "  
 cat message.txt  


I have easily added code to existing IRLP and Allstar Linux computers.  IRLP or Allstar has the hooks to catch DTMF strings to invoke this application to record your spoken commands, and submit them for translation.  From there you can code keyword triggers a number of ways.  An easy example is to use grep.

 if grep --quiet time /tmp/message.txt; then  
  /home/irlp/bin/key 
  TIME=`date "+%l:%M %p"`  
  echo "the time is $TIME" | festival --tts   
  /home/irlp/bin/unkey
 fi  


Freely Available STTs:
Google STT
IBM STT
Wit.ai STT
AT&T STT


I highly recommend the "Building a Virtual Assistant for Raspberry Pi" book by Tanay Pant

{Edit 10/2018}
Chris Lam, KM6VGZ  - “Make amateur radio cool again”, said Mr Artificial Intelligence. - A project on building a speech recognition system for amateur radio communication.

2 comments:

Bill Chellis said...

Steve,

This is pretty cool.
I might just have to setup my own repeater now just to play with this.
God knows no one else will around here.

Once again, please know you are not the only ham out there who realizes their ham ticket is a bona fide license to TINKER.

Bill, KB1ROP

Matt said...

Im going to give this a go on my test box linked to AllStar and Echolink. Would be great to be able to give a vouce command like "Ok repeater, link me to gb3np" rather than have to lookup the id ind remember the DTMF shortcuts.