Let's Make Robots!

Speech Recognition - Robot Thinking and Understanding

Programming a robot to do a single specific task such as following a line is hard enough for most people. So how do we program a robot so it can perform many different task? How do we command / control it?

Recently I have been wanting to use the speech and voice recognition of an Android mobile phone. My friend Bing wrote a simple app for me that allows my controller to connect to an Android phone via Bluetooth and then send the phone a list of words that I want the robot to recognize. For example: "OddJob,go,forward,backward,turn,stop"

The first word in the list is the robots name. If the app does not hear a sentence starting with the robot's name then it will ignore anything being said. Once it hears a sentence starting with the robots name then it will look for words in the word list supplied. The Android software will offer several different possible results when it recognizes a sentence so the app looks for the first sentence in the list where all words are recognized. This means your instructions must be made from words in the list but you can say them in different combinations.

Once the app finds a sentence where all words are recognized it returns a number for each word it recognized. For example the robots name is always first so the app will always send a 1 to indicate the robot's name. using my previous word list example, go=2, forward=3 etc. The next problem is making sense of the words (numbers). The simplest solution is probably to group the words into different types and then treat the different types like rotary switches.

This could be done with switch case statements or a 4 dimensional array but the logic would not be flexible. If you wanted to add new words or even to change the code so that the robot could learn new words then switch case statements would be a nightmare to update and a 4D array would quickly become too big to fit in memory.

So now I am trying to think of a more flexible logic system. Preferably one that can learn new words without the code being re-written. I was thinking of some of the early A.I. programs that you could have a conversation with. Although they might not pass a Turing test they might be good enough for the robot to follow your instructions.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I realize this thread is old, but I thought I might as well.  Cool stuff and right up my alley.

A simple way would be to create a simple in memory database of words in Android by defining something like:

Hashtable<String, Word> _Dictionary     or some such for holding words and synonyms of words, where synonoyms refer back to other "root" words.  In this case, "Word" is a generic Word Class that you would write.

Optionally, you can also add another for "Word Handlers", routing words to classes for handling a word:

Hashtable<String, IWordHandler> _WordHandlers

In this definition, IWordHandler is a custom interface that you would write.

Alternatively, so can build whatever database structure you would want using SQLite and deploy it with the Android App.  I have written large relational databases for android phones, and they can certainly handle it.

Lastly, can call off phone to a website or web service and do it there, the advantage being that for a very large command set, you can maintain it there, not have to sync it with phone (which can be complicated), and use regular expression search algorithms for a lot more natural language ability and varied sentence structures.

I have tried all of these at various times.  SuperDroidBot "Anna" once used the in memory version for a very basic set of mission critical commands (around 50-100 words), and goes off bot for an extended set (another 500-1000 or so).  I have since turned off most of the local language engine as the off-bot one matured.

I personally found that whatever solution I have used for this, I needed to have a lot of synonym commands that "sound close" to the actual commands, as the phone's voice recognition services will throw out a lot of misinterpreted words.

Take a look at Backus naur form. Might offer part of what you are after.

Thanks for that, not something I would have normally Googled :D

After a quick look at the Wikipedia article it reminds be a bit of the logic simplification I was taught at colledge where a set of rules was used to simplify a complex logic circuit into it's minimal form.

I think that is the first thing I need to do, take a sentence and break it down to it's simplest logical form.


Doing what you propose is very challenging, but I found a research project similar to your objective. You can have a look at this task because they have some data available as well:


Nevertheless, it is a very interesting project you have there. Please keep us up to date with it :)