Right now I'm not really concerned with understanding the sound, but more with localizing it so I can point the bot's head at the speaker.
I was thinking of two cheap microphones with something behind them that would absorb sound. Then I could just focus on anything I could hear with both mikes and point the head until the sound is the loudest in both ears. If the sound is directly behind the bot, it should be possible to determine this by a very small microphone in the rear of the head.
Has anybody used a scheme like this?