MyRobotLab - Template Matching - Bot finds teet..

myrobotlab, open source, Java, service based framework, robotics and creative machine control

Logitech C600 Camera - PanTilt Rig with $16 BBB Arduino Clone - not super fancy but it works..


Template Matching is now available through MRL.

Template Matching is a process of matching a small sub image within a larger global image.  As an exercise I chose a wall socket since this could be the goal for a self charging robot. 

When the target is locked and centered, an event will fire off.  If this were a mobile platform and the goal was to mate with the socket, the next behavior would be to move closer to the socket and avoid obstacles.  Since, this is not a mobile platform, I have chosen to send the event to a Text To Speech service with the appropriate verbiage. 

The interface for creating a template can be programmed with coordinate numbers, or selected through the video feed.  To select a new template the Matching Template filter should be high-lighted, then simply select the top left and bottom right rectangle of the new template.  You will see the template image become visible in the Photo Reel section of the OpenCV gui.

Currently, I am using the Face Tracking service in MRL.  The Face Tracking service will soon be decomposed into a more generalized Tracking Service, which can be used to track a target with any sort of appropriate sensor data.  Previously I found tracking problematic.  The pan/tilt platform would bounce back and forth and overcompensate (Hysterisis).  The lag which video processing incurs makes the tracking difficult.  In an attempt to compensate this issue, I have recently combined a PID controller into the Tracking service, and have been very pleased with the results.  The tracking is bounces around much less, although there is still room for improvement.  

PID is a method (and artform) which allows error correction in complex systems.  Initially a set of values must be chosen for the specific system.  There are 3 major values.  

  • Kp = Proportional constant - dependent on present errors
  • Ki = Integral constant - is the accumulation of past errors 
  • Kd = Derivative constant - attempts to predict future errors

The video will show the initial setup.  This involves connecting an Arduino to a COM port, then connecting 2 Servos to theArduino (1 for pan & another for tilt).  After this is done, I begin selecting different templates to match as the test continues. The template match value in the upper left corner represents and represents the quality of matching.

The states which can occur

  • "I found my num num" - first time tracking
  • "I got my num num" - lock and centered on target
  • "I wan't dat" - not centered - but tracking to target
  • "boo who who - where is my num num?" - lost tracking

More to Come

In the video you can see when I switch off the lights the lock is lost.  Template matching is sensitive to scale, rotation, and light changes.  Haar object detection is more robust and less sensitive to scale and lighting changes.  The next step will be taking  the template and proceeding with Haar training. 

Kinect & Bag Of Words - association

Update 2011.08.08

I was on vacation for a week, however, when I got back I wanted to make sure the latest (mrl 13) was cross platform compatible.
I had some problems with Fedora Core 15 / GNOME 4 desktop / Java & OpenCV
FC15 can install opencv version 2.2 but 2.3 is available for download.
I removed the 2.2 version - and did a clean install of mrl 13

The desktop is still acting a little "goofy" but after :

  • copying *.so's from the bin directory to /usr/lib (blech) 
  • ldconfig
  • loading opencv service - and refreshing the gui screen (blech)
  • using the Gray/PyramidDown/MatchTemplate filters

I got tempate matchin on the MAAHR brain going at 78 ms per frame running with debug logging. (almost 13 frames per second!)
it says 93ms because the screen-capture slows the process down. 


MAAHR is currently running on a 12V SLA
The CPU and power supply are cool 
None of the CPUs are over 60% and this should drop off significantly if the video feed was not being displayed.
MRL-13 has now been tested on Windows XP & 7, Fedora Core 13 & 15 -  The next version should come with opencv binaries compatible with an ARM-7, although Ro-Bot-X's Chumby appears down for some reason.....



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Did you remove myrobotlab.jar and download the latest?
Yes, that's how I got the keyboard service.

Your right...

Somethings fishy with the google "Otto" poster ...  Grrrr.... it says it updated it, and sent the file .. but something is very wonky...

Alright I did it manually - you should be able to try again - the label says "no really ! - it does have "send strings" "

Your right...

Somethings fishy with the google "Otto" poster ...  Grrrr.... it says it updated it, and sent the file .. but something is very wonky...

Alright I did it manually - you should be able to try again - the label says "no really ! - it does have "send strings" "

A simply system that I have used quite a bit is a simple letter/number combination. For example, you could go direction and speed as F100 (for forward 100%) or R100 (reverse 100%). Maybe L90 for left 90 degrees. I also use P5 for "Play track number 5" off of the mp3 player. This simple system has worked flawlessly for a lot of projects of mine. Just a thought.

Now, the real question is: Does Butler Bot have any wheels attached yet or are we still just "staying alive" and blinking leds?

I'd like to make a GUI in MRL where you can dynamically create and bind buttons to Strings... I'm guessing your doing that with your Android project?  When are you going to start coding for MRL again :)

Actually Chris, I think I have a pretty good system: each robot has an I.D. Byte and a pre-programed list of commands. If a robot is listening and hears it's I.D, it will listen for the command. This gives me 256 reusable commands for each robot. I think I will redo butler bot in the near future. I will make it out of more lightweight materials and add a mini fridge on the bottom, and a pump going to the top which will have glasses. Also, I may add a small "snack bar" in the middle.

So, the user can select a box in the image and MRL can track it. How can we store several selections? How can we make the selection automated? I mean, I want the robot to be able to find a wall socket, a cup, a beer, a flower, a toy, whatever. To do that, it has to have pre-stored selections that he can try to match with the current video image. If he knows what to look for, it compares just one selection to the image. But if it needs to recognise some object, it needs to be able to make the selection of the object in the image by itself, then compare it to the pre-stored selections. If it can't find it, then it can ask for a name for that object and store it as a new object. You already have a service (function) for this?

It's headed that way...
The next simple (almost trivial step) is to store more than 1 selection..  I'll be doing that shortly.. 

You and I are on the same track ....  we know what we want as an end goal....

Here are some things to be aware of though:
TemplateMatching works pretty well if the image's lighting conditions are the same and the scale is the same...   
This "matching" is pretty delicate..  To have more meaningful matching we'll need to do Haar training / SURF.  What this does is basically de-constructs the image into a data "map".  Imagine taking several sheets of stretchy tracing paper - overlaying the image of interest with it, and tracing it.  On one page you trace the most "obvious" form.. and on subsequent pages you fill in more details. 

This process is called Haar Training, and its result (the sheets of tracing paper) are referred to as Haar cascades.  They are more robust than template matching - since they are traces - they are much less sensitive to light changes..  Since they are done on stretchy paper... you can scale them quickly to attempt to match objects at different ranges...

In some ways its like getting closer to the "essence" of an object - which is closer to how we objectify things..  

When we think of a "table" we build a meta-table image in our head - although it can be paired with a specific table image, the construct or essence of table is universal.  We can save the image and bind it to the construct, but the generalized table is what we find useful to communicate with.

I'm pretty close to this type of thing...  I got a Kinect working with MRL now...

Object isolation is key in object identification...  You've got to first isolate the thing your interested in.  You can do that with a Kinect since it gives you range info - you just say your interested in objects from 30cm to 60cm in range.  Qbo does isolation using stereo cameras - it finds "the nearest thing" 

Predator - does isolation by tracking the object and isolating an arbitrary size around the tracking point.

I like the idea of user selection to provide some of the information on how to isolate objects.  It is pretty convenient to look through the robots eyes and point/click what you want it to learn. The real exciting part about it is the idea, that the robots vision can be streamed to the internet, and many users could be point/click/teaching the robot what to learn - the processing of these images to Haar Cascades is pretty intensive, but could be done by more than one computer.  The idea is the robot then would have access to a large online library of work done by people to help identify important "objects"

I'd like to start such a library... and I'm making small steps to that end goal...


I know the Kinect is the best thing to use, but I don't realy want to add a Kinect to a Chumby based robot. 

What if we add an IR sensor and a US sensor to the webcam to give it range info? The microcontroller that pans/tilts the camera can also read the range sensors.

Or, use a projected laser line and use the height of the line in the image to detect the closest object? 

Which of these alternatives are better/easier to use with MRL?