Let's Make Robots!

MyRobotLab - Template Matching - Bot finds teet..

myrobotlab, open source, Java, service based framework, robotics and creative machine control

Logitech C600 Camera - PanTilt Rig with $16 BBB Arduino Clone - not super fancy but it works..


Template Matching is now available through MRL.

Template Matching is a process of matching a small sub image within a larger global image.  As an exercise I chose a wall socket since this could be the goal for a self charging robot. 

When the target is locked and centered, an event will fire off.  If this were a mobile platform and the goal was to mate with the socket, the next behavior would be to move closer to the socket and avoid obstacles.  Since, this is not a mobile platform, I have chosen to send the event to a Text To Speech service with the appropriate verbiage. 

The interface for creating a template can be programmed with coordinate numbers, or selected through the video feed.  To select a new template the Matching Template filter should be high-lighted, then simply select the top left and bottom right rectangle of the new template.  You will see the template image become visible in the Photo Reel section of the OpenCV gui.

Currently, I am using the Face Tracking service in MRL.  The Face Tracking service will soon be decomposed into a more generalized Tracking Service, which can be used to track a target with any sort of appropriate sensor data.  Previously I found tracking problematic.  The pan/tilt platform would bounce back and forth and overcompensate (Hysterisis).  The lag which video processing incurs makes the tracking difficult.  In an attempt to compensate this issue, I have recently combined a PID controller into the Tracking service, and have been very pleased with the results.  The tracking is bounces around much less, although there is still room for improvement.  

PID is a method (and artform) which allows error correction in complex systems.  Initially a set of values must be chosen for the specific system.  There are 3 major values.  

  • Kp = Proportional constant - dependent on present errors
  • Ki = Integral constant - is the accumulation of past errors 
  • Kd = Derivative constant - attempts to predict future errors

The video will show the initial setup.  This involves connecting an Arduino to a COM port, then connecting 2 Servos to theArduino (1 for pan & another for tilt).  After this is done, I begin selecting different templates to match as the test continues. The template match value in the upper left corner represents and represents the quality of matching.

The states which can occur

  • "I found my num num" - first time tracking
  • "I got my num num" - lock and centered on target
  • "I wan't dat" - not centered - but tracking to target
  • "boo who who - where is my num num?" - lost tracking

More to Come

In the video you can see when I switch off the lights the lock is lost.  Template matching is sensitive to scale, rotation, and light changes.  Haar object detection is more robust and less sensitive to scale and lighting changes.  The next step will be taking  the template and proceeding with Haar training. 

Kinect & Bag Of Words - http://en.wikipedia.org/wiki/Bag_of_words_model_in_computer_vision association

Update 2011.08.08

I was on vacation for a week, however, when I got back I wanted to make sure the latest (mrl 13) was cross platform compatible.
I had some problems with Fedora Core 15 / GNOME 4 desktop / Java & OpenCV
FC15 can install opencv version 2.2 but 2.3 is available for download.
I removed the 2.2 version - and did a clean install of mrl 13

The desktop is still acting a little "goofy" but after :

  • copying *.so's from the bin directory to /usr/lib (blech) 
  • ldconfig
  • loading opencv service - and refreshing the gui screen (blech)
  • using the Gray/PyramidDown/MatchTemplate filters

I got tempate matchin on the MAAHR brain going at 78 ms per frame running with debug logging. (almost 13 frames per second!)
it says 93ms because the screen-capture slows the process down. 


MAAHR is currently running on a 12V SLA
The CPU and power supply are cool 
None of the CPUs are over 60% and this should drop off significantly if the video feed was not being displayed.
MRL-13 has now been tested on Windows XP & 7, Fedora Core 13 & 15 -  The next version should come with opencv binaries compatible with an ARM-7, although Ro-Bot-X's Chumby appears down for some reason.....



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Making good headway, good headway indeed. I genuflect in your direction, sir.

So, I'll try to get a virtual machine with similar platform running...

MRL version 1 Face detect works well ..... finds video source

MRL version 12 FaceTracking unable to find video source.(tried all video in grab methods too)

The face tracking service in V12 needs updating... but "hopefully" the OpenCV service worked?

I was wondering if I could some of your help testing on other platforms?

I've published a new version (myrobotlab-0012) here - http://code.google.com/p/myrobotlab/downloads/list 

It's still very rough in the OpenCV frame grabbing area - the GUI had to be re-worked in order to (eventually) offer a lot more functionality - specifically the ability to configure different frame grabbers.  At some point this will allow the saving of video footage for future unit testing, and the ability to read in pre-saved video files.  It has the Kinect fame grabber, with a mode I added to do object segmentation/isolation called "interleave"

I'm interested specifically in the "Windows" platforms.  It currently works with Linux, and I have a virtual windows XP x86 machine which I successfully tested the OpenCVFrameGrabber - There is a few other exotic hardware frame grabbers - no need to test those if you don't have a FlyCapture or DC1394 camera.  If it fails please send me the Log.txt file generated in the directory myrobotlab was started and the type of webcam you have.

I really appreciate it !


I have an Win7/64 machine and it works! I can also use the second camera if I wish. So I can experiment with MRL!


One more thing, I just got the replacement motherboard for my Chumby!!! Time to install it! Wish me luck!

Here is a screen capture, if you click on the picture, you'll get the large one:

I have a Windows 7 / 64 bit laptop.

So, the user can select a box in the image and MRL can track it. How can we store several selections? How can we make the selection automated? I mean, I want the robot to be able to find a wall socket, a cup, a beer, a flower, a toy, whatever. To do that, it has to have pre-stored selections that he can try to match with the current video image. If he knows what to look for, it compares just one selection to the image. But if it needs to recognise some object, it needs to be able to make the selection of the object in the image by itself, then compare it to the pre-stored selections. If it can't find it, then it can ask for a name for that object and store it as a new object. You already have a service (function) for this?

It's headed that way...
The next simple (almost trivial step) is to store more than 1 selection..  I'll be doing that shortly.. 

You and I are on the same track ....  we know what we want as an end goal....

Here are some things to be aware of though:
TemplateMatching works pretty well if the image's lighting conditions are the same and the scale is the same...   
This "matching" is pretty delicate..  To have more meaningful matching we'll need to do Haar training / SURF.  What this does is basically de-constructs the image into a data "map".  Imagine taking several sheets of stretchy tracing paper - overlaying the image of interest with it, and tracing it.  On one page you trace the most "obvious" form.. and on subsequent pages you fill in more details. 

This process is called Haar Training, and its result (the sheets of tracing paper) are referred to as Haar cascades.  They are more robust than template matching - since they are traces - they are much less sensitive to light changes..  Since they are done on stretchy paper... you can scale them quickly to attempt to match objects at different ranges...

In some ways its like getting closer to the "essence" of an object - which is closer to how we objectify things..  

When we think of a "table" we build a meta-table image in our head - although it can be paired with a specific table image, the construct or essence of table is universal.  We can save the image and bind it to the construct, but the generalized table is what we find useful to communicate with.

I'm pretty close to this type of thing...  I got a Kinect working with MRL now...

Object isolation is key in object identification...  You've got to first isolate the thing your interested in.  You can do that with a Kinect since it gives you range info - you just say your interested in objects from 30cm to 60cm in range.  Qbo does isolation using stereo cameras - it finds "the nearest thing" 

Predator - does isolation by tracking the object and isolating an arbitrary size around the tracking point.

I like the idea of user selection to provide some of the information on how to isolate objects.  It is pretty convenient to look through the robots eyes and point/click what you want it to learn. The real exciting part about it is the idea, that the robots vision can be streamed to the internet, and many users could be point/click/teaching the robot what to learn - the processing of these images to Haar Cascades is pretty intensive, but could be done by more than one computer.  The idea is the robot then would have access to a large online library of work done by people to help identify important "objects"

I'd like to start such a library... and I'm making small steps to that end goal...


I know the Kinect is the best thing to use, but I don't realy want to add a Kinect to a Chumby based robot. 

What if we add an IR sensor and a US sensor to the webcam to give it range info? The microcontroller that pans/tilts the camera can also read the range sensors.

Or, use a projected laser line and use the height of the line in the image to detect the closest object? 

Which of these alternatives are better/easier to use with MRL?