Let's Make Robots!

Software/Hardware infrastructure project to control the vehicle over Internet

Drive around, stream video back to the driver.

We are working on the software and hardware infrastructure to control vehicles (car, tank, quad-copter, etc.) over the internet. This is complete opensource project with all the source code available here.

Video streaming is made in real-time from on-board camera using adaptive algorithm to keep stable framerate which is required for comfortable control. BeagleBoard is used as on-board computer. In addition to high-level data processing like video compression, our goal was to control servos also from BeagleBoard without additional micro-controllers. GPIOs and two built-in PWM generators are used. The only additional electronic required are voltage level shifters to convert 1.8V to standard 5V required for servos and speed controllers.

Hacked RC racing car were used as a first prototype. Currently we are working on the much slower tank which is more convenient to use at home because RC car is just too fast for indoor. Our main focus currently is to implement stereo vision and then build quad-copter with this software and hardware.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I had thought about adding 3D to a telepresence system using a couple of webcams like you guys are doing.  Hasn't gotten off the ground.  Still just rolling around my head but I would love to bounce ideas about 3D video transmission and display techniques.

Sorry for the slow reaction. I was on vacation.

Regarding stereo vision - we decide to go the simplest way. So what we are doing is just merge side by side two captured frames into the wide one using Gstreamer's videomixer element. After that, the whole thing are the same as for normal single camera sorce until the last visualization step. There, we are using simple fuction written with OpenCV which splits the wide image in two parts and generates anaglyph picture.

You can get a general idea how to use the videomixer element by looking at this example:
https://www.gitorious.org/veter/veter/blobs/master/misc/car.config#line58
It is not exactly what is necessary (did not checked the code in yet for some reasons) but is very close.

The function to generate anaglyph image from two merged frames could be found here:
https://www.gitorious.org/veter/cockpit/blobs/master/src/toanaglyph.cpp#line29

Hey I'm going to be experimenting and writing a semi-custom 3D encoder based on the x264 code base in the next few months for my robot project.  If I come up with something decent I'll post the source.

 

Good luck with your efforts!

At GStreamer conference 2010 I was attending the very interesting presentation about stereoscopic encoding from Martin Bisson. If you are interested, you can take a look at his presentation slides. There is also presentation video available in HD quality (124Mb) and 450×800 (39Mb).

So you are compressing and transmitting the side-by-side video stream to the driver computer, and then the driver computer is generating the anaglyph for display - correct?  It seems like since you are using anaglyph, you would save half your bandwidth by generating the anaglyph on the robot and transmitting an anaglyph video stream.  Not only would you reduce your video image size by half, but it seems that the color change to anaglyph works like a lossy filter.  It would reduce the amount of information in the image (fewer colors to encode) and make the image even more compressible.  The additional processing time on the robot would probably be offset by a greatly reduced transmission size and you might see a lot less total latency.  Maybe I'm missing something in the details?

Of course this only applies for anaglyph.  If you were trying to do something cooler like display the dual images on a 3D monitor or VR glasses then you would need both images intact.  I have not done any testing myself but have done some thinking about the best way to transmit the dual images.  Side-by-side is a very logical choice and is probably best, but another possibility would be interlaced - the reason being that each image should be nearly the same.  The same pixels in each image should be highly correlated so the interlaced image should be very compressible (since vertically adjacent pixels would often but not always be the same color).  It would be an interesting experiment to see which compressed better - the two side-by-side images with the large vertical discontinuity in the middle or the interlaced images with smaller local discontinuties.

>So you are compressing and transmitting the side-by-side video
>stream to the driver computer, and then the driver computer is
>generating the anaglyph for display - correct?

Yes. Correct.

>It seems like since you are using anaglyph, you would save half
>your bandwidth by generating the anaglyph on the robot

Your considerations about advantages of generating the anaglyph image on the robot is absolutely correct. As you pointed out, anaglyph generation leads to the loss of the information. And this is why we decide not to do it. Our goal is to make the complete data available on the driver side. The reason for it is possible further processing ranging from higher quality (full color) stereo vision to obsticle detection, etc. So we do not want to lose information which might be important for the processing. In general, our strategy is to use the robot to collect the sensor data (including video) and make as much processing as possible on the driver computer or even on the dedicated cluster (CUDA, OpenCL, OpenCV, etc.) for complex decissions in real-time.

>but another possibility would be interlaced - the reason being
>that each image should be nearly the same.

Very interesting idea! I did not consider it before.

>It would be an interesting experiment to see which compressed
>better - the two side-by-side images with the large vertical
>discontinuity in the middle or the interlaced images with smaller
>local discontinuties.

It would be definitely interesting experiment! I am also believe that it might be interesting enough to make a publication on some conference about the results. That is actually the reason why I am trying hard to build a community around this project and encourage robotics and computer vision students to use what we have done as a base for their further experiments. I am even thinking about offering the kit with required hardware to make it easier to start.

> Our goal is to make the complete data available on the driver side. The reason for it is possible further processing ranging from higher quality (full color) stereo vision to obsticle detection, etc.

There may be other ways to utilize the correlation between the stereo images to compress the feed into a single image and then reconstruct useful stereo images.  For example, you could take a 24-bit color image - convert to 12-bit grayscale and then use the extra 4 bits as an inter-image delta value.  Assuming the pixels were highly correlated between the images then you might only need a few bits to encode the second image.  If image pixel A(0,0) = 31 and in the other image, pixel B(0, 0) = 32 then a delta value of +1 is all you would need to reconstruct pixel B(0, 0).

Of course this will involve some loss of information and there will be some artifacts if the dynamic range of the delta is not high enough (ie. delta > 16) but you might be able to find a compromise of bit depths that allows you to reconstruct stereo images with enough visual fidelity for robust analysis (edge detection, object detection, etc...)  Color would require higher bit depths for good results but you probably don't need a full 48 bits to encode a stereoscopic pixel.  And of course for visual display, information loss and artifacts are less of a problem as much as a simple nuisance.

> Our goal is to make the complete data available on the driver side. The reason for it is possible further processing ranging from higher quality (full color) stereo vision to obsticle detection, etc.

(double post removed)

This is a very nice robot... It's very similar to what I had in mind for my own 3d vison robot, I even bought the same cameras.

I went for a different type of brain though: an atom based pc with an arduino or two to interface it to the "real" world.

> This is a very nice robot... I

Thanks! :-)

> I went for a different type of brain though: an atom based pc with an arduino

The very first versions of mine was also made with Atom-based miniITX board and microcontroller. The advantage of this solution is that Atom CPU is rather powerful and there are more things you can do on board. However, I decide to use BeagleBoard without microcontroller. Rasons for it are: a) BeagleBoard requires considerably less power (much longer working time on accus); b) using DSP it is not a problem to compress h264 video in real-time; c) BeagleBoard has enough I/O pins so there is no need for additional microcontroller which makes the system more compact and cheaper; d) BeagleBoard is smaller. Taking in account that I am going to do complicated image processing and decission making *not* on-board but on the "driver" side or even on the special separate computer (GPU, etc.), BeagleBoard has enough horse power to deal with all tasks need to be done on-board.