Let's Make Robots!

help with 3d image reconstruction (from stereo or multiple images)

Does anybody have any experience with, or any good references on reconstructing a 3d scene from a stero image, or lots of images from different positions?

My use case is obviously a robot (MmBot) which has a stereo camera mounted on the front. I don't want to use a proper depth camera, as they are intrusive so I couldn't have multiple robots in the same room all seeing at once.

This is MmBot with the stereo cameras on the left, and the image she sends to the pc on the right.


I need to get a depth image from the stereo feed.





Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

OpenCV has stereo functions for stereo cameras.  Additionally it has other fun algorithms like SURF, which is what the 123d catch program primarily uses mentioned by airuno2I.

Good luck, we always need "new and improved" wheels :)

Thx grog - I think OpenCV is definitely a good starting point. I'm wondering if there's any stereo algorithms that can take 'hints' from things at known depths, calculated via feature matching.

It sounds like your interested in "horizontal disparity" to indicate depth, which all stereo algorithms do at some level.  The difficult part is figuring out the "which where & how" details of the features to match.  

That's why I mentioned OpenCV SURF implementation.  It appears to be pretty good at finding & matching features in two different frames.

OpenCV is hands down the way to go for stereo if you ask me. Some of the world's best in the field publish their code there. 

If you're interested in only using one monocular camera, this program is pretty cool


But all the processing is done in "the cloud". I don't know why they insist on it working that way...but they do, so it's not really useful for robots.

Grog has already done this: http://letsmakerobots.com/node/2256

Why is it that newbies don't know how to use Google?

Hey I'm not a newbie - spent hours trawling google for this one :) Like I said, it's almost a case of too much info than too little - -there's lots of techniques and I'm trying to narrow down which ones to try. Looks like Grog's done a good job of picking out lots of places to start looking though. 

First thing that comes to mind for PC vision processing is OpenCV. Not only does OpenCV take care of many fancy vision functions for you, it's also extremely flexible and therefore powerful.

I know OpenCV has stereo imaging abilities, but I can't say I've ever used those tools myself. However you will find more than a few people here on LMR and around the 'net that have plenty of OpenCV experience. It's a popular platform for a good reason.

BTW MmBot is coming along nicely, and I wouldn't feel right if I didn't mention that I love LBP/LBP2 =)

Yup - I'm thinking OpenCV will be my first stop - hopefully by the time I get my raspberry pi somebody will have ported it as well :) Ideally I need an algorithm I can pull out and GPUify it though.

And thx for loving LBP - that's why we made it :)

Don't confuse depth perception with the generation of a map in three dimensions. A stereo camera might(or might not, see below) help you with the first by calculating the distance to some perceived objects, but it won't really help with the second in a meaningful way because of limitations in scope and the fact that objects can hide behind one another. This can happen with about any sensor system of course. Most of the 3d camera usage in robotics seems to be oriented towards telepresence and most 3d "scene building" is done with a laser scanner or one-camera/ping hybrid system (like a Kinect) it seems. You already have both, but object recognition-especially 3d- is probably asking a little much of a Mega. My imagination fails when it comes to even generating a flow chart to use a stereo camera to judge distances. I don't know how you'd determine a reliable reference depth with purely visual information. Any ideas?

I agree depth perception and generation of a map aren't the same thing, however depth perception is the main prerequisit to building a 3d map, and is also handy for other things. 

I'm not too concerned about computing power - my robot sends the camera feed back to a pc for processing, which then communicates with the arduino over blue tooth to give instructions or read sensors. i.e. the brain is running on a nice i7 pc :)

So, my first task is to use stereo imaging to derive a depth image from 2 photos taken from different/known positions. It's a problem that's been worked on quite a bit but there's a lot of very complex random sources of info - tricky to know where to look for the right stuff.