Let's Make Robots!

Kinect Video Stream Masking

 

Here is some streaming video being masked by the depth image of the kinect.
I use OpenCV's inRangeS to find a range to create a mask from the depth image, then cvCopy to copy it over the video image.

One strange thing I've noticed is the video image is down and to the left of the mask.  If you look on the right edge you can see another anomoly which is the boundry of the depth mask.  Apparently the depth image is 632 X 480 vs 640 X 480..

Is the shift part of this too?

Inspiration - http://10k3d.com/kinect-learning-to-recognize-objects-using-op
Interesting link - http://www.ruialmeida.org/?p=218

 

The offset seems to be ~32 pixels as a crow flies - which means I have to adjust the mask 25 pixels left and 25 pixels down... 

I'm Wondering : 

1. Is this for all Kinects or just mine... I suspect its all because of the way it works & 632 resolution, but I really have not found much info on the subject.

2. How the heck am I going to move the mask over given the set of OpenCV functions?

 

The Results

Looks better !, the alignment falls off as the target gets closer to the edges of the 640x480 screen, but pretty good when centered, which is what I'm gonna work with for object identification.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

More Progress..
I managed to implement video stream multi-plexing.

This is where:

  • streams can be forked off into seperate streams and seperate displays
  • the streams can be processed seperately
  • streams can be rec-combined and processed in a workflow together
  • any stream can be paired with any display
  • new screens and forks can be created dynamically - just select the screen and press fork button

Here the (top right) depth data from the kinect is forked into an InRange mask (bottom left) and recombined with the original image (top left).
The recombined stream is sent to the Find Contours filter and coordinates of the polygons are being published to any other service listening.

 

Lookin good man, lookin good.

I reckon you could do some cool not-so-useful stuff using what you've put together:
• depth map + gaussian blur = faux depth of field effect
• depth map + saturation and/or brightness = something that looks really cool?

Maybe I'll have to get one of these things for myself =)

Thanks TeleFox...

Haven't thought of that.. but yes, could have "zowie" potential...

My first thought was object isolation/segmentation..... So the computer and I can play the "What's this?" game...
You know...
Me:  "What's this ?"
Computer: "I don't know"
Me:  "It's a hand"
Computer: "Ok"
Me: "What's this ?"
Computer: "It looks like a hand" 

Thanks for the response!  Wow, great link too..

After scratching my head last night, I decided I'll try this:
Which is close to what you suggested except it only needs 1 empty image, and 1 copy

1. create a completely black image the same dimensions of the RGB image - 640 X 480
2. use cvSetImageROI on the depth data - if 25 x 25 is correct -
I would cvSetImageROI(kinectMask, cvRect(25, 0, 615 - 8 , 455)); ... the (-8) to get rid of the irritating band on the right :)

3. then cvSetImageROI on the black target - cvSetImageROI(black, 0, 25, 615 - 8, 455)
4. do a single copy from the depth to the black blank cvCopy(depth, black);

Sound's good in "theory"

Without doing any kind of border interpolation or anything like that, you'll have to live with a smaller image if you want an image that only contains true info from both the normal camera and IR depth mapper. The largest correlated image you can get will be (632 - 25) X (480 - 25), i.e. 607 X 455... what awful numbers =)

Anyway, that assumes that the scaling between the two images is the same, and that they really are offset by 25, 25 pixels.

There are probably more elegant ways to process the two images before you combine them (remap()?), but the first method that comes to mind is using getRectSubPix(). Create two empty 607 X 455 images, rip out the appropriate pixels from the two sources, then cvCopy() to combine the sub pixel images. That way you get your offset and cropping taken care of in one step per source image.

Check out this link: it isn't just your Kinect that's a bit weird after all =)