Where are you now on the project? How far have you gotten?
It would help if you would add the links to:
This way we can all catch up to what you are looking at now. Once we know where you are (looking for tutorials/ etc), we can figure out how best to help.
I don't think Kinect does any computer vision processing, as far as I know it just outputs video pixels plus depth information. I think you may be able to implement some simple processing such as blob detection, etcetera with an Arduino. To process video to recognize human gestures you will probably need quite a bit more CPU power than an 8-bit Arduino, and you will probably want to leverage use of OpenCV.
I saw some hacks made with a working desktop/notebook computer, A quadcopter to avoid obstacles(google for it)... And I think it's may be done with a Raspberry pi(which is actualy a pc) Worth the research to do great things... It IS possible, but you may need a starting point ;)
The arduino can not be communicated directly with the arduino. What I suggest is having some sort of computer, ie. rasberry pi, netbook, etc to run processing (http://processing.org/) which then could send commands to move the motors in a certain direction.
Check out here about skeleton tracking with the kinect and processing http://learning.codasign.com/index.php?title=Skeleton_Tracking_with_the_Kinect