Virtual hand from the SOFA project |
What can you do with hand tracking?
The ability to track position (x/y/z) and orientation (yaw/pitch/roll) of hands enables many things:- A very natural way to interact with virtual objects. Fine-grained interaction might also require understanding finger position, grab/release movements, but just being able to naturally reach out to a virtual object seems to me to be really important.
- Another option to choose between menu items and other user-interface options. A 'yes or no' question could ideally be answered in many ways: by touching a virtual 'yes' button in the air, by nodding the head to indicate 'yes', by saying 'yes' to a microphone or even by showing a thumbs up sign. Just like you can select a 'yes or no' question on a computer screen with mouse, keyboard or speech, you should be able to do the same and more in a virtual world.
- Locating the hands will continue to be useful in fitness or sports games as many of the Microsoft Kinect games have demonstrated. Imagine being a quarterback where your arms are tracked so the game can understand the trajectory and velocity of your throw as well as your release point
Hand tracking approaches
Historically, hand and arm tracking involved wearing body suits or wearing special markers or active sensors on the body such as this one from XSENS used to create the movie "TED":
An alternative approach uses an array of cameras in a reasonably sterile space such as the demonstration below from Organic Motion:
While these demos are very nice, they require lots of equipment, clean spaces and special gear. The XSENS demo uses active sensors which means a user needs a power source (e.g. batteries), needs to strap these on to the body - great for special applications but not simple and inexpensive enough for consumer use.
Tracking technologies that are certainly more consumer-friendly are those that use one or two cameras. The Kinect uses a structured-light approach which projects a known light pattern on the objects to be tracked and then analyzes it with a camera. Dual-camera solutions such as this and this are essentially 3D cameras that correlate objects across the views of cameras with a known position and then try to calculate the position of the object. Leap Motion uses three light sources and two cameras in a compact sensor. Other sensors use the time of flight method which is similar to how a radar works, to understand the distance of an object from the sensor. Below is nice demo from the Intel Perceptual Computing initiative which uses a time-of-flight sensor:
Some technologies such as the Kinect produce a depth map - essentially a representation of how far each pixel in the image is way from the camera - and then use sophisticated algorithm to turn this into a skeleton and try to understand what this skeleton means. Depending on how smart these algorithms are, the left hand may easily be mistaken for the right hand or, in worst cases, for a leg or something else.
In contrast, marker-based technologies don't extract the full skeleton but rather known locations of certain markers that are attached to specific parts of the body. Because the marker is known in advance, there is little chance that left and right hand are confused. However, using markers is a disadvantage because they have to be worn and calibrated.
Below are examples of marker tracking from WorldViz (shown together with a Sensics HMD)
and ART (where the markers are placed on the HMD and the controller to obtain precise position and orientation.
Marker tracking on an HMD, from IEEE VR 2013 |
Motion tracking with a body suit, photographed at SIGGRAPH |
In the next post, we will discuss the applicability of these technologies to virtual reality goggles