The annual IEEE conference on virtual reality took place in Minneapolis last week. It was a unique opportunity to meet some of the leading VR researchers in the world, to showcase new product innovations and to exchange views on the present and future of VR.
I had the pleasure of sharing the stage in "the battle of the HMDs" panel session at the conference, together with David A Smith, Chief Innovation Officer for Lockheed Martin, Stephen Ellis who leads the Advanced Displays and Spatial Perception Laboratory at NASA and Dr. Jason Jerald of NextGen Interactions.
Below are a (slightly edited) version of my slide and a free-form version of the accompanying text. The audience was primarily VR researchers, so if one thinks of "R&D" as "Research and Development", this talk was aimed more at the research side then the development side.
I believe that there are three layers to what I call the "HMD value pyramid": baseline technology, sensing and context. As one would expect, the pyramid cannot stand without its baseline technology, which we will discuss shortly, but once baseline technology exists, additional layers of value build upon it. While the baseline technologies are mandatory, the real value in my opinion is in the layers above it. This is where I am hoping the audience will focus their research: making these layers work, and then developing methods and algorithms to make these capabilities affordable and thus widespread.
There are several components that form the baseline of the VR visual experience:
- Optics that adapt the displays to the appropriate viewing distance and provide the desired field of view, eye relief and other optical qualities.
- Ergonomics: a way to wear these optics and displays comfortably on the head, understanding that there are different sizes and facial formations, and quickly adjust them to an optimal position
- Wireless video, which allows disconnecting an HMD from a host computer, thus allowing freedom of motion without risk of cable entanglement
- Processing power, whether performing the simple tasks of controlling the displays, performing calculation-intensive activities such as distortion correction or ultimately allowing applications to run completely inside the HMD without the need to connect to an external computing device.
Once the underlying technologies of the HMD are in place, we can move the next layer which I think is more interesting and more valuable: the sensory layer. I've spoken and written about this before: beyond a head-worn display, the HMD is a platform. It is a computing platform but it is first and foremost a sensory platform that is uniquely positioned to gather real-time information about the user. Some of the key sensors:
- Head orientation sensors (yaw/pitch/roll) that have become commonplace in HMDs
- Head position sensors (X/Y/Z)
- Position and orientation sensors for other body parts such as arms or legs
- Sensors to detect hands and fingers
- Eye tracking which provides real-time reporting of gaze direction
- Biometric sensors - heart rate, skin conductivity, EEG
- Outward-facing cameras that can provide real-time image of the surroundings (whether visible, IR or depth)
- Inward-facing cameras that might provide clues with regards to facial expressions
HMD eye tracking sensors are behind in the development curve. Yes, it is possible to buy excellent HMD-based eye trackers for $10K-$20K, but at these prices, only a few can afford them. What would it take to have a "good enough" eye tracker follow the price curve of the orientation tracker?
HMD-based hand and finger sensors are probably even farther behind in terms of robustness, responsiveness, detection field and analysis capabilities.
All these sensors could bring tremendous benefits to the user experience, to the ability of the application to effectively serve the user, or even to the ability of remote users to naturally communicate with each other while wearing HMDs. I think the challenge this this audience is to advance these frontiers: make these sensors work; make them work robustly (e.g. across many users, in many different conditions and not just in the lab) and then make them in such a way that they can be mass-produced inexpensively. Whether these required breakthroughs are in new types of sensing elements, or new computational algorithms, that is up to you to decide, but I can't under-emphasize how important sensors are beyond the basic capabilities of HMDs.
Additional examples of context that is derived from multiple sensors: the user is walking; or jumping; or excited (through biometric data and pupil size); or smiling; or scared. The user is about to run into the sofa. The user is next to Joe. The user is holding a toy gun and is aiming at the window.
Sometimes, there are many ways to express the same thing. Consider a "yes/no" dialog box in Windows. The user might click on "yes" using the mouse, or "tab" over to the "yes" button and hit space, or click alt-y, or say yes, and there are perhaps a few other modes to achieve the same result. Similarly in VR, the user might speak "yes" or might nod her head up and down in a "yes" gesture, or might provide the thumbs up sigh, or might touch a virtual "yes" button in space. Context enables the multi-modal interface that focuses on "what" you are trying to express as opposed to exactly "how" you are doing it.
Context, of course, requires a lot of research. Which sensors are available? How much can their data be trusted? How can we minimize training? How can we reduce false negative or false positives? This is yet another great challenge to this community.