The VRguy's Blog: image processing

Sunday, June 11, 2017

How does eye tracking work?

Eye tracking could become a standard peripheral in VR/AR headsets. Tracking gaze direction can deliver many benefits. Foveated rendering, for instance, optimizes GPU resources by using eye tracking data. Higher-resolution images at shown at the central vision area and lower-resolution outside it. Understanding gaze direction can lead to more natural interaction. Additionally, People with certain disabilities can use their eyes instead of their hands. Eye tracking can detect concussions in athletes and can even help people see better. Eye tracking can help advertisers understand what interests customers.

Eye tracking is complex. Scientists and vendors have spent many year perfecting algorithms and techniques.

But how does it work? Let's look at a high-level overview.

Most eye tracking systems use a camera pointing at the eye and infrared (IR) light. IR illuminates the eye and a camera sensitive to IR analyzes the reflections. The wavelength of the light is often 850 nanometers. It is just outside the visible spectrum of 390 to 700 nanometers. The eye can't detect the illumination but the camera can.

We see the world when our retinal detects light entering through the pupil. IR light also enters the eye through this pupil. Outside the pupil area, light does not enter the eye. Instead, it reflects back towards the camera. Thus, the camera sees the pupil as a dark area - no reflection - whereas the rest of the eye is brighter. This is "dark pupil eye tracking". If the IR light source is near the optical axis, it can reflect from the back of the eye. In this case, the pupil appears bright. This is "bright pupil eye tracking". It is like the "red eye" effect when using flash photography. Whether we use dark or bright pupil, the key point is that the pupil looks different than the rest of the eye.

The image captured by the camera is then processed to determine the location of the pupil. This allows estimating the direction of gaze from the observed eye. Processing is sometimes done on a PC, phone or other connected processor. Other vendors developed special-purpose chips that offload the processing from the main CPU. If eye tracking cameras observe both eyes, one can combine the gaze readings from both eyes. This allows estimating of the fixation point of the user in real or virtual 3D space.

There are other eye tracking approaches that are less popular. For instance, some have tried to detect movements of the eye muscles. This method provides high-speed data but is less accurate than camera-based tracking.

How often should we calculate the gaze direction? The eyes have several types of movements. Saccadic movements are fast and happen when we need to shift gaze from one area to another. Vergence shifts are small movements the help in depth perception. They aim to get the image of an object to appear on corresponding spots on both retinas. Smooth pursuit is how we move when we track a moving object. To track saccadic movements, one needs to track the eye hundreds of time per second. But, saccadic movements do not provide gaze direction. Thus, they are interesting to research applications but not to mass-market eye tracking. Vergence and smooth pursuit movements are slower. Tens of samples per second are often enough. Since Many VR applications want to have the freshest data, there is a trend to track the eyes at the VR frame rate.

Eye tracking systems need to compensate for movements of the camera relative to the eye. For instance, a head-mounted display can slide and shift relative to the eyes. One popular technique is to use reflections of the light source from the cornea. These reflections are called Purkinje reflections. They change little during eye rotation and can serve as an anchor for the algorithm. Other algorithms try to identify the corners of the eye as an anchor point.

There are other variables that an algorithm needs to compensate for. The eye is not a perfect sphere. Some people have bulging eyes and others have inset eyes. The location of the eye relative to the camera is not constant between users. These and other variables are often addressed during a calibration procedure. Simple calibration presents a cross on the screen at a known location and asks the user to fixate on it. By repeating this for a few locations, the algorithm calibrates the tracker to a user.

Beyond the algorithm, the optical system of the tracker presents extra challenges. It aims to be lightweight. It tries to avoid needs constraints on the optics used to present the actual VR/AR image to the user. It needs to work with a wide range of facial structures. For a discussion on optical configurations for eye tracking, please see here.

Eye trackers used to be expensive. This was not the result of expensive components, but rather of a limited market. When only researchers bought eye trackers, companies charged more to cover their R&D expenses. As eye trackers move into mainstream, eye trackers will become inexpensive.

Sunday, July 19, 2015

Beyond gaming: virtual reality helps people with vision disabilities

Over the past two years, Sensics has been working with our customer Visionize and a group of researchers from the Wilmer Eye Institute at Johns Hopkins University on applying the group's combined expertise towards creating a solution to help people with vision disabilities. The Los Angeles Times published a story today about one of the models in this line. It's a good opportunity to describe this Low Vision project in greater detail and shine the light on non-gaming application of consumer VR.

Prototype of Visionize low-vision system is used to magnify the image of the boy above the monitor. The monitor - used for illustration purposes only - shows a screencast of what the user of the system sees in each eye

Low vision is a common problem. It is estimated that there are about 2.5 million people in the United States - so over 0.75% of the population - that suffer from low vision (defined as best corrected visual acuity less than 20/60 in the better seeing eye). While low vision is typically associated with aging, there are also a large number of kids who are born with vision disabilities or develop them in their early years. Additionally, hundreds of thousands additional patients enter the low-vision population every year.

The impact of low vision ranges from difficulty in reading to difficulties in recognizing people, places and objects. Disease progression can often be controlled, but the existing damage is permanent. Macular Degeneration, a disease that destroys the area of central vision (the fovea), is the most common low vision pathology. Because the resolution at the fovea is higher than the rest of the eye, the overall visual acuity is reduced.

Optical or digital magnifiers are popular with the low vision population and can be effective for static activities such as watching TV or reading. However, they are more challenging for use in dynamic activities such as walking:

A magnifier hides part of the text

They might be too large or cumbersome to hold
If they magnify the entire visual field, the user loses peripheral vision in a significant way. For instance, if the person using a 5x magnifier has a total field of vision of 100 degrees and if the magnifier covered their entire visual field, then only 20 degrees of the real world would be mapped into this 100 degrees, preventing effective movement.
If just part of the image is magnified, there is part of the scene that is completely hidden underneath the magnification, as is illustrated in the diagram on the right

To address this problem, we developed a non-linear magnification algorithm that magnifies the image at the point of interest but creates a continuous image so that nothing is lost at the edges. In the model covered in the LA Times, a Samsung Gear VR system is used. The on-board camera provides a live view of the environment and the customized algorithms perform real-time 60 FPS enhancement to present a smartly-magnified image to the user. Parameters such as size and amount of magnification of the "bubble" can be easily controlled. In some cases, these depend on the viewing conditions and in others they can be customized to the particular vision disability of the user.

More advanced models use different types of HMDs and have been tested with multiple cameras and other powreful additions. More about this as well as additional vision enhancements will perhaps be covered at a future opportunity.

An illustration of the magnified bubble can be seen in the diagram below:

and a video illustrating operating an ATM machine with the system (as well as other examples) can be seen in the Visionize site:

These days, gaming gets the majority of the press attention for virtual reality, but other applications exist. For us, the ability to work on a product that truly improves the quality of life for people with vision disabilities, is truly heart-warming.