Eye tracking could become a standard peripheral in VR/AR headsets. Tracking gaze direction can deliver many benefits. Foveated rendering, for instance, optimizes GPU resources by using eye tracking data. Higher-resolution images at shown at the central vision area and lower-resolution outside it. Understanding gaze direction can lead to more natural interaction. Additionally, People with certain disabilities can use their eyes instead of their hands. Eye tracking can detect concussions in athletes and can even help people see better. Eye tracking can help advertisers understand what interests customers.
Eye tracking is complex. Scientists and vendors have spent many year perfecting algorithms and techniques.
But how does it work? Let's look at a high-level overview.
Most eye tracking systems use a camera pointing at the eye and infrared (IR) light. IR illuminates the eye and a camera sensitive to IR analyzes the reflections. The wavelength of the light is often 850 nanometers. It is just outside the visible spectrum of 390 to 700 nanometers. The eye can't detect the illumination but the camera can.
We see the world when our retinal detects light entering through the pupil. IR light also enters the eye through this pupil. Outside the pupil area, light does not enter the eye. Instead, it reflects back towards the camera. Thus, the camera sees the pupil as a dark area - no reflection - whereas the rest of the eye is brighter. This is "dark pupil eye tracking". If the IR light source is near the optical axis, it can reflect from the back of the eye. In this case, the pupil appears bright. This is "bright pupil eye tracking". It is like the "red eye" effect when using flash photography. Whether we use dark or bright pupil, the key point is that the pupil looks different than the rest of the eye.
The image captured by the camera is then processed to determine the location of the pupil. This allows estimating the direction of gaze from the observed eye. Processing is sometimes done on a PC, phone or other connected processor. Other vendors developed special-purpose chips that offload the processing from the main CPU. If eye tracking cameras observe both eyes, one can combine the gaze readings from both eyes. This allows estimating of the fixation point of the user in real or virtual 3D space.
There are other eye tracking approaches that are less popular. For instance, some have tried to detect movements of the eye muscles. This method provides high-speed data but is less accurate than camera-based tracking.
How often should we calculate the gaze direction? The eyes have several types of movements. Saccadic movements are fast and happen when we need to shift gaze from one area to another. Vergence shifts are small movements the help in depth perception. They aim to get the image of an object to appear on corresponding spots on both retinas. Smooth pursuit is how we move when we track a moving object. To track saccadic movements, one needs to track the eye hundreds of time per second. But, saccadic movements do not provide gaze direction. Thus, they are interesting to research applications but not to mass-market eye tracking. Vergence and smooth pursuit movements are slower. Tens of samples per second are often enough. Since Many VR applications want to have the freshest data, there is a trend to track the eyes at the VR frame rate.
Eye tracking systems need to compensate for movements of the camera relative to the eye. For instance, a head-mounted display can slide and shift relative to the eyes. One popular technique is to use reflections of the light source from the cornea. These reflections are called Purkinje reflections. They change little during eye rotation and can serve as an anchor for the algorithm. Other algorithms try to identify the corners of the eye as an anchor point.
There are other variables that an algorithm needs to compensate for. The eye is not a perfect sphere. Some people have bulging eyes and others have inset eyes. The location of the eye relative to the camera is not constant between users. These and other variables are often addressed during a calibration procedure. Simple calibration presents a cross on the screen at a known location and asks the user to fixate on it. By repeating this for a few locations, the algorithm calibrates the tracker to a user.
Beyond the algorithm, the optical system of the tracker presents extra challenges. It aims to be lightweight. It tries to avoid needs constraints on the optics used to present the actual VR/AR image to the user. It needs to work with a wide range of facial structures. For a discussion on optical configurations for eye tracking, please see here.
Eye trackers used to be expensive. This was not the result of expensive components, but rather of a limited market. When only researchers bought eye trackers, companies charged more to cover their R&D expenses. As eye trackers move into mainstream, eye trackers will become inexpensive.