Wednesday, June 22, 2016

The Three (and a half) Configurations of Eye Trackers

Eye tracking could become a critical sensor in HMDs. In previous posts such as here, here and here we discussed some of the ways that eye trackers could be useful as input devices, as ways to reduce rendering load and more. But how are eye trackers installed inside an HMD? An appropriate placement of the eye tracking camera gives a quality image of the eye regardless of the gaze direction. If the eye image is bad, the tracking quality will be bad. It's truly a 'garbage in, garbage out' situation. The three typical ways to install a camera are:
  1. Underneath the optics
  2. Combined with the optics via a hot mirror (or an internal reflection)
  3. Inside the optics.
In this post, we describe these configurations.

Underneath the optics

Eye tracking This configuration is illustrated in the image on the right, which shows the Sensics zSight HMD with an integrated Ergoneers eye tracker. The tracker is the small camera that is visible underneath the left eyepiece. The angle in which the camera is installed is important. A camera that is perpendicular - practically looking into the eye - will typically get an excellent image.  If the camera angle is steep, the anatomy of the eye - eyelids, eyelashes, inset eyes - gets in the way of getting a good image. If the eye relief (distance from cornea to first element of the optics) is small, the camera will need to be placed at a steeper angle than if the eye relief was large. If the diameter of the optics are large, the camera would need to be placed lower and thus at a steeper angle than if the diameter of the optics is smaller. If the user wears glasses, an eye tracker that is placed underneath the optics might "see" the frame of the glasses instead of the eye. Having said that, the advantage of this approach is that it does not place many constraints on the optics. Eye tracker cameras could usually be added below optics that were not designed to accommodate eye tracking.

Eye tracker that is combined with the optics

HotMirrorWithRaysEye tracking cameras are often infra-red cameras that look at IR light that is reflected off the eye. As such, eye tracking cameras don't need visible light. This allows using what is called a hot mirror: a mirror that reflects IR light yet passes visible light. Consider the optical system shown to the right (copyright Sensics). Light from the screen (right side) passes through a lens, a hot mirror and another lens and reaches the eye. In contrast, if the eye is lit by an IR light source, IR light coming back from the eye is reflected off the hot mirror towards the upper part of the optical system. If a camera is placed there, it can have an excellent view of the eye without interfering with the optical quality. This configuration also gives more flexibility with regards to the camera being used. For instance, a larger camera (perhaps with very high frame rate) would not be feasible if placed under the optics. However, when placed separately from the optical system such as above the mirror, it might fit. The downside of this configuration, other than the need to add the hot mirror, is that the optical system needs to leave enough room for the hot mirror and this introduces a mechanical constraint that limits the options of the optical designer. ReflectionWithRaysA variation on this design (what I referred in the title as "the half" configuration is having the IR light reflect off one of the optical surfaces, assuming this surface is coated with an IR-reflective coating.  You can see this in the configuration on the right (also copyright Sensics). An optical element is curved and the IR light reflects off it into the camera. The image received by the camera might be somewhat distorted, but since that image is processed by an algorithm, that algorithm could compensate for the image distortion. This solution removes the need for a hot mirror but does require that there is a lens that is shaped in a way to reflect the IR light into the camera. It also requires the additional expense of an IR coating.

Eye tracker integrated with the optics

dSight with Ergoneers eye trackerThe third configuration is even simpler. A miniature camera is used. A small hole is drilled through the optics and the camera is placed through it. The angle and location of the camera is balanced between getting an optical image of the eye and the need to not introduce a significant visual distraction. This is shown on the right as part of the eye tracking option of the Sensics dSight. This configuration gives excellent flexibility with regards to camera placement, but does introduce some visual distraction and requires careful drilling of a hole through the optics.

Thursday, June 16, 2016

Notes from the Zero Latency Free-Roam VR Gameplay

I spent this past weekend in Australia working with Sensics customer Zero Latency towards their upcoming VR deployment at SEGA's Joypolis park in Tokyo. As part of the visit, I had the chance to go through the Zero Latency "Zombie Outbreak" experience and I thought I would share some notes from it. Zero Latency has been running this experience for quite some time and have had nearly 10,000 paying customers do it. The experience is about 1-hour long including about 10 minutes of pre-game briefing and equipment setup, 45 minutes of play and 5 minutes to take the equipment off and get the space ready for the next group. There are 6 customer slots per hour and everyone plays together in the same space at the same time. To date, Zero Latency has opened this to customers for about 29 hours a week - mostly on weekends - but will now be adding weeknights for a total of 40 game hours per week. A ticket costs 88 Australian dollars (about 75 US dollars) and there is typically a 6-week waiting list to get in. The experience is located in the Zero Latency office, a converted warehouse in the north side of Melbourne Australia. Most of the warehouse is taken up by the rectangular game space, about 15 x 25 meters (50 x 80 feet), or 375 m² (4000 sq ft) to be precise. The rest of the warehouse is used for two floors engineering and administrative offices.  One can peak through the office windows at the customers playing and during the day you can constantly hear the shouts of excitement, squeals of joy and screams of horror coming from the game space.

I had a chance to go through the game twice: once with a group of Zero Latency employees before the space was opened to customers, and once as the 6th man of a 5-person group of paying customers late night. Once customers come in they are greeted by a 'game master' that provides a pre-mission briefing, explains the rules and provides explanation on the gaming gun. The gun can switch between a semi-automatic rifle and a shotgun. It has a trigger, a button to switch modes, a reload button and a pump to load bullets into the shotgun and load grenades when in rifle mode. I found the gun to be comfortable and balanced, and it seems that is has undergone many iterations before arriving in the current form. Players wear a backpack that includes a lightweight Alienware portable computer, a battery and a control box. The HMD and the gun have lighted spheres on them - reminiscent of the PlayStation Move - that are used to track the players and the weapons throughout the space. Players also wear Razer headsets that provide two-way audio so that players can easily communicate with each other as well as hear instructions from the game master.

The game starts with a few minutes of acclimation where players walk across the space to virtual shooting range and spend a couple of minutes getting comfortable with operating their weapons. The game then starts. It is essentially a simple game - players fight their way through the space while shooting zombies and other menacing characters, some of which shoot back at you. Every few minutes, players switch scenes by going through an elevator or teleportation waypoints, circles on the ground where each of the six players has to stand before the next scene can be reached. Sometimes you fight in an urban setting, sometimes on a rooftop, inside a cafeteria and so forth. Zombies can be killed by a direct shot to the head or multiple shots to the body. The players can also be killed, but then return to the game after about 10 seconds of appearing as a 'ghost'.  Game 'power ups' are sometimes found through the space. For instance, during my gameplay I found an AK-47 assault rifle and later a heavy machine gun. At the end of the game, each player is shown their score and ranking, where the score is calculated based on the number of kills and the number of player deaths. That score sheet is emailed to players and is available for later viewing on the Web. The graphics are fine and an attacking zombie is quite compelling when it is right in your face, but the things I truly found compelling in the game are not so much the graphics and gameplay but rather a few other things:
  • Free-roam VR is great. The large space offers fantastic freedom of movement. You can see players move throughout the space, duck to take cover, turn around quickly with no hesitation at all. This generates an excellent feeling of immersion. You can truly feel that you could hide behind corners or walk anywhere with no apparent limitations. Of course, every space has physical limitations and Zero Latency has implemented a system where if you get too close to a player or a wall, something like a radar appears on your screen showing you at the center and the obstacles (players, walls) on it so that you know how to avoid them. If you get too close, the game pauses until you are farther away. This felt very natural. Throughout nearly two hours of active gameplay I think I brushed once or twice against another player but no more than that, even though players were in close proximity. Immersion is such that players don't notice people that are not players around them. In the current Zero Latency office, the bathroom for the office (the "Loo" in "Australian") is right across from the playing space so to get there you can either take a detour walking alongside the walls or go straight through the playing area where the players couldn't care less because they don't even know that you are walking by.
  • The social aspect is very compelling. This game is not about 6 individuals playing separately in a space. It is about 6 players acting as a team within the space. You can definitely hear "you take the right corridor and I'll take the left", or "watch your back" or "I need some help here!" shouts from one player to another. Players that work individually have little chance to stop the zombie invasion coming from all directions, but playing together gives you that chance.
  • Tracking - for both the head and the weapon - are very smooth to the point where you don't think about it. Because multiple players are tracked in the space you can see their avatars around you (sometimes with name tags). The graphics of players walking in the game need some work in my opinion, but you can clearly see where everyone is and what they are doing.
  • 45 minutes of game play go by very quickly and the game masters control the pace very well. As you can imagine, some groups take longer than others to get to the next waypoints, and the game uses waiting for elevators or helicopters as a way to condense or extend the total time. For instance, once you arrive in the cafeteria a sign shows up that the elevator will arrive in 100 seconds. I would imagine that if a group arrived earlier, they would have to wait longer for the elevator or if a group took more time, they would wait less.
The space itself is essentially empty save for the overhead tracking cameras. Thus, the same space the tracking system can be used for many different experiences. Unfortunately, we had to work from time to time and I did not have a chance to try some of the newer experiences that Zero Latency is working on, especially since the space was occupied by customers most of the time. I'm certainly looking forward to coming there again and continue to save the world.

Monday, June 6, 2016

Understanding Pixel Density and Eye-Limiting Resolution

If the human eye was a digital camera, it's "data sheet" would say that it has a of 60 pixels/degree at the fovea (the part of the retina where the visual acuity is highest). This is called eye-limiting resolution.

This means that if there an image with 3600 pixels (60 x 60) and that image fell on a 1° x 1° area of the fovea, a person would not be able to tell it apart from an image with 8100 pixels (90 x 90) that fell on a 1° x 1° area of the fovea.

Note 1: 60 pixels per degree figure is sometimes expressed as "1 arc-minute per pixel". Not surprisingly, an arc-minute is an angular measurement defined as 1/60th of a degree.

Note 2: this kind of calculation is the basis for what Apple refers to as a "retina display", a screen that when held at the right distance would generate this kind of pixel density on the retina.

If you have a VR goggle, you can calculate the pixel density - how many pixels per degree if presents the eye - by dividing the number of pixels in a horizontal display line by the horizontal field of view provided by the eyepiece. For instance, the Oculus DK1 (yes, I know that was quite a while ago) had 1280 x 800 pixels across both eyes, so 640 x 800 pixels per eye, and with a monocular horizontal field of view of about 90 degrees, it had a pixel density of 640 / 90 so just over 7 pixels/degree.

Not to pile on the DK1 (it had many good things, though resolution was not one of them), 7 pixels/degree is the linear pixel density. When you think about it in terms of pixel density per surface area, is it not just 8.5 times worse than the human eye (60 / 7 = 8.5) but actually a lot worse (8.5 * 8.5 which is over 70). The following table compares pixel densities for some popular consumer and professional HMDs:

Product Horizontal pixels per eye Approximate Horizontal Field of View (degrees per eye) Approximate Pixel Density (pixels/degree)
Oculus DK1 640 90 7.1
OSVR HDK 960 90 10.7
HTC VIVE 1080 90 12.0
Sensics dSight 1920 95 20.2
Sensics zSight  1280  48 26.6
Sensics zSight 1920  1920  60  32.0
Human fovea  60.0

Higher pixel density allows you to see finer details - read text; see the grain of the leather on a car's dashboard; spot a target at a greater distance - and in general contribute to an increasingly realistic image.

Historically, one of the things that separated professional-grade HMDs from consumer HMDs was that the professional HMDs had higher pixel density. Let's simulate this using the following four images. Let's assume that the first image, taken from Unreal Engine's Showdown demo, is shown at full 60 pixels/degree density. We can then re-sample it at half the pixel density - simulating 30 pixels/degree - and then half again (resulting in 15 pixels/degree) and half again (7,5 pixels/degree). Notice the stark differences as we go to lower and lower pixel densities.
  ShowdownFull
Full resolution  (simulating 60 pixels/degree)

ShowdownHalf-1
Half resolution  (simulating 30 pixels/degree)
  ShowdownQuarter-1
Simulating 15 pixels/degree

ShowdownEight-1
Simulating 7.5 pixels/degree

Higher pixel density for the visual system is not the same as higher pixel density for the screen because pixels on the screen are magnified through the optics. The same screen could be magnified differently with two different optical systems resulting in different pixel densities presented to the eye. It is true, though, that given the same optical system, higher pixel density of pixels on the screen does translate to higher pixel density presented to the eye. As screens get better and better, we will get increasingly closer to eye-limiting resolution in the HMD and thus to essentially photo-realistic experiences.

Tuesday, May 31, 2016

How binocular overlap impacts horizontal field of view

In a previous post, we discussed binocular overlap which increases overall horizontal (and diagonal) field of view. HMD manufacturers sometimes create partially overlapped systems (e.g. overlap less than 100%) to increase the overall horizontal field of view.

For example, imagine an eyepiece that provides a 90 degree horizontal field of view that subtends from 45° to the left to 45° to the right. If both left and right eyepieces point at the same angle, the overall horizontal field of view of the goggles is also from 45° to the left to 45° to the right, so a total of 90 degrees. When both eyepieces cover the same angles, as in this example, we call this 100% overlap.

But now lets assume that the left eyepiece is rotated a bit to the left so that it subtends from 50° to the left and 40° to the right. The monocular field of view is unchanged at 90°. If the right eye is symmetrically moved, it now covers from 40° to the left to 50° to the right. In this case, the binocular (overall) horizontal field of view is 100°, so a bit larger than in the 100% case, and the overlap is 80° (40° to the left to 40° to the right) or 80/90=88.8%

The following tables provide a useful reference to see how to percent of binocular overlap impacts the horizontal (and thus also the diagonal) field of view. We provide two tables, one for displays with a 16:9 aspect ratio (such as 2560x1440 or 1920x1080) and the other for 9:10 aspect ratio (such as the 1080x1200 display in the HTC VIVE). Click on them to see a larger version.

 For instance, if we look at the 16:9 table we can read through an example of a 90° diagonal field of view, which would translate into 82.1° horizontal and 52.2° vertical if the entire screen was visible. Going down the table we can see that at 100% overlap, the binocular horizontal field of view remains the same, e.g. 82.1° and the diagonal also remains the same. However, if we chose 80% binocular overlap, the binocular horizontal field of view grows to 98.6°, vertical stays the same and diagonal grows to 103.2°
overlap for 16-9 aspect ratio

overlap for 9-10 aspect ratio

For those interested, the exact math is below: overlap equations

Sunday, May 8, 2016

Understanding Predictive Tracking

Image source: Adrian Boeing blog
In the context of AR and VR systems, predictive tracking refers to the process of predicting the future orientation and/or position of an object or body part. For instance, one might want to predict the orientation of the head or the position of the hand.

Why is predictive tracking useful?

One common use of predictive tracking is to reduce the apparent "motion to photon" latency, meaning the time between movement and when that movement is reflected in the drawn scene. Since there is some delay between movement and an updated display (more on the sources of that delay below), using an estimated future orientation and position as the data used in updating the display, could shorten that perceived latency.

While a lot of attention has been focused on predictive tracking in virtual reality applications, it is also very important in augmented reality. For instance, if you are displaying a graphical overlay to appear on top of a physical object that you see with an augmented reality goggles, it is important that the overlay stays on the object even when you rotate your head. The object might be recognized with a camera, but it takes time for the camera to capture the frame, for a processor to determine where the object is in the frame and for a graphics chip to render the new overlay. By using predictive tracking, you can get better apparent registration between the overlay and the physical object.

How does it work? 

If you saw a car travelling at a constant speed and you wanted to predict where that car will be one second in the future, you could probably make a fairly accurate prediction. You know the current position of the car, you might know (or can estimate) the current velocity, and thus you can extrapolate the position into the near future.

Of course if you compare your prediction with where the car actually is in one second, your prediction is unlikely to be 100% accurate every time: the car might change direction or speed during that time. The farther out you are trying to predict, the less accurate your prediction will be: predicting where the car will be in one second is likely much more accurate than predicting where it will be in one minute.

The more you know about the car and its behavior, the better chance you have of making an accurate prediction. For instance, if you were able to measure not only the velocity but also the acceleration, you can make a more accurate prediction.

If you have additional information about the behavior of the tracked body, this can also improve prediction accuracy. For instance, when doing head tracking, understand how fast the head can possibly rotate and what are common rotation speeds, can improve the tracking model. Similarly, if you are doing eye tracking, you can use the eye tracking information to anticipate head movements as discussed in this post

Sources of latency

The desired to perform predictive tracking comes from having some latency between actual movement and displaying an image that reflects that movement. Latency can come from multiple sources, such as:
  • Sensing delays. The sensors (e.g. gyroscope) may be bandwidth-limited and do not instantaneously report orientation or position changes. Similarly, camera-based sensors may exhibit delay between when the pixel on the camera sensor receives light from the tracked object to that frame being ready to be sent to the host processor.
  • Processing delays. Sensors are often combined using some kind of sensor fusion algorithm, and executing this algorithm can add latency.
  • Data smoothing. Sensor data is sometimes noisy and to avoid erroneous jitter, software or hardware-based low pass algorithms are executed.
  • Transmission delays. For example, if orientation sensing is done using a USB-connected device, there is some non-zero time between the data available to be ready by the host processor and the time data transfer over USB is completed.
  • Rendering delays. When rendering a non-trivial scene, it takes some time to have the image ready to be sent to the display device.
  • Frame rate delays. If a display is operating at 100 Hz, for instance, there is a 10 mSec time between successive frames. Information that is not precisely current to when a particular pixel is drawn may need to wait until the next time that pixel is drawn on the display.
Some of these delays are very small, but unfortunately all of them add up and predictive tracking, along with other techniques such as time warping, are helpful in reducing the apparent latency.

How much to track into the future?

In two words: it depends. You will want to estimate the end-the-end latency of your system as a starting point and then optimize them to your liking.

It may be that you will need to predict several timepoints into the future at any given time. Here are some examples why this may be required:
  • There are objects with different end-to-end delays. For instance, a hand tracked with a camera may be have different latency than a head tracker, but both need to be drawn in sync in the same scene, so predictive tracking with different 'look ahead' times will be used.
  • In configurations where a single screen - such as a cell phone screen - is used to provide imagery to both eyes, it is often the case that the image for one eye appears with a delay of half a frame (e.g. half of 1/60 seconds, or approx 8 mSec) relative to the other eye. In this case, it is best to use predictive tracking that looks ahead 8 mSec more for that delayed half of the screen.

Common prediction algorithms

Here is some sampling of predictive tracking algorithms:
  • Dead reckoning. This is a very simple algorithm: if the position and velocity (or angular position and angular velocity) is known at a given time, the predicted position assumes that the last know position and velocity are correct and the velocity remains the same. For instance, if the last known position is 100 units and the last known velocity is 10 units/sec, then the predicted position 10 mSec (0.01 seconds) into the future is 100 + 10 x 0.01 = 100.1. While this is very simple to compute, it assumes that the last position and velocity are accurate (e.g. not subject to any measurement noise) and that the velocity is constant. Both these assumptions are often incorrect.
  • Kalman predictor. This is based on a popular Kalman filter that is used to reduce sensor noise in systems where there exists a mathematical model of the system's operation. See here for more detailed explanation of the Kalman filter.
  • Alpha-beta-gamma. The ABG predictor is closely related to the Kalman predictor, but is less general and has simpler math, which we can explain here at a high level. ABG tries to continuously estimate both velocity and acceleration and use them in prediction. Because the estimates take into account actual data, they provide some measurement noise reduction. Configuring the parameters (alpha, beta and gamma) provide the ability to emphasize responsiveness as opposed to noise reduction. If you'd like to follow the math, here it goes:

Summary

Predictive tracking is a useful and commonly-used technique for reducing apparent latency. It offers simple or sophisticated implementations, requires some thought and analysis, but it is well worth it.

Saturday, April 30, 2016

VR and AR in 12 variations

I've been thinking about how to classify VR and AR headsets and am starting to look at them along three dimensions (no pun intended):
  1. VR vs AR
  2. PC-powered vs. Phone-powered vs. Self-powered. This looks at where the processing and video generation is coming from. Is it connected to a PC? Is it using a standard phone? Or does it embed processing inside the headset 
  3. Wide field of view vs. Narrow FOV
This generates a total of 2 x 3 x 2 = 12 options as follows


Configuration

Example and typical use

1: VR, PC-powered, Wide-field
Examples: Oculus, HTC Vive, Sensics dSight, OSVR HDK. This immersive VR configuration is used in many applications, though the most popular one is gaming. One attribute that separated consumer-grade goggles like the HTC Vive from professional-grade goggles such as the Sensics dSight is pixel density: the number of pixels per degree. You can think about this as the diffence between watching a movie on a 50 inch standard-definition TV as opposed to a 50 inch HDTV.

2: VR, PC-powered, Narrow-field
Example: Sensics zSight 1920. With a given number of pixels per eye, narrow-field systems allow for much higher pixel density, which allows observing fine details or very small objects. For instance, imagine that you are training to land a UAV. The first step in landing a UAV is spotting it in the sky. The higher the pixel density is, the farther out you can spot an object of a given size. The zSight 1920 has about 32 pixels/degree whereas a modern consumer goggle like the HTC Vive has less than half that.

3: VR, Phone-powered, Wide-field
Examples: Samsung Gear VR, Google Cardboard, Zeiss VROne.. This configuration where the phone is inserted into some kind of holster is used for general-purpose mobile VR. The advantages of this configuration is its portability as well as its low cost - assuming you already own a compatible phone. The downside of this configuration is that the processing power of a phone is inferior to a high-end PC and thus the experience is more limited in terms of frame rate and scene complexity. Today's phones were not fully designed with VR in mind, so there are sometimes concerns about overheating and battery life.

4: VR, Phone-powered, Narrow-field
Example: LG 369 VR. In this configuration, the phone is not carried on the head but rather connected via a thin wire to a smaller unit on the head. The advantage of this configuration is that it can be very lightweight and compact. Also, the phone could potentially be used as an input pad. The downside is that the phone is connected via a cable. Another downside is often the cost. Because this configuration does not use the phone screens, it needs to include its own screens that might add to the cost. Another advantage is that the phone camera can not be used for video see-through or for sensing.

5: VR, Self-powered, Wide-field
Examples: Gameface Labs, Pico Neo. These configurations aim for standalone, mobile VR without requiring the mobile phone. They potentially save weight by not using unnecessary phone components such as the casing and touch screen, but would typically be more expensive than phone based VR for those users that already own the phone. They might have additional flexibility with regards to which sensors to include, camera placement and battery location. They are more difficult to upgrade relative to a phone-based VR solution, but the fact that the phone cannot be taken out might be an advantage for applications such as public VR where a fully-integrated system that cannot be easily taken apart is a plus.

6: VR, Self-powered, Narrow-field
Example: Sensics SmartGoggles.  These configurations are less popular today. Even the Sensics SmartGoggles which included on-board Android processor as well as wide-area hand sensors was built with relatively narrow field of view (60 degrees) because of the components available at the time.

7: AR, PC-powered, Wide-field
Example: Meta 2. In many augmented reality applications, people ask for wide field so that, for instance, a virtual object that appears overlaid on the real world does not disappear when the user looks to the side.  This configuration may end up being transient because in many cases the value of augmented reality is in being able to interact with the real world, and the user's motion when tethered to a PC is more limited. However, one might see uses in applications such as engineering workstation.

8: AR, PC-powered, Narrow-field
I am not aware of good examples of this configuration. It combines the limit of narrow-field AR with the tether to the PC.

9: AR, Phone-powered, Wide-field
This could become one of the standard AR configuration just like phone-powered, wide-field VR is becoming a mainstream configuration. To get there, the processing power and optics/display technology catch up with the requirements.
10: AR, Phone-powered, Narrow-field
Example: Seebright. In this configuration, a phone is worn on the head and its screen becomes the display for the goggles. Semi-transparent optics combine phone-generated imagery with the real world. I believe this is primarily a transient configuration until wide-field models appear.

11: AR, Self-powered, Wide-field
I am unaware of current examples of this configuration though one would assume it could be very attractive because of the mobility on one hand and the ability to interact in a wide field of view.
12: AR, Self-powered, Narrow-field
Examples: Microsoft Hololens, Google glass, Vuzix M300. There are two types of devices here: one is an 'information appliance' like Google Glass, designed to provide contextually-relevant information without taking over the field of view. These configurations are very attractive in industrial settings for applications like field technicians, workers in a warehouse or even customer service representatives needing a mobile, wearable terminal often to connect with a cloud-based database. The second type of device, exemplified by the Hololens seeks to augment the reality by placing virtual objects locked in space. I am sure the Hololens would like to be a wide-field model and it is narrow field at the moment because of the limitations of its current display technology


Looking forward to feedback and comments.

Monday, April 11, 2016

Understanding Foveated Rendering





Foveated rendering is a rendering technique that takes advantage of the fact that that the resolution of the eye is highest in the fovea (the central vision area) and lower in the peripheral areas. As a result, if one can sense the gaze direction (with an eye tracker), GPU computational load can be reduced by rendering an image that has higher resolution at the direction of gaze and lower resolution elsewhere.

The challenge in turning this from theory to reality is to find the optimal function and parameters that maximally reduce GPU computation while maintaining highest quality visual experience. If done well, the user shouldn’t be able to tell that foveated rendering is being used. The main questions to address are:
  1. In what angle around the center of vision should we keep the highest resolution? 
  2. Is there a mid-level resolution that is best to use? 
  3. What is the drop-off in “pixel density” between central and peripheral vision? 
  4. What is the maximum speed that the eye can move? This question is important because even though the eye is normally looking at the center of the image, the eye can potentially rotate so that the fovea is aimed at image areas with lower resolution.
Let's address these questions:

1. In what angle around the center of vision should we keep the highest resolution?

Source: Wikipedia
The macula portion of the retina is responsible for fine detail. It spans the central 18˚ around the gaze point, or 9˚ eccentricity (the angular distance away from the center of gaze). This would be the best place to put the boundary of the inner layer. Fine detail is processed by cones (as opposed to rods), and at eccentricities past 9˚ you see a rapid fall off of cone density, so this makes sense biologically as well. Furthermore, the “central visual field” ends at 30˚ eccentricity, and everything past that is considered periphery. This is a logical spot to put the boundary between the middle and outermost layer for foveated rendering.

2. Is there a mid-level resolution that is best to use?  and 3. What is the drop-off in “pixel density” between central and peripheral vision? 

Some vendors such as Sensomotoric Instruments (SMI) use an inner layer at full native resolution, a middle layer at 60% resolution, and an outer layer at 20% resolution. When selecting the resolution dropoff, it is important to ensure that at the layer boundaries, the resolution is at or above the eye’s acuity at that eccentricity. At 9˚ eccentricity, acuity drops to 20% of the maximum acuity, and at 30˚ acuity drops to 7.4% of the max acuity. Given this, it appears that SMI’s values work, but are generous compared to what the eye can see.




4.    What is the maximum speed that the eye can move?


Source: Indiana University
A saccade is a rapid movement of the eye between fixation points. Saccade speed is determined by the distance between the current gaze and the stimulus. If the stimulus is as far as 50˚ away, then peak saccade velocity can get up to around 900˚/sec. This is important because you want the high resolution layer to be large enough so that the eye can’t move to the lower resolution portion in the time it takes to get the gaze position and render the scene. So if system latency is 20 msec, and assume eye can move at 900˚/sec – eye could move 18˚ in that time, meaning you would want the inner (higheslayer radius to be greater than that – but that is only if the stimulus presented is 50˚ away from current gaze. 

Additional thoughts

Source: Vision and Ocular Motility by Gunter Noorden

Visual acuity decreases on the temporal side (e.g. towards the ear) somewhat more rapidly than on the nasal side. It also decreases more sharply below and, especially, above the fovea, so that lines connecting points of equal visual acuity are elliptic, paralleling the outer margins of the visual field. Following this, it might make sense to render the different layers in ellipses rather than circles. The image shows the lines of equal visual acuity for the visual field of the left eye – so one can see that it extends farther to the left (temporal side) for the left eye, and for the right eye visual field would extend farther to the right.

For additional reading

This paper from Microsoft research is particularly interesting. 
They approach the foveated rendering problem in a more technical way – optimizing to find layer parameters based on a simple but fundamental idea: for a given acuity falloff line, find the eccentricity layer sizes which support at least that much resolution at every eccentricity, while minimizing the total number of pixels across all layers. It explains their methodology though does not give their results for the resolution values and layer sizes.

Note: special thanks to Emma Hafermann for her research on this post

For additional VR tutorials on this blog, click here
Expert interviews and tutorials can also be found on the Sensics Insight page here