Wednesday, October 19, 2016

Peeking inside the Sensics Goggles for Public VR

Earlier this week, we made the Sensics Goggles for Public VR available for purchase on the OSVR Store. This is a limited pre-production run as we gear up for production of larger quantities.
We designed this product to address the needs of those that operate VR in public places such as theme parks, entertainment venues and shopping malls.
Goggles for public VR have different requirements than goggles for home use just like an exercise treadmill at a gym or health club needs to be different than a treadmill at home. Specifically, goggles for public VR need to be:
  • Durable, so that they withstand use by a large number of people. Unlike users of a VR goggle at home, users of a VR goggle at a public place might care less about handling it carefully.
  • Easy to clean, so that every user can get a clean, fresh feeling when wearing the goggles, regardless of who wore it before them.
  • Easy to maintain, in case something breaks.
  • Designed to allow maximum throughput of guests so as to maximize the number of people that can experience the attraction.
At the same time, the visual experience needs to be at least as good as goggles for home use because guests typically expect an experience beyond what they can get at home.
To achieve these goals, we used mass-produced 2160x1200 90 Hz OLED screens, high-quality dual-element optics with individual focusing mechanism, accurate 9-axis orientation tracker and incorporated them into a novel, patent-pending design. Below are some of the highlights of this design. To illustrate them, we mostly use the CAD drawings because they make it easy to show internal parts.
Here is CAD model of the entire unit (each part is colored differently in this model to make them stand out) next to the actual unit:
The back side of the unit has a cable clip to allow easy insertion and removal of cables as required. This ensures that cables don’t get in the way of the user.
The front of the unit includes a window that is transparent to IR. This allows inclusion of a Leap Motion camera inside the unit to facilitate natural interaction with the hands. The fact that the controller is embedded inside the goggle eliminates the need to route cables externally. This approach is superior to external mounting of the controller because when mounted externally, the controller might be easier to detach from the goggles. Note that in the CAD model, the IR window has been removed so that the Leap Motion unit is clearly visible.
To fit a wide range of users, the goggles were designed with adjustable optics. These allow people that normally wear eyeglasses to take them off and still see an excellent picture. Individual knobs — highlighted by the arrows in the CAD drawing from a bottom view perspective
- allow focusing of each eye independently. It is also possible to design optics that have a large enough eye relief to accommodate glasses but we chose adjustments in this particular design.
The face mask — the part that touches the user’s face — is easily removable and replaceable. It is designed with a groove (not shown in the picture) that allows an operator to quickly and accurately replace the mask when needed without requiring any special tools.
VR experiences can be very intense. For instance, guests to SEGA Joypolis run around in a special warehouse and shoot zombies. It is important to keep these guests cool and dry. That’s why we the public VR goggles include dual silent fans that whisk away humidity and heat.
The diagram has arrows pointing to an air vent (one in each side) and the holes through which it exits the goggles. An important feature in the Sensics design is the ability to separate the “passive part” of the goggles (facemask and head strap) from the “active part” (electronics, optics, etc.). This feature provides several important benefits:
  1. It allows guests to don the passive part while waiting in line. They can adjust the fit to their heads, and make sure the strap is comfortable. While doing so, the front part of the passive unit is completely open so guests can still see the real world, take a selfie with the strap. Only when the activity is about to begin does the operator attach the active part to the passive part.
  2. It permits various cleaning strategies for the passive part — the part that touches the head. For instance, an attraction operator can have many more passive parts than active parts and then clean the passive parts in batch at the end of the day.
  3. Separating the face mask from the active part of the goggles allows for multiple sizes of the face mask to fit kids, different facial structures and so forth.
The two parts of the goggles — active and passive — are illustrated in the photos below by Sensics team member Yaron Kaufman.
Detaching the passive part from the active part is done by pressing two button — one in each side of the goggles. The button is shown as yellow highlighted by the arrow in the diagram. The clasp holding the two parts together is made of metal, and thus designed for numerous grab/release cycles.

Two additional parts are highlighted in the diagram: configurable buttons on the top right side of the goggles serve as programmable user-interface controls. This could be to increase/decrease volume, to pause the game, select a menu item or any other function. The mechanical design allows for one, two or three buttons per the preferences of the customer.
Last, an audio output jack appear on the bottom. The goggles can also support a permanent audio solution which attaches where the large yellow ellipse is shown in the diagram to the right. We put a lot of thought into designing this product. We hope you will get a chance to try it and appreciate its suitability to public VR applications.

Tuesday, August 9, 2016

Why did Sensics launch the OSVR Store?

Last week, the OSVR Store came on-line. It offers a range of OSVR-related products, services, accessories and components. It also contains useful information, most of it adopted from this blog.

But why did the Sensics team launch it?

The first answer that comes to mind is “to make money”. That’s an obvious reason, as Sensics is a for-profit company. We invest a lot in developing OSVR and would love to see returns on our investments.

But that’s not the only reason, nor perhaps the most important one. Here are some others.

We wanted the OSVR Store to be helpful to the VR enthusiast and hacker. That’s why we offer components: optics, tracking boards from various vendors, IR camera. More components are coming. Some will use those to upgrade an existing system, others to build a new one.

We wanted a place for hardware developers, a platform to market their innovations. If you make something OSVR-related, we invite you to sell it on the OSVR Store. It can be an OSVR-supported HMD. It can be an accessory or component that can help OSVR users. It can even be OSVR-related services. We strive to offer fair and simple terms. If you can build it, we can help you promote it. Drop us a note at to get started.

To me, OSVR has always been about choice. About democratizing VR. Not forcing users to buy everything from the same vendor. Encouraging applications to run on many devices. Support more than one operating system.

The OSVR Store is one more way to give everyone choice. Check it out.

Monday, August 1, 2016

OSVR - a Look Ahead


OSVR is an open source software platform and VR goggle. Sensics and Razer launched OSVR 18 months ago with the intent of democratizing VR. We wanted to provide an open alternative to walled-garden, single-device approaches.

It turns out that others share this vision. We saw exponential growth in participation in OSVR. Acer, NVIDIA, Valve, Ubisoft, Leap Motion and many others joined the ecosystem. The OSVR goggle – called the Hacker Development Kit – has seen several major hardware improvements. The founding team and many other contributors expanded the functionality of the OSVR software.

I’d like to describe how I hope to see OSVR develop given past and present industry trends.

Increased Device Diversity leads to more Choices for Customers


An avalanche of new virtual reality devices arrived. We see goggles, motion trackers, haptics, eye trackers, motion chairs and body suits. There is no slowdown in sight: many new devices will launch in the coming months. What is common to all these devices? They need software: game engine plugins, compatible content and software utilities. For device manufacturers, this software is not a core competency but ‘a necessary evil’. Without software, these new devices are almost useless.

At the same time, content providers realize it’s best not to limit the their content to one device. The VR market is too small for that. The more devices you support, the largest your addressable market becomes.

With such rapid innovation, what was the best VR system six months ago is anything but that today. The dream VR system might be a goggle from one vendor, input devices from another and tracking from a third. Wait another six months and you’ll want something else. Does everything need to come from the same vendor? Maybe not. The lessons of home electronics apply to VR: you don’t need a single vendor to make all your devices.

This ‘mix and match’ ability is even more critical for enterprise customers. VR arcades, for instance, might use custom hardware or professional tracking systems. They want a software environment that is flexible and extensible. They want an environment that supports ‘off-the-shelf’ products yet extends for ‘custom’ designs.

OSVR Implications

OSVR already supports hundreds devices. The up-to-date list is here: . Every month, device vendors, VR enthusiasts and the core OSVR team add new devices. Most OSVR plugins (extension modules) are open-sourced. Thus, it is often possible to use an existing plugin as baseline for a new one. With every new device, we come closer towards achieving universal device support.

A key OSVR goal is to create abstract device interfaces. This allows applications to work without regards to the particular device or technology choice. For example, head tracking can come from optical trackers or inertial ones. The option of a a “mix and match” approach overcomes the risk of a single vendor lock-in. You don’t change your word processor when you buy a new printer. Likewise, you shouldn’t have to change your applications when you get a new VR device.

We try to make it easy to add OSVR support to any device. We worked with several goggle manufactures to create plugins for their products. Others did this work themselves. Once such a plugin is ready, customers instantly gains access to all OSVR content. Many game engines – such as Unity, Unreal and SteamVR- immediately support it.

The same is also true for input and output peripherals such as eye trackers and haptic devices. If developers use an API from one peripheral vendor, they need to learn a new API for each new device. If developers use the OSVR API, they don’t need to bother with vendor-specific interfaces.

I would love to see more enhancements to the abstract OSVR interfaces. They should reflect new capabilities, support new devices and integrate smart plugins.

More People Exposed to more VR Applications in More Places


Just a few years ago, the biggest VR-centric conference of the year had 500 attendees. Most attendees had advanced computer science degrees. My company was one of about 10 presenting vendors. Today, you can experience a VR demo at a Best Buy. You can use a VR device on a roller coaster. With a $10 investment, you can turn your phone into a simple VR device.

In the past, to set up a VR system you had to be a geek with plenty of time. Now, ordinary people expect to do it with ease.

More than ever, businesses are experimenting with adopting VR. Applications that have always been the subject of dreams of are becoming practical. We see entertainment, therapy, home improvement, tourism, meditation, design and many other applications.

These businesses are discovering that different applications have different hardware and software requirements. A treadmill at home is not going to survive the intensive use at a gym. Likewise, a VR device designed for home use is not suitable for use in a high-traffic shopping mall. The computing and packaging requirements for these applications are different from use to use. Some accept a high-end gaming PC, while others prefer inexpensive Android machines. I expect to see the full gamut of hardware platforms and a wide variety of cost and packaging options.

OSVR Implications

“Any customer can have a car painted any color that he wants so long as it is black”, said Henry Ford. I’d like to see a different approach, one that encourages variety and customization.

On the hardware side, Sensics is designing many products that use OSVR components. For instance, our “Goggles for public VR” use OSVR parts in an amusement park goggle. We also help other companies use OSVR components inside their own packages. For those that want to design their own hardware, the OSVR goggle is a good reference design.

On the software side, I would like to see OSVR expand to support more platforms. I’d like to see better Mac support and more complete coverage of Android and Linux platforms. I’d like to see VR work well on mid-range PCs and not limited to the newest graphics cards. This will lower the barriers to experience good VR and bring more people into the fold. I’d like to see device-specific optimizations to make the most of available capabilities. The OpenCV image processing library has optimizations for many processors. OSVR could follow a similar path.

Additionally, it is important to automate or at least simplify the end-user experience. Make it as close to plug-and-play as possible . The task of identifying available devices and configuring them should be quick and simple.

Simplicity is not limited to configuration. We’d like to see easier ways to choose, buy and deploy software.

Reducing Latency is Becoming Complex


Presence in VR requires low latency, and reducing latency is not easy. Low latency is also not the result of one single technique. Instead, many methods work together to achieve the desired result. Asynchronous time warp modifies the image just before sending it to the display. Predictive tracking lowers perceived latency by estimating future orientation. Direct mode bypasses the operating system. Foveated rendering reduces render complexity by understanding eye position. Render masking removes pixels from hidden areas in the image.

If this sounds complex, it is just the beginning. One needs to measure optical distortion and correct it in real-time. Frame rates continue to increase, thus lowering the available time to render a frame. Engines can optimize rendering by using similarities between the left- and right-eye images. Techniques that used to be exotic are now becoming mainstream.

A handful of companies have the money and people to master all these techniques. Most other organizations prefer to focus on their core competencies. What should they do?

OSVR implications

A key goal of OSVR is to “make hard things easy without making easy things hard”. The OSVR Render Manager examplifies this. OSVR makes these latency-reduction methods available to everyone. We work with graphics vendors to achieve direct mode through their API. We work with game engines to provide native integration of OSVR into their code.

I expect the OSVR community to continue to keep track of the state of the art, and improve the code-base. Developers using OSVR can focus away from the plumbing of rendering. OSVR will continue to allow developers to focus on great experiences.

The Peripherals are Coming


A PC is useful with a mouse and keyboard. Likewise, A goggle is useful with a head tracker. A PC is better when adding a printer, a high-quality microphone and a scanner. A goggle is better with an eye tracker, a hand controller and a haptic device. VR peripherals increase immersion and bring more senses into play.

In a PC environment, there are many ways to achieve the same task. You select an option using the mouse, the keyboard, by touching the screen, or even with your voice. In VR, you can do this with a hand gesture, with a head nod or by pressing a button. Applications want to focus on what you want to do rather than how you express your wishes.

More peripherals mean more configurations. If you are in a car racing experience, you’d love to use a rumble chair if you have it. Even though Rumble chairs are not commonplace, there are several types of them. Applications need to be able to sense what peripherals are available and make use of them.

Even a fundamental capability like tracking will have many variants. Maybe you have a wireless goggle that allows you to roam around. Maybe you sit in front of a desk with limited space. Maybe you have room to reach forward with your hands. Maybe you are on a train and can’t do so. Applications can’t assume just one configuration.

OSVR implications

OSVR embeds Virtual Reality Peripheral Network (VRPN), an established open-source library. Supporting many devices and focusing on the what, not the how is in our DNA.

I expect OSVR to continue to improve its support for new devices. We might need to enhance the generic eye tracker interface as eye trackers become more common. We will need to look for common characteristics of haptics devices. We might even be able to standardize how vendors specify optical distortion.

This is a community effort, not handed down from some elder council in an imperial palace. I would love to see working groups formed to address areas of common interest.

Turning Data into Information


A stream of XYZ hand coordinates is useful. Knowing that this stream represents a ‘figure 8’ is more useful. Smart software can turn data into higher-level information. Augmented reality tools detect objects in video feeds. Eye tracking software converts eye images into gaze direction. Hand tracking software converts hand position into gestures.

Analyzing real-time data gets us closer to understanding emotion and intent. In turn, applications that make use if this information can become more compelling. A game can use gaze direction to improve the quality of interaction with a virtual character. Monitoring body vitals can help achieve the desire level of relaxation or excitement.

As users experience this enhanced interaction, they will demand more of it.

OSVR Implications

Desktop applications don’t have code to detect a mouse double-click. They rely on the operating system to convert mouse data into the double-click event. OSVR needs to provide applications with both low-level data and high-level information.

In “OSVR speak”, an analysis plugin is the software that converts data into information. While early OSVR work focused on lower-level tasks, several analysis plugins are already available. For example, DAQRI integrated a plugin that detects objects in a video stream.

I expect many more plugins will become available. The open OSVR architecture opens plugin development to everyone. If you are an eye tracking expert, you can add an eye tracking plugin. If you have code that detects gestures, it is easy to connect it to OSVR. One might also expect a plugin marketplace, like an asset store, to help find and deploy plugins.

Augmenting Reality

Market trends

Most existing consumer-level devices are virtual reality devices. Google Glass has not been as successful as hoped. Magic Leap is not commercial yet. Microsoft Hololens kits are shipping to developers, but are not priced for consumers yet.

With time, augmented-reality headsets will become consumer products. AR products share many of the needs of their VR cousins. They need abstract interfaces. They need to turn data into information. They need high-performance rendering and flexible sensing.

OSVR Implications

The OSVR architecture supports AR just as it supports VR. Because AR and VR have so much in common, many components are already in place.

AR devices are less likely to tether to a Windows PC. The multi-platform and multi-OS capabilities of OSVR will be an advantage. Wherever possible, I hope to continue and see a consistent cross-platform API for OSVR. This will allow developers to tailor deployment options to the customer needs.


We designed OSVR to provide universal connectivity between engines and devices. OSVR makes hard things easy so developers can focus on fantastic experiences, not plumbing. It is open so that the rate of innovation is not constrained by a single company. I expect it to be invaluable for many years to come. Please join the OSVR team and myself for this exciting journey.

To learn more about our work in OSVR, please visit this page

This post was written by Yuval Boger, CEO of Sensics and co-founder of OSVR. Yuval and his team designed the OSVR software platform and built key parts of the OSVR offering.

Thursday, July 21, 2016

Key Parameters for Optical Designs

At Sensics, we completed many optical designs for VR over the years, and are busy these days with new ones to accommodate new displays and new sets of requirements. For those thinking about optics, here is a collection of some important parameters to consider, when focusing in optical systems for VR.

Field of View: typically measured in degrees, the field of view defines what is the horizontal, vertical and diagonal extent that can be viewed at any given point. This is often specified as a monocular (single eye) field of view, but it is also customary to specify the binocular field of view and thus the binocular overlap

Eye relief: typically measured in millimeters, the eye relief indicates the distance between the eye and the closest optical element as seen in the illustration below. 

Eye Relief
Illustration of eye relief

Regular eyeglasses have an eye relief of about 12mm. Advantages of larger eye relief:
  • If the optics are too close to the eye, they generate discomfort such as when the eyelashes touch the optics.
  • If the eye relief is large enough, the system might be able to accommodate people wearing glasses without the need to provide a focusing mechanism to compensate for not having glasses
Disadvantages of larger eye relief:
  • The total depth of the optical system (distance from eye to screen) becomes larger and the overall system potentially more cumbersome.
  • The minimal diameter first optical element is dictated by a combination of the desired field of view and eye relief. Larger eye relief requires the lens to be wider and thus likely heavier.
Eye box: often specified in millimeters, the eye box determines how much the eye can move up/down/left/right from the optimal position without significant degradation in the image quality. Some optical systems such as rifle scopes have very narrow eye box because they want to 'force' the eye to be in the optimal position. Other optical systems, such as HMDs used in soldier training, might desire larger eye boxes to allow the trainee to see a good image even as the HMD moves on the head while the trainee is running. The image quality at the optimal position is most always best, but if the eye box is too narrow, the user will not obtain a good image without tedious adjustments. For instance, the diagram below shows the simulation results of an optical design at the nominal eye position (left) and at 4 mm away from the optimal position:
Eye box simulation
Comparing optical quality at a distance away from the optimal eye position

Material and type of lensa lens is typically made from optical-grade plastic or from glass. There are hundreds of different optical-grade glass types but only about a dozen optical-grade plastic material. Different material provide different light bending properties (e.g. index of refraction) so it is quite common that multi-element optical systems are made with more than one material. Glass is typically heavier, more expensive to mold, but has greater variety, provides better surface quality and is often physically harder (e.g. more resistant to scratches). Plastic is cheaper and lighter. Additional lens types and non-linear optical elements such as Fresnel Lenses and polarizers are also available.

Distortion: optical distortion is one type of imperfection in an optical design. Distortion causes straight lines not being seen as straight lines when viewed through the optics. An example of this is shown below.

Optical distortion
Optical distortion
Distortion is reported in percentage units. If a pixel is placed at a distance of 100 pixels (or mm or degrees or inches or whichever unit you prefer) and appears as if it at a distance of 110, the distortion at that particular point is (110-100)/100 = 10%. During the process of optical design, distortion graphs are commonly viewed during the iterations of the design. For instance, consider the distortion graph below for a design with 96 degrees field of view (2 x 48): 

Sample Distortion Curve
Distortion graph

The graph shows, for instance, that at 30 degrees away from the center, distortion is still about 2-3%, but at 40 degrees away from the center it increases to about 8 percent. The effect of distortion is sometimes shown in a distortion grid shown below. If the optical design was perfect and had no distortion, each blue cross would line up perfectly at the grid intersection points. 

Distortion Grid
Distortion Grid
Sometimes, distortion is monotonic, meaning that it gradually increases as one moves towards the edge. Non-monotonic distortion can cause the appearance of a 'bubble' if not corrected. 

Chromatic aberration: Just like white light breaks into various colors when passing through a prism, an optical system might behave differently for different wavelengths/colors. This could cause color breakup. It is useful to explore how much the system is 'color corrected' so as to minimize this color breakup. The image below shows a nice picture at the center of the optical system but fairly significant color breakup at the edges. 

Color breakup
Color breakup
Relative illuminationthe ability of an optical system to collect light can change throughout the image. Consider a uniformly-lit surface that is viewed through an optical system. Often, the perceived brightness at the center of the optics is the highest and it drops is one moves towards the edges. This is numerically expressed as relative illumination such as the graph below. While the human eye has amazing dynamic range, non-monotonic illumination can cause the appearance of dark or bright 'rings' in the image. 

Relative Illumination
Relative Illumination

Spot size: imagine a screen with a pattern of tiny dots. In a perfect world, all dots would appear with the same size and no smear when looking through the optical system. In reality, the dot size typically increases as one moves away from the center. The numerical measurement of this is the spot size and diagrams indicating the spot size at different points through the optics often look something like this:

Spot size
Spot Size
Other characteristics: depending on the desired use case, there are often size, weight and cost limitations that need to be considered to narrow the range of acceptable solutions to the specifications. Just like it is easier to fit a higher-degree polynomial to a set of data points because more terms provide additional degrees of freedom, it is easier to achieve the a set of desired optical parameters with additional lenses (or more precisely with additional surfaces), but extra lenses often add cost, size and weight. 

Putting it all together: it is practically impossible to find a car that is inexpensive, has amazing fuel efficiency, offers fantastic acceleration, seats 7 people and is very pleasing to the eye. Similarly, it is difficult to design an optical system that has no distortion, provides wide field of view, large eye box, costs next to nothing and is very thin. When contracting the design of an optical system, it is useful to define all desired characteristics but specify which parameters are key and which parameters are less important.

Wednesday, June 22, 2016

The Three (and a half) Configurations of Eye Trackers

Eye tracking could become a critical sensor in HMDs. In previous posts such as here, here and here we discussed some of the ways that eye trackers could be useful as input devices, as ways to reduce rendering load and more. But how are eye trackers installed inside an HMD? An appropriate placement of the eye tracking camera gives a quality image of the eye regardless of the gaze direction. If the eye image is bad, the tracking quality will be bad. It's truly a 'garbage in, garbage out' situation. The three typical ways to install a camera are:
  1. Underneath the optics
  2. Combined with the optics via a hot mirror (or an internal reflection)
  3. Inside the optics.
In this post, we describe these configurations.

Underneath the optics

Eye tracking This configuration is illustrated in the image on the right, which shows the Sensics zSight HMD with an integrated Ergoneers eye tracker. The tracker is the small camera that is visible underneath the left eyepiece. The angle in which the camera is installed is important. A camera that is perpendicular - practically looking into the eye - will typically get an excellent image.  If the camera angle is steep, the anatomy of the eye - eyelids, eyelashes, inset eyes - gets in the way of getting a good image. If the eye relief (distance from cornea to first element of the optics) is small, the camera will need to be placed at a steeper angle than if the eye relief was large. If the diameter of the optics are large, the camera would need to be placed lower and thus at a steeper angle than if the diameter of the optics is smaller. If the user wears glasses, an eye tracker that is placed underneath the optics might "see" the frame of the glasses instead of the eye. Having said that, the advantage of this approach is that it does not place many constraints on the optics. Eye tracker cameras could usually be added below optics that were not designed to accommodate eye tracking.

Eye tracker that is combined with the optics

HotMirrorWithRaysEye tracking cameras are often infra-red cameras that look at IR light that is reflected off the eye. As such, eye tracking cameras don't need visible light. This allows using what is called a hot mirror: a mirror that reflects IR light yet passes visible light. Consider the optical system shown to the right (copyright Sensics). Light from the screen (right side) passes through a lens, a hot mirror and another lens and reaches the eye. In contrast, if the eye is lit by an IR light source, IR light coming back from the eye is reflected off the hot mirror towards the upper part of the optical system. If a camera is placed there, it can have an excellent view of the eye without interfering with the optical quality. This configuration also gives more flexibility with regards to the camera being used. For instance, a larger camera (perhaps with very high frame rate) would not be feasible if placed under the optics. However, when placed separately from the optical system such as above the mirror, it might fit. The downside of this configuration, other than the need to add the hot mirror, is that the optical system needs to leave enough room for the hot mirror and this introduces a mechanical constraint that limits the options of the optical designer. ReflectionWithRaysA variation on this design (what I referred in the title as "the half" configuration is having the IR light reflect off one of the optical surfaces, assuming this surface is coated with an IR-reflective coating.  You can see this in the configuration on the right (also copyright Sensics). An optical element is curved and the IR light reflects off it into the camera. The image received by the camera might be somewhat distorted, but since that image is processed by an algorithm, that algorithm could compensate for the image distortion. This solution removes the need for a hot mirror but does require that there is a lens that is shaped in a way to reflect the IR light into the camera. It also requires the additional expense of an IR coating.

Eye tracker integrated with the optics

dSight with Ergoneers eye trackerThe third configuration is even simpler. A miniature camera is used. A small hole is drilled through the optics and the camera is placed through it. The angle and location of the camera is balanced between getting an optical image of the eye and the need to not introduce a significant visual distraction. This is shown on the right as part of the eye tracking option of the Sensics dSight. This configuration gives excellent flexibility with regards to camera placement, but does introduce some visual distraction and requires careful drilling of a hole through the optics.

Thursday, June 16, 2016

Notes from the Zero Latency Free-Roam VR Gameplay

I spent this past weekend in Australia working with Sensics customer Zero Latency towards their upcoming VR deployment at SEGA's Joypolis park in Tokyo. As part of the visit, I had the chance to go through the Zero Latency "Zombie Outbreak" experience and I thought I would share some notes from it. Zero Latency has been running this experience for quite some time and have had nearly 10,000 paying customers do it. The experience is about 1-hour long including about 10 minutes of pre-game briefing and equipment setup, 45 minutes of play and 5 minutes to take the equipment off and get the space ready for the next group. There are 6 customer slots per hour and everyone plays together in the same space at the same time. To date, Zero Latency has opened this to customers for about 29 hours a week - mostly on weekends - but will now be adding weeknights for a total of 40 game hours per week. A ticket costs 88 Australian dollars (about 75 US dollars) and there is typically a 6-week waiting list to get in. The experience is located in the Zero Latency office, a converted warehouse in the north side of Melbourne Australia. Most of the warehouse is taken up by the rectangular game space, about 15 x 25 meters (50 x 80 feet), or 375 m² (4000 sq ft) to be precise. The rest of the warehouse is used for two floors engineering and administrative offices.  One can peak through the office windows at the customers playing and during the day you can constantly hear the shouts of excitement, squeals of joy and screams of horror coming from the game space.

I had a chance to go through the game twice: once with a group of Zero Latency employees before the space was opened to customers, and once as the 6th man of a 5-person group of paying customers late night. Once customers come in they are greeted by a 'game master' that provides a pre-mission briefing, explains the rules and provides explanation on the gaming gun. The gun can switch between a semi-automatic rifle and a shotgun. It has a trigger, a button to switch modes, a reload button and a pump to load bullets into the shotgun and load grenades when in rifle mode. I found the gun to be comfortable and balanced, and it seems that is has undergone many iterations before arriving in the current form. Players wear a backpack that includes a lightweight Alienware portable computer, a battery and a control box. The HMD and the gun have lighted spheres on them - reminiscent of the PlayStation Move - that are used to track the players and the weapons throughout the space. Players also wear Razer headsets that provide two-way audio so that players can easily communicate with each other as well as hear instructions from the game master.

The game starts with a few minutes of acclimation where players walk across the space to virtual shooting range and spend a couple of minutes getting comfortable with operating their weapons. The game then starts. It is essentially a simple game - players fight their way through the space while shooting zombies and other menacing characters, some of which shoot back at you. Every few minutes, players switch scenes by going through an elevator or teleportation waypoints, circles on the ground where each of the six players has to stand before the next scene can be reached. Sometimes you fight in an urban setting, sometimes on a rooftop, inside a cafeteria and so forth. Zombies can be killed by a direct shot to the head or multiple shots to the body. The players can also be killed, but then return to the game after about 10 seconds of appearing as a 'ghost'.  Game 'power ups' are sometimes found through the space. For instance, during my gameplay I found an AK-47 assault rifle and later a heavy machine gun. At the end of the game, each player is shown their score and ranking, where the score is calculated based on the number of kills and the number of player deaths. That score sheet is emailed to players and is available for later viewing on the Web. The graphics are fine and an attacking zombie is quite compelling when it is right in your face, but the things I truly found compelling in the game are not so much the graphics and gameplay but rather a few other things:
  • Free-roam VR is great. The large space offers fantastic freedom of movement. You can see players move throughout the space, duck to take cover, turn around quickly with no hesitation at all. This generates an excellent feeling of immersion. You can truly feel that you could hide behind corners or walk anywhere with no apparent limitations. Of course, every space has physical limitations and Zero Latency has implemented a system where if you get too close to a player or a wall, something like a radar appears on your screen showing you at the center and the obstacles (players, walls) on it so that you know how to avoid them. If you get too close, the game pauses until you are farther away. This felt very natural. Throughout nearly two hours of active gameplay I think I brushed once or twice against another player but no more than that, even though players were in close proximity. Immersion is such that players don't notice people that are not players around them. In the current Zero Latency office, the bathroom for the office (the "Loo" in "Australian") is right across from the playing space so to get there you can either take a detour walking alongside the walls or go straight through the playing area where the players couldn't care less because they don't even know that you are walking by.
  • The social aspect is very compelling. This game is not about 6 individuals playing separately in a space. It is about 6 players acting as a team within the space. You can definitely hear "you take the right corridor and I'll take the left", or "watch your back" or "I need some help here!" shouts from one player to another. Players that work individually have little chance to stop the zombie invasion coming from all directions, but playing together gives you that chance.
  • Tracking - for both the head and the weapon - are very smooth to the point where you don't think about it. Because multiple players are tracked in the space you can see their avatars around you (sometimes with name tags). The graphics of players walking in the game need some work in my opinion, but you can clearly see where everyone is and what they are doing.
  • 45 minutes of game play go by very quickly and the game masters control the pace very well. As you can imagine, some groups take longer than others to get to the next waypoints, and the game uses waiting for elevators or helicopters as a way to condense or extend the total time. For instance, once you arrive in the cafeteria a sign shows up that the elevator will arrive in 100 seconds. I would imagine that if a group arrived earlier, they would have to wait longer for the elevator or if a group took more time, they would wait less.
The space itself is essentially empty save for the overhead tracking cameras. Thus, the same space the tracking system can be used for many different experiences. Unfortunately, we had to work from time to time and I did not have a chance to try some of the newer experiences that Zero Latency is working on, especially since the space was occupied by customers most of the time. I'm certainly looking forward to coming there again and continue to save the world.

Monday, June 6, 2016

Understanding Pixel Density and Eye-Limiting Resolution

If the human eye was a digital camera, it's "data sheet" would say that it has a of 60 pixels/degree at the fovea (the part of the retina where the visual acuity is highest). This is called eye-limiting resolution.

This means that if there an image with 3600 pixels (60 x 60) and that image fell on a 1° x 1° area of the fovea, a person would not be able to tell it apart from an image with 8100 pixels (90 x 90) that fell on a 1° x 1° area of the fovea.

Note 1: 60 pixels per degree figure is sometimes expressed as "1 arc-minute per pixel". Not surprisingly, an arc-minute is an angular measurement defined as 1/60th of a degree.

Note 2: this kind of calculation is the basis for what Apple refers to as a "retina display", a screen that when held at the right distance would generate this kind of pixel density on the retina.

If you have a VR goggle, you can calculate the pixel density - how many pixels per degree if presents the eye - by dividing the number of pixels in a horizontal display line by the horizontal field of view provided by the eyepiece. For instance, the Oculus DK1 (yes, I know that was quite a while ago) had 1280 x 800 pixels across both eyes, so 640 x 800 pixels per eye, and with a monocular horizontal field of view of about 90 degrees, it had a pixel density of 640 / 90 so just over 7 pixels/degree.

Not to pile on the DK1 (it had many good things, though resolution was not one of them), 7 pixels/degree is the linear pixel density. When you think about it in terms of pixel density per surface area, is it not just 8.5 times worse than the human eye (60 / 7 = 8.5) but actually a lot worse (8.5 * 8.5 which is over 70). The following table compares pixel densities for some popular consumer and professional HMDs:

Product Horizontal pixels per eye Approximate Horizontal Field of View (degrees per eye) Approximate Pixel Density (pixels/degree)
Oculus DK1 640 90 7.1
OSVR HDK 960 90 10.7
HTC VIVE 1080 90 12.0
Sensics dSight 1920 95 20.2
Sensics zSight  1280  48 26.6
Sensics zSight 1920  1920  60  32.0
Human fovea  60.0

Higher pixel density allows you to see finer details - read text; see the grain of the leather on a car's dashboard; spot a target at a greater distance - and in general contribute to an increasingly realistic image.

Historically, one of the things that separated professional-grade HMDs from consumer HMDs was that the professional HMDs had higher pixel density. Let's simulate this using the following four images. Let's assume that the first image, taken from Unreal Engine's Showdown demo, is shown at full 60 pixels/degree density. We can then re-sample it at half the pixel density - simulating 30 pixels/degree - and then half again (resulting in 15 pixels/degree) and half again (7,5 pixels/degree). Notice the stark differences as we go to lower and lower pixel densities.
Full resolution  (simulating 60 pixels/degree)

Half resolution  (simulating 30 pixels/degree)
Simulating 15 pixels/degree

Simulating 7.5 pixels/degree

Higher pixel density for the visual system is not the same as higher pixel density for the screen because pixels on the screen are magnified through the optics. The same screen could be magnified differently with two different optical systems resulting in different pixel densities presented to the eye. It is true, though, that given the same optical system, higher pixel density of pixels on the screen does translate to higher pixel density presented to the eye. As screens get better and better, we will get increasingly closer to eye-limiting resolution in the HMD and thus to essentially photo-realistic experiences.

Tuesday, May 31, 2016

How binocular overlap impacts horizontal field of view

In a previous post, we discussed binocular overlap which increases overall horizontal (and diagonal) field of view. HMD manufacturers sometimes create partially overlapped systems (e.g. overlap less than 100%) to increase the overall horizontal field of view.

For example, imagine an eyepiece that provides a 90 degree horizontal field of view that subtends from 45° to the left to 45° to the right. If both left and right eyepieces point at the same angle, the overall horizontal field of view of the goggles is also from 45° to the left to 45° to the right, so a total of 90 degrees. When both eyepieces cover the same angles, as in this example, we call this 100% overlap.

But now lets assume that the left eyepiece is rotated a bit to the left so that it subtends from 50° to the left and 40° to the right. The monocular field of view is unchanged at 90°. If the right eye is symmetrically moved, it now covers from 40° to the left to 50° to the right. In this case, the binocular (overall) horizontal field of view is 100°, so a bit larger than in the 100% case, and the overlap is 80° (40° to the left to 40° to the right) or 80/90=88.8%

The following tables provide a useful reference to see how to percent of binocular overlap impacts the horizontal (and thus also the diagonal) field of view. We provide two tables, one for displays with a 16:9 aspect ratio (such as 2560x1440 or 1920x1080) and the other for 9:10 aspect ratio (such as the 1080x1200 display in the HTC VIVE). Click on them to see a larger version.

 For instance, if we look at the 16:9 table we can read through an example of a 90° diagonal field of view, which would translate into 82.1° horizontal and 52.2° vertical if the entire screen was visible. Going down the table we can see that at 100% overlap, the binocular horizontal field of view remains the same, e.g. 82.1° and the diagonal also remains the same. However, if we chose 80% binocular overlap, the binocular horizontal field of view grows to 98.6°, vertical stays the same and diagonal grows to 103.2°
overlap for 16-9 aspect ratio

overlap for 9-10 aspect ratio

For those interested, the exact math is below: overlap equations

Sunday, May 8, 2016

Understanding Predictive Tracking

Image source: Adrian Boeing blog
In the context of AR and VR systems, predictive tracking refers to the process of predicting the future orientation and/or position of an object or body part. For instance, one might want to predict the orientation of the head or the position of the hand.

Why is predictive tracking useful?

One common use of predictive tracking is to reduce the apparent "motion to photon" latency, meaning the time between movement and when that movement is reflected in the drawn scene. Since there is some delay between movement and an updated display (more on the sources of that delay below), using an estimated future orientation and position as the data used in updating the display, could shorten that perceived latency.

While a lot of attention has been focused on predictive tracking in virtual reality applications, it is also very important in augmented reality. For instance, if you are displaying a graphical overlay to appear on top of a physical object that you see with an augmented reality goggles, it is important that the overlay stays on the object even when you rotate your head. The object might be recognized with a camera, but it takes time for the camera to capture the frame, for a processor to determine where the object is in the frame and for a graphics chip to render the new overlay. By using predictive tracking, you can get better apparent registration between the overlay and the physical object.

How does it work? 

If you saw a car travelling at a constant speed and you wanted to predict where that car will be one second in the future, you could probably make a fairly accurate prediction. You know the current position of the car, you might know (or can estimate) the current velocity, and thus you can extrapolate the position into the near future.

Of course if you compare your prediction with where the car actually is in one second, your prediction is unlikely to be 100% accurate every time: the car might change direction or speed during that time. The farther out you are trying to predict, the less accurate your prediction will be: predicting where the car will be in one second is likely much more accurate than predicting where it will be in one minute.

The more you know about the car and its behavior, the better chance you have of making an accurate prediction. For instance, if you were able to measure not only the velocity but also the acceleration, you can make a more accurate prediction.

If you have additional information about the behavior of the tracked body, this can also improve prediction accuracy. For instance, when doing head tracking, understand how fast the head can possibly rotate and what are common rotation speeds, can improve the tracking model. Similarly, if you are doing eye tracking, you can use the eye tracking information to anticipate head movements as discussed in this post

Sources of latency

The desired to perform predictive tracking comes from having some latency between actual movement and displaying an image that reflects that movement. Latency can come from multiple sources, such as:
  • Sensing delays. The sensors (e.g. gyroscope) may be bandwidth-limited and do not instantaneously report orientation or position changes. Similarly, camera-based sensors may exhibit delay between when the pixel on the camera sensor receives light from the tracked object to that frame being ready to be sent to the host processor.
  • Processing delays. Sensors are often combined using some kind of sensor fusion algorithm, and executing this algorithm can add latency.
  • Data smoothing. Sensor data is sometimes noisy and to avoid erroneous jitter, software or hardware-based low pass algorithms are executed.
  • Transmission delays. For example, if orientation sensing is done using a USB-connected device, there is some non-zero time between the data available to be ready by the host processor and the time data transfer over USB is completed.
  • Rendering delays. When rendering a non-trivial scene, it takes some time to have the image ready to be sent to the display device.
  • Frame rate delays. If a display is operating at 100 Hz, for instance, there is a 10 mSec time between successive frames. Information that is not precisely current to when a particular pixel is drawn may need to wait until the next time that pixel is drawn on the display.
Some of these delays are very small, but unfortunately all of them add up and predictive tracking, along with other techniques such as time warping, are helpful in reducing the apparent latency.

How much to track into the future?

In two words: it depends. You will want to estimate the end-the-end latency of your system as a starting point and then optimize them to your liking.

It may be that you will need to predict several timepoints into the future at any given time. Here are some examples why this may be required:
  • There are objects with different end-to-end delays. For instance, a hand tracked with a camera may be have different latency than a head tracker, but both need to be drawn in sync in the same scene, so predictive tracking with different 'look ahead' times will be used.
  • In configurations where a single screen - such as a cell phone screen - is used to provide imagery to both eyes, it is often the case that the image for one eye appears with a delay of half a frame (e.g. half of 1/60 seconds, or approx 8 mSec) relative to the other eye. In this case, it is best to use predictive tracking that looks ahead 8 mSec more for that delayed half of the screen.

Common prediction algorithms

Here is some sampling of predictive tracking algorithms:
  • Dead reckoning. This is a very simple algorithm: if the position and velocity (or angular position and angular velocity) is known at a given time, the predicted position assumes that the last know position and velocity are correct and the velocity remains the same. For instance, if the last known position is 100 units and the last known velocity is 10 units/sec, then the predicted position 10 mSec (0.01 seconds) into the future is 100 + 10 x 0.01 = 100.1. While this is very simple to compute, it assumes that the last position and velocity are accurate (e.g. not subject to any measurement noise) and that the velocity is constant. Both these assumptions are often incorrect.
  • Kalman predictor. This is based on a popular Kalman filter that is used to reduce sensor noise in systems where there exists a mathematical model of the system's operation. See here for more detailed explanation of the Kalman filter.
  • Alpha-beta-gamma. The ABG predictor is closely related to the Kalman predictor, but is less general and has simpler math, which we can explain here at a high level. ABG tries to continuously estimate both velocity and acceleration and use them in prediction. Because the estimates take into account actual data, they provide some measurement noise reduction. Configuring the parameters (alpha, beta and gamma) provide the ability to emphasize responsiveness as opposed to noise reduction. If you'd like to follow the math, here it goes:


Predictive tracking is a useful and commonly-used technique for reducing apparent latency. It offers simple or sophisticated implementations, requires some thought and analysis, but it is well worth it.

Saturday, April 30, 2016

VR and AR in 12 variations

I've been thinking about how to classify VR and AR headsets and am starting to look at them along three dimensions (no pun intended):
  1. VR vs AR
  2. PC-powered vs. Phone-powered vs. Self-powered. This looks at where the processing and video generation is coming from. Is it connected to a PC? Is it using a standard phone? Or does it embed processing inside the headset 
  3. Wide field of view vs. Narrow FOV
This generates a total of 2 x 3 x 2 = 12 options as follows


Example and typical use

1: VR, PC-powered, Wide-field
Examples: Oculus, HTC Vive, Sensics dSight, OSVR HDK. This immersive VR configuration is used in many applications, though the most popular one is gaming. One attribute that separated consumer-grade goggles like the HTC Vive from professional-grade goggles such as the Sensics dSight is pixel density: the number of pixels per degree. You can think about this as the diffence between watching a movie on a 50 inch standard-definition TV as opposed to a 50 inch HDTV.

2: VR, PC-powered, Narrow-field
Example: Sensics zSight 1920. With a given number of pixels per eye, narrow-field systems allow for much higher pixel density, which allows observing fine details or very small objects. For instance, imagine that you are training to land a UAV. The first step in landing a UAV is spotting it in the sky. The higher the pixel density is, the farther out you can spot an object of a given size. The zSight 1920 has about 32 pixels/degree whereas a modern consumer goggle like the HTC Vive has less than half that.

3: VR, Phone-powered, Wide-field
Examples: Samsung Gear VR, Google Cardboard, Zeiss VROne.. This configuration where the phone is inserted into some kind of holster is used for general-purpose mobile VR. The advantages of this configuration is its portability as well as its low cost - assuming you already own a compatible phone. The downside of this configuration is that the processing power of a phone is inferior to a high-end PC and thus the experience is more limited in terms of frame rate and scene complexity. Today's phones were not fully designed with VR in mind, so there are sometimes concerns about overheating and battery life.

4: VR, Phone-powered, Narrow-field
Example: LG 369 VR. In this configuration, the phone is not carried on the head but rather connected via a thin wire to a smaller unit on the head. The advantage of this configuration is that it can be very lightweight and compact. Also, the phone could potentially be used as an input pad. The downside is that the phone is connected via a cable. Another downside is often the cost. Because this configuration does not use the phone screens, it needs to include its own screens that might add to the cost. Another advantage is that the phone camera can not be used for video see-through or for sensing.

5: VR, Self-powered, Wide-field
Examples: Gameface Labs, Pico Neo. These configurations aim for standalone, mobile VR without requiring the mobile phone. They potentially save weight by not using unnecessary phone components such as the casing and touch screen, but would typically be more expensive than phone based VR for those users that already own the phone. They might have additional flexibility with regards to which sensors to include, camera placement and battery location. They are more difficult to upgrade relative to a phone-based VR solution, but the fact that the phone cannot be taken out might be an advantage for applications such as public VR where a fully-integrated system that cannot be easily taken apart is a plus.

6: VR, Self-powered, Narrow-field
Example: Sensics SmartGoggles.  These configurations are less popular today. Even the Sensics SmartGoggles which included on-board Android processor as well as wide-area hand sensors was built with relatively narrow field of view (60 degrees) because of the components available at the time.

7: AR, PC-powered, Wide-field
Example: Meta 2. In many augmented reality applications, people ask for wide field so that, for instance, a virtual object that appears overlaid on the real world does not disappear when the user looks to the side.  This configuration may end up being transient because in many cases the value of augmented reality is in being able to interact with the real world, and the user's motion when tethered to a PC is more limited. However, one might see uses in applications such as engineering workstation.

8: AR, PC-powered, Narrow-field
I am not aware of good examples of this configuration. It combines the limit of narrow-field AR with the tether to the PC.

9: AR, Phone-powered, Wide-field
This could become one of the standard AR configuration just like phone-powered, wide-field VR is becoming a mainstream configuration. To get there, the processing power and optics/display technology catch up with the requirements.
10: AR, Phone-powered, Narrow-field
Example: Seebright. In this configuration, a phone is worn on the head and its screen becomes the display for the goggles. Semi-transparent optics combine phone-generated imagery with the real world. I believe this is primarily a transient configuration until wide-field models appear.

11: AR, Self-powered, Wide-field
I am unaware of current examples of this configuration though one would assume it could be very attractive because of the mobility on one hand and the ability to interact in a wide field of view.
12: AR, Self-powered, Narrow-field
Examples: Microsoft Hololens, Google glass, Vuzix M300. There are two types of devices here: one is an 'information appliance' like Google Glass, designed to provide contextually-relevant information without taking over the field of view. These configurations are very attractive in industrial settings for applications like field technicians, workers in a warehouse or even customer service representatives needing a mobile, wearable terminal often to connect with a cloud-based database. The second type of device, exemplified by the Hololens seeks to augment the reality by placing virtual objects locked in space. I am sure the Hololens would like to be a wide-field model and it is narrow field at the moment because of the limitations of its current display technology

Looking forward to feedback and comments.

Monday, April 11, 2016

Understanding Foveated Rendering

Foveated rendering is a rendering technique that takes advantage of the fact that that the resolution of the eye is highest in the fovea (the central vision area) and lower in the peripheral areas. As a result, if one can sense the gaze direction (with an eye tracker), GPU computational load can be reduced by rendering an image that has higher resolution at the direction of gaze and lower resolution elsewhere.

The challenge in turning this from theory to reality is to find the optimal function and parameters that maximally reduce GPU computation while maintaining highest quality visual experience. If done well, the user shouldn’t be able to tell that foveated rendering is being used. The main questions to address are:
  1. In what angle around the center of vision should we keep the highest resolution? 
  2. Is there a mid-level resolution that is best to use? 
  3. What is the drop-off in “pixel density” between central and peripheral vision? 
  4. What is the maximum speed that the eye can move? This question is important because even though the eye is normally looking at the center of the image, the eye can potentially rotate so that the fovea is aimed at image areas with lower resolution.
Let's address these questions:

1. In what angle around the center of vision should we keep the highest resolution?

Source: Wikipedia
The macula portion of the retina is responsible for fine detail. It spans the central 18˚ around the gaze point, or 9˚ eccentricity (the angular distance away from the center of gaze). This would be the best place to put the boundary of the inner layer. Fine detail is processed by cones (as opposed to rods), and at eccentricities past 9˚ you see a rapid fall off of cone density, so this makes sense biologically as well. Furthermore, the “central visual field” ends at 30˚ eccentricity, and everything past that is considered periphery. This is a logical spot to put the boundary between the middle and outermost layer for foveated rendering.

2. Is there a mid-level resolution that is best to use?  and 3. What is the drop-off in “pixel density” between central and peripheral vision? 

Some vendors such as Sensomotoric Instruments (SMI) use an inner layer at full native resolution, a middle layer at 60% resolution, and an outer layer at 20% resolution. When selecting the resolution dropoff, it is important to ensure that at the layer boundaries, the resolution is at or above the eye’s acuity at that eccentricity. At 9˚ eccentricity, acuity drops to 20% of the maximum acuity, and at 30˚ acuity drops to 7.4% of the max acuity. Given this, it appears that SMI’s values work, but are generous compared to what the eye can see.

4.    What is the maximum speed that the eye can move?

Source: Indiana University
A saccade is a rapid movement of the eye between fixation points. Saccade speed is determined by the distance between the current gaze and the stimulus. If the stimulus is as far as 50˚ away, then peak saccade velocity can get up to around 900˚/sec. This is important because you want the high resolution layer to be large enough so that the eye can’t move to the lower resolution portion in the time it takes to get the gaze position and render the scene. So if system latency is 20 msec, and assume eye can move at 900˚/sec – eye could move 18˚ in that time, meaning you would want the inner (higheslayer radius to be greater than that – but that is only if the stimulus presented is 50˚ away from current gaze. 

Additional thoughts

Source: Vision and Ocular Motility by Gunter Noorden

Visual acuity decreases on the temporal side (e.g. towards the ear) somewhat more rapidly than on the nasal side. It also decreases more sharply below and, especially, above the fovea, so that lines connecting points of equal visual acuity are elliptic, paralleling the outer margins of the visual field. Following this, it might make sense to render the different layers in ellipses rather than circles. The image shows the lines of equal visual acuity for the visual field of the left eye – so one can see that it extends farther to the left (temporal side) for the left eye, and for the right eye visual field would extend farther to the right.

For additional reading

This paper from Microsoft research is particularly interesting. 
They approach the foveated rendering problem in a more technical way – optimizing to find layer parameters based on a simple but fundamental idea: for a given acuity falloff line, find the eccentricity layer sizes which support at least that much resolution at every eccentricity, while minimizing the total number of pixels across all layers. It explains their methodology though does not give their results for the resolution values and layer sizes.

Note: special thanks to Emma Hafermann for her research on this post

For additional VR tutorials on this blog, click here
Expert interviews and tutorials can also be found on the Sensics Insight page here