Tuesday, April 5, 2016

Time-warp Explained

In the context of virtual reality, time warp is a technique to reduce the apparent latency between head movement and the the corresponding image that appears inside an HMD.

In an ideal world, the rendering engine would render an image using the measured head pose (orientation and position) immediately before the image is displayed on the screen. However, in the real world, rendering takes time, so the rendering engine uses a pose reading that is a few milliseconds before the image is displayed on the screen. During these few milliseconds, the head moves, so the displayed image lags a little bit after the actual pose reading.

Let's take a numerical example, Let's assume we need to render at 90 frames per second, so there are approximately 11 milliseconds for the rendering process of each frame. Let's assume that head tracking data is available pretty much continuously but that rendering takes 10 milliseconds. Knowing the rendering time, the rendering engine starts rendering as late as possible, which is 10 milliseconds before the frame needs to be displayed. Thus, the rendering engine uses head tracking data that is 10 milliseconds old. If the head rotates at a rate of 200 degrees/second, these 10 milliseconds are equivalent to 2 degrees. If the horizontal field of view of the HMD is 100 degrees and there are 1000 pixels across the visual field, a 2-degree error means that the image lags actual head movement by about 20 pixels.

However, it turns out that even a 2 degree head rotation does not dramatically change the perspective of how the image is drawn. Thus, if there was a way to move the image by 20 pixels on the screens (e.g. 2 degrees in the example), the resultant image would be pretty much exactly what the render engine would draw if the reported head position was changed by two degrees.

That's precisely what time-warping (or "TW" for short) does: it quickly (in less than 1 millisecond) translates the image a little bit based on how much the head rotated between the time the render engine used the head rotation reading and the time the time warping begins.

The process with time warping is fairly simple: the render engine renders and then when the render engine is done, the time-warp is quickly applied to the resultant image.

But what happens if the render engine takes more time than is available between frames? In this case, a version of time-warping, called asynchronous time-warping ("ATW") is often used. ATW takes the last available frame and applies time-warping to it. If the render engine did not finish in time, ATW takes the previous frame, and applies time-warping to it. If the previous frame is taken, the head probably rotated even more, so a greater shift is required. While not as ideal as having the render engine finish on time, ATW on the previous frame is still better than just missing a frame which typically manifests itself in 'judder' - uneven movement on the screen. This is why ATW is sometimes referred to as a "safety net" for rendering, acting in case the render did not complete on time. The "Asynchronous" part of ATW comes from the fact that ATW is an independent process/thread from the main render engine, and runs at a higher priority than the render engine so that it can present an updated frame to the display even if the render engine did not finish on time.

Let's finish with a few finer technical points:

  • The time-warping example might lead to believe that only left-right (e.g. yaw) head motion can be compensated. In practice, all three rotation directions - yaw, pitch and roll - can be compensated as well as head position under some assumptions. For instance, OSVR actually performs 6-DOF warping based in an assumption of objects that are 2 meters from the center of projection. It handles rotation about the gaze direction and approximates all other translations and rotations.
  • Moving objects in the scene - such as hands - will still exhibit judder if the render engine misses a frame, in spite of time-warping. 
  • For time-warping to work well, the rendered frame needs to be somewhat bigger than the size of the display. Otherwise, when shifting the image one might end up shifting empty pixels into the visible area. Exactly how much the rendered frame needs to be larger depends on the frame rate, and the expected velocity of the head rotation. Larger frames mean more pixels to render and more memory, so time warping is not completely 'free'
  • If the image inside the HMD is rendered onto a single display (as opposed to two displays - one per eye), time warping might want to use different warping amounts for each eye because typically one eye would be drawn on screen before the other.
  • Objects such as a menu that are in "head space" (e.g. should be fixed relative to head) need to be rendered and submitted to the time-warp code separately since they should not be modified for projected head movement.
  • Predictive tracking (estimating future pose based on previous reads of orientation, position and angular/linear velocity) can help as input to the render engine, but an actual measurement is always preferable to estimation of the future pose.
  • Depending on the configuration of the HMD displays, there may be some rendering delay between left eye and right eye (for instance, if the screen is a portrait-mode screen, renders top to bottom and the left eye maps to the top part of the screen). In this case, one can use different time warp values for each eye.

For additional VR tutorials on this blog, click here
Expert interviews and tutorials can also be found on the Sensics Insight page here


Ard van Breemen said...

What I don't know is this:
is timewarp just a translation, or a warped translation?
Moving your head 2 degrees in 10ms results in 20 pixels, so the render start has T-pixels to translate, the render end has T+20 pixels to translate, whichever direction the movement is.
And what I also don't get is why is in portrait mode the display not considered as 2 seperate renders of a left and a right position? Especially the right eye seems to be lagging in time, and I fear that might have real problems with people using a HMD often. Things like lazy eyes. (Also 3D hdmi usually delivers 2 frames stitched together as one big frame.)
Or am I now interfering with nvidia and ati territory as my territory is none of the two (arm framebuffers, it's a lot easier ;-) ).

VRGuy said...

The time warping also takes the lens distortion into account, so if I understand your question correctly, it is a warped translation.

In a single screen, the display is considered as two separate renders and indeed some implementations (such as OSVR), the left eye can be configured to be time-warped differently depending on the display structure of the particular HMD

Ard van Breemen said...

Hi, no, I actually meant: display pixel(0,0) needs another translation than display pixel(right,bottom). So even on a single left or right eye frame each individual pixel has a unique timedependent translation before being eye warped, or actually, the eye warping and timedependent translation are mixed with eachother, because the pixel placement is depending on the eyewarp...
So actually it is a reversed warp:
for each pixel on the display, you take the draw time T of that pixel on the screen, unwarp it (reversed eye correction), and then use the pixels that should have been drawn there according to the time T translation.
I don't know how to express it otherwise, sorry :-).
But my original question was (rephrased): do current timewarp implementations take into account that each pixel has it's own timewarp?
And my second question was (rephrased): do current left eye/right eye implementations in single display portrait mode implement a very laggy right eye, or do they finish the rendering of the right eye half way the displaying of the left eye? That last part is easy to do if you have access to the framebuffer and have a good timer, and can bitblit in time to the framebuffer. I'm not so sure about that on the pc if you look at the complexity that amd and nvidia made of displaying a single frame, I doubt having access to the framebuffer will give you any benefits without good timing access.

VRGuy said...

Varying the timewarping per pixel is possible, but if often more associated with 'beam chasing' implementations. Regular timewarping does not use a different transform for every pixel

Yes, single screen implementations render one eye with lag but we can time warp each half of the screen to a different transform to account for that. For instance, if you implement predictive tracking (OSVR does, for instance), you can use a different look-ahead time for each half of the screen