Private:3DV Remote Rendering

Here we describe the components of a 3D video remote rendering system for mobile devices based on cloud computing services. We also discuss the main design choices and challenges that need to be addressed in such a system.

Components

The system will be composed of three main components:

Mobile receiver(s)
Adaptation proxy
View synthesis and rendering cloud service

Transmission is to be carried via unicast over an unreliable wireless channel. A feedback channel would be necessary between the receiver and the proxy. This channel would be utilized to send information about current/desired viewpoint, buffer status, and network conditions, but also about the mobile device itself: current battery level, screen resolution, expected amount of power for processing, etc. Such feedback channel is crucial in order to have a fully-adaptive algorithm, that can quickly adapt to any change of those parameters.

Because of the limited wireless bandwidth, we need efficient and adaptive compression of transmitted views/layers. In addition, an unequal error protection (UEP) technique will be required to overcome the unreliable nature of the wireless channel. Multiple description coding (MDC) has already been used in some experiments, and the results seem quite promising.

Based on receiver feedback, the adaptation proxy is responsible for selecting the best views to send to the receiver, perform rate adaptation based on current network conditions, and encode them quickly and efficiently. We can utilize RD-optimization techniques for rate adaptation.

It is important to distinguish what the proxy and cloud would accomplish. For example, a multiview plus depth scheme could be used in order to support a broad array of devices, such as phones and tablets. From some real, filmed views, a server might interpolate the extra views required, where the number of extra views depend of the nature of the mobile device. This would give quite some flexibility, at a very cheap cost: only the original views would need to be stored. Experimentally, we would have to see if it is doable to generate extra interpolated views on the fly, or if such views need to live on the server as well.

Design Choices

What is the format of the stored video files?
How many views (and possible depth maps) need to be sent to the receiver?
- two views (receiver needs to construct a disparity/depth map and synthesize intermediate views)
- two views + two depth maps (receiver can then synthesize any intermediate view between the received ones)
- one view + depth map (yields a limited view synthesis range)
- two+ views (and + depth maps) for larger displays, such as the iPad and displays/TV in cars
What compression format should be used to compress the texture images of the views? This could be driven by the resolution of the display where a high level of texture might not be noticeable.
What compression format is efficient for compressing the depth maps without affecting the quality of synthesized views? Should depth map be compressed?
- Will MVC be suitable for depth maps?
How much will quality reduction of one of the views to reduce bandwidth affect the synthesis process at the receiver side?
- Will the effect be significant given that receiver's display size is small?
How will view synthesis and associated operations (e.g. 3D warping and hole filling) at the receiver-side affect the power consumption of the device?
Should we only focus on reducing the amount of data needed to be transmitted, as antennas consume a significant amount of power? Or should we experiment and see how those 2 variables are tied together, given that decoding 3D videos is a much more resource intensive process?

Tools

Joint Multiview Video Coding JMVC Reference Software (:pserver:jvtuser@garcon.ient.rwth-aachen.de:/cvs/jvt)
View Synthesis Based on Disparity/Depth (ViSBD) Reference Software
Computer Unified Device Architecture (CUDA)