Difference between revisions of "Private:3DV Remote Rendering"

From NMSL
Line 39: Line 39:
 
* How will view synthesis and associated operations (e.g. 3D warping and hole filling) at the receiver-side affect the power consumption of the device?
 
* How will view synthesis and associated operations (e.g. 3D warping and hole filling) at the receiver-side affect the power consumption of the device?
 
* Should we only focus on reducing the amount of data needed to be transmitted, as antennas consume a significant amount of power?  Or should we experiment and see how those 2 variables are tied together, given that decoding 3D videos is a much more resource intensive process?
 
* Should we only focus on reducing the amount of data needed to be transmitted, as antennas consume a significant amount of power?  Or should we experiment and see how those 2 variables are tied together, given that decoding 3D videos is a much more resource intensive process?
 +
 +
 +
=== Thoughts ===
 +
 +
* If we reduce the depth maps resolution by down-sampling to reduce the bitrate of the transmitted stream, we will need to perform depth enhancement operations after reconstruction at the receiver side. These operations can be computationally expensive and may drain the mobile receivers power.
 +
* When considering which is more important in order to implement a prioritization technique, we are also faced with a dilemma. Having a high resolution texture stream is required for backward compatibility in case the device is only 2D capable. However, having a high quality depth map is very important to avoid shape/surface deformations after reconstructing the 3D scene.
 +
* Trying to divide the task of rendering the 3D video between the client and the server is also not trivial. For example, attempting to generate part of the view that will be rendered on the client side will not be possible because we do not know which viewpoint the client will render based on user input. Moreover, this defeats the goal that needs to be achieved which is sending the client only two neighboring views and delegating the rendering of intermediate views to it in order to reduce viewpoint change latency.
 +
* Attempting to utilize the GPU to speedup the view synthesis process also has its challenges. One main issue that may hinder achieving significant speedups is the 3D warping process. That mapping process between pixels in the reference view and pixels in the target view is not a one-to-one mapping process. Several pixels may be mapped to the same pixel location in the target view causing competition that needs to be resolved based on which pixel represents an object in the foreground and which one represents the background. Thus, attempting to perform 3D warping of the pixels in parallel will exhibit shared resource contention. How much effect does this has on achievable speedups needs to be determined.
 +
<!-- * One possible solution that reduces the effect of this contention would be to perform rectification of the two views first to reduce the warping process to a horizontal shift process based on the depth value. Since we have two reference views, a left view and a right view, we -->
  
  

Revision as of 11:51, 7 April 2011

Here we describe the components of a 3D video remote rendering system for mobile devices based on cloud computing services. We also discuss the main design choices and challenges that need to be addressed in such a system.


Components

The system will be composed of three main components:

  • Mobile receiver(s)
  • Adaptation proxy
  • View synthesis and rendering cloud service


Transmission is to be carried via unicast over an unreliable wireless channel. A feedback channel would be necessary between the receiver and the proxy. This channel would be utilized to send information about current/desired viewpoint, buffer status, and network conditions, in addition statistics about mobile device itself (e.g. current battery level, screen resolution, expected amount of power for processing, etc.). Such feedback channel is crucial in order to have a fully-adaptive algorithm, that can quickly adapt to any change in these parameters.

Because of the limited wireless bandwidth, we need efficient and adaptive compression of transmitted views/layers. In addition, an unequal error protection (UEP) technique will be required to overcome the unreliable nature of the wireless channel. Multiple description coding (MDC) has already been used in some experiments, and the results seem quite promising for both multiview video coding, and video plus depth.

It is assumed that the mobile receiver has a display capable of rendering at least two views. Interfacing with mobile receiver's display may be an issue since this will only be possible through a predefined driver API. Whether or not these APIs will be exposed and which 3D image format they expect will vary from one device to the other. We believe that such autostereoscopic displays will probably come with their own IP hardware to perform rendering operations such as 3D warping, hole filling, etc. Whether or not we would have control over this process is still unknown. In the mean time, it is possible to experiment and send data that can be rendered in 2D, just like most of the experiments we have read so far. This would enable us to establish the feasibility of our scheme, and benchmark it against previous works.

Based on receiver feedback, the adaptation proxy is responsible for selecting the best views to send, perform rate adaptation based on current network conditions, and encode them quickly and efficiently. We can utilize RD-optimization techniques for rate adaptation.

It is important to distinguish what the proxy and cloud would accomplish. For example, a multiview plus depth scheme could be used in order to support a broad array of devices, such as phones and tablets. From some real, filmed views, a server might interpolate the extra views required, where the number of extra views depend of the nature of the mobile device. This would give quite some flexibility, at a very cheap cost: only the original views would need to be stored. Experimentally, we would have to see if it is doable to generate extra interpolated views on the fly, or if such views need to live on the server as well.


Design Choices

  • What is the format of the stored video files?
  • How many views (and possible depth maps) need to be sent to the receiver?
    • two views (receiver needs to construct a disparity/depth map and synthesize intermediate views)
    • two views + two depth maps (receiver can then synthesize any intermediate view between the received ones)
    • one view + depth map (yields a limited view synthesis range)
    • two+ views (and + depth maps) for larger displays, such as the iPad and displays/TV in cars
  • What compression format should be used to compress the texture images of the views? This could be driven by the resolution of the display where a high level of texture might not be noticeable.
  • What compression format is efficient for compressing the depth maps without affecting the quality of synthesized views? Should depth map be compressed?
    • Will MVC be suitable for depth maps?
  • How much will quality reduction of one of the views to reduce bandwidth affect the synthesis process at the receiver side?
    • Will the effect be significant given that receiver's display size is small?
  • How will view synthesis and associated operations (e.g. 3D warping and hole filling) at the receiver-side affect the power consumption of the device?
  • Should we only focus on reducing the amount of data needed to be transmitted, as antennas consume a significant amount of power? Or should we experiment and see how those 2 variables are tied together, given that decoding 3D videos is a much more resource intensive process?


Thoughts

  • If we reduce the depth maps resolution by down-sampling to reduce the bitrate of the transmitted stream, we will need to perform depth enhancement operations after reconstruction at the receiver side. These operations can be computationally expensive and may drain the mobile receivers power.
  • When considering which is more important in order to implement a prioritization technique, we are also faced with a dilemma. Having a high resolution texture stream is required for backward compatibility in case the device is only 2D capable. However, having a high quality depth map is very important to avoid shape/surface deformations after reconstructing the 3D scene.
  • Trying to divide the task of rendering the 3D video between the client and the server is also not trivial. For example, attempting to generate part of the view that will be rendered on the client side will not be possible because we do not know which viewpoint the client will render based on user input. Moreover, this defeats the goal that needs to be achieved which is sending the client only two neighboring views and delegating the rendering of intermediate views to it in order to reduce viewpoint change latency.
  • Attempting to utilize the GPU to speedup the view synthesis process also has its challenges. One main issue that may hinder achieving significant speedups is the 3D warping process. That mapping process between pixels in the reference view and pixels in the target view is not a one-to-one mapping process. Several pixels may be mapped to the same pixel location in the target view causing competition that needs to be resolved based on which pixel represents an object in the foreground and which one represents the background. Thus, attempting to perform 3D warping of the pixels in parallel will exhibit shared resource contention. How much effect does this has on achievable speedups needs to be determined.


Tools

  • Joint Multiview Video Coding JMVC Reference Software (:pserver:jvtuser@garcon.ient.rwth-aachen.de:/cvs/jvt)
  • View Synthesis Based on Disparity/Depth (ViSBD) Reference Software
  • Computer Unified Device Architecture (CUDA)


References