Difference between revisions of "MASH: Adaptive Streaming of Multiview Videos over HTTP"

From NMSL
Line 65: Line 65:
 
Rate adaptation for multiview videos is far more complex, as it needs to handle many views of different importance, while not wasting network bandwidth or resulting in many stalls during playback for re-buffering. To handle this complexity, we propose employing a ''family'' of buffer-rate functions, which considers the relative importance of the active and inactive views and how this relative view importance dynamically changes during the streaming session. Specifically, we define a function <math>f_i (B_i(t))</math> for each view <math>V_i</math>, which maps the buffer level <math>B_i(t)</math> of that view to a target quality <math>Q_i(t)</math> based on its importance factor <math>\beta_i</math> at time <math>t</math>. We use <math>\beta_i</math> to limit the maximum buffer occupancy level for view <math>V_i</math> as: <math>B_{max,i} = \beta_i \times B_{max}</math>. Since we set <math>\beta_i = 1</math> for the active view, the algorithm can request segments up to the maximum quality <math>Q_{max,i}</math>.
 
Rate adaptation for multiview videos is far more complex, as it needs to handle many views of different importance, while not wasting network bandwidth or resulting in many stalls during playback for re-buffering. To handle this complexity, we propose employing a ''family'' of buffer-rate functions, which considers the relative importance of the active and inactive views and how this relative view importance dynamically changes during the streaming session. Specifically, we define a function <math>f_i (B_i(t))</math> for each view <math>V_i</math>, which maps the buffer level <math>B_i(t)</math> of that view to a target quality <math>Q_i(t)</math> based on its importance factor <math>\beta_i</math> at time <math>t</math>. We use <math>\beta_i</math> to limit the maximum buffer occupancy level for view <math>V_i</math> as: <math>B_{max,i} = \beta_i \times B_{max}</math>. Since we set <math>\beta_i = 1</math> for the active view, the algorithm can request segments up to the maximum quality <math>Q_{max,i}</math>.
 
For inactive views, MASH can request segments for up to a fraction of their maximum qualities. Figure 2 illustrates the buffer-rate functions for two views <math>V_i</math> and <math>V_j</math>. <math>V_i</math> is the active view, so <math>B_{max,i} = B_{max}</math>. The figure shows when the requests stop for both <math>V_i</math> and <math>V_j</math>, and the maximum bitrate difference to reflect the importance of each view.
 
For inactive views, MASH can request segments for up to a fraction of their maximum qualities. Figure 2 illustrates the buffer-rate functions for two views <math>V_i</math> and <math>V_j</math>. <math>V_i</math> is the active view, so <math>B_{max,i} = B_{max}</math>. The figure shows when the requests stop for both <math>V_i</math> and <math>V_j</math>, and the maximum bitrate difference to reflect the importance of each view.
 +
 +
 +
<center>
 +
{| border="0"
 +
|[[Image:mash_function_example.png|center|Fig. 2: Proposed buffer-rate functions of active and inactive views.|500px]]
 +
|-
 +
|align="center" width="500pt"|Fig. 2: Proposed buffer-rate functions of views <math>V_i</math> (active) and <math>V_j</math> (inactive)
 +
|}
 +
</center>
 +
  
 
=== Evaluation ===
 
=== Evaluation ===

Revision as of 15:10, 18 November 2016

People

  • Khaled Diab
  • Mohamed Hefeeda


Overview

Multiview videos offer unprecedented experience by allowing users to explore scenes from different angles and perspectives. Thus, such videos have been gaining substantial interest from major content providers such as Google and Facebook. Adaptive streaming of multiview videos is, however, challenging because of the Internet dynamics and the diversity of users interests and network conditions. To address this challenge, we propose a novel rate adaptation algorithm for multiview videos (called MASH). Streaming multiview videos is more user centric than single-view videos, because it heavily depends on how users interact with the different views. To efficiently support this interactivity, MASH constructs probabilistic view switching models that capture the switching behavior of the user in the current session, as well as the aggregate switching behavior across all previous sessions of the same video. MASH then utilizes these models to dynamically assign relative importance to different views. Furthermore, MASH uses a new buffer-based approach to request video segments of various views at different qualities, such that the quality of the streamed videos is maximized while the network bandwidth is not wasted. We have implemented a multiview video player and integrated MASH in it. We compare MASH versus the state-of-the-art algorithm used by YouTube for streaming multiview videos. Our experimental results show that MASH can produce much higher and smoother quality than the algorithm used by YouTube, while it is more efficient in using the network bandwidth. In addition, we conduct large- scale experiments with up to 100 concurrent multiview streaming sessions, and we show that MASH maintains fairness across competing sessions, and it does not overload the streaming server.

Details

Figure 1 shows a high-level overview of MASH, which runs at the client side. MASH combines the outputs of the global and local view switching models to produce a relative importance factor <math>\beta_i </math> for each view <math>V_i</math> . MASH also constructs a buffer-rate function <math>f_i</math> for each view <math>V_i</math>, which maps the current buffer occupancy to the segment quality to be requested. The buffer-rate functions are dynamically updated during the session; whenever a view switch happens. MASH strives to produce smooth and high quality playback for all views, while not wasting bandwidth by carefully prefetching views that will likely be watched.


Fig. 1: High-level overview of MASH.


View Switching Models

MASH combines the outputs of two stochastic models (local and global) to estimate the likelihood of different views being watched. We define each view switching model as a discrete-time Markov chain (DTMC) with <math>N</math> (number of views) states. View switching is allowed at discrete time steps of length <math>\Delta</math>. The time step <math>\Delta</math> is the physical constraint on how fast the user can interact with the video.

Local Model: It captures the user activities during the current streaming session, and it evolves with time. That is, the model is dynamic and is updated with every view switching event that happens in the session. The model maintains a count matrix <math>M(t)</math> of size <math>N \times N</math> , where <math>M_{ij}(t)</math> is proportional to the number of times the user switched from view <math>V_i</math> to <math>V_j</math>, from the beginning of the session up to time <math>t</math>. The count matrix <math>M(t)</math> is initialized to all ones. Whenever a view switching occurs, the corresponding element in <math>M(t)</math> is incremented. The count matrix is used to compute the probability transition matrix of the local model <math>L(t)</math>.

Global Model: This model aggregates users activities across all streaming sessions that have been served by the server so far. At beginning of the streaming session, the client downloads the global model parameters from the server. We use <math>G</math> to denote the transition matrix of the global model, where <math>G_{ij} = p(V_j |V_i )</math> is the probability of switching to <math>V_j</math> given <math>V_i</math>. If this is the fist streaming session, <math>G_{ij}</math> is initialized to <math>1/N</math> for every <math>i</math> and <math>j</math>.

Combined Model: The local and global model complement each other in predicting the (complex) switching behavior of users during watching multiview videos. For example, in some streaming sessions, the user activity may significantly deviate from the global model expectations, because the user is exploring the video from different viewing angles than most previous users have. Or the multiview video may be new, and the global model has not captured the expected view switching pattern yet. On the other hand, the local model may not be very helpful when the user has not had enough view switches yet, e.g., at the beginning of a streaming session. We combine the local and global models to compute an importance factor <math>\beta_i</math>for each view <math>V_i</math> by linearly combining <math>G</math> and <math>L(t)</math> using weight factor <math>\alpha_i</math>. This weight factor is carefully computed to dynamically adjust the relative weights of the global and local models.

MASH: The Proposed Algorithm

MASH is a buffer-based rate adaptation algorithm for multiview videos, which means it determines the requested segment quality based on the buffer occupancy level, and it does not need to estimate the network capacity.

Rate adaptation for multiview videos is far more complex, as it needs to handle many views of different importance, while not wasting network bandwidth or resulting in many stalls during playback for re-buffering. To handle this complexity, we propose employing a family of buffer-rate functions, which considers the relative importance of the active and inactive views and how this relative view importance dynamically changes during the streaming session. Specifically, we define a function <math>f_i (B_i(t))</math> for each view <math>V_i</math>, which maps the buffer level <math>B_i(t)</math> of that view to a target quality <math>Q_i(t)</math> based on its importance factor <math>\beta_i</math> at time <math>t</math>. We use <math>\beta_i</math> to limit the maximum buffer occupancy level for view <math>V_i</math> as: <math>B_{max,i} = \beta_i \times B_{max}</math>. Since we set <math>\beta_i = 1</math> for the active view, the algorithm can request segments up to the maximum quality <math>Q_{max,i}</math>. For inactive views, MASH can request segments for up to a fraction of their maximum qualities. Figure 2 illustrates the buffer-rate functions for two views <math>V_i</math> and <math>V_j</math>. <math>V_i</math> is the active view, so <math>B_{max,i} = B_{max}</math>. The figure shows when the requests stop for both <math>V_i</math> and <math>V_j</math>, and the maximum bitrate difference to reflect the importance of each view.


Fig. 2: Proposed buffer-rate functions of views <math>V_i</math> (active) and <math>V_j</math> (inactive)


Evaluation