Difference between revisions of "MASH: Adaptive Streaming of Multiview Videos over HTTP"

Revision as of 16:00, 18 November 2016

People

Khaled Diab
Mohamed Hefeeda

Overview

Multiview videos offer unprecedented experience by allowing users to explore scenes from different angles and perspectives. Thus, such videos have been gaining substantial interest from major content providers such as Google and Facebook. Adaptive streaming of multiview videos is, however, challenging because of the Internet dynamics and the diversity of users interests and network conditions. To address this challenge, we propose a novel rate adaptation algorithm for multiview videos (called MASH). Streaming multiview videos is more user centric than single-view videos, because it heavily depends on how users interact with the different views. To efficiently support this interactivity, MASH constructs probabilistic view switching models that capture the switching behavior of the user in the current session, as well as the aggregate switching behavior across all previous sessions of the same video. MASH then utilizes these models to dynamically assign relative importance to different views. Furthermore, MASH uses a new buffer-based approach to request video segments of various views at different qualities, such that the quality of the streamed videos is maximized while the network bandwidth is not wasted. We have implemented a multiview video player and integrated MASH in it. We compare MASH versus the state-of-the-art algorithm used by YouTube for streaming multiview videos. Our experimental results show that MASH can produce much higher and smoother quality than the algorithm used by YouTube, while it is more efficient in using the network bandwidth. In addition, we conduct large- scale experiments with up to 100 concurrent multiview streaming sessions, and we show that MASH maintains fairness across competing sessions, and it does not overload the streaming server.

Details

Figure 1 shows a high-level overview of MASH, which runs at the client side. MASH combines the outputs of the global and local view switching models to produce a relative importance factor <math>\beta_i </math> for each view <math>V_i</math> . MASH also constructs a buffer-rate function <math>f_i</math> for each view <math>V_i</math>, which maps the current buffer occupancy to the segment quality to be requested. The buffer-rate functions are dynamically updated during the session; whenever a view switch happens. MASH strives to produce smooth and high quality playback for all views, while not wasting bandwidth by carefully prefetching views that will likely be watched.

500px

Fig. 1: High-level overview of MASH.

View Switching Models

MASH combines the outputs of two stochastic models (local and global) to estimate the likelihood of different views being watched. We define each view switching model as a discrete-time Markov chain (DTMC) with <math>N</math> (number of views) states. View switching is allowed at discrete time steps of length <math>\Delta</math>. The time step <math>\Delta</math> is the physical constraint on how fast the user can interact with the video.

Local Model: It captures the user activities during the current streaming session, and it evolves with time. That is, the model is dynamic and is updated with every view switching event that happens in the session. The model maintains a count matrix <math>M(t)</math> of size <math>N \times N</math> , where <math>M_{ij}(t)</math> is proportional to the number of times the user switched from view <math>V_i</math> to <math>V_j</math>, from the beginning of the session up to time <math>t</math>. The count matrix <math>M(t)</math> is initialized to all ones. Whenever a view switching occurs, the corresponding element in <math>M(t)</math> is incremented. The count matrix is used to compute the probability transition matrix of the local model <math>L(t)</math>.

Global Model: This model aggregates users activities across all streaming sessions that have been served by the server so far. At beginning of the streaming session, the client downloads the global model parameters from the server. We use <math>G</math> to denote the transition matrix of the global model, where <math>G_{ij} = p(V_j |V_i )</math> is the probability of switching to <math>V_j</math> given <math>V_i</math>. If this is the fist streaming session, <math>G_{ij}</math> is initialized to <math>1/N</math> for every <math>i</math> and <math>j</math>.

Combined Model: The local and global model complement each other in predicting the (complex) switching behavior of users during watching multiview videos. For example, in some streaming sessions, the user activity may significantly deviate from the global model expectations, because the user is exploring the video from different viewing angles than most previous users have. Or the multiview video may be new, and the global model has not captured the expected view switching pattern yet. On the other hand, the local model may not be very helpful when the user has not had enough view switches yet, e.g., at the beginning of a streaming session. We combine the local and global models to compute an importance factor <math>\beta_i</math>for each view <math>V_i</math> by linearly combining <math>G</math> and <math>L(t)</math> using weight factor <math>\alpha_i</math>.

MASH: The Proposed Algorithm

Evaluation

@@ Line 53: / Line 53: @@
 MASH combines the outputs of two stochastic models (local and global) to estimate the likelihood of different views being watched. We define each view switching model as a discrete-time Markov chain (DTMC) with <math>N</math> (number of views) states. View switching is allowed at discrete time steps of length <math>\Delta</math>. The time step <math>\Delta</math> is the physical constraint on how fast the user can interact with the video.
-'''Local Model:''' It captures the user activities during the current streaming session, and it evolves with time. That is, the model is dynamic and is updated with every view switching event that happens in the session. The model maintains a count matrix <math>M(t)</math> of size <math>N \times N</math> , where <math>M_{ij}(t)</math> is proportional to the number of times the user switched from view <math>V_i</math> to <math>V_j</math>, from the beginning of the session up to time <math>t</math>. The count matrix <math>M(t)</math> is initialized to all ones. Whenever a view switching occurs, the corresponding element in <math>M(t)</math> is incremented.
+'''Local Model:''' It captures the user activities during the current streaming session, and it evolves with time. That is, the model is dynamic and is updated with every view switching event that happens in the session. The model maintains a count matrix <math>M(t)</math> of size <math>N \times N</math> , where <math>M_{ij}(t)</math> is proportional to the number of times the user switched from view <math>V_i</math> to <math>V_j</math>, from the beginning of the session up to time <math>t</math>. The count matrix <math>M(t)</math> is initialized to all ones. Whenever a view switching occurs, the corresponding element in <math>M(t)</math> is incremented. The count matrix is used to compute the probability transition matrix of the local model <math>L(t)</math>.
 '''Global Model:''' This model aggregates users activities across all streaming sessions that have been served by the server so far. At beginning of the streaming session, the client downloads the global model parameters from the server. We use <math>G</math> to denote the transition matrix of the global model, where <math>G_{ij} = p(V_j |V_i )</math> is the probability of switching to <math>V_j</math> given <math>V_i</math>. If this is the fist streaming session, <math>G_{ij}</math> is initialized to <math>1/N</math> for every <math>i</math> and <math>j</math>.
-'''Combined Model:'''
+'''Combined Model:''' The local and global model complement each other in predicting the (complex) switching behavior of users during watching multiview videos. For example, in some streaming sessions, the user activity may significantly deviate from the global model expectations, because the user is exploring the video from different viewing angles than most previous users have. Or the multiview video may be new, and the global model has not captured the expected view switching pattern yet. On the other hand, the local model may not be very helpful when the user has not had enough view switches yet, e.g., at the beginning of a streaming session. We combine the local and global models to compute an importance factor <math>\beta_i</math>for each view <math>V_i</math> by linearly combining  <math>G</math> and  <math>L(t)</math> using weight factor <math>\alpha_i</math>.
 === MASH: The Proposed Algorithm ===