MASH: Adaptive Streaming of Multiview Videos over HTTP
People
- Khaled Diab
- Mohamed Hefeeda
Overview
Multiview videos offer unprecedented experience by allowing users to explore scenes from different angles and perspectives. Thus, such videos have been gaining substantial interest from major content providers such as Google and Facebook. Adaptive streaming of multiview videos is, however, challenging because of the Internet dynamics and the diversity of users interests and network conditions. To address this challenge, we propose a novel rate adaptation algorithm for multiview videos (called MASH). Streaming multiview videos is more user centric than single-view videos, because it heavily depends on how users interact with the different views. To efficiently support this interactivity, MASH constructs probabilistic view switching models that capture the switching behavior of the user in the current session, as well as the aggregate switching behavior across all previous sessions of the same video. MASH then utilizes these models to dynamically assign relative importance to different views. Furthermore, MASH uses a new buffer-based approach to request video segments of various views at different qualities, such that the quality of the streamed videos is maximized while the network bandwidth is not wasted. We have implemented a multiview video player and integrated MASH in it. We compare MASH versus the state-of-the-art algorithm used by YouTube for streaming multiview videos. Our experimental results show that MASH can produce much higher and smoother quality than the algorithm used by YouTube, while it is more efficient in using the network bandwidth. In addition, we conduct large- scale experiments with up to 100 concurrent multiview streaming sessions, and we show that MASH maintains fairness across competing sessions, and it does not overload the streaming server.
Details
Figure 1 shows a high-level overview of MASH, which runs at the client side. MASH combines the outputs of the global and local view switching models to produce a relative importance factor <math>\beta_i </math> for each view <math>V_i</math> . MASH also constructs a buffer-rate function <math>f_i</math> for each view <math>V_i</math>, which maps the current buffer occupancy to the segment quality to be requested. The buffer-rate functions are dynamically updated during the session; whenever a view switch happens. MASH strives to produce smooth and high quality playback for all views, while not wasting bandwidth by carefully prefetching views that will likely be watched.
Fig. 1: High-level overview of MASH. |
View Switching Models
MASH combines the outputs of two stochastic models (local and global) to estimate the likelihood of different views being watched. We define each view switching model as a discrete-time Markov chain (DTMC) with <math>N</math> (number of views) states. View switching is allowed at discrete time steps of length <math>\Delta</math>. The time step <math>\Delta</math> is the physical constraint on how fast the user can interact with the video.
Local Model: It captures the user activities during the current streaming session, and it evolves with time. That is, the model is dynamic and is updated with every view switching event that happens in the session. The model maintains a count matrix <math>M(t)</math> of size <math>N \times N</math> , where <math>M_{ij}(t)</math> is proportional to the number of times the user switched from view <math>V_i</math> to <math>V_j</math>, from the beginning of the session up to time <math>t</math>. The count matrix <math>M(t)</math> is initialized to all ones. Whenever a view switching occurs, the corresponding element in <math>M(t)</math> is incremented.
Global Model: This model aggregates users activities across all streaming sessions that have been served by the server so far. At beginning of the streaming session, the client downloads the global model parameters from the server. We use <math>G</math> to denote the transition matrix of the global model, where <math>G_{ij} = p(V_j |V_i )</math> is the probability of switching to <math>V_j</math> given <math>V_i</math>. If this is the fist streaming session, <math>G_{ij}</math> is initialized to <math>1/N</math> for every <math>i</math> and <math>j</math>.
Combined Model: