Spring 2015 (RA)

Courses: NA

January 29

3D Streaming System
- Three centralized algorithms for storage management.
- Some assumptions on video library and system resources.
- Tried different orderings and assumptions to avoid eviction and make the analysis easier.
- No combination of ordering or assumption help in avoid eviction
- The simple algorithm is optimal (and correct) but not realistic
- The approx. algorithm is polynomial but may be incorrect (with no eviction).
- The approx. algorithm is correct but exponential (with eviction).

January 12

3D Streaming System
- Model modification to reflect the demand, IO bandwidth, and network bandwidth. Also, the processing times and capacities are more accurate now.
- The replication factor depends on the version demands, watching probability and streaming servers.
- I considered a model where the streaming servers are learned during the optimization given the replication factor, the model is slow even using a small data set. I believe that learning the replication factor from the optimization would result in the same slow optimization, so I went back to:
  - static assignment of streaming servers, and
  - calculation of replication factor as described in the report
- To be discussed in the meeting (from my side):
  - The current model has one issue: when the replication factor > 1, at most one replica is processed not stored. For example, is replication factor is five, we can have 4 stored versions and 1 processed one. (The resources are not exceeded). We can solve this using non-linear constraint, but this would complicate things?
  - The multi-site data center is still in progress.
  - The distributed algorithm would work with the new model, but need to add the "streaming servers" semantic to it.

Fall 2014 (RA)

Courses:
- CMPT-726: Machine Learning
- CMPT-880: Cloud Computing

December 15

3D Streaming System
- I needed to take a rest after the semester had finished, I worked for four days and was almost in a hibernation mode in the rest of the days. The current progress report doesn't have much updates compared to the previous ones:
  - I updated the storage management model with streaming redundancy (Section I.V.B) using predefined streaming servers assignments.
  - I implemented the streaming redundancy (Section I.V.B.3) to let the solver chooses the disks, processors and streaming servers. This one was very slow as expected since we increased the number of variables and constraints (100 videos take about 5 hours to find the solution). So, I will drop this option and use the original one plus the redundancy.
  - For this report, I didn't update the distributed algorithm.
- My plan:
  - The current distributed algorithm is "good" in the sense that we can generalize it for the update algorithm (in the case of new videos or popularity peaks). It is not "the best" in the terms of the output cost, I believe we can find better assignments.
  - The "redundancy factor" will enable us to extend the model/algorithm from centralized datacenter to CDN+cloud. However, the resources provisioning is still an issue that is not handled by the current model.

I hope to find answers for these questions (especially the first one) by next Wed.

Courses
- Cloud computing: A
- Machine learning: A+

October 02

3D Streaming System
- This week I focused on many system design aspects in more details (not just some machines/disks over a network):
  - Frontend HTTP servers: scale-in or out based on the request load.
  - Manager: has a global view of the system, and runs the storage management algorithm. At one moment, when the load increases, we may have many managers in the system, one master (runs the storage management algorithm(s)), and slaves. Each manager is responsible of specific versions, we will distribute versions based on their popularity such that all managers have the same load. The state will be stored in the manager's local memory.
    - We have a distributed in-memory key-value store which stores the whole system state. This one is replicated to a disk periodically. Each manager updates this key-value store periodically, so we don't lost information. Specifically, manager's local memory acts as a L1 cache for this key-value store.
  - Storage System: as before, couple of SSDs and HDDs besides the original videos.
  - Streaming servers: machines are responsible of the actual streaming from the "Storage System" to the end user. The "streaming manager" selects a "streaming server" and negotiates with it to stream the requested segment. Once agreed, the "streaming server" starts the actual streaming. To reduce the overhead of streaming server<->manager communication, thy negotiate to stream X segments, not just one segment. The buffer management part is not clear yet. But, the "streaming server" itself has a memory replacement scheme (e.g. LRU) but evicts the segments to its local disk. Also, each segment fetched from the storage system is replicated to K nearest streaming servers local disks, they will help in recovery and peer-assisted streaming. For example, when a "streaming server" detects that it is loaded (bandwidth or memory), it redirects its streaming responsibility to a nearest streaming server, and updates the manager.
  - Haertbeat-based status update, we have tested this technique before, and was fine to the task.
- I defined recovery routines for each module:
  - Streaming server: if it fails, the manager detects this and redirects the streaming to an alive nearest streaming server. This nearest server has replicated segments, overhead is to (1) redirect streaming from the failing server to the alive one, and (2) copy segments from alive server's local disk to its memory.
  - Manager: Start a new manager (either a new machine, or a streaming server). If the failing one is a master, one of the managers takes its responsibilities.
  - Storage System: While replication is fine, but it is expensive. Original 3D videos are replicated, but we need a different (smarter) idea for 3D versions.

Storage Management
- I assume that we have millions of 3D versions, so we comprise accuracy with speed. I will go one step back to the original problem without prefix as a start.
- Initial idea is to use an approximate greedy (but parallel) algorithm.
  - For each tree level: 1- for each version we calculate the cost on each disk and machine. 2- for each disk/machine, select the set of videos to be processed/stored without exceeding resources.
  - Versions Number: v, Disks Number: d, Machines number: p. Complexity is O(v*d*p + v*(p+d)). If we parallelize steps (e.g. MapReduce) complexity can be O(v*d*p/m + v*(p+d)/r + grouping overhead), where m is mappers number and r is the reducers number.
- Second direction is to get rid of level-by-level traversal, but this may incur less accuracy.

Courses
- Machine Learning: submitted an assignment (coding + math questions) this Monday.
- Cloud Computing: regular material readings (this week: MapReduce, GFS paper, and two students paper + Spark streaming paper)

Summer 2014 (RA)

Courses: NA

August 16

Storage Management
- I sent a new report, discussing a new research direction to employ multi-tiered storage architectures in 3D streaming systems, and propose a resources management algorithm in such systems.
- The report is almost complete especially in the technical details of the proposed algorithm. This is a list of incomplete parts:
  - Add one paragraph in the introduction about the evolution of disks and multi-tiered storage systems.
  - Add one more related work from ACM MM'13.
  - Add background materials.
  - Small modification to the problem definition (straightforward).
  - Complete the formulation part of the second algorithm (straightforward).
- Also, I propose different simulations scenarios with better performance metrics (storage, user hold time, playback discontinuity, price, and power consumption).

3D Measurement Study:
- The libpcap wrapper and HTTP parser are already implemented (from last week).
- Start to implement the client/server model using HTTP with basic authentication. The server side is supposed to be multi-process, and both the server and the client using non-blocking socket routines.

Next week plan:
- Multi-tier Storage: Finalize important parts of the report, based on our meeting.
- Multi-tier Storage: Start the simulations, based on our next Skype meeting.
- Measurement Study: Run the client with dummynet. Start to design cases with dummynet, and run them.
- Measurement Study: Finish the client/server implementation, and finish the system design as well.

June 23

Storage Management
- I wrote the report describing current models.
- Implementation:
  - Workload generator is done using python. I had some challenges in running the generator. The users requests generator is very slow. I implemented it using CPU threads pool and GPU but it's still slow (1000 videos, 45000 versions, with 10 users/sec may take about 12 hrs!). The GPU implementation is written by CUDA/C and called by the python code. Currently, I am trying to approximate the generation of the users requests to make it faster, while preserving the average request rate.
  - I am starting implementing the models (prefix storage and storage management) this week and the next one. I will use CPLEX with python apis, and an open source C++ library for min-cost max-flow (by Yuri Boykov's group http://vision.csd.uwo.ca/code/). I spent some time reading about cplex apis.

May 30

Storage Management
- I have finished the prefix storage and storage management models in a distributed setup of disks and machines.
- Prefix Storage:
  - I aim not only to manage the storage medium (disks), but to reduce the storage requirements for each version (if possible). Hence, the system needs to store initial segments (prefix) of the version, while keeping the original video untouched for future processing when required. In this problem, I am using the results of user behavior studies by Yu et al. (EuroSys'06) and Dobrian et al. (SIGCOMM'11).
  - To address this problem, I developed two models. First one is an ILP optimization and the second one is a Bayesian Net. model. I still don't know which is the best (each one has pros/cons)
    - ILP: minimizes the storage requirement, while satisfying the expected viewing length of the version, and maximizing the joint probability of watching the segments to be stored. The first one is calculated using the quality and video length. While the joint probability is based on the segment index within the video (viewing probability decreases when the index increases). Generally, it is an NP-hard problem.
    - Bayesian Net.: a probabilistic model was the first thing to think about (before the ILP), since watching a segment is a random process by itself. The first models (false starts) were Markov chain and Hidden Markov Models. These models are naive to capture the dependencies between watching and storing the segment. On the other hand, Bayesian network looks promising, since it captures these dependencies and can be extended to be used in other problems (like caching, same watching model we build is used in different problems). Storing each segment is dependent on watching that segment and watching the previous one. Also, watching the segment is dependent on watching the previous segment. We want to infer the joint probability of storing consecutive segments, and select those segments with the highest probability. The inference task can be solved using belief propagation algorithms. I am still not sure if these algorithms can run in polynomial time, but my intuition is that belief propagation algorithms seem to be decentralized and can run fast.
- Storage Management:
  - Unlike our previous work, where we have two cases of storing or processing the version, here we have three cases: storing the whole version, processing the whole version, or partially storing and processing the version. Modelling such a problem as an optimization problem may lead to an increase in the number of variables, hence, increasing the running time. I was motivated by the network flow literature to model the problem. I had some false starts like min-cuts, supply-demand with circulations. Finally, I modeled the problem as a min-cost max-flow, where each edge has a cost (processing/storage costs) and a capacity (number of versions). Mainly, we need to send a flow of N versions from the source, and determine which edges encounter the minimum costs to reach the sink (which is connected to the disks and machines), while satisfying the storage and processing requirements and latency. min-cost max-flow has polynomial running time. This model looks promising. I had finished the design of the graph structure (which becomes harder in the case of prefix storage), and assigning costs and capacities.
  - Factors like: number of users, network latency between disks and machines, disks bandwidth and total waiting time are handled as well. While some factors like seek time and rotational delays (~20-30 msec) are not yet handled.
- Next steps:
  - Currently, I am writing the report that describes the above three models with more details.
  - I will start implementing the prefix storage models next week, and compare between them as well. Hopefully, I can start with the management algorithm itself as well.
  - I have some ideas about the evaluation, I will include them in the report as well.
- Extras:
  - I started an activity 4-6 weeks ago, where I am trying to make a comprehensive literature review on topics in streaming. My objectives, in addition to including those contributions in our future research, are to build a knowledge base for myself and criticizing each work and how to address its shortcomings. I am trying to select seminal works from big conferences and journals. The main topics are:
    - Perceptual Modeling and Issues in 3D Videos,
    - Streaming Systems,
    - Storage Management in Streaming Systems,
    - Rate Adaptation Algorithms,
    - Energy Saving in Streaming Systems,
    - Caching Algorithms in Streaming Systems,
    - Prefetching Algorithms in Streaming Systems,
    - Traffic Analysis of Streaming Systems,
    - User Behaviors in Streaming Systems,
    - Distributed and Cloud-based Streaming Systems,
    - Performance Metrics in Streaming Systems

FVV project:
- I was working on the prefetch stream prediction:
  - Ahmed suggested using velocity and direction of the view switching to predict the prefetch stream. However, Pterovic (thesis'13) had proposed the same technique.
  - I suggested using a simple probabilistic model like Markov chain, to capture the relationship between the views. Ahmed had some doubts about it, hence, I didn't go with this direction.
  - I suggested using a heuristic-based technique based on moving Gaussian while changing its skewness to capture the user behavior of choosing the direction. My intuition was to capture the probability of ending up with view vj when I am at view vi, and depending on this probability value, we either fetch an extra stream or not. Ahmed had some doubts about using Gaussian, but my idea was "Gaussian is the normal". We didn't go with this direction.
  - Finally, I suggested making a real study conducted by 10-20 subjects. Then, we collect the data and build the conditional probability distribution. This one looks the more promising and Ahmed is still thinking about it.

May 18

Storage management
- I am reviewing some studies about the user behavior, user interactivity and content popularity in streaming sessions.
- I believe that we may employ the results of some of these studies in our storage management, but not performing such studies.
- These studies will affect the prefix storage size, and may affect the selection of some costs in the graph.
- I am trying to study the distributed streaming systems as well (e.g. cloud-based, or a dedicated cluster). Storage and processing heterogeneity affects the choice of costs.
- Disk seek time and network latency between disks and processing machines should be included as well.
- We need to employ the version bitrate, we ignored the bitrate in the previous work.
- The algorithm will output one of these outputs
  - If there is feasible solution, a set of stored versions, and a set of to-be-processed versions
  - Otherwise, the algorithm proposes quantitative suggestions like adding more storage, machines or modifying the waiting time.
- I think modeling the problem as min-cost max-flow is sufficient, and our contribution is to aggregate all of the above issues as costs.
- The main challenges
  - Define clear objectives and the scenario of the study
  - Abstracting the heterogeneity of storage and processing architectures using throughput-oriented costs "How many segments/second this machine can generate? How many segments this disk can store?"
  - Selection of prefix storage size for each version.
  - Relating the maximum waiting time (the input parameter) with the system parameters like seek time, network delays, etc...

Measurement study
- I am revisiting some studies that performed measurement studies on real systems.
- I am still trying to know exactly what to measure/evaluate given the packets history of a session. How to detect different events like buffering, re-buffering, smooth viewing etc...
- Then how to infer results from the measurements.
- Conducting the study on different devices, browsers, and networks is important to fully understand what happens.
- Implementation: we can build a packet sniffer and start a streaming session. I am thinking of using C and libpcap (http://www.tcpdump.org/) for sniffing. I read that python-based sniffers perform bad (high CPU loads). I will start some implementation this week, and see how to detect events from packets.

Spring 2014 (RA)

Courses:
- CMPT-705: Design and Analysis of Algorithms
- CMPT-820: Multimedia Systems

Submissions:
- NA

Publications:
- [accepted] K. Diab, T. ElGamal, K. Calagary, and M. Hefeeda, "Storage optimization for 3d streaming systems," in Proc.of ACM Multimedia Systems Conference (MMSys’14), Singapore, March 2014.

Submissions Plans:
- ACM Multimedia in March 31, 2014 (http://www.acmmm.org/2014/)

Mar 23

FVV project:
- Implementation of a program to calculate PSNR and MSE values for reference views between the uncompressed videos and the 10 representations we have. I wrote a script to automate the calculations over our dataset.
- For the next 3-5 days, I will work on 'related work' section and start the evaluation plan.

Courses work:
- Algorithm: quiz, I spent some time reading from the textbook and external materials.
- Multimedia: paper presentation on perceptual issues in stereoscopic images, the presentation lasted 80 min divided on two classes; there was an interest from the audience to discuss different aspects of the paper. Also, I proposed an idea for the term project, where I am currently working on.

Feb 23

No major progress in the research of the FVV project. Mainly, this is because I was investigating different ideas to control the segment buffer.
Some extensions ideas had been discussed with Ahmed Hamza and I added them to my notes.
In the implementation part, I was reading the libdash code and the current implementation of Ahmed Hamza. I think that I had discovered the reason of the unexpected behavior in case of view switching:
- Segment buffers are controlled by one thread. Each frame buffer is controlled by a different thread. Also, there is a thread to handle the view switching. When the view switching happens, the view switching thread tries to modify the buffers while other threads accessing these buffers. There are locks, and here comes the problem of the deadlock.
- My solution is: when the view switching happens, the view switching thread sends a signal/message to other threads to modify their buffers. The view switching thread mustn't modify the buffers, hence, avoiding deadlocks.
- I discussed the solution with Ahmed Hamza and it looks fine, I will applying it to the current implementation and see what happens.
For the courses:
- Multimedia: I will deliver the H264 encoder/decoder this week. Also, I will present the first paper.
- Algorithms: I have a quiz in Feb. 25, 2014.

Feb 07

Currently, I am working with Ahmed Hamza for the FFV rate adaptation project
Working, with Ahmed, on the optimization problem formulation
- Specifically, I am investigating to employ the queuing theory in the formulation to increase decoders utilization and keep buffers full as we can. We model each decoder as a server, and each segment as a client. Some challenges are modeling the arrival rate distribution, and relating the queuing theory formulas to our formulas.
- Number of decoders may increase in order to increase the throughput of the whole decoding step.
Working on re-implementing this pipeline in the client-side: downloading -> buffering -> decoding. Some of our objectives:
- Reduce overheads of spawning new threads for decoders and other steps. Each group of threads should be in a thread pool.
- Increase the decoding throughput. Currently, the segments at time "t" are decoded in order, even if there are some downloaded segments of time "t+1". Mainly, we need to keep in-order decoding but enable decoding of segments at different timestamps. To do this, I will remove the decoding thread (which acts as a distributor but a barrier at the same time) and use a push model to push segments to decoders pool thread. On the output side, I will use a priority queue rather than the normal queue, to store the decoded frames in-order. Frames should be indexed properly.
- The current control flow is not easy to change or modify which may lead to more bugs. I am trying to simplify and organize the control flow. For example, what does the player do when the user changes the viewpoint?
There was a discussion about moving some of the processing to the server side.

Jan 24

I worked on the camera ready version of our MMSys'14 paper.
Currently, I am working on the literature survey for the rate adaptation project. This report should have been delivered tonight, but I couldn't complete all the sections that I want to write. I will deliver it and complete it by the next weekend.
For the rate adaptation project:
- There had been different approaches in the academia for 2D videos.
- Rate adaptation for 3D videos are rare. This means -somehow- that we are in the good direction.
- In the next four weeks, I should take some important decisions. Some questions that I need to answer:
  - Do we need to deal with 3D videos as two different views? and not only as a 2D image. This will let us exploit some facts about 3D videos.
  - There are many objectives that define a "good" rate adaptation algorithms. What objectives do we need to handle in our algorithm?
  - What is the problem definition?
  - How are we going to model the problem and derive the solution? I am thinking of two approaches right now. Optimization problem or Vector space modeling. In both cases, a "good" objective function or a "good" similarity metric should be designed.
  - Human Visual System for 3D perception should be included in the model as well
  - How to evaluate our model? (Simulation, Emulation, real system?)
  - How to integrate such a model with the current system? How to extend the current storage optimization algorithm to support the storage requirements of multiple-bitrate versions?
For the courses:
- Multimedia: I will deliver a hierarchical JPEG encoder/decoder implementation next week. I had implemented it using OpenCV. TODO components are: Huffman coding (can use external code), hierarchical flow, GUI and the document. Also, I will present two papers (during the course):
  - 3D-TV content creation: Automatic 2D-to-3D video conversion
  - Perceptual issues in stereoscopic signal processing
- Algorithms: I have a quiz in Feb. 4, 2014.

Private:progress-diab

Contents

Spring 2015 (RA)

January 29

January 12

Fall 2014 (RA)

December 15

October 02

Summer 2014 (RA)

August 16

June 23

May 30

May 18

Spring 2014 (RA)

Mar 23

Feb 23

Feb 07

Jan 24