Difference between revisions of "Private:progress-hamza"
From NMSL
m |
|||
(140 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | = Spring 2015 (RA) = | ||
+ | |||
+ | === Feb 9 === | ||
+ | * Ran additional experiments with a wider range of network bandwidth values for the two video sequences in the evaluation. | ||
+ | * Captured the network packets for a couple of streaming sessions to check the cause for the gap between the actual and estimated network bandwidth values. | ||
+ | ** So far, I do not see any indication that the TCP window is being reduced. I did however find that sometimes the server was closing one or two of the opened connections after some time. | ||
+ | ** After further investigation, I found that default KeepAliveTimeout value in the Apache2 Web server was 5 seconds. The two closed connections were a result of this timeout being triggered at some point during playback. I changed the server's configuration to make the timeout 20 seconds. | ||
+ | * Implemented an independent segment encoding, decoding, and transcoding utility. This is needed to automate the process for obtaining the average quality per segment for each representation (and will be helpful in future work if we dynamically need to change a segment's bitrate online). | ||
+ | ** Also implemented a DASH segmenter based on GPAC's API (this will be useful if we need to generate DASH segments from a live stream). | ||
+ | * Working on editing the scripts to include the average qualities per segment for the reference views in the MPD file. | ||
+ | ** Also modifying the player's code to take this information into consideration in its rate adaptation logic. | ||
+ | |||
+ | |||
+ | = Spring 2014 (RA) = | ||
+ | * '''Courses:''' None | ||
+ | |||
+ | * '''Publications:''' | ||
+ | ** A DASH-based Free-Viewpoint Video Streaming System (NOSSDAV'14), Mar 2014. | ||
+ | |||
+ | |||
+ | === Apr 12 === | ||
+ | * Mainly working on refactoring the client implementation to get it working better with the new features. | ||
+ | * Cheng provided a GPU machine for me and I'm trying to get the results for the RD measurements faster using multiple machines (also tried GPU instances on AWS but has some configuration problems so I decided to look into it later). | ||
+ | |||
+ | |||
+ | === Apr 5 === | ||
+ | * Virtual views distortion calculation scripts are running for one sequence but may take some time to finish (running for two days so far). Synthesizing, writing to disk, calculating distortion, for each virtual view using each possible combination of bit rates is time consuming. | ||
+ | * Discussed with Khaled the refactoring of the client code. We are hoping to get a lock-free design to eliminate all random behaviours when switching views. | ||
+ | * Completed the writing of 5 sections. | ||
+ | * Working on the evaluation plan. | ||
+ | |||
+ | |||
+ | === Mar 28 === | ||
+ | * Realized that the virtual view distortion model does not really need to include the D-Q relationship since the videos are already pre-coded. Therefore, we will include the per-segment distortion in the MPD and I'm comparing the accuracy of the model against the cubic model proposed by Velisavljevic et al. and will use the one with higher accuracy. | ||
+ | * Still working on the pre-fetching code. | ||
+ | * The write-up so far is available on the SVN server. | ||
+ | |||
+ | |||
+ | === Mar 21 === | ||
+ | * Writing scripts to calculate the distortions of reference views representations and the virtual views distortions (PSNR and SSIM) corresponding to different combinations of reference views representations. The goal is to obtain the values for the different coefficients in the virtual view distortion model. | ||
+ | * Working on the write-up for the paper based on several discussions and meetings with ChengHsin. | ||
+ | * Fixing some issues with the client and adding the distortion model based adaptation logic and stream pre-fetching. | ||
+ | |||
+ | |||
+ | === Mar 7 === | ||
+ | * Working on the implementation of the adaptation based on the distortion model. | ||
+ | * Worked with Khaled on debugging and fixing the deadlock problem in the client. | ||
+ | * Discussed with Khaled today how to refactor the client and make it more generic. Meeting with him tomorrow to discuss the evaluation plan. | ||
+ | * Initiated the write-up for the paper. | ||
+ | * Preparing the NOSSDAV presentation. | ||
+ | |||
+ | |||
+ | === Feb 21 === | ||
+ | * Almost done with the problem formulation. | ||
+ | * Met with Khaled and, based on our discussion, the client will pre-fetch two side views and allocate the bandwidth to them differently based on the current viewpoint and the navigation speed and direction. Therefore, if the user is changing views in the left direction: | ||
+ | ** The side view on the right will be assigned the minimum rate and the remaining available bandwidth will be divided between the two reference views and the left side view. | ||
+ | ** The left side view will be allocated a percentage <math>p</math> of the remaining bandwidth (e.g., 30 %). | ||
+ | ** The bitrate of the reference views' components will be allocated based on the rate-distortion formulation. | ||
+ | * The main difficulty with the implementation is that unlike 2D where we know there is only one stream being decoded and played back, we now have multiple streams. In the case of 2D streaming, we fill the buffers up to a certain threshold to absorb sudden changes in the network bandwidth. In the multi-view case, however, it is still not clear to me how this can be done efficiently. | ||
+ | ** If we based or segment selection decisions on recent viewer behavior (for example knowing the velocity and direction of view switching during the previous period), we won't be able to make the decision until the playback of the current segment is done, which is too late. Increasing the segment buffer level to a certain threshold before playback will also not work since the decision made in this case will be based on information which is far behind. | ||
+ | * Will start working on the code next week. | ||
+ | |||
+ | |||
+ | === Feb 7 === | ||
+ | * Working on the camera-ready version for the NOSSDAV paper. | ||
+ | * Working on the extension of the paper for ACM MM. | ||
+ | * Held a meeting with Khaled and we brainstormed about what needs to be done and the tasks that he will be working on. This will mainly include: | ||
+ | ** The possibility of employing queuing theory in the formulation to model the buffers dynamics. | ||
+ | ** Increasing the efficiency of the player by eliminating the bottleneck caused by thread synchronization. | ||
+ | |||
+ | |||
+ | === Jan 31 === | ||
+ | * '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-projects/3DVideo/DASH-FTV/documents/techReps/dash-ftv-rate-adaptation/doc/dash-ftv-rate-adaptation.pdf here] | ||
+ | * Completed the formulation of the rate adaptation optimization problem based on the distortion of the virtual view. | ||
+ | * Working on adding other objectives constraints such as quality variation and buffers levels. | ||
+ | * Discussed the work with Khaled. He is taking a look at the paper, reports, and code. I will assign to him a task after he is done with his class quiz on Monday. | ||
+ | |||
+ | |||
+ | === Jan 24 === | ||
+ | * I have a few models for the virtual view quality. None of them completely addresses what we are looking for. | ||
+ | ** One model relates the average distortions of the reference views components to the distortion of the distortion of the virtual view. We have used that in our previous work but the criticism that we got was that the linear model is not an accurate one. | ||
+ | ** Another model relates the quantization step value of a component (texture or depth) to the induced distortion by that component in the virtual view. However, this does not consider the case where the quantization step value for the left and right images are different and the evaluation was based on MVC encoding. Since in DASH the components of each view will be encoded separately and the adaptation logic may choose different bitrates for the left and right streams, this model will may be useful in the case of equal Qstep values. The validity of the model in the case of simulcast encoding however needs to be checked. | ||
+ | ** The same authors also propose a relationship between the <math>Q_{\text{step}}</math> of the texture components and <math>Q_{\text{step}}</math> of the depth components in order to find the optimum allocation for bit rate between them. | ||
+ | ** The last two models seem to only consider the middle virtual view position. | ||
+ | * If we are to use a model that relates <math>Q_{\text{step}}</math> (or the quantization parameter QP) to the distortion of the virtual view. Then, we need to consider variable bit rate encoded representations since the QP or <math>Q_{\text{step}}</math> will be fixed. | ||
+ | * I'm still working on the formulation of the problem and how to relate the virtual view distortion to the bitrate/quality of individual components of the reference views. | ||
+ | * I have also looked into several works incorporating the power consumption aspect into the rate adaptation decision. Some also adjust the scheduling of segment fetch times in order to allow the radio interface to go to sleep for longer times without having the segment buffer completely drained. Given the fact that we have multiple streams, this seems to be an interesting problem. One aspect that was not considered in the previous work is the decoding power consumption for the chosen representation. | ||
+ | |||
+ | |||
+ | === Jan 17 === | ||
+ | * Working on formulating the optimization problem for rate adaptation. | ||
+ | * Making the player more stable and eliminating thread synchronization issues. | ||
+ | |||
+ | |||
+ | |||
+ | = Fall 2013 (RA) = | ||
+ | * '''Courses:''' None | ||
+ | |||
+ | * '''Submissions:''' | ||
+ | ** A DASH-based Free-Viewpoint Video Streaming System (NOSSDAV'14) | ||
+ | |||
+ | |||
+ | |||
+ | = Spring 2013 (TA) = | ||
+ | * '''Courses:''' None | ||
+ | |||
+ | |||
+ | === March 4 === | ||
+ | * Developed a stereo player around the sample player that comes with libdash. We are now able to render two video streams side-by-side. However, currently the MPD files of the two streams must have the same structure. This does not seam to be a problem for now as we can generate similar MPDs. | ||
+ | * Som has informed me that they have migrated the sampleplayer code to Qt. This should make modifications easier in the future and allow us to use more GUI components. | ||
+ | * Had a meeting with Som and we have two main issues we are working on right now: | ||
+ | ** Making sure that the frame pair being rendered from the two streams is synchronized. If one of the streams is faster than the other, it is possible that one side will be overwritten by several frames before a new frame is drawn to the other side. We are currently looking at intelligent ways to make sure this does not happen or fix it. | ||
+ | ** The view switching logic. This will require attaching and detaching several decoders on-the-fly to the renderer during run-time. We are also considering the implications of having multiple receivers/decoders running simultaneously to enable fast view switching. | ||
+ | |||
+ | |||
+ | === Feb 4 === | ||
+ | * Discussed with Som and surveyed potential software for the FTV project. We created a Wiki page [[Private:FTV|here]] with our findings and updates. | ||
+ | * We decided on several libraries and a 3D video player for the implementation and demonstration of our system. We are now looking into the code of libdash, DASHEncoder, and Bino to understand where we can add our changes and whether we can extract some functionalities out from their codebase. | ||
+ | * We also settled on the '''isoff-main''' MPD profile and on initially having the views coded separately. | ||
+ | * Managed to get the sample DASH player that comes with libdash working. But the playback is slow and the quality is not very good. We are looking into the cause of this. In the meantime, we are making necessary modifications to the code to render two video streams side-by-side. This turned out to be non-trivial since it is involving using OpenGL in addition to SDL (which the sample player was using). However, this is progressing. | ||
+ | * Had a meeting with Khaled regarding the retargeting project and the evaluation of the work. Agreed on comparing the work first against simple scaling and making sure that the outcome is better in terms of preserving the shapes of important objects. Also, the comparison will be done against seam carving to illustrate the significant speed-up. | ||
+ | |||
+ | |||
+ | |||
+ | = Fall 2012 (RA) = | ||
+ | * '''Courses:''' None | ||
+ | |||
+ | |||
+ | * Worked on the survey paper and finishing the revision based on the comments. Will send it soon. | ||
+ | * Attended WaveFront's [http://www.wavefrontac.com/wavefront-events/academic-industry-relations-waveguide-seminar/ Academic-Industry Relations WaveGuide Seminar]. It is apparent that the companies that presented (e.g. Sierra Wireless, Nokia, Ericsson) have interest in machine-to-machine (M2M) communications and the Internet of Things (IoT). Research done in some labs at UBC are in that direction. | ||
+ | * Discussed with Khaled about the progress of the retargeting project and the next steps. Apparently he had a bug in the code that was causing a long memory allocation time due to the removal of an initialization line that was present in my code. Khaled has prepared the communiation framework for the distribution of the retargeting process and is now integrating my GPU retargeting code in that framework. | ||
+ | * In my meeting with him, we concluded that we should mainly focus on optimizing the performance of retargeting a single frame and decide whether we cannot do better than what we have in the current implementation or whether we can avoid any possible bottleneck. I referred Khaled to check out the NVIDIA Visual Profiler and try to collaborate with me to pinpoint the bottleneck on the GPU. | ||
+ | * Based on the work presented in the paper '''Coarse-to-fine temporal optimization for video retargeting based on seam carving''', it might be possible to avoid constructing an energy map in each frame and instead do this for key frames. This is possible by exploiting the motion vectors that are available in the encoded bitstream. The idea is that pixels to be removed from a certain frame should also be removed from following frames to maintain consistency. Therefore, we can locate the region where the seam found in one frame is located in a following frame using the motion vectors. | ||
+ | * Another possible enhancement when using the gradient magnitude to construct the energy map is to avoid repeating the convolution operator that gives as the gradient magnitude on the entire frame after seam removal and just resrict it to the region surrounding the seam that was removed in the previous iteration. This is based on the fact that pixels surrounding the seam will be neighbors after seam removal and thus only this region within the frame has changed and there is no nead to calculate the gradient magnitude for the rest of the frame. However, whether this will be practical (and efficient) on the GPU will need to be determined. | ||
+ | * After upgrading the machine at NSL that has the GPU installed in order to compile Khaled's code, who is apparently using the new C++11 standard that was not supported by the compiler version on the machine, I'm having authentication problems and I'm not able to use the machine. I'm working with Ben and Jason to fix this as soon as possible (we may reinstall the operating system) so that I can continue working on the code with Khaled. | ||
+ | |||
+ | |||
+ | |||
+ | = Spring 2012 (RA) = | ||
+ | * '''Courses:''' None | ||
+ | |||
+ | |||
+ | === March 17 === | ||
+ | * '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/hamza/reports/March2012/doc/reportTemplate.pdf here]. | ||
+ | * Implemented the seam carving algorithm using the dynamic programming approach and OpenCV | ||
+ | * Searched for a suitable graph cut library to utilize for performing the energy minimization in seam carving for videos as per Avidan and Shamir's second paper. The Centre for Biomedical Image Analysis (CBIA) provides a recent library that seems to be flexible enough. I contacted them to gain access to it and am working my way through the API. | ||
+ | |||
+ | |||
+ | === March 6 === | ||
+ | * Surveyed recent research on video re-targeting. Report can be found [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/hamza/reports/March2012/doc/reportTemplate.pdf here]. (last updated: Mar 6) | ||
+ | |||
+ | |||
+ | === February 21 === | ||
+ | * Attending MMSys'12 | ||
+ | * Preparing for depth exam | ||
+ | |||
+ | |||
+ | === February 7 === | ||
+ | * Finalizing depth exam report and preparing slides | ||
+ | * Preparing for MMSys'12 presentation and Nokia meeting | ||
+ | |||
+ | |||
+ | === January 24 === | ||
+ | * Completed the implementation and added a section on the allocation algorithm to TOMCCAP paper. Almost done with the review and will send draft in a few hours. | ||
+ | |||
+ | |||
+ | === January 10 === | ||
+ | * I implemented the scheduling algorithm and had to change many parts in the simulator. It is running now but I'm not getting the expected results. I tried comparing my logic with both Cheng's and Som's codes. Everything seems to be in order and correct. I'm stuck at this point and still trying to figure out why the simulator is not getting any feasible solution, even when reducing the number of channels (streams) and after implementing a logic to reduce the number of layers under consideration from the texture component! | ||
+ | |||
+ | |||
+ | |||
+ | = Fall 2011 (RA) = | ||
+ | * '''Courses:''' | ||
+ | ** CMPT-726: Machine Learning (Audit) | ||
+ | * '''Publications:''' | ||
+ | ** Multicasting of Multiview 3D Videos over Wireless Networks (MoVid'12) | ||
+ | |||
+ | |||
+ | |||
+ | === December 27 === | ||
+ | * Went over previous burst scheduling work by Cheng, Som, and Farid. | ||
+ | * In our MoVid paper, we have already decided on which layers to transmit from each stream based on the bit rates of the substreams and the channel capacity. It is then necessary to organize the frames of each stream into bursts within the scheduling window in order to minimize the energy consumption at receivers while satisfying the buffer constraints (no overflow or underflow). | ||
+ | * The difficulty of allocating MVD2 3D video bursts is that each video stream is actually four components which we group into two streams (one stream for texture components and one stream for the depth components) in our S3VM algorithm. In the double buffering technique used in Cheng's work, each channel had only a single stream and each receiver had one receiving buffer. In order to decode a frame in the 3D video stream, corresponding video data from all four streams should be available at the decoding deadline of the frame. The question now is, can we treat the texture and depth components' streams as a single stream during the burst scheduling process? And should we consider a single buffer for both components at the receiver side or do we need two separate receiving buffers (one for each component)? It should be noted that the depth components' stream will have a lower bit rate than texture. | ||
+ | * I am considering summing up the rates of the texture and depth components and perform the schedule allocation as if they are a single stream. This way, we will still maintain a single receiver buffer. It is assumed that the some signaling is available in the received data which enables the receiver to distinguish the packets or NAL units of the four streams (left texture, left depth, right texture, right depth). After filling the receiver buffer during a window, the receiver will swap the buffers and drain the receiver buffer in the next window while distributing the NALUs to four different NALU buffers. | ||
+ | * I am also considering as future work to perform more fine grain allocation to the OFDMA resources. This however may require reformulating the problem from the beginning. In our MoVid work, we considered that the entire OFDM downlink subframe will be allocated to a single video channel and that the modulation and coding used for all resources within the stream are fixed. However, on a finer scale, each resource within the downlink subframe may be allocated to a different channel and use a different modulation and coding scheme. I came across three recent papers discussing this and performing adaptive resource allocation for single layer and scalable video multicast. | ||
+ | |||
+ | |||
+ | === December 13 === | ||
+ | * '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/hamza/reports/December2011/doc/reportTemplate.pdf here] (last updated: Dec 13) | ||
+ | * Added another paper related to VoD delivery with the assistance of cloud CDNs to the report. Also added a discussion on primitives related to view manipulation and synthesis in 3D videos. | ||
+ | * Created a list of open source computer vision and image and video processing libraries. List is accessible [[Private:OSS_Multimedia_Libraries|here]]. Most image processing tools required are already in OpenCV. However, GPU-based implementations that achieve higher speed-up are also available. The latter will be useful in real-time systems and can be utilized on Amazon EC2 GPU nodes for example. | ||
+ | * Have been trying to come up with a novel idea to extend our MoVid work. However, still did not find a good direction. One side issue that can be checked is to address one of the comments regarding using PSNR as a metric (although we also mentioned SSIM, which is a perceptual metric, will give similar results). To provide virtual view adaptation we need a good quality metric for the virtual views. A [http://hal.archives-ouvertes.fr/docs/00/62/80/66/PDF/bosc.pdf recent ICIP paper] indicates that new methods are required for assessing virtual synthesized views because pixel-based and perceptual-based metrics fail. The paper also mentions that depth should also be taken into account. We can also evaluate the Peak Signal-to-Perceptual Noise metric used by MPEG in their evaluations (currently looking into this). | ||
+ | * I was also thinking whether we should check if other virtual view distortion models yield better results. However, this may require a totally different problem formulation. Another issue is how to schedule the bursts to satisfy buffer constraints, but this will be very similar to previous works by Cheng, Farid, and Som. So, it will not be very new. | ||
+ | * Managed to encode a video sequence with good quality when rendered on the new display. | ||
+ | |||
+ | |||
+ | === November 29 === | ||
+ | * Working on MoVid paper and depth exam report. | ||
+ | * Looking into container formats for media content as well as libraries that enable creating them and extracting content from them. This is required for selectively extracting content from stored files when performing adaptation. Two EU projects that have been working on a similar system (no explicit mentioning of the cloud though) are the [http://www.coast-fp7.eu/ COAST] and [http://www.ist-sea.eu/ SEA] (SEAmless Content Delivery) projects. A couple of months ago, COAST presented the world’s first working prototype of a [http://www.coast-fp7.eu/press_release_1.html 3D adaptive video receiver] based on the novel Dynamic and Adaptive HTTP Streaming (DASH) standard. To avoid starting from scratch, I'm going over their deliverables and reports to determine what can be leveraged from their work. | ||
+ | * Gathering information on available tools to perform the different multimedia processing primitives. Will create a Wiki page and add my findings so far. | ||
+ | * Managed to generate both still image and video content that the new display accepts. Video quality was not so good. But I believe it is just a matter of the encoding and codec parameters. Will try a different codec as well as attempt to fine tune the encoding parameters. Next step will be providing the display with all the views instead of having its rendering component generate them. | ||
+ | * Following MPEG's meeting in Geneva this week. They are working on developing an efficient encoder for 3DV content. Many companies and universities have responded to thier call for proposals. They are working in two directions in parallel: proposals based on AVC and proposals based on the new HEVC (high-efficiency video coding) standard. It seems that for AVC-based coders they have settled on the software presented by Nokia because it showed good performance (less bitrate, better quality, reduced time complexity). For HEVC, HHI's proposal was chosen and other tools will be integrated to it from other proposals. Code for HHI's implementation is available, but not sure about Nokia's code (haven't found it yet). Nokia's implementation is however based on JM and VSRS. | ||
+ | |||
+ | |||
+ | === November 15 === | ||
+ | * '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/hamza/reports/November2011/doc/reportTemplate.pdf here] (last updated: Nov 15) | ||
+ | * Working on depth exam report. Current draft version can be found [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/hamza/reports/DepthExam/doc/reportTemplate.pdf here]. | ||
+ | * Prepared presentation for Nokia UR forum this Thursday [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-projects/3DVideo/3DVScheduling/documents/talks/NokiaUR_Nov17.pptx pptx]. | ||
+ | * Set up the new display and machine and tested them. Both are working properly. Experimenting with providing our own material using both video+depth or multiple views. For some reason they were hesitating on providing me with some necessary tools. Finally managed to get them today. | ||
+ | * Testing the DepthGate demo software to see if it will be beneficial to obtain a full-version license. | ||
+ | * Surveyed some necessary background and added some preliminary thoughts to report regarding server-side adaptation. | ||
+ | |||
+ | |||
+ | === October 19 === | ||
+ | * '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/hamza/reports/October2011/doc/reportTemplate.pdf here] (last updated: Oct 22) | ||
+ | * Dissected and understood the view synthesis reference software source code and implementation (which will be useful if we are going to implement this process on an adaptation server or cloud service). | ||
+ | * Trying to formulate a burst scheduling problem for transmitting 3D videos over wireless broadband access networks. | ||
+ | * Read several papers on adaptive display power management (the concept is summarized and described in report). It seems like an interesting idea. It may be possible to implement for 3D displays as well. However, some power measurement studies may be required on actual mobile 3D displays. | ||
+ | * Also doing some readings on [http://multimediacommunication.blogspot.com/2011/02/dynamic-adaptive-streaming-over-http.html Dynamic Adaptive Streaming over HTTP (DASH)] to see if it will be suitable for wireless networks and 3D video. | ||
+ | * Working on my depth exam report. | ||
+ | * Attended: | ||
+ | ** ICNP 2011 | ||
+ | ** Dr. El Saddik's talk: ''Research in Collaborative Haptic-Audio-Visual Environments and Serious Games for Health Application'' | ||
+ | ** Haiyang Wang's depth exam: ''Migration of Internet Services to Cloud Platforms: Issues, Challenges, and Approaches'' | ||
+ | |||
+ | |||
+ | |||
+ | === September 6 === | ||
+ | * '''Report:''' Sent by e-mail (SVN server was not functioning properly when I attempted to commit) | ||
+ | * Completed implementation and ran initial experiments to verify that implementation is working properly. Results are ok. | ||
+ | * Complete review of report. Addressed all comments, added more explanations/details, and restructured. | ||
+ | * I'm limited by the number of available video sequences so I'm trying to address this by starting from different frames and concatenating some sequences together. The encoding using JSVM is time consuming, which delayed the final results. I'm currently performing encoding on a couple of machines in the lab. | ||
+ | * The experiments will require varying the capacity (by varying the scheduling window size) and varying the number of videos to be transmitted. We can measure the objective function value obtained by our approximate solution to that of the optimal solution. We can also calculate how fast the running time will be compared to an exhaustive search solution that attempts to find the optimal value. | ||
+ | |||
+ | |||
+ | |||
+ | = Summer 2011 (RA) = | ||
+ | * '''Courses:''' None | ||
+ | |||
+ | |||
+ | |||
+ | === August 22 === | ||
+ | * Completed setting up an emulation environment within the lab using [http://wanem.sourceforge.net/ WANem] and [http://www.linuxfoundation.org/collaborate/workgroups/networking/netem netem]. | ||
+ | * Reviewed all the papers/work on asymmetric stereo encoding and transmission. | ||
+ | |||
+ | |||
+ | |||
+ | === August 8 === | ||
+ | * The first idea mentioned in the comments concerning unicast and choosing the best representation to transmit based on the user's viewpoint was my initial direction. However, as I mentioned in my May 31 progress summary below, this problem has a fixed number classes (in MCKP terminology), only 4 streams are to be transmitted (2 texture (L and R), and 2 depth maps (L and R)). This means that, unlike Som's work for example, where it was assumed that we have a large number of streams (e.g. between 10 and 50 in his experiments), the problem of selecting the best substreams in this case is indeed achievable in real-time. Assuming each of the 4 streams is encoded into 4 layers, we have 4^4 =256 combinations in the search space, which can easily be enumerated. | ||
+ | * Implemented a framework for solving knapsack problems including the 0-1 MCKP which we are using in the formulation. Also, added code to dynamically formulate the optimization based on a given number of input 3D videos and optimally solve this mixed integer programming (MIP) problem using the GNU Linear Programming Kit (GLPK) API. | ||
+ | * Also implemented the main classes for simulating the scheduling process at the base station for the multicast/broadcast scenario. For experimenting with a large number of video streams, one issue is that the number of available 3D video sequences that we have is 20. To increase the size of the input, I was thinking of segmenting the videos (e.g. every 1 second) in the hope of getting different average bit rates even for segments from the same sequence and use those segments as if they were different sequences. | ||
+ | * Another issue is having long video sequences. All of the sequences are quite short (a few seconds). This is not a problem per se. as we can just repeat the sequence. However, JSVM is complex and extremely slow. So, I have to reduce the dimensions of the sequences to the CIF resolution for example. However, this will require re-estimating the depth maps since simply scaling the depth maps down will not give correct view synthesis results. | ||
+ | * Looked into creating a testbed. I read Som's report and discussed with Saleh since I understand he was also looking for a WiMAX testbed. I have created an account at [http://www.orbit-lab.org/ ORBIT lab] to check out their WiMAX testbed. And I'm looking into how flexible their platform is. However, I feel that not having the equipment under our control makes it more difficult to run a demo since we need our own receiving device which is capable of decoding and rendering the video content. | ||
+ | * For 3D mobile devices, the current status is that there are no devices with a display that renders more than two views. Large auto-stereoscopic displays which are capable of rendering 26 views are however available (and yes we can buy them, actually the MPEG group are using them in their experiments). Another issue with setting up such a testbed is that the receiving device should be capable of decoding SVC streams. The [http://sourceforge.net/projects/opensvcdecoder/ OpenSVCDecoder] is indeed available, but can it be ported to Android without problems? And how far can it go? Two things that need to be tested. | ||
+ | ** [http://masterimage3d.com/ Masterimage] has revealed a [http://www.youtube.com/watch?v=9nxzu04_fws glasses-free 3D tablet reference design] that uses Masterimage's cell-matrix parallax barrier technology. Masterimage's Cell tablet is based on Texas Instruments' OMAP 4430 chip. Software outfit Cyberlink announced its partnership with Masterimage to create 3D video playback software that makes use of Masterimage's 3D display. | ||
+ | ** LG has already shown its first 3D tablet. The [http://www.youtube.com/watch?v=nWM29aXhS6w Optimus Pad 3D] has an 8.9 inch 15:9 display, with 1280 x 768 resolution. Unlike the Optimus smartphone, it's not autostereoscopic - it requires you to wear 3D spex. Still, the advantages of its 3D functionality are clear. The Optimus pad, which runs the Android 3.0 Honeycomb OS, has dual 5MP cameras for 3D photography and camcorder stereoscopy. 3D footage can be viewed on the tablet or by connecting to a 3D TV. | ||
+ | ** HTC also has the [http://www.youtube.com/watch?v=u0EDhhY_gKA EVO 3D] smartphone. | ||
+ | ** It is rumored that the next generation iPad (iPad 3) will have a glasses-free display. | ||
+ | ** At Computex, Asus unveiled the [http://www.youtube.com/watch?v=0p2AS5kVPkc Eee Pad MeMo 3D]. This Honeycomb alternative has a 7-inch parallax barrier no-glasses 3D display with a resolution of 1280 x 800. | ||
+ | * Attended a [http://gruvi.cs.sfu.ca/ GrUVi] talk on "Predicting stereoscopic viewing comfort". | ||
+ | * I'm reviewing the report again to address the comments that were listed and will post the updated version when done. | ||
+ | |||
+ | |||
+ | |||
+ | === July 12 === | ||
+ | * '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/hamza/reports/June2011/doc/reportTemplate.pdf here] (last updated: July 13) | ||
+ | |||
+ | |||
+ | === June 28 === | ||
+ | * '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/hamza/reports/June2011/doc/reportTemplate.pdf here] (last updated: June 28) | ||
+ | |||
+ | |||
+ | === June 14 === | ||
+ | * '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/hamza/reports/June2011/doc/reportTemplate.pdf here] (last updated: June 13) | ||
+ | |||
+ | |||
+ | === May 31 === | ||
+ | * '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/hamza/reports/May2011/doc/reportTemplate.pdf here] (last updated: May 31) | ||
+ | * Working on formulating the optimization that we discussed in the last meeting, it came to my attention that although the problem I'm trying to solve is an instance of the multiple-choice knapsack problem (MCKP), the number of classes is fixed and very small (only 4 streams). This means that, unlike Som's work where it was assumed that we have a large number of streams (e.g. between 10 and 50 in his experiments), the problem of selecting the best substreams in this case is indeed achievable in real-time. Assuming each stream is encoded into 4 layers, we have 4^4 =256 combinations which can easily be enumerated. | ||
+ | * While trying to think about an actual problem. I'm thinking of extending Som's work and utilizing client-driven multicast where the client subscribes to the channels of the desired views. A number of 3D video streams is to be transmitted. Each stream has N views. The views are encoded using SVC to a number of layers. The ''i''th view of the streams is multiplexed over a single broadcast/multicast channel. The receiver tunes-in/joins channels i and i+2 to receive two reference views which can be utilized to synthesize any view in-between. | ||
+ | * The problem now however seems to be more complicated as this will be a strange variant of the knapsack problem where we have two knapsacks but items that go into each knapsack come from different classes while there is only one joint objective function. I'm trying to figure out whether this resembles any known variant of the knapsack problem, but until now I was not successful. | ||
+ | * After thinking more about it, I decided that if there are only two views and the client is going to be receiving from only two broadcast/multicast channels, then it doesn't matter to which channel a substream is allocated and we can relax any assignment restrictions. This leads to a multiple choice multiple knapsack problem (MCMKP) which is still NP-hard. | ||
+ | * One way in which the MCMKP can be tackled is by partitioning it (sub-optimally) into two sub-problems: a multiple choice knapsack problem (MCKP) and a multiple knapsack problem (MKP). A similar approach was taken [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4150841 here] for the problem of providing QoS support for prioritized, bandwidth-adaptive, and fair media streaming using a multimedia server cluster. '''Note:''' I should mention that I wasn't able to find many resources about the MCMKP problem and it seems to me it is somehow similar to the multi-dimensional multiple-choice multi-knapsack problem (MMMKP) which also is scarce when it comes to material. | ||
+ | * After selecting the optimal substreams by solving the MCKP over the aggregate capacity of the two channels, we need to perform an assignment for each selected substream to one of the two channels in a way that will minimize any bandwidth fragmentation. | ||
+ | |||
+ | |||
+ | === May 19 === | ||
+ | * '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/hamza/reports/May2011/doc/reportTemplate.pdf here] | ||
+ | * Found a more recent and simple [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5728855 model] for the distortion of synthesized views based on the distortions of reference views and their depth maps. | ||
+ | * Formulating an optimization problem to select the best combination of substreams to transmit out of the two reference views and their corresponding depth maps in order to minimize the average distortion over all intermediate synthesized views while not exceeding the current channel capacity. | ||
+ | * There have been relevant work in the context of joint bit allocation between texture videos and depth maps in 3D video coding. In addition, another model has been utilized in a recent work on RD-optimized interactive streaming of multiview video. However, in that work, the authors assume the presence of multiple encodings of the views at the server-side. Our work will attempt to utilize SVC to encode the views at various qualities/bitrates and extract the best substreams that maximize the quality and satisfy the capacity constraint. | ||
+ | * Could not find the power consumption characteristics of the wireless chipsets mentioned by reviewer of ToM paper. The companies are not revealing them in their datasheets and they only advertise that they are ultra-low power. One [http://www.soccentral.com/results.asp?CatID=552&EntryID=20724 article] claims that the Broadcom BCM4326 and BCM4328 Wi-Fi chips enable a 54-Mbps full-rate active receive power consumption of less than 270mW. But no more details are given. | ||
+ | |||
+ | |||
+ | |||
+ | === May 5 === | ||
+ | * Currently working on formulating the rate adaptation problem for 3D video streaming using SVC that I mentioned in the last meeting. I'm writing a report on the problem and working on formally formulating it as an optimization problem. Expecting to be done with the formulation by the end of this week. | ||
+ | * Discussed with Som about his work on hybrid multicast/unicast systems but we could not find a common ground for leveraging that work to solve the high bit rate problem of 3D videos. The main issue is that in such systems patching is used to recover the leading portion from the beginning of the video stream which the multicast session has already passed. Attempting to transmit depth streams for example using separate unicast channels or patching does not apply here because the texture and depth streams are synchronized and are utilized concurrently. Moreover, other than streaming different views using separate multicast channels (which has been already proposed in several papers), it is not clear to me how multicasting would enable an interactive free viewpoint experience where the user is free to navigate to any desired viewpoint of the scene. | ||
+ | |||
+ | |||
+ | |||
= Spring 2011 (RA) = | = Spring 2011 (RA) = | ||
* '''Courses:''' None | * '''Courses:''' None | ||
+ | |||
+ | |||
+ | === Apr 22 === | ||
+ | * Downloaded and compiled Insight Segmentation and Registration Toolkit (ITK) and the Visualization ToolKit (VTK). The two libraries are huge and it took at least an hour to compile each. They utilize the '''cmake''' utility for configuring and building (even for creating new projects), and VTK provides a [http://www.vtk.org/Wiki/CMake:Eclipse_UNIX_Tutorial tutorial] on how to utilize cmake with Eclipse. Managed to read DICOM slices and save them as volume in the MetaImage format. Was also able to render the generated MetaImage as a volume using VTK. To take DICOM images as input and view them through VTK, we should first open the files and save them to volume as indicated in the Insight/Examples/IO/DicomSeriesReadImageWrite2.cxx example. Then we visualize the volume as shown in the example InsightApplications/Auxiliary/vtk/itkReadITKImageShowVTK.cxx. Some modifications to the latter source file were necessary to render a 3D volume. | ||
+ | * In biomedicine, 3-D data are acquired by a multitude of imaging devices [magnetic resonance imaging (MRI), CT, 3-D microscopy, etc.]. In most cases, 3-D images are represented as a sequence of two-dimensional (2-D) parallel image slices. Three-dimensional visualization is a series of theories, methods and techniques, which applies computer graphics, image processing technique and human-computer interacting technique to transform the resulting data from the process of scientific computing to graphics. | ||
+ | * DICOM files consist of a header and a body of image data. The header contains standardized as well as free-form fields. The set of standardized fields is called the public DICOM dictionary. A single DICOM file can contain multiples frames, allowing storage of volumes or animations. Image data can be compressed using a large variety of standards, including JPEG (both lossy and lossless), LZW (Lempel Ziv Welch), and RLE (Run-length encoding). | ||
+ | * Going from slices to a surface model (e.g. a mesh) requires some work. The most important is the segmentation. One needs to isolate on each slice the tissue that will be used to create the 3D model. Generally, there are three main steps to generate a mesh from a series of DICOM slices: | ||
+ | ** read DICOM image(s): vtkDICOMImageReader | ||
+ | ** extract isocontour to produce a mesh: vtkContourFilter | ||
+ | ** write mesh in STL file format: vtkSTLWriter | ||
+ | * It seems progressive meshes may not be very appropriate for representing the objects in medical applications. The doctors need to slice the object and look at cross sections. Meshes will only show the outer surface. The anatomical structure or the region of interest needs to be delineated and separated out so that it can be viewed individually. This process is known as ''image segmentation'' in the world of medical imaging. However, segmentation of organs or region-of-interest from single image is of hardly any significance for volume rendering. What is more important is the segmentation from 3D volumes (which are basically consecutive images stacked together), such techniques are known as ''volume segmentation''. A good, yet probably outdated, survey of volume segmentation techniques is given by Lakare in [http://www.cs.sunysb.edu/~mueller/teaching/cse616/sarangRPE.pdf this] report. A more recent evaluation of four different 3D segmentation algorithms with respect to their performance on three different CT Data Sets is given by Bulu and Alpkocak [http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4298660 here]. | ||
+ | * As mentioned by Lakare, segmentation in medical imaging is generally considered a very difficult problem. There are many approaches for volume segmentation proposed in literature. These vary widely depending on the specific application, imaging modality (CT, MRI, etc.), and other factors. For example., the segmentation of lungs has different issues than the segmentation of colon. The same algorithm which gives excellent results for one application, might not even work for another. According to Lakare, at the time of writing of his report, there was no segmentation method that provides acceptable results for every type of medical dataset. | ||
+ | * A somewhat old, yet still valid, [http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=865875 tutorial] on visualizing using the VTK was published in IEEE Computer Graphics and Applications in 2000. The vtkImageData object can be used to represent one-, two-, and three-dimensional image data. As a sub-class of vtkDataSet, vtkImageData can be represented by a vtkActor and rendered with a vtkDataSetMapper. In 3D this data can be considered a volume. Alternatively, it can be represented by a vtkVolume and rendered with a subclass of vtkVolumeMapper. Since some subclasses of vtkVolumeMapper use geometric techniques to render the volume data, the distinction between volumes and actors mostly arises from the different terminology and parameters used in volumetric rendering as opposed to the underlying rendering method. VTK currently supports three types of volume rendering—ray tracing, 2D texture mapping, and a method that uses the VolumePro graphics board. | ||
+ | * VTK can render using the Open GL API, or more recently Manta. The iPhone (and other devices such as Android) use OpenGL ES, which is essentially a subset of OpenGL targeted at embedded systems. A recent [http://www.vtk.org/pipermail/vtk-developers/2010-December/009076.html post] (December 2010) on the VTK's mailing list indicate that there is interest in writing/collaborating on a port of VTK's rendering for OpenGL ES. | ||
+ | * A paper on mesh decimation using VTK can be found [http://www.cg.tuwien.ac.at/courses/Seminar/SS2002/Knapp_paper.pdf here]. | ||
+ | * [http://meshlab.sourceforge.net/ MeshLab] is an open source, portable, and extensible system for the processing and editing of unstructured 3D triangular meshes. | ||
+ | |||
+ | |||
+ | |||
+ | === Apr 15 === | ||
+ | * Jang ''et al.'' proposed a real-time implementation of a multi-view image synthesis system. This implementation is based on lookup tables (LUTs). In their implementation, the sizes of the LUTs for rotation conversion and disparity are 1.1 MBytes and 900 Bytes for each viewpoint, respectively. The processing time to create the left and right images before using LUT was 3.845 sec, which doesn't enable real-time synthesis. Using LUTs reduced the processing time to 0.062 sec. | ||
+ | * Park ''et al.'' presented a depth-image-based rendering (DIBR) technique for 3DTV service over terrestrial-digital multimedia broadcasting (T-DMB), the mobile TV standard adopted by Korea. They leverage the previously mentioned real-time view synthesis technique by Jang ''et al.'' to overcome the computational cost of generating the auto-stereoscopic image. Moreover, they propose a depth pre-processing method using two adaptive smoothing filters to minimize the amount of resulting holes due to disocclusion during the view synthesis process. | ||
+ | * Gurler ''et al.'' presented a multi-core decoding architecture for multiview video encoded in MVC. Their proposal is based on the idea of decomposing the input N-view stream into M-independently decodable sub-streams and performing decoding of each sub-stream by separate threads using multiple instances of the MVC decoder. However, to obtain such independently decodable sub-streams, the video must be encoded using special inter-view prediction schemes depending on the number of cores. | ||
+ | * As indicated by Yuan ''et al.'', the distortion of virtual views is influenced by four factors in 3DV systems: | ||
+ | ** compression of texture videos and depth maps | ||
+ | ** performance of the view synthesis algorithm | ||
+ | ** inherent inaccuracy of depth maps | ||
+ | ** whether the captured texture videos are well rectified | ||
+ | * Trying to encode two-view texture and depth map streams using JMVC (the multiview reference encoder) to get an idea of how much overhead transmitting an additional view along with depth maps will be incurred when transmitting a 3D video over wireless channels. Managed to compile the source and edit the configuration files, but still get errors when encoding. Looking more into the configuration files parameters. | ||
+ | * Looked more into DICOM slices, it is simply taking parallel 2D sections of an object. Using those slices, and knowing the inter-slice distance, medical imaging software are able to reconstruct the 3D representation. The more recent versions of the DICOM standard enable packaging all the slices into one file to reduce the overhead of headers by eliminating redundant ones. | ||
+ | |||
+ | |||
+ | === Apr 8 === | ||
+ | * Gathered different thoughts from my readings in the ''Readings and Thoughts'' section of the [[Private:3DV_Remote_Rendering | 3D Video Remote Rendering and Adaptation System]] Wiki page. | ||
+ | * Could not find any work on distributed view synthesis. | ||
+ | * I went over the work done by Dr. Hamarneh's students. I read the publications and the report he sent. However, as far as I can see, it is an implementation work for porting an existing open source medical image analysis toolkit to the iOS platform. There are no algorithms or theory involved. That said, one of their future goals is to facilitate reading, writing, and processing of 3D or higher dimensional medical images on iOS (which only supports normal 2D image formats). Current visualization of such imagery on desktop machines is performed via the [http://www.vtk.org/ Visualization ToolKit (VTK)]. One of their goals is to also port this toolkit to iOS. Another possible tool that I found that is also based on VTK is [http://www.slicer.org/pages/Introduction Slicer], an open source software package for visualization and image analysis. | ||
+ | * Based on my readings progressive mesh streaming, it should be applicable in this context. However, I'm still not familiar with the standard formats and the encoding of such meshes (especially in medical image analysis and visualization applications). Generally, it seems that medical images have their own formats such as the [http://medical.nema.org/ DICOM] standard. Their initial thought is to transmit a number of what are known as DICOM slices to the receiver and then the receiver would construct the 3D model from them. So, this is still not very clear to me, as well as whether 3D video technologies may play a role in this. | ||
+ | |||
+ | |||
+ | === Mar 14 === | ||
+ | * '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/hamza/reports/February2011/doc/reportTemplate.pdf here] | ||
+ | * Added more details on homographies in the report. | ||
+ | * Implemented double warping and blending, as well as inverse warping using [http://arma.sourceforge.net/ Armadillo] C++ linear algebra library. | ||
+ | |||
+ | |||
+ | === Mar 7 === | ||
+ | * Added more detailed description of the view synthesis process. | ||
+ | * Implemented the first phase of the process (forward warping) and the z-buffer competition resolution technique in C/C++. I tested it on the Breakdancers sequence from MSR. | ||
+ | * Working on profiling the code using [http://oprofile.sourceforge.net/about/ OProfile] to calculate the number of cycles required by the view synthesis process to derive preliminary estimates of power consumption. | ||
+ | * Implementing double warping and a hole filling technique to get a feeling of the final quality that can be obtained. | ||
+ | * Understanding homography matrices and how they are used to speed up the synthesis process. | ||
+ | * Working on deriving a formal analysis of the time complexity of the view synthesis process. The projection phase basically involves a number of matrix multiplications. | ||
+ | |||
+ | |||
+ | === Feb 28 === | ||
+ | |||
+ | |||
+ | |||
+ | === Feb 21 === | ||
+ | * Familiarizing myself with JSVM and its tools and options. | ||
+ | * Contacted the lab that developed the reference software for disparity estimation and view synthesis described in the MPEG technical reports. Still haven't received a reply. | ||
+ | |||
+ | |||
+ | === Feb 14 === | ||
+ | * Reading about SVC and how to perform bitstream extraction | ||
+ | * Reading Cheng's paper on viewing time scalability and Som's IWQoS paper. | ||
+ | * Reading a couple of papers on optimized substream extraction | ||
+ | * Reading papers on modelling the synthesized view distortion in V+D 3D videos | ||
+ | |||
=== Jan 24 === | === Jan 24 === | ||
− | * The mobile market seems to shifting towards multicore processors. At CES 2011, at least two companies showcased their new mobile | + | * '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/hamza/reports/January2011/doc/reportTemplate.pdf here] |
+ | * [[Private:3DV_Remote_Rendering | 3D Video Remote Rendering and Adaptation System]] | ||
+ | * Market survey: | ||
+ | ** The mobile market seems to shifting towards multicore processors. At CES 2011, at least two companies showcased their new mobile phones (LG Optimus 2X and Motorola ATRIX 4G) based on the [http://www.nvidia.com/object/tegra-2.html NVIDIA Tegra 2] dual-core ARM Cortex A9 processor. This looks promising as it may enable smoother graphics capabilities and may be useful for fast view synthesis on the mobile device. However, some evaluation of power consumption needs to be performed. The chip also includes an ''ultra-low power'' (ULP) GeForce GPU and is capable of decoding 1080p HD video. [http://www.youtube.com/watch?v=PNwRdDT5uFI Demo Video] | ||
+ | ** Tablets emerging in the market nowadays are using the Tegra 2 processor (e.g. Dell [http://www.youtube.com/watch?v=6Lh7SUfNg3M Streak 7] and Motorola [http://www.youtube.com/watch?v=D7zheLybA-Q XOOM]) | ||
+ | ** Qualcomm [http://www.qualcomm.com/products_services/chipsets/snapdragon.html Snapdragon], Samsung [http://www.samsung.com/global/business/semiconductor/newsView.do?news_id=1195 Orion] ([http://www.youtube.com/watch?v=PKPtnZxoWO8 Video]), and Texas Instruments [http://focus.ti.com/general/docs/wtbu/wtbuproductcontent.tsp?templateId=6123&navigationId=12842&contentId=53247 OMAP4] are all dual-core processors expected in the first half of 2011. | ||
+ | ** Slides [http://www.pcmag.com/article2/0,2817,2376152,00.asp leaked] this weekend from NVIDIA's presentation at the Mobile World Congress indicate that the company will be shipping a '''Tegra 2 3D''' processor this year intended for use in mobile gadgets featuring a 3D screen! Although this is ''yet to be confirmed'', it is expected that devices such as LG's G-Slate which is expected to have a glasses-free, three-dimensional display and will be shipping around the same time will run on this processor. Moreover, an announcement of a Tegra 3 processor is expected in February. | ||
+ | ** The recent release of Gingerbread (Android 2.3) has witnessed a concurrent release of a new NDKr5 which allows application lifecycle management and window management to be performed outside Java. This means an application can be written entirely in C/C++/ARM assembly code without need to develop Java or JNI bindings. | ||
+ | |||
=== Jan 17 === | === Jan 17 === | ||
Line 11: | Line 389: | ||
* Reading about stereo-based view synthesis. | * Reading about stereo-based view synthesis. | ||
* Went over 3 papers on real-time view synthesis using GPUs. | * Went over 3 papers on real-time view synthesis using GPUs. | ||
− | + | ||
=== Jan 10 === | === Jan 10 === |
Latest revision as of 01:00, 10 February 2015
Spring 2015 (RA)
Feb 9
- Ran additional experiments with a wider range of network bandwidth values for the two video sequences in the evaluation.
- Captured the network packets for a couple of streaming sessions to check the cause for the gap between the actual and estimated network bandwidth values.
- So far, I do not see any indication that the TCP window is being reduced. I did however find that sometimes the server was closing one or two of the opened connections after some time.
- After further investigation, I found that default KeepAliveTimeout value in the Apache2 Web server was 5 seconds. The two closed connections were a result of this timeout being triggered at some point during playback. I changed the server's configuration to make the timeout 20 seconds.
- Implemented an independent segment encoding, decoding, and transcoding utility. This is needed to automate the process for obtaining the average quality per segment for each representation (and will be helpful in future work if we dynamically need to change a segment's bitrate online).
- Also implemented a DASH segmenter based on GPAC's API (this will be useful if we need to generate DASH segments from a live stream).
- Working on editing the scripts to include the average qualities per segment for the reference views in the MPD file.
- Also modifying the player's code to take this information into consideration in its rate adaptation logic.
Spring 2014 (RA)
- Courses: None
- Publications:
- A DASH-based Free-Viewpoint Video Streaming System (NOSSDAV'14), Mar 2014.
Apr 12
- Mainly working on refactoring the client implementation to get it working better with the new features.
- Cheng provided a GPU machine for me and I'm trying to get the results for the RD measurements faster using multiple machines (also tried GPU instances on AWS but has some configuration problems so I decided to look into it later).
Apr 5
- Virtual views distortion calculation scripts are running for one sequence but may take some time to finish (running for two days so far). Synthesizing, writing to disk, calculating distortion, for each virtual view using each possible combination of bit rates is time consuming.
- Discussed with Khaled the refactoring of the client code. We are hoping to get a lock-free design to eliminate all random behaviours when switching views.
- Completed the writing of 5 sections.
- Working on the evaluation plan.
Mar 28
- Realized that the virtual view distortion model does not really need to include the D-Q relationship since the videos are already pre-coded. Therefore, we will include the per-segment distortion in the MPD and I'm comparing the accuracy of the model against the cubic model proposed by Velisavljevic et al. and will use the one with higher accuracy.
- Still working on the pre-fetching code.
- The write-up so far is available on the SVN server.
Mar 21
- Writing scripts to calculate the distortions of reference views representations and the virtual views distortions (PSNR and SSIM) corresponding to different combinations of reference views representations. The goal is to obtain the values for the different coefficients in the virtual view distortion model.
- Working on the write-up for the paper based on several discussions and meetings with ChengHsin.
- Fixing some issues with the client and adding the distortion model based adaptation logic and stream pre-fetching.
Mar 7
- Working on the implementation of the adaptation based on the distortion model.
- Worked with Khaled on debugging and fixing the deadlock problem in the client.
- Discussed with Khaled today how to refactor the client and make it more generic. Meeting with him tomorrow to discuss the evaluation plan.
- Initiated the write-up for the paper.
- Preparing the NOSSDAV presentation.
Feb 21
- Almost done with the problem formulation.
- Met with Khaled and, based on our discussion, the client will pre-fetch two side views and allocate the bandwidth to them differently based on the current viewpoint and the navigation speed and direction. Therefore, if the user is changing views in the left direction:
- The side view on the right will be assigned the minimum rate and the remaining available bandwidth will be divided between the two reference views and the left side view.
- The left side view will be allocated a percentage <math>p</math> of the remaining bandwidth (e.g., 30 %).
- The bitrate of the reference views' components will be allocated based on the rate-distortion formulation.
- The main difficulty with the implementation is that unlike 2D where we know there is only one stream being decoded and played back, we now have multiple streams. In the case of 2D streaming, we fill the buffers up to a certain threshold to absorb sudden changes in the network bandwidth. In the multi-view case, however, it is still not clear to me how this can be done efficiently.
- If we based or segment selection decisions on recent viewer behavior (for example knowing the velocity and direction of view switching during the previous period), we won't be able to make the decision until the playback of the current segment is done, which is too late. Increasing the segment buffer level to a certain threshold before playback will also not work since the decision made in this case will be based on information which is far behind.
- Will start working on the code next week.
Feb 7
- Working on the camera-ready version for the NOSSDAV paper.
- Working on the extension of the paper for ACM MM.
- Held a meeting with Khaled and we brainstormed about what needs to be done and the tasks that he will be working on. This will mainly include:
- The possibility of employing queuing theory in the formulation to model the buffers dynamics.
- Increasing the efficiency of the player by eliminating the bottleneck caused by thread synchronization.
Jan 31
- Report: here
- Completed the formulation of the rate adaptation optimization problem based on the distortion of the virtual view.
- Working on adding other objectives constraints such as quality variation and buffers levels.
- Discussed the work with Khaled. He is taking a look at the paper, reports, and code. I will assign to him a task after he is done with his class quiz on Monday.
Jan 24
- I have a few models for the virtual view quality. None of them completely addresses what we are looking for.
- One model relates the average distortions of the reference views components to the distortion of the distortion of the virtual view. We have used that in our previous work but the criticism that we got was that the linear model is not an accurate one.
- Another model relates the quantization step value of a component (texture or depth) to the induced distortion by that component in the virtual view. However, this does not consider the case where the quantization step value for the left and right images are different and the evaluation was based on MVC encoding. Since in DASH the components of each view will be encoded separately and the adaptation logic may choose different bitrates for the left and right streams, this model will may be useful in the case of equal Qstep values. The validity of the model in the case of simulcast encoding however needs to be checked.
- The same authors also propose a relationship between the <math>Q_{\text{step}}</math> of the texture components and <math>Q_{\text{step}}</math> of the depth components in order to find the optimum allocation for bit rate between them.
- The last two models seem to only consider the middle virtual view position.
- If we are to use a model that relates <math>Q_{\text{step}}</math> (or the quantization parameter QP) to the distortion of the virtual view. Then, we need to consider variable bit rate encoded representations since the QP or <math>Q_{\text{step}}</math> will be fixed.
- I'm still working on the formulation of the problem and how to relate the virtual view distortion to the bitrate/quality of individual components of the reference views.
- I have also looked into several works incorporating the power consumption aspect into the rate adaptation decision. Some also adjust the scheduling of segment fetch times in order to allow the radio interface to go to sleep for longer times without having the segment buffer completely drained. Given the fact that we have multiple streams, this seems to be an interesting problem. One aspect that was not considered in the previous work is the decoding power consumption for the chosen representation.
Jan 17
- Working on formulating the optimization problem for rate adaptation.
- Making the player more stable and eliminating thread synchronization issues.
Fall 2013 (RA)
- Courses: None
- Submissions:
- A DASH-based Free-Viewpoint Video Streaming System (NOSSDAV'14)
Spring 2013 (TA)
- Courses: None
March 4
- Developed a stereo player around the sample player that comes with libdash. We are now able to render two video streams side-by-side. However, currently the MPD files of the two streams must have the same structure. This does not seam to be a problem for now as we can generate similar MPDs.
- Som has informed me that they have migrated the sampleplayer code to Qt. This should make modifications easier in the future and allow us to use more GUI components.
- Had a meeting with Som and we have two main issues we are working on right now:
- Making sure that the frame pair being rendered from the two streams is synchronized. If one of the streams is faster than the other, it is possible that one side will be overwritten by several frames before a new frame is drawn to the other side. We are currently looking at intelligent ways to make sure this does not happen or fix it.
- The view switching logic. This will require attaching and detaching several decoders on-the-fly to the renderer during run-time. We are also considering the implications of having multiple receivers/decoders running simultaneously to enable fast view switching.
Feb 4
- Discussed with Som and surveyed potential software for the FTV project. We created a Wiki page here with our findings and updates.
- We decided on several libraries and a 3D video player for the implementation and demonstration of our system. We are now looking into the code of libdash, DASHEncoder, and Bino to understand where we can add our changes and whether we can extract some functionalities out from their codebase.
- We also settled on the isoff-main MPD profile and on initially having the views coded separately.
- Managed to get the sample DASH player that comes with libdash working. But the playback is slow and the quality is not very good. We are looking into the cause of this. In the meantime, we are making necessary modifications to the code to render two video streams side-by-side. This turned out to be non-trivial since it is involving using OpenGL in addition to SDL (which the sample player was using). However, this is progressing.
- Had a meeting with Khaled regarding the retargeting project and the evaluation of the work. Agreed on comparing the work first against simple scaling and making sure that the outcome is better in terms of preserving the shapes of important objects. Also, the comparison will be done against seam carving to illustrate the significant speed-up.
Fall 2012 (RA)
- Courses: None
- Worked on the survey paper and finishing the revision based on the comments. Will send it soon.
- Attended WaveFront's Academic-Industry Relations WaveGuide Seminar. It is apparent that the companies that presented (e.g. Sierra Wireless, Nokia, Ericsson) have interest in machine-to-machine (M2M) communications and the Internet of Things (IoT). Research done in some labs at UBC are in that direction.
- Discussed with Khaled about the progress of the retargeting project and the next steps. Apparently he had a bug in the code that was causing a long memory allocation time due to the removal of an initialization line that was present in my code. Khaled has prepared the communiation framework for the distribution of the retargeting process and is now integrating my GPU retargeting code in that framework.
- In my meeting with him, we concluded that we should mainly focus on optimizing the performance of retargeting a single frame and decide whether we cannot do better than what we have in the current implementation or whether we can avoid any possible bottleneck. I referred Khaled to check out the NVIDIA Visual Profiler and try to collaborate with me to pinpoint the bottleneck on the GPU.
- Based on the work presented in the paper Coarse-to-fine temporal optimization for video retargeting based on seam carving, it might be possible to avoid constructing an energy map in each frame and instead do this for key frames. This is possible by exploiting the motion vectors that are available in the encoded bitstream. The idea is that pixels to be removed from a certain frame should also be removed from following frames to maintain consistency. Therefore, we can locate the region where the seam found in one frame is located in a following frame using the motion vectors.
- Another possible enhancement when using the gradient magnitude to construct the energy map is to avoid repeating the convolution operator that gives as the gradient magnitude on the entire frame after seam removal and just resrict it to the region surrounding the seam that was removed in the previous iteration. This is based on the fact that pixels surrounding the seam will be neighbors after seam removal and thus only this region within the frame has changed and there is no nead to calculate the gradient magnitude for the rest of the frame. However, whether this will be practical (and efficient) on the GPU will need to be determined.
- After upgrading the machine at NSL that has the GPU installed in order to compile Khaled's code, who is apparently using the new C++11 standard that was not supported by the compiler version on the machine, I'm having authentication problems and I'm not able to use the machine. I'm working with Ben and Jason to fix this as soon as possible (we may reinstall the operating system) so that I can continue working on the code with Khaled.
Spring 2012 (RA)
- Courses: None
March 17
- Report: here.
- Implemented the seam carving algorithm using the dynamic programming approach and OpenCV
- Searched for a suitable graph cut library to utilize for performing the energy minimization in seam carving for videos as per Avidan and Shamir's second paper. The Centre for Biomedical Image Analysis (CBIA) provides a recent library that seems to be flexible enough. I contacted them to gain access to it and am working my way through the API.
March 6
- Surveyed recent research on video re-targeting. Report can be found here. (last updated: Mar 6)
February 21
- Attending MMSys'12
- Preparing for depth exam
February 7
- Finalizing depth exam report and preparing slides
- Preparing for MMSys'12 presentation and Nokia meeting
January 24
- Completed the implementation and added a section on the allocation algorithm to TOMCCAP paper. Almost done with the review and will send draft in a few hours.
January 10
- I implemented the scheduling algorithm and had to change many parts in the simulator. It is running now but I'm not getting the expected results. I tried comparing my logic with both Cheng's and Som's codes. Everything seems to be in order and correct. I'm stuck at this point and still trying to figure out why the simulator is not getting any feasible solution, even when reducing the number of channels (streams) and after implementing a logic to reduce the number of layers under consideration from the texture component!
Fall 2011 (RA)
- Courses:
- CMPT-726: Machine Learning (Audit)
- Publications:
- Multicasting of Multiview 3D Videos over Wireless Networks (MoVid'12)
December 27
- Went over previous burst scheduling work by Cheng, Som, and Farid.
- In our MoVid paper, we have already decided on which layers to transmit from each stream based on the bit rates of the substreams and the channel capacity. It is then necessary to organize the frames of each stream into bursts within the scheduling window in order to minimize the energy consumption at receivers while satisfying the buffer constraints (no overflow or underflow).
- The difficulty of allocating MVD2 3D video bursts is that each video stream is actually four components which we group into two streams (one stream for texture components and one stream for the depth components) in our S3VM algorithm. In the double buffering technique used in Cheng's work, each channel had only a single stream and each receiver had one receiving buffer. In order to decode a frame in the 3D video stream, corresponding video data from all four streams should be available at the decoding deadline of the frame. The question now is, can we treat the texture and depth components' streams as a single stream during the burst scheduling process? And should we consider a single buffer for both components at the receiver side or do we need two separate receiving buffers (one for each component)? It should be noted that the depth components' stream will have a lower bit rate than texture.
- I am considering summing up the rates of the texture and depth components and perform the schedule allocation as if they are a single stream. This way, we will still maintain a single receiver buffer. It is assumed that the some signaling is available in the received data which enables the receiver to distinguish the packets or NAL units of the four streams (left texture, left depth, right texture, right depth). After filling the receiver buffer during a window, the receiver will swap the buffers and drain the receiver buffer in the next window while distributing the NALUs to four different NALU buffers.
- I am also considering as future work to perform more fine grain allocation to the OFDMA resources. This however may require reformulating the problem from the beginning. In our MoVid work, we considered that the entire OFDM downlink subframe will be allocated to a single video channel and that the modulation and coding used for all resources within the stream are fixed. However, on a finer scale, each resource within the downlink subframe may be allocated to a different channel and use a different modulation and coding scheme. I came across three recent papers discussing this and performing adaptive resource allocation for single layer and scalable video multicast.
December 13
- Report: here (last updated: Dec 13)
- Added another paper related to VoD delivery with the assistance of cloud CDNs to the report. Also added a discussion on primitives related to view manipulation and synthesis in 3D videos.
- Created a list of open source computer vision and image and video processing libraries. List is accessible here. Most image processing tools required are already in OpenCV. However, GPU-based implementations that achieve higher speed-up are also available. The latter will be useful in real-time systems and can be utilized on Amazon EC2 GPU nodes for example.
- Have been trying to come up with a novel idea to extend our MoVid work. However, still did not find a good direction. One side issue that can be checked is to address one of the comments regarding using PSNR as a metric (although we also mentioned SSIM, which is a perceptual metric, will give similar results). To provide virtual view adaptation we need a good quality metric for the virtual views. A recent ICIP paper indicates that new methods are required for assessing virtual synthesized views because pixel-based and perceptual-based metrics fail. The paper also mentions that depth should also be taken into account. We can also evaluate the Peak Signal-to-Perceptual Noise metric used by MPEG in their evaluations (currently looking into this).
- I was also thinking whether we should check if other virtual view distortion models yield better results. However, this may require a totally different problem formulation. Another issue is how to schedule the bursts to satisfy buffer constraints, but this will be very similar to previous works by Cheng, Farid, and Som. So, it will not be very new.
- Managed to encode a video sequence with good quality when rendered on the new display.
November 29
- Working on MoVid paper and depth exam report.
- Looking into container formats for media content as well as libraries that enable creating them and extracting content from them. This is required for selectively extracting content from stored files when performing adaptation. Two EU projects that have been working on a similar system (no explicit mentioning of the cloud though) are the COAST and SEA (SEAmless Content Delivery) projects. A couple of months ago, COAST presented the world’s first working prototype of a 3D adaptive video receiver based on the novel Dynamic and Adaptive HTTP Streaming (DASH) standard. To avoid starting from scratch, I'm going over their deliverables and reports to determine what can be leveraged from their work.
- Gathering information on available tools to perform the different multimedia processing primitives. Will create a Wiki page and add my findings so far.
- Managed to generate both still image and video content that the new display accepts. Video quality was not so good. But I believe it is just a matter of the encoding and codec parameters. Will try a different codec as well as attempt to fine tune the encoding parameters. Next step will be providing the display with all the views instead of having its rendering component generate them.
- Following MPEG's meeting in Geneva this week. They are working on developing an efficient encoder for 3DV content. Many companies and universities have responded to thier call for proposals. They are working in two directions in parallel: proposals based on AVC and proposals based on the new HEVC (high-efficiency video coding) standard. It seems that for AVC-based coders they have settled on the software presented by Nokia because it showed good performance (less bitrate, better quality, reduced time complexity). For HEVC, HHI's proposal was chosen and other tools will be integrated to it from other proposals. Code for HHI's implementation is available, but not sure about Nokia's code (haven't found it yet). Nokia's implementation is however based on JM and VSRS.
November 15
- Report: here (last updated: Nov 15)
- Working on depth exam report. Current draft version can be found here.
- Prepared presentation for Nokia UR forum this Thursday pptx.
- Set up the new display and machine and tested them. Both are working properly. Experimenting with providing our own material using both video+depth or multiple views. For some reason they were hesitating on providing me with some necessary tools. Finally managed to get them today.
- Testing the DepthGate demo software to see if it will be beneficial to obtain a full-version license.
- Surveyed some necessary background and added some preliminary thoughts to report regarding server-side adaptation.
October 19
- Report: here (last updated: Oct 22)
- Dissected and understood the view synthesis reference software source code and implementation (which will be useful if we are going to implement this process on an adaptation server or cloud service).
- Trying to formulate a burst scheduling problem for transmitting 3D videos over wireless broadband access networks.
- Read several papers on adaptive display power management (the concept is summarized and described in report). It seems like an interesting idea. It may be possible to implement for 3D displays as well. However, some power measurement studies may be required on actual mobile 3D displays.
- Also doing some readings on Dynamic Adaptive Streaming over HTTP (DASH) to see if it will be suitable for wireless networks and 3D video.
- Working on my depth exam report.
- Attended:
- ICNP 2011
- Dr. El Saddik's talk: Research in Collaborative Haptic-Audio-Visual Environments and Serious Games for Health Application
- Haiyang Wang's depth exam: Migration of Internet Services to Cloud Platforms: Issues, Challenges, and Approaches
September 6
- Report: Sent by e-mail (SVN server was not functioning properly when I attempted to commit)
- Completed implementation and ran initial experiments to verify that implementation is working properly. Results are ok.
- Complete review of report. Addressed all comments, added more explanations/details, and restructured.
- I'm limited by the number of available video sequences so I'm trying to address this by starting from different frames and concatenating some sequences together. The encoding using JSVM is time consuming, which delayed the final results. I'm currently performing encoding on a couple of machines in the lab.
- The experiments will require varying the capacity (by varying the scheduling window size) and varying the number of videos to be transmitted. We can measure the objective function value obtained by our approximate solution to that of the optimal solution. We can also calculate how fast the running time will be compared to an exhaustive search solution that attempts to find the optimal value.
Summer 2011 (RA)
- Courses: None
August 22
- Completed setting up an emulation environment within the lab using WANem and netem.
- Reviewed all the papers/work on asymmetric stereo encoding and transmission.
August 8
- The first idea mentioned in the comments concerning unicast and choosing the best representation to transmit based on the user's viewpoint was my initial direction. However, as I mentioned in my May 31 progress summary below, this problem has a fixed number classes (in MCKP terminology), only 4 streams are to be transmitted (2 texture (L and R), and 2 depth maps (L and R)). This means that, unlike Som's work for example, where it was assumed that we have a large number of streams (e.g. between 10 and 50 in his experiments), the problem of selecting the best substreams in this case is indeed achievable in real-time. Assuming each of the 4 streams is encoded into 4 layers, we have 4^4 =256 combinations in the search space, which can easily be enumerated.
- Implemented a framework for solving knapsack problems including the 0-1 MCKP which we are using in the formulation. Also, added code to dynamically formulate the optimization based on a given number of input 3D videos and optimally solve this mixed integer programming (MIP) problem using the GNU Linear Programming Kit (GLPK) API.
- Also implemented the main classes for simulating the scheduling process at the base station for the multicast/broadcast scenario. For experimenting with a large number of video streams, one issue is that the number of available 3D video sequences that we have is 20. To increase the size of the input, I was thinking of segmenting the videos (e.g. every 1 second) in the hope of getting different average bit rates even for segments from the same sequence and use those segments as if they were different sequences.
- Another issue is having long video sequences. All of the sequences are quite short (a few seconds). This is not a problem per se. as we can just repeat the sequence. However, JSVM is complex and extremely slow. So, I have to reduce the dimensions of the sequences to the CIF resolution for example. However, this will require re-estimating the depth maps since simply scaling the depth maps down will not give correct view synthesis results.
- Looked into creating a testbed. I read Som's report and discussed with Saleh since I understand he was also looking for a WiMAX testbed. I have created an account at ORBIT lab to check out their WiMAX testbed. And I'm looking into how flexible their platform is. However, I feel that not having the equipment under our control makes it more difficult to run a demo since we need our own receiving device which is capable of decoding and rendering the video content.
- For 3D mobile devices, the current status is that there are no devices with a display that renders more than two views. Large auto-stereoscopic displays which are capable of rendering 26 views are however available (and yes we can buy them, actually the MPEG group are using them in their experiments). Another issue with setting up such a testbed is that the receiving device should be capable of decoding SVC streams. The OpenSVCDecoder is indeed available, but can it be ported to Android without problems? And how far can it go? Two things that need to be tested.
- Masterimage has revealed a glasses-free 3D tablet reference design that uses Masterimage's cell-matrix parallax barrier technology. Masterimage's Cell tablet is based on Texas Instruments' OMAP 4430 chip. Software outfit Cyberlink announced its partnership with Masterimage to create 3D video playback software that makes use of Masterimage's 3D display.
- LG has already shown its first 3D tablet. The Optimus Pad 3D has an 8.9 inch 15:9 display, with 1280 x 768 resolution. Unlike the Optimus smartphone, it's not autostereoscopic - it requires you to wear 3D spex. Still, the advantages of its 3D functionality are clear. The Optimus pad, which runs the Android 3.0 Honeycomb OS, has dual 5MP cameras for 3D photography and camcorder stereoscopy. 3D footage can be viewed on the tablet or by connecting to a 3D TV.
- HTC also has the EVO 3D smartphone.
- It is rumored that the next generation iPad (iPad 3) will have a glasses-free display.
- At Computex, Asus unveiled the Eee Pad MeMo 3D. This Honeycomb alternative has a 7-inch parallax barrier no-glasses 3D display with a resolution of 1280 x 800.
- Attended a GrUVi talk on "Predicting stereoscopic viewing comfort".
- I'm reviewing the report again to address the comments that were listed and will post the updated version when done.
July 12
- Report: here (last updated: July 13)
June 28
- Report: here (last updated: June 28)
June 14
- Report: here (last updated: June 13)
May 31
- Report: here (last updated: May 31)
- Working on formulating the optimization that we discussed in the last meeting, it came to my attention that although the problem I'm trying to solve is an instance of the multiple-choice knapsack problem (MCKP), the number of classes is fixed and very small (only 4 streams). This means that, unlike Som's work where it was assumed that we have a large number of streams (e.g. between 10 and 50 in his experiments), the problem of selecting the best substreams in this case is indeed achievable in real-time. Assuming each stream is encoded into 4 layers, we have 4^4 =256 combinations which can easily be enumerated.
- While trying to think about an actual problem. I'm thinking of extending Som's work and utilizing client-driven multicast where the client subscribes to the channels of the desired views. A number of 3D video streams is to be transmitted. Each stream has N views. The views are encoded using SVC to a number of layers. The ith view of the streams is multiplexed over a single broadcast/multicast channel. The receiver tunes-in/joins channels i and i+2 to receive two reference views which can be utilized to synthesize any view in-between.
- The problem now however seems to be more complicated as this will be a strange variant of the knapsack problem where we have two knapsacks but items that go into each knapsack come from different classes while there is only one joint objective function. I'm trying to figure out whether this resembles any known variant of the knapsack problem, but until now I was not successful.
- After thinking more about it, I decided that if there are only two views and the client is going to be receiving from only two broadcast/multicast channels, then it doesn't matter to which channel a substream is allocated and we can relax any assignment restrictions. This leads to a multiple choice multiple knapsack problem (MCMKP) which is still NP-hard.
- One way in which the MCMKP can be tackled is by partitioning it (sub-optimally) into two sub-problems: a multiple choice knapsack problem (MCKP) and a multiple knapsack problem (MKP). A similar approach was taken here for the problem of providing QoS support for prioritized, bandwidth-adaptive, and fair media streaming using a multimedia server cluster. Note: I should mention that I wasn't able to find many resources about the MCMKP problem and it seems to me it is somehow similar to the multi-dimensional multiple-choice multi-knapsack problem (MMMKP) which also is scarce when it comes to material.
- After selecting the optimal substreams by solving the MCKP over the aggregate capacity of the two channels, we need to perform an assignment for each selected substream to one of the two channels in a way that will minimize any bandwidth fragmentation.
May 19
- Report: here
- Found a more recent and simple model for the distortion of synthesized views based on the distortions of reference views and their depth maps.
- Formulating an optimization problem to select the best combination of substreams to transmit out of the two reference views and their corresponding depth maps in order to minimize the average distortion over all intermediate synthesized views while not exceeding the current channel capacity.
- There have been relevant work in the context of joint bit allocation between texture videos and depth maps in 3D video coding. In addition, another model has been utilized in a recent work on RD-optimized interactive streaming of multiview video. However, in that work, the authors assume the presence of multiple encodings of the views at the server-side. Our work will attempt to utilize SVC to encode the views at various qualities/bitrates and extract the best substreams that maximize the quality and satisfy the capacity constraint.
- Could not find the power consumption characteristics of the wireless chipsets mentioned by reviewer of ToM paper. The companies are not revealing them in their datasheets and they only advertise that they are ultra-low power. One article claims that the Broadcom BCM4326 and BCM4328 Wi-Fi chips enable a 54-Mbps full-rate active receive power consumption of less than 270mW. But no more details are given.
May 5
- Currently working on formulating the rate adaptation problem for 3D video streaming using SVC that I mentioned in the last meeting. I'm writing a report on the problem and working on formally formulating it as an optimization problem. Expecting to be done with the formulation by the end of this week.
- Discussed with Som about his work on hybrid multicast/unicast systems but we could not find a common ground for leveraging that work to solve the high bit rate problem of 3D videos. The main issue is that in such systems patching is used to recover the leading portion from the beginning of the video stream which the multicast session has already passed. Attempting to transmit depth streams for example using separate unicast channels or patching does not apply here because the texture and depth streams are synchronized and are utilized concurrently. Moreover, other than streaming different views using separate multicast channels (which has been already proposed in several papers), it is not clear to me how multicasting would enable an interactive free viewpoint experience where the user is free to navigate to any desired viewpoint of the scene.
Spring 2011 (RA)
- Courses: None
Apr 22
- Downloaded and compiled Insight Segmentation and Registration Toolkit (ITK) and the Visualization ToolKit (VTK). The two libraries are huge and it took at least an hour to compile each. They utilize the cmake utility for configuring and building (even for creating new projects), and VTK provides a tutorial on how to utilize cmake with Eclipse. Managed to read DICOM slices and save them as volume in the MetaImage format. Was also able to render the generated MetaImage as a volume using VTK. To take DICOM images as input and view them through VTK, we should first open the files and save them to volume as indicated in the Insight/Examples/IO/DicomSeriesReadImageWrite2.cxx example. Then we visualize the volume as shown in the example InsightApplications/Auxiliary/vtk/itkReadITKImageShowVTK.cxx. Some modifications to the latter source file were necessary to render a 3D volume.
- In biomedicine, 3-D data are acquired by a multitude of imaging devices [magnetic resonance imaging (MRI), CT, 3-D microscopy, etc.]. In most cases, 3-D images are represented as a sequence of two-dimensional (2-D) parallel image slices. Three-dimensional visualization is a series of theories, methods and techniques, which applies computer graphics, image processing technique and human-computer interacting technique to transform the resulting data from the process of scientific computing to graphics.
- DICOM files consist of a header and a body of image data. The header contains standardized as well as free-form fields. The set of standardized fields is called the public DICOM dictionary. A single DICOM file can contain multiples frames, allowing storage of volumes or animations. Image data can be compressed using a large variety of standards, including JPEG (both lossy and lossless), LZW (Lempel Ziv Welch), and RLE (Run-length encoding).
- Going from slices to a surface model (e.g. a mesh) requires some work. The most important is the segmentation. One needs to isolate on each slice the tissue that will be used to create the 3D model. Generally, there are three main steps to generate a mesh from a series of DICOM slices:
- read DICOM image(s): vtkDICOMImageReader
- extract isocontour to produce a mesh: vtkContourFilter
- write mesh in STL file format: vtkSTLWriter
- It seems progressive meshes may not be very appropriate for representing the objects in medical applications. The doctors need to slice the object and look at cross sections. Meshes will only show the outer surface. The anatomical structure or the region of interest needs to be delineated and separated out so that it can be viewed individually. This process is known as image segmentation in the world of medical imaging. However, segmentation of organs or region-of-interest from single image is of hardly any significance for volume rendering. What is more important is the segmentation from 3D volumes (which are basically consecutive images stacked together), such techniques are known as volume segmentation. A good, yet probably outdated, survey of volume segmentation techniques is given by Lakare in this report. A more recent evaluation of four different 3D segmentation algorithms with respect to their performance on three different CT Data Sets is given by Bulu and Alpkocak here.
- As mentioned by Lakare, segmentation in medical imaging is generally considered a very difficult problem. There are many approaches for volume segmentation proposed in literature. These vary widely depending on the specific application, imaging modality (CT, MRI, etc.), and other factors. For example., the segmentation of lungs has different issues than the segmentation of colon. The same algorithm which gives excellent results for one application, might not even work for another. According to Lakare, at the time of writing of his report, there was no segmentation method that provides acceptable results for every type of medical dataset.
- A somewhat old, yet still valid, tutorial on visualizing using the VTK was published in IEEE Computer Graphics and Applications in 2000. The vtkImageData object can be used to represent one-, two-, and three-dimensional image data. As a sub-class of vtkDataSet, vtkImageData can be represented by a vtkActor and rendered with a vtkDataSetMapper. In 3D this data can be considered a volume. Alternatively, it can be represented by a vtkVolume and rendered with a subclass of vtkVolumeMapper. Since some subclasses of vtkVolumeMapper use geometric techniques to render the volume data, the distinction between volumes and actors mostly arises from the different terminology and parameters used in volumetric rendering as opposed to the underlying rendering method. VTK currently supports three types of volume rendering—ray tracing, 2D texture mapping, and a method that uses the VolumePro graphics board.
- VTK can render using the Open GL API, or more recently Manta. The iPhone (and other devices such as Android) use OpenGL ES, which is essentially a subset of OpenGL targeted at embedded systems. A recent post (December 2010) on the VTK's mailing list indicate that there is interest in writing/collaborating on a port of VTK's rendering for OpenGL ES.
- A paper on mesh decimation using VTK can be found here.
- MeshLab is an open source, portable, and extensible system for the processing and editing of unstructured 3D triangular meshes.
Apr 15
- Jang et al. proposed a real-time implementation of a multi-view image synthesis system. This implementation is based on lookup tables (LUTs). In their implementation, the sizes of the LUTs for rotation conversion and disparity are 1.1 MBytes and 900 Bytes for each viewpoint, respectively. The processing time to create the left and right images before using LUT was 3.845 sec, which doesn't enable real-time synthesis. Using LUTs reduced the processing time to 0.062 sec.
- Park et al. presented a depth-image-based rendering (DIBR) technique for 3DTV service over terrestrial-digital multimedia broadcasting (T-DMB), the mobile TV standard adopted by Korea. They leverage the previously mentioned real-time view synthesis technique by Jang et al. to overcome the computational cost of generating the auto-stereoscopic image. Moreover, they propose a depth pre-processing method using two adaptive smoothing filters to minimize the amount of resulting holes due to disocclusion during the view synthesis process.
- Gurler et al. presented a multi-core decoding architecture for multiview video encoded in MVC. Their proposal is based on the idea of decomposing the input N-view stream into M-independently decodable sub-streams and performing decoding of each sub-stream by separate threads using multiple instances of the MVC decoder. However, to obtain such independently decodable sub-streams, the video must be encoded using special inter-view prediction schemes depending on the number of cores.
- As indicated by Yuan et al., the distortion of virtual views is influenced by four factors in 3DV systems:
- compression of texture videos and depth maps
- performance of the view synthesis algorithm
- inherent inaccuracy of depth maps
- whether the captured texture videos are well rectified
- Trying to encode two-view texture and depth map streams using JMVC (the multiview reference encoder) to get an idea of how much overhead transmitting an additional view along with depth maps will be incurred when transmitting a 3D video over wireless channels. Managed to compile the source and edit the configuration files, but still get errors when encoding. Looking more into the configuration files parameters.
- Looked more into DICOM slices, it is simply taking parallel 2D sections of an object. Using those slices, and knowing the inter-slice distance, medical imaging software are able to reconstruct the 3D representation. The more recent versions of the DICOM standard enable packaging all the slices into one file to reduce the overhead of headers by eliminating redundant ones.
Apr 8
- Gathered different thoughts from my readings in the Readings and Thoughts section of the 3D Video Remote Rendering and Adaptation System Wiki page.
- Could not find any work on distributed view synthesis.
- I went over the work done by Dr. Hamarneh's students. I read the publications and the report he sent. However, as far as I can see, it is an implementation work for porting an existing open source medical image analysis toolkit to the iOS platform. There are no algorithms or theory involved. That said, one of their future goals is to facilitate reading, writing, and processing of 3D or higher dimensional medical images on iOS (which only supports normal 2D image formats). Current visualization of such imagery on desktop machines is performed via the Visualization ToolKit (VTK). One of their goals is to also port this toolkit to iOS. Another possible tool that I found that is also based on VTK is Slicer, an open source software package for visualization and image analysis.
- Based on my readings progressive mesh streaming, it should be applicable in this context. However, I'm still not familiar with the standard formats and the encoding of such meshes (especially in medical image analysis and visualization applications). Generally, it seems that medical images have their own formats such as the DICOM standard. Their initial thought is to transmit a number of what are known as DICOM slices to the receiver and then the receiver would construct the 3D model from them. So, this is still not very clear to me, as well as whether 3D video technologies may play a role in this.
Mar 14
- Report: here
- Added more details on homographies in the report.
- Implemented double warping and blending, as well as inverse warping using Armadillo C++ linear algebra library.
Mar 7
- Added more detailed description of the view synthesis process.
- Implemented the first phase of the process (forward warping) and the z-buffer competition resolution technique in C/C++. I tested it on the Breakdancers sequence from MSR.
- Working on profiling the code using OProfile to calculate the number of cycles required by the view synthesis process to derive preliminary estimates of power consumption.
- Implementing double warping and a hole filling technique to get a feeling of the final quality that can be obtained.
- Understanding homography matrices and how they are used to speed up the synthesis process.
- Working on deriving a formal analysis of the time complexity of the view synthesis process. The projection phase basically involves a number of matrix multiplications.
Feb 28
Feb 21
- Familiarizing myself with JSVM and its tools and options.
- Contacted the lab that developed the reference software for disparity estimation and view synthesis described in the MPEG technical reports. Still haven't received a reply.
Feb 14
- Reading about SVC and how to perform bitstream extraction
- Reading Cheng's paper on viewing time scalability and Som's IWQoS paper.
- Reading a couple of papers on optimized substream extraction
- Reading papers on modelling the synthesized view distortion in V+D 3D videos
Jan 24
- Report: here
- 3D Video Remote Rendering and Adaptation System
- Market survey:
- The mobile market seems to shifting towards multicore processors. At CES 2011, at least two companies showcased their new mobile phones (LG Optimus 2X and Motorola ATRIX 4G) based on the NVIDIA Tegra 2 dual-core ARM Cortex A9 processor. This looks promising as it may enable smoother graphics capabilities and may be useful for fast view synthesis on the mobile device. However, some evaluation of power consumption needs to be performed. The chip also includes an ultra-low power (ULP) GeForce GPU and is capable of decoding 1080p HD video. Demo Video
- Tablets emerging in the market nowadays are using the Tegra 2 processor (e.g. Dell Streak 7 and Motorola XOOM)
- Qualcomm Snapdragon, Samsung Orion (Video), and Texas Instruments OMAP4 are all dual-core processors expected in the first half of 2011.
- Slides leaked this weekend from NVIDIA's presentation at the Mobile World Congress indicate that the company will be shipping a Tegra 2 3D processor this year intended for use in mobile gadgets featuring a 3D screen! Although this is yet to be confirmed, it is expected that devices such as LG's G-Slate which is expected to have a glasses-free, three-dimensional display and will be shipping around the same time will run on this processor. Moreover, an announcement of a Tegra 3 processor is expected in February.
- The recent release of Gingerbread (Android 2.3) has witnessed a concurrent release of a new NDKr5 which allows application lifecycle management and window management to be performed outside Java. This means an application can be written entirely in C/C++/ARM assembly code without need to develop Java or JNI bindings.
Jan 17
- Concentrating on view synthesis in 3D video systems and read two recent survey papers about the topic.
- Reading about multiple view geometry to understand the warping process and the related terms from epipolar, trifocal, and projective geometry.
- Understanding the commonly used camera pinhole model.
- Reading about stereo-based view synthesis.
- Went over 3 papers on real-time view synthesis using GPUs.
Jan 10
- Exploring potential research directions in 3D videos, including: adaptive virtual view rendering in free-viewpoint video, view synthesis, and rate adaptation in 3D video streaming.
- Investigating the potential of cloud computing as a platform for enabling remote rendering of 3D video for mobile devices.
Fall 2010 (RA)
- Courses:
- CMPT-765: Computer Communication Networks
- Submissions:
- Energy Saving in Multiplayer Mobile Games (TOM'11)
- Publications:
- Energy-Efficient Gaming on Mobile Devices using Dead Reckoning-based Power Management (NetGames'10)
Summer 2010 (DGS-GF)
- Courses: None
- Submissions:
- Energy-Efficient Gaming on Mobile Devices using Dead Reckoning-based Power Management (NetGames'10)
Spring 2010 (TA)
- Courses:
- CMPT-705: Design and Analysis of Algorithms
Fall 2009 (RA)
- Courses:
- CMPT-771: Internet Architecture and Protocols
- Submissions:
- Efficient AS Path Computation and Its Application to Peer Matching (NSDI'10)
Summer 2009 (RA)
- Submissions:
- Efficient Peer Matching Algorithms (CoNEXT'09)
Spring 2009 (TA)
- Courses:
- CMPT-820: Multimedia Systems