This page describes the envisioned free-viewpoint TV system, including: design decisions, useful resources, tools, etc.
Updates
- [Meeting on:2/1]
- Decided to use Bino player for the end deocding/demo of 3D FTV. There is a plugin for 3D video for VLC but at this time it looks less stable than Bino.
- For our initial version, decided to use multiple independent AVC encoded streams for streaming the different views. This is going to result in less compression than MVC encoded streams but right now we don't have real time MVC decoding support in any media player.
- Decided on using the libdash library (which became the MPEG-DASH reference client library in MPEG's meeting this month) for implementing the client side DASH components and adaptation logic. Since, at any given point in time there are going to be two views being downloaded we need two instances of libDASH on the client side
- Decided to use DASHEncoder for DASH content and MPD generation at server side.
- [Som:1/28] MySQL is not required for DASHEncoder but it only implements 'isoff-basic-on-demand' profile while we need the 'stereoid' profile. We can start with the MPD described for 3 view video in MPEG-DASH standard[Section G.4, pp 120]. If we leave out audio for the time being, from the code base of DASHEncoder it seems the major work will be in replacing x264 relevant parts with JMVC and updating the MP4Box related parts. Does MP4Box support the files generated by JMVC? if not we need further code to implement the ISOBFF parts for JMVC generated bitstream. This is the first thing we need working : Add H.264/MVC bitsream into mp4 container and check if the mp4 file can be played back by any player. For generating the mvc bistream we should select a test sequence with at least three views (c1,c2,c3) such that (c1+c2) and (c2+c3) produce stereo views.
- [Ahmed:1/28] Added detailed description of creating proper content for the 3D display.
People
Project Tree
https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-projects/3DVideo/DASH-FTV
3D Display
The Dimenco auto-stereoscopic display accepts a 2D+Z image (also known as V+D for videos). The video frame containing the texture component is combined with a depth map. The two components (texture and depth) are concatenated together to after reducing the width of each by half. The video frames are then compressed and the resulting file is renamed with the extension .s3d so that the player can detect that it is in the 2D+Z format. This extension makes sure that the Dimenco 3D Player sets the display in 3D mode with the correct settings. The Dimenco 3D Player can also display 3D images in the 2D+Z format. These images should be RGB (24-bit uncompressed) BMP files that are renamed to have the extension .b3d.
The total resolution for the input video should be 1920x540. The left half (with a resolution of 960x540) contains a 2D RGB image and the the right half (also with a 960x540 resolution) contains a grey scale image representing the depth map. It should be noted that the grey scale image is still a 24-bit (i.e., three channel) picture. Therefore, the R, G, and B bytes of the depth map have the same value.
The first step in preparing the content is resizing each of the two components to a resolution of 960x540. This can be achieved using ImageMagick tools (namely, using the convert utility). An example is given below (note that adding the character ! to the size forces -resize to ignore the aspect ratio and distort the image so it always generates an image exactly the size specified).
convert Dancer_color_1920x1088.bmp -resize 960x540\! Dancer_color_960x540.bmp convert Dancer_depth_1920x1088.bmp -resize 960x540\! Dancer_depth_960x540.bmp
Once we have the two components in the correct size, we then need to combine them together (side-by-side) into a single image of size 1920x540. This can be done using another ImageMagick utility called montage. Then -mode concatenate option tells montage to just glue the together and the -tile option can be used to specify the layout it should use. If the layout is not specified, the default is to concatenate the images horizontally (side-by-side).
montage -mode concatenate Dancer_color_960x540.bmp Dancer_depth_960x540.bmp Dancer_2dz_1920x540.bmp
Finally, the resulting image should be renamed with the extension .b3d. If the images are frames from a YUV sequence, they can be extracted using ffmpeg:
ffmpeg -s 1920x1088 -i Dancer_c_5_1920x1088.yuv -r 1 -t 00:00:30 -f image2 ~/Dancer_color_1920x1088.bmp ffmpeg -s 1920x1088 -i Dancer_d_5_1920x1088.yuv -r 1 -t 00:00:30 -f image2 ~/Dancer_depth_1920x1088.bmp
It is mentioned that the frames should be prepared as a sequence of bitmap images. However, I don't believe that this is mandatory. It should be possible to prepare them as YUV sequences as well. But to follow the guidelines, I describe here the process using bitmap images.
The following script performs the resizing on all the frames of one component. It is assumed that the filename of the frame contains its number and the user specifies the range of frames to be resized as well as the target directory to save the resized frames. We can place the resized frames of both components in the same target directory.
#!/bin/bash FILENAME=$1 FRAME_START=$2 FRAME_END=$3 WIDTH=$4 HEIGHT=$5 TARGET_DIRECTORY=$6 for i in `seq -w $FRAME_START $FRAME_END`; do F=$(echo "${FILENAME}$i.bmp") convert $F -resize ${WIDTH}x${HEIGHT}\! ${TARGET_DIRECTORY}/$F done
Next, to combine the texture frame with the corresponding depth frame, the following script can be used.
#!/bin/bash SEQ_NAME=$1 FILENAME=$2 FRAME_START=$3 FRAME_END=$4 TARGET_DIRECTORY=$5 for i in `seq -w $FRAME_START $FRAME_END`; do COLOR=$(echo "color-${FILENAME}$i.bmp") DEPTH=$(echo "depth-${FILENAME}$i.bmp") montage -mode concatenate $COLOR $DEPTH ${TARGET_DIRECTORY}/${SEQ_NAME}_2dz_1920x540_f${i}.bmp done
Now we can use WMEncoder or ffmpeg to encode the frames. I personally prefer ffmpeg. We need to generate an uncompressed AVI file from the combined frames. This can be achieved using the following command:
ffmpeg -i BALLET_2dz_1920x540_f%02d.bmp -vcodec rawvideo -y ballet_2dz_1920x540.avi
We then perform video compression and place the result in a VOB container.
ffmpeg -i ballet_2dz_1920x540.avi -vcodec mpeg2video -b 24000 -bt 16000 -aspect 32:9 -s 1920x540 -y ballet_2dz_1920x540.s3d.vob
Note: The option -hq was mentioned in the documentation. However, this option seems to be no longer supported by ffmpeg. You can try using -mbd 1 instead of -hq.
Finally, we convert the VOB file to a WMV file. A fairly simple way of doing this is using ffmpeg:
ffmpeg -i filename.vob -vcodec wmv2 -acodec wmav2 -sameq -s 720x576 filename.wmv
JMVC Tutorial
TBA
Bino 3D Video Player
- Bino can read commands from one or multiple script files. Use the --read-commands option to specify script files. A script file can be a standard text file, but it can also be a named pipe, which allows other programs or scripts to submit commands to Bino as they see fit. The following is an example for executing a script stored in a text file called script.txt. The --no-gui option is not necessary, but prevents the GUI to remember the changed brightness setting in future sessions.
$ bino --read-commands /path/to/script.txt --no-gui
- With OpenGL, the default method to display stereoscopic 3D content is OpenGL quad buffered stereo, often used with active shutter glasses. However, graphics card manufacturers tend to enable this output technique only on expensive high end hardware. The default output technique for stereoscopic 3D input is OpenGL quad buffered stereo if the graphics card supports it, otherwise red/cyan anaglyph glasses.
- Code structure:
- dispatch.h: Defines the following classes:
- command A command that can be sent to the dispatch by a controller.
- dispatch The dispatch (singleton).
- controller The controller interface.
- notification A notification that can be sent to controllers by the dispatch (signals that the corresponding value has changed).
- open_input_data Contains everything that is needed to open a media input.
- command_file.h: Defines the following class:
- command_file Extends controller
- video_output.h: Defines the following class:
- video_output Extends controller
- video_output_qt.h: Defines the following classes:
- gl_thread Extends QThread
- video_output_qt_widget Extends QGLWidget
- video_container_widget Extends both QWidget and controller
- video_output_qt Extends video_output
- media_data.h: Defines the following classes:
- device_request Extends serializable
- parameters Extends serializable
- video_frame
- audio_blob
- subtitle_box Extends serializable
- media_object.h: Defines the following class:
- media_object
- player.h: Defines the following class:
- player
- gui.h: Defines the following classes:
- audio_dialog Extends QDialog and contoller
- color_dialog Extends QDialog and contoller
- controls_widget Extends QWidget and contoller
- in_out_widget Extends QWidget and contoller
- full_screen_dialog Extends QDialog and contoller
- crosstalk_dialog Extends QDialog and contoller
- quality_dialog Extends QDialog and contoller
- zoom_dialog Extends QDialog and contoller
- subtitle_dialog Extends QDialog and contoller
- video_dialog Extends QDialog and contoller
- sdi_output_dialog Extends QDialog and contoller
- open_device_dialog Extends QDialog
- main_window Extends QMainWindow and contoller
- gui Contains a reference to the main_window object
- main.h: Contains the main() function.
- dispatch.h: Defines the following classes:
DASH Options
- Apple's HTTP Live Streaming (HLS)
- MPEG-DASH specification (ISO/IEC 23009-1)
Software and Data
- Joint Multiview Video Coder (JMVC)
- Bino 3D video player
- FFmpeg and x264 Encoding Guide
- Example: Encoding MSR video sequences views (can use -qp 0 or -crf 0 to encode a lossless output):
ffmpeg -i color-cam2-f%03d.bmp -c:v libx264 -preset slow -crf 22 -r 10 -c:a copy cam2.mp4
- Example: Encoding MSR video sequences views (can use -qp 0 or -crf 0 to encode a lossless output):
References
- A. Vetro and I. Sodagar, The MPEG-DASH Standard for Multimedia Streaming over the Internet, IEEE Multimedia, 2011
- T. Stockhammer, Dynamic Adaptive Streaming over HTTP - Standards and Design Principles, Proc. of ACM Multimedia Systems, 2011
- Deliverable 5.2: Media Signalling Specification for 3D Video Content Types, COAST Project
- G. Park, J. Lee, G. Lee, and K. Kim, Efficient 3D Adaptive HTTP Streaming Scheme over Internet TV, Proc. of IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), 2012
- C. Müller and C. Timmerer, A VLC Media Player Plugin enabling Dynamic Adaptive Streaming over HTTP, Proc. of ACM Multimedia, 2011