This page describes the envisioned free-viewpoint TV system, including: design decisions, useful resources, tools, etc.
Updates
- [Meeting on:2/1]
- Decided to use Bino player for the end deocding/demo of 3D FTV. There is a plugin for 3D video for VLC but at this time it looks less stable than Bino.
- For our initial version, decided to use multiple independent AVC encoded streams for streaming the different views. This is going to result in less compression than MVC encoded streams but right now we don't have real time MVC decoding support in any media player.
- Decided on using the libdash library (which became the MPEG-DASH reference client library in MPEG's meeting this month) for implementing the client side DASH components and adaptation logic. Since, at any given point in time there are going to be two views being downloaded we need two instances of libDASH on the client side
- Decided to use DASHEncoder for DASH content and MPD generation at server side.
- [Som:1/28] MySQL is not required for DASHEncoder but it only implements 'isoff-basic-on-demand' profile while we need the 'stereoid' profile. We can start with the MPD described for 3 view video in MPEG-DASH standard[Section G.4, pp 120]. If we leave out audio for the time being, from the code base of DASHEncoder it seems the major work will be in replacing x264 relevant parts with JMVC and updating the MP4Box related parts. Does MP4Box support the files generated by JMVC? if not we need further code to implement the ISOBFF parts for JMVC generated bitstream. This is the first thing we need working : Add H.264/MVC bitsream into mp4 container and check if the mp4 file can be played back by any player. For generating the mvc bistream we should select a test sequence with at least three views (c1,c2,c3) such that (c1+c2) and (c2+c3) produce stereo views.
- [Ahmed:1/28] Added detailed description of creating proper content for the 3D display.
People
Project Tree
https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-projects/3DVideo/DASH-FTV
3D Display
The Dimenco auto-stereoscopic display accepts a 2D+Z image (also known as V+D for videos). The video frame containing the texture component is combined with a depth map. The two components (texture and depth) are concatenated together to after reducing the width of each by half. The video frames are then compressed and the resulting file is renamed with the extension .s3d so that the player can detect that it is in the 2D+Z format. This extension makes sure that the Dimenco 3D Player sets the display in 3D mode with the correct settings. The Dimenco 3D Player can also display 3D images in the 2D+Z format. These images should be RGB (24-bit uncompressed) BMP files that are renamed to have the extension .b3d.
The total resolution for the input video should be 1920x540. The left half (with a resolution of 960x540) contains a 2D RGB image and the the right half (also with a 960x540 resolution) contains a grey scale image representing the depth map. It should be noted that the grey scale image is still a 24-bit (i.e., three channel) picture. Therefore, the R, G, and B bytes of the depth map have the same value.
The first step in preparing the content is resizing each of the two components to a resolution of 960x540. This can be achieved using ImageMagick tools (namely, using the convert utility). An example is given below (note that adding the character ! to the size forces -resize to ignore the aspect ratio and distort the image so it always generates an image exactly the size specified).
convert Dancer_color_1920x1088.bmp -resize 960x540\! Dancer_color_960x540.bmp convert Dancer_depth_1920x1088.bmp -resize 960x540\! Dancer_depth_960x540.bmp
Once we have the two components in the correct size, we then need to combine them together (side-by-side) into a single image of size 1920x540. This can be done using another ImageMagick utility called montage. Then -mode concatenate option tells montage to just glue the together and the -tile option can be used to specify the layout it should use. If the layout is not specified, the default is to concatenate the images horizontally (side-by-side).
montage -mode concatenate Dancer_color_960x540.bmp Dancer_depth_960x540.bmp Dancer_2dz_1920x540.bmp
Finally, the resulting image should be renamed with the extension .b3d. If the images are frames from a YUV sequence, they can be extracted using ffmpeg:
ffmpeg -s 1920x1088 -i Dancer_c_5_1920x1088.yuv -r 1 -t 00:00:30 -f image2 ~/Dancer_color_1920x1088.bmp ffmpeg -s 1920x1088 -i Dancer_d_5_1920x1088.yuv -r 1 -t 00:00:30 -f image2 ~/Dancer_depth_1920x1088.bmp
It is mentioned that the frames should be prepared as a sequence of bitmap images. However, I don't believe that this is mandatory. It should be possible to prepare them as YUV sequences as well. But to follow the guidelines, I describe here the process using bitmap images.
The following script performs the resizing on all the frames of one component. It is assumed that the filename of the frame contains its number and the user specifies the range of frames to be resized as well as the target directory to save the resized frames. We can place the resized frames of both components in the same target directory.
#!/bin/bash FILENAME=$1 FRAME_START=$2 FRAME_END=$3 WIDTH=$4 HEIGHT=$5 TARGET_DIRECTORY=$6 for i in `seq -w $FRAME_START $FRAME_END`; do F=$(echo "${FILENAME}$i.bmp") convert $F -resize ${WIDTH}x${HEIGHT}\! ${TARGET_DIRECTORY}/$F done
Next, to combine the texture frame with the corresponding depth frame, the following script can be used.
#!/bin/bash SEQ_NAME=$1 FILENAME=$2 FRAME_START=$3 FRAME_END=$4 TARGET_DIRECTORY=$5 for i in `seq -w $FRAME_START $FRAME_END`; do COLOR=$(echo "color-${FILENAME}$i.bmp") DEPTH=$(echo "depth-${FILENAME}$i.bmp") montage -mode concatenate $COLOR $DEPTH ${TARGET_DIRECTORY}/${SEQ_NAME}_2dz_1920x540_f${i}.bmp done
Now we can use WMEncoder or ffmpeg to encode the frames. I personally prefer ffmpeg. We need to generate an uncompressed AVI file from the combined frames. This can be achieved using the following command:
ffmpeg -i BALLET_2dz_1920x540_f%02d.bmp -vcodec rawvideo -y ballet_2dz_1920x540.avi
We then perform video compression and place the result in a VOB container.
ffmpeg -i ballet_2dz_1920x540.avi -vcodec mpeg2video -b 24000 -bt 16000 -aspect 32:9 -s 1920x540 -y ballet_2dz_1920x540.s3d.vob
Note: The option -hq was mentioned in the documentation. However, this option seems to be no longer supported by ffmpeg. You can try using -mbd 1 instead of -hq.
Finally, we convert the VOB file to a WMV file. A fairly simple way of doing this is using ffmpeg:
ffmpeg -i filename.vob -vcodec wmv2 -acodec wmav2 -sameq -s 720x576 filename.wmv
JMVC Tutorial
TBA
Bino 3D Video Player
- Bino can read commands from one or multiple script files. Use the --read-commands option to specify script files. A script file can be a standard text file, but it can also be a named pipe, which allows other programs or scripts to submit commands to Bino as they see fit. The following is an example for executing a script stored in a text file called script.txt. The --no-gui option is not necessary, but prevents the GUI to remember the changed brightness setting in future sessions.
$ bino --read-commands /path/to/script.txt --no-gui
- GL/gl.h is the base OpenGL header file, which give you OpenGL-1.1 function and token declarations, and maybe more. For anything going beyond version 1.1, you must use the OpenGL extension mechanism. Since this is a tedious task, it has been automated by the GLEW project, which offers all the dirty details packed up in a easy to use library. The declarations of this library are found in the header file GL/glew.h. Since OpenGL extensions don't make sense without basic OpenGL, the GLEW header implicitly includes the regular OpenGL header, so as of including GL/glew.h you no longer need to include GL/gl.h.
- With OpenGL, the default method to display stereoscopic 3D content is OpenGL quad buffered stereo, often used with active shutter glasses. However, graphics card manufacturers tend to enable this output technique only on expensive high end hardware. The default output technique for stereoscopic 3D input is OpenGL quad buffered stereo if the graphics card supports it, otherwise red/cyan anaglyph glasses.
- Code structure:
- dispatch.h: Defines the following classes:
- command A command that can be sent to the dispatch by a controller.
- dispatch The dispatch (singleton). Contains a vector of controller objects. It also has a reference to player, media_input, video_output, audio_output, and gui objects. Important methods include:
- init() Takes an open_input_data object as an argument
- register_controller() and deregister_controller()
- receive_cmd()
- process_all_events()
- get_media_input(), get_video_output(), and get_audio_output()
- step()
- controller The controller interface. A controller can send commands to the dispatch (e.g. "pause", "seek", "set_saturation", "adjust colors", ...). The dispatch then reacts on this command, and sends a notification to all controllers afterwards. The controllers may react on the notification or ignore it. The main functions of this class are:
- receive_notification() The controller receives notifications through calls to this function.
- process_events() Calls to this function request that the controller processes the events.
- send_cmd() Used by the controller to send commands to the player.
- notification A notification that can be sent to controllers by the dispatch (signals that the corresponding value has changed).
- open_input_data Contains everything that is needed to open a media input.
- command_file.h: Defines the following class:
- command_file Extends controller
- video_output.h: Defines the following class:
- video_output Extends controller. The class defines a number of private GL helper functions. Two frames are managed internally by the video_output class (a 2-elements array of video_frame), each with its own set of properties etc. The active frame is the one that is displayed, the other frame is the one that is prepared for display. Each frame contains a left view and may contain a right view. This class also has a member variable referring to the rendering parameters. The parameters are obtained from the dispatch in the display_current_frame() method. The main defined functions are:
- set_suitable_size() Set a video area size suitable for the given input/output settings.
- prepare_next_frame() Prepare a new frame for display.
- activate_next_frame() Switch to the next frame (make it the current one).
- display_current_frame() Overloaded function. This function is very important as the implementation is responsible for rendering the frames using OpenGL according to the chosen output layout. Check implementation and calls to glWindowPos2i() and glDrawPixels(). At the beginning of the function, the rendering parameters are first obtained from the dispatch object. The stereo mode is set based on stereo_mode parameter passed when calling this function.
- video_output Extends controller. The class defines a number of private GL helper functions. Two frames are managed internally by the video_output class (a 2-elements array of video_frame), each with its own set of properties etc. The active frame is the one that is displayed, the other frame is the one that is prepared for display. Each frame contains a left view and may contain a right view. This class also has a member variable referring to the rendering parameters. The parameters are obtained from the dispatch in the display_current_frame() method. The main defined functions are:
- video_output_qt.h: Defines the following classes:
- gl_thread Extends QThread
- video_output_qt_widget Extends QGLWidget
- video_container_widget Extends both QWidget and controller
- video_output_qt Extends video_output
- media_data.h: Defines the following classes:
- device_request Extends serializable
- parameters Extends serializable. Contains enumerations for stereo layout (describes how left and right view are stored) and stereo_mode (the output mode for left and right view).
- video_frame
- audio_blob
- subtitle_box Extends serializable
- media_object.h: Defines the following class:
- media_object
- player.h: Defines the following class:
- player
- gui.h: Defines the following classes:
- audio_dialog Extends QDialog and contoller
- color_dialog Extends QDialog and contoller
- controls_widget Extends QWidget and contoller
- in_out_widget Extends QWidget and contoller
- full_screen_dialog Extends QDialog and contoller
- crosstalk_dialog Extends QDialog and contoller
- quality_dialog Extends QDialog and contoller
- zoom_dialog Extends QDialog and contoller
- subtitle_dialog Extends QDialog and contoller
- video_dialog Extends QDialog and contoller
- sdi_output_dialog Extends QDialog and contoller
- open_device_dialog Extends QDialog
- main_window Extends QMainWindow and contoller
- gui Contains a reference to the main_window object
- main.h: Contains the main() function.
- dispatch.h: Defines the following classes:
DASHEncoder
- One dash segment can have multiple MP4 fragments (subsegmentes in DASH terminology). So one may generate segments with multiple subsegments. More information about this here. The input can be any file you like: If you compile x264 with gpac support it can read any MP4 file, otherwise just YUV files work. If you want to do an transcoding from any other format use the ffmpeg-input pipe option. Gop: Amount of frames for an group of pictures. It’s usually a multiple of the framerate. It makes sense to synchronize it with the segment length, e.g. 24 fps with a gop size of 48 and a segment length of 2 sec. scenecut: is the automatic IDR frame placement of x264 based on scene cut detection. By setting it to 0, it is desabled and the IDR frame is set by the -keyint parameter. passes: the amount of encoding passes. 1 pass is used for VBV mode of x264 … 2 passes for CBR mode of x264. More Information about x264 parameters can be found here.
- AnyOption.h
- Defines the AnyOption class. The internal state of an object from this class hold an in-memory representation of the configuration options.
- DASHEncoder.cpp
- Contains the main() function. Upon starting the program, the parse() function is called. The parse() function creates an AnyOption object and calls two methods in that object: one to parse the configuration file, and one to parse the command-line parameters. After this step, MPDGenerator and MP4BoxMultiplexer objects are created, as well as one of the concrete implementations of AbstractVideoEncoder (e.g., x264Encoder). For video encoding, the program iterates over each bit rate defined in the bitrate string in the configuration. For each bitrate, the video encoder is configured for that bitrate and the encode() method of the AbstractVideoEncoder object is invoked. After encoding, a folder is created to hold the DASH-encoded representation to be generated. The video bitstream is moved to that folder. If an AV mux combination is specified, the audio files are copied to the folder. The convertMPDs() function converts the MPDs to actual standard.
libdash
- When attempting to build libdash from source, I got the following error when configuring using cmake:
CMake Error at /usr/share/cmake-2.8/Modules/FindPackageHandleStandardArgs.cmake:91 (MESSAGE): Could NOT find CURL (missing: CURL_LIBRARY CURL_INCLUDE_DIR)
- This was resolved by installing one of the following packages: libcurl4-gnutls-dev or libcurl4-openssl-dev (depending on the SSL implementation you like to use).
- When attempting to configure and build the sampleplayer, the following error was encountered:
/usr/local/include/libavutil/common.h: In function ‘int32_t av_clipl_int32_c(int64_t)’: /usr/local/include/libavutil/common.h:173:47: error: ‘UINT64_C’ was not declared in this scope
- This was because cmake is not adding CXXFLAGS=-D__STDC_CONSTANT_MACROS when generating the Makefile. This can be resolved by adding the following line after the CMAKE_MODULE_PATH line in the CMakeLists.txt file of the sampleplayer.
set(CMAKE_CXX_FLAGS "-D__STDC_CONSTANT_MACROS")
- Another error that is encountered in the sampleplayer::decoder::LibavDecoder::decode() method when building the sampleplayer was:
error: ‘Sleep’ was not declared in this scope
- The Sleep() function is a Windows C++ function. One alternative to this function in Linux is usleep(). This requires adding compiler directive to selectively choose one of the two functions based on the OS.
- To build the sampleplayer along with libdash, we need to edit the CMakeLists.txt by adding the following line:
add_subdirectory(sampleplayer)
- However, because the SDL headers included with libdash are intended for Windows, we get the following error when reaching the sampleplayer code when compiling:
/libdash/sdl/include/SDL_config_minimal.h:38:22: error: conflicting declaration ‘typedef unsigned int size_t’ /usr/lib/gcc/x86_64-linux-gnu/4.6/include/stddef.h:212:23: error: ‘size_t’ has a previous declaration as ‘typedef long unsigned int size_t’
- The SDL header files should never try to redefine size_t on Linux. We should properly install the SDL development package (libsdl1.2-dev) through apt-get/synaptic.
DASH Options
- Apple's HTTP Live Streaming (HLS)
- MPEG-DASH specification (ISO/IEC 23009-1)
Software and Data
- Joint Multiview Video Coder (JMVC)
- Bino 3D video player
- FFmpeg and x264 Encoding Guide
- Example: Encoding MSR video sequences views (can use -qp 0 or -crf 0 to encode a lossless output):
ffmpeg -i color-cam2-f%03d.bmp -c:v libx264 -preset slow -crf 22 -r 10 -c:a copy cam2.mp4
- Example: Encoding MSR video sequences views (can use -qp 0 or -crf 0 to encode a lossless output):
References
- A. Vetro and I. Sodagar, The MPEG-DASH Standard for Multimedia Streaming over the Internet, IEEE Multimedia, 2011
- T. Stockhammer, Dynamic Adaptive Streaming over HTTP - Standards and Design Principles, Proc. of ACM Multimedia Systems, 2011
- Deliverable 5.2: Media Signalling Specification for 3D Video Content Types, COAST Project
- G. Park, J. Lee, G. Lee, and K. Kim, Efficient 3D Adaptive HTTP Streaming Scheme over Internet TV, Proc. of IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), 2012
- C. Müller and C. Timmerer, A VLC Media Player Plugin enabling Dynamic Adaptive Streaming over HTTP, Proc. of ACM Multimedia, 2011