Private:FTV

This page describes the envisioned free-viewpoint TV system, including: design decisions, useful resources, tools, etc.

Updates

[Meeting on:2/1]
[Som:1/28] MySQL is not required for DASHEncoder but it only implements 'isoff-basic-on-demand' profile while we need the 'stereoid' profile. We can start with the MPD described for 3 view video in MPEG-DASH standard[Section G.4, pp 120]. If we leave out audio for the time being, from the code base of DASHEncoder it seems the major work will be in replacing x264 relevant parts with JMVC and updating the MP4Box related parts. Does MP4Box support the files generated by JMVC? if not we need further code to implement the ISOBFF parts for JMVC generated bitstream. This is the first thing we need working : Add H.264/MVC bitsream into mp4 container and check if the mp4 file can be played back by any player. For generating the mvc bistream we should select a test sequence with at least three views (c1,c2,c3) such that (c1+c2) and (c2+c3) produce stereo views.
[Ahmed:1/28] Added detailed description of creating proper content for the 3D display.

People

Mohamed Hefeeda

Ahmed Hamza

Somsubhra Sharangi

Target 3D Display

The Dimenco auto-stereoscopic display accepts a 2D+Z image (also known as V+D for videos). The video frame containing the texture component is combined with a depth map. The two components (texture and depth) are concatenated together to after reducing the width of each by half. The video frames are then compressed and the resulting file is renamed with the extension .s3d so that the player can detect that it is in the 2D+Z format. This extension makes sure that the Dimenco 3D Player sets the display in 3D mode with the correct settings. The Dimenco 3D Player can also display 3D images in the 2D+Z format. These images should be RGB (24-bit uncompressed) BMP files that are renamed to have the extension .b3d.

The total resolution for the input video should be 1920x540. The left half (with a resolution of 960x540) contains a 2D RGB image and the the right half (also with a 960x540 resolution) contains a grey scale image representing the depth map. It should be noted that the grey scale image is still a 24-bit (i.e., three channel) picture. Therefore, the R, G, and B bytes of the depth map have the same value.

The first step in preparing the content is resizing each of the two components to a resolution of 960x540. This can be achieved using ImageMagick tools (namely, using the convert utility). An example is given below (note that adding the character ! to the size forces -resize to ignore the aspect ratio and distort the image so it always generates an image exactly the size specified).

convert Dancer_color_1920x1088.bmp -resize 960x540\! Dancer_color_960x540.bmp
convert Dancer_depth_1920x1088.bmp -resize 960x540\! Dancer_depth_960x540.bmp

Once we have the two components in the correct size, we then need to combine them together (side-by-side) into a single image of size 1920x540. This can be done using another ImageMagick utility called montage. Then -mode concatenate option tells montage to just glue the together and the -tile option can be used to specify the layout it should use. If the layout is not specified, the default is to concatenate the images horizontally (side-by-side).

montage -mode concatenate Dancer_color_960x540.bmp Dancer_depth_960x540.bmp Dancer_2dz_1920x540.bmp

Finally, the resulting image should be renamed with the extension .b3d. If the images are frames from a YUV sequence, they can be extracted using ffmpeg:

ffmpeg -s 1920x1088 -i Dancer_c_5_1920x1088.yuv -r 1 -t 00:00:30 -f image2 ~/Dancer_color_1920x1088.bmp
ffmpeg -s 1920x1088 -i Dancer_d_5_1920x1088.yuv -r 1 -t 00:00:30 -f image2 ~/Dancer_depth_1920x1088.bmp

It is mentioned that the frames should be prepared as a sequence of bitmap images. However, I don't believe that this is mandatory. It should be possible to prepare them as YUV sequences as well. But to follow the guidelines, I describe here the process using bitmap images.

The following script performs the resizing on all the frames of one component. It is assumed that the filename of the frame contains its number and the user specifies the range of frames to be resized as well as the target directory to save the resized frames. We can place the resized frames of both components in the same target directory.

#!/bin/bash

FILENAME=$1
FRAME_START=$2
FRAME_END=$3
WIDTH=$4
HEIGHT=$5
TARGET_DIRECTORY=$6

for i in `seq -w $FRAME_START $FRAME_END`;
do
        F=$(echo "${FILENAME}$i.bmp")
        convert $F -resize ${WIDTH}x${HEIGHT}\! ${TARGET_DIRECTORY}/$F
done

Next, to combine the texture frame with the corresponding depth frame, the following script can be used.

#!/bin/bash

SEQ_NAME=$1
FILENAME=$2
FRAME_START=$3
FRAME_END=$4
TARGET_DIRECTORY=$5


for i in `seq -w $FRAME_START $FRAME_END`;
do
        COLOR=$(echo "color-${FILENAME}$i.bmp")
        DEPTH=$(echo "depth-${FILENAME}$i.bmp")
        montage -mode concatenate $COLOR $DEPTH ${TARGET_DIRECTORY}/${SEQ_NAME}_2dz_1920x540_f${i}.bmp
done

Now we can use WMEncoder or ffmpeg to encode the frames. I personally prefer ffmpeg. We need to generate an uncompressed AVI file from the combined frames. This can be achieved using the following command:

ffmpeg -i BALLET_2dz_1920x540_f%02d.bmp -vcodec rawvideo -y ballet_2dz_1920x540.avi

We then perform video compression and place the result in a VOB container.

ffmpeg -i ballet_2dz_1920x540.avi -vcodec mpeg2video -b 24000 -bt 16000 -aspect 32:9 -s 1920x540 -y ballet_2dz_1920x540.s3d.vob

Note: The option -hq was mentioned in the documentation. However, this option seems to be no longer supported by ffmpeg. You can try using -mbd 1 instead of -hq.

Finally, we convert the VOB file to a WMV file. A fairly simple way of doing this is using ffmpeg:

ffmpeg -i filename.vob -vcodec wmv2 -acodec wmav2 -sameq -s 720x576 filename.wmv

DASH Options

Apple's HTTP Live Streaming (HLS)

MPEG-DASH specification (ISO/IEC 23009-1)

Software and Data

Joint Multiview Video Coder (JMVC)

GPAC

Bino 3D video player

FFmpeg and x264 Encoding Guide
- Example: Encoding MSR video sequences views:

ffmpeg -i color-cam2-f%03d.bmp -c:v libx264 -preset slow -crf 22 -r 10 -c:a copy cam2.mp4

References

A. Vetro and I. Sodagar, The MPEG-DASH Standard for Multimedia Streaming over the Internet, IEEE Multimedia, 2011
T. Stockhammer, Dynamic Adaptive Streaming over HTTP - Standards and Design Principles, Proc. of ACM Multimedia Systems, 2011
Deliverable 5.2: Media Signalling Specification for 3D Video Content Types, COAST Project
G. Park, J. Lee, G. Lee, and K. Kim, Efficient 3D Adaptive HTTP Streaming Scheme over Internet TV, Proc. of IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), 2012
C. Müller and C. Timmerer, A VLC Media Player Plugin enabling Dynamic Adaptive Streaming over HTTP, Proc. of ACM Multimedia, 2011