Private:progress-harvey

From NMSL
Revision as of 15:53, 11 April 2011 by Rch3 (talk | contribs)

Spring 2011 (GF)

  • Courses: None

working on: Video Copy Detection using Optical Flow

April 9 - April 12

  • Random Concerns about using motion vectors for the signature
    • I used ffmpeg to encode a 3 minute video clip from a larger clip using the code in (1) below
      • I got some really large motion vectors. 565 motion vectors were greater than 100 in magnitude. The average Magnitude of all motion Vectors went from 3.97 in the original clip to 6.54 in the transcoded clip.
      • This about a 50% difference.
      • The average motion direction was 57.09 degrees in the original clip and 85.06 degrees in the transcoded clip.
      • This is difference of aout 40%.
    • There is a me_range setting wich allows one to specify the search radius. I used this in (2) below
      • All motion vectors were of magnitude 5 or smaller.
      • I got an average magnitude of 3.65 - Closer to the original)
      • The average direction was 80.92 degrees - Closer to that in (1)

(1) ffmpeg -i LargeClip.mpg -g 12 -bf 2 -vframes 4500 -b 1344k out.mpg
(2) ffmpeg -i LargeClip.mpg -g 12 -bf 2 -vframes 4500 -b 1344k -me_range 5 out.mpg

clearly the magnitudes of the motion vectors is arbitrary. The direction of the motion vectors I think must also be a function of the encoding processs. There are a number of encoding algorthms which use a search pattern to find a local minimum difference in MBs. The patterns such as Diamond, Cross Diamond, Kite Cross Diamond, Exhaustive, etc. These patterns may find local minima in differnent directions. I find it unusual that the average mot

March 30 - April 8

To deal with the large spike in the data around the (0,0) vector, I have been working on two approaches:

  • Looking at motion vectors which are away from the (0,0) vector
    • I went a magnitude of 20 pixels away from the center. Results are significantly worse.
    • I will run a script to go through all magnitudes away from the center and see if there is an optimum
  • Increasing the granularity of the direction bins
    • I went from 8 to 12 to 24 bins for the directions. There was little change in the results

Neither approach is providing me with good results. The results I expect are that the distances between the same video under different transformations will be reasonble consistent AND the distances between different video clips will be larger than all of these.

Another Motion Vector Approach is used in this paper. It calculates the median magnitude of motion vectors for each direction bin. It is used for seeking. It does not claim to be useful for transformed videos, but it may be a good approach. We would need to implement a seqeunce matching algorithm to evaluate the results.

March 15 - March 29

I have analyzed the results from the transformations using both cosine and euclidean distance metrics. I then got motion histograms for the 399 videos in the trekvid 2009 database. I used ffmpeg to get a 3 minute clip and then created the motion histogram for the clips. I got the pairwise distance between all combinations of pairs and compared these distances to the ones obtained for the video clip under transformations. I discovered that many of the distances for pairs of different clips were smaller than the distance for the same clip under mild transformations. This presents a problem for using motion vectors as a signature.

Most of the motion is small. On average, 40% of the motion vectors are (0,0) and 23% are of magnitude 1. This leads to 63% of the motion being placed in the first few bins. I am experimenting with looking at motion outside this region as the fingerprint


March 8 - March 14

I have made copies of a clip with the following transformations applied:

  • Resizing to 90%, 80%, 70%, 60%, 50% of the original.
  • Cropping of 10, 20, 30, 40, 50 pixels from border around clip
  • Addition of a logo to the clip

I am now trying to analyse the histograms to see how well the motion signature matches between clips. I expect to have results by the end of Tuesday

March 2 - March 7

I worked on writing up the algorithm - I had some trouble envisioning how to combine the two independant elements (motion vectors and SURF features) into one signature. As per discussion with Dr. Hefeeda, I am looking just at the motion vectors for now.

Several papers use motion vectors.

  • Hampapur et. al. describe a signature based on the motion of the video across frames. Each frame is partitioned into N = Nx x Ny blocks. they examine a block in frame-t centered at (xt, yt) and look in frame t+1 within specified search region for a block which has the minimum Sum of Average Pixel Differences, SAPD. The difference between the patch location in frame t and the best match in frame t + 1 produces a motion vector.
  • Tasdemir et. al. is the only paper I found which used the motion vectors directly. This paper claims motion vectors are a good parameter, but that "a complete CBCD system should be capable of fusing the information coming from motion, color and SIFT feature sets in an intelligent manner to reach a decision"
  • No paper that I saw attempted to aggregate motion vectors across frames for the signature

I am working on testing the motion vectors under a few transformations and see how the histograms compare.


Feb 23 - March 1

  • I have created seversal test cased to examine the motion vectors I extract and validate them.
  • Experimenting with motion vectors extracted gave results which did not seem correct. I have been digging into the source code to try and figure what is going on. I have been able to fix one major problem relating whether the prediction was based on the previous frame or the next frame. I am still looking into some others.

Feb 16 - Feb 22

  • Using ffmpeg APIs I am now able to do the following:
    • Extract I-frames and save in jpg format for later analysis
    • Extract Motion vectors for B and P frames which I will use to build my histogram

Note: There does not seem to be a way to do this with ffmpeg in the compressed domain. I have to make a call to avcodec_decode_frame() in order to get the motion information and the frame type. There is a workaround for the I-frames. I can use a utility called ffprobe (use the R92 branch) with the -show_frames option from which a list of I-frames can be constructed, but since I need to decode the I-frames to decode subsequent B/P frames, I will just loop through and decode all frames. Theoretically, this can be run in hte compressed domain if one knew enough about the encoding protocal and had lots of time to write their own parser. I think that my approach is enough to show proof of concept for now.

Feb 9 - Feb 15

  • Worked on motion vector extraction using ffmpeg APIs

Feb 2 - Feb 8

  • Downloaded and compiled ffprobe
  • Started coding using ffmpeg libraries to extract I-frames and motion vectors. I have almost got the I-Frame portion worked out, and I will look into the motion vectors next week

Jan 19 - Feb 1

  • Clustered SURF pts with both k-means and x-means
  • Looking into how to extract I-frames and motion information from video sequences
  • Downloaded and compiled source files for ffmpeg
    • I think I can write something using this which will parse the mpegs
    • I am having problems with permissions when i try to install - working with Ahmed and Jason
  • Found a Matlab m-file for extracting motion vectors - I am not sure this will be all that usefulMatlab m files
  • Also found a reworking of mplayer: modified mplayer (modified for flow)
    • Janez Pers: The modifications are relatively minor, but ugly. The code that draws motion vectors is changed to dump the arrow directions and length into the text file. I cannot offer any support for compiling Mplayer though. The binary-only (windows) version is available here, it has added example video (it is better to start with this): windows binary. If you like the code and will use it in your scientific work, you can check my paper, which uses the same code for the second batch of experiments: Pers's paper.
    • Short instructions:
      • if you have MPEG4 already (I used the mpeg4 encoding as a fast way to get vectors as well), then skip the first step: mencoder original.avi -ovc lavc -lavcopts vcodec=mpeg4 -o mpeg4encoded.avi
      • now extract the motion vectors, without displaying the video (you can display the video as well, if you like, it was just more convenient for me) mplayer mpeg4encoded.avi -benchmark -lavdopts vismv=1
      • Now, the file opticalflow.dat will appear. Do not forget option vismv=1, the extraction is part of the visualisation.
      • The file opticalflow.dat has the following format: framenum,x,y,vx,vy (vx vy being the vectors, x y being the position of the block).
    • Be aware that the data for the I frames will be missing (no flow there). And, in my experience, lower bitrates give better flow than high ones - with high ones the encoder does not need to bother with the motion vectors, since it has enough bandwith already...

Jan 19-25

  • TrecVid 2008 final transformation document with examples: Transformation Document
  • TrecVid 2008 explanation of how transformations are generated: Transformation Explanation
  • 2010 TrecVid Requirements
  • Downloaded the TrecVid 2007 and 2009 databases with test cases and testcases for 2010
  • Downloaded and investigating x-means experimental software (Licensed to me for research purposes only)
  • Experimenting with SURF interext points and x-means clustering

Jan 12-18

  • Finished survey of Video Copy Detection methods Survey
  • Prepared presentation on Optical Flow My Presentation

Jan 11

  • Survey of State of the Art Techniques

Fall 2010 (RA)

  • Courses:
    • CMPT-820: Multimedia Systems
  • worked on:
    • Video Copy Detection

Summer 2010 (TA)

  • Courses:
    • None
  • worked on:
    • Energy-Efficient Gaming on Mobile Devices using Dead Reckoning-based Power Management
  • submitted
    • NetGames 2010: Energy-Efficient Gaming on Mobile Devices using Dead Reckoning-based Power Management (accepted)

Spring 2010 (RA)

  • Courses:
    • CMPT-822: Special topics in Database Systems
    • CMPT 884: Computational Vision
  • worked on:
    • Energy-Efficient Gaming on Mobile Devices
  • submitted
    • Nosdav 2010: Energy-Efficient Gaming on Mobile Devices (not accepted)

Fall 2009 (TA)

  • Courses:
    • CMPT-705: Algorithm
    • CMPT-771: Internet Architecture and Protocols

To Do List

  • Full Write up of the proposed algorithm. Start with a centralized approach and them show how it can be distributed.
    • Claim: This algorithm is novel, better, more efficient, and can be easily distributed
    • State how it can be implemented distributively. Naively - 1 node/video clip - is there a better way?
    • State evaluation criteria. Prioritize wich transformations are the most important and how our algorithm is expected to deal with each.


Consider implementing a non-evenly distributed histogram for magnitudes. Consider the behaviour of the tail of the magnitude data. Also look into the search window size in ffmpeg and set the maximum magniture to this value.

Watch for the number of motion vectors received from 8x8. 16x8 and 8x16 and decide if it is worth considering these

Want to get something together for conference in mid April