Summer 2011 (RA)

Courses: None

working on: Video Copy Detection using SURF count signature and temporal ordinal measures Updated Thesis Report

Aug 16-

Working on thesis:

Edited Chapter 6: evaluation
Running experiments involving combinatations of transformations

Aug 5- Aug 15

Working on editing my thesis report

Jul 12 - Aug 7

Made the following video transformations - these are mostly the extreme limits of TrecVid evaluation criteria.

Gaussian blur of 3 pixels in radius
Gamma shift to .5
Gamma shift to 1.6
Image rotation by 10 degrees
Image shift by 50 horizontally and 40 pixels vertically
Image crop of 20 pixels from all sides
Addition of random noise to 20% of the pixels
Addition of text
Image height scaled by 75% creating both a horizontal stretch and a letterbox effect
Camera angle adjusted by 20o horizontally and 20o vertically
Image resize to 50% of its original size
Image resize to 200% of its original size
Contrast decreased 30%
Contrast increased 30%

The precision and recall were calculated. Implemented a sampling algorithm to greatly speed up the search. 105 queries were searched against the database of 399 videos in 18 minutes sampling at 1 frame in every 10.

The effects of different sampling rates was examined and the results added to the report.

The effect of different grids were examined.

Looked at a 2x2, 3x3, and 4x4. Determined that the 2x2 was the best configuration as it was faster with virtually no loss of quality

Rewrote almost the entire report.

Jul 5 - Jul 11

I have been creating the database - It has taken 4 days to analize 107 Gb of data. While this has been going on, I have been creating some test queries on 7 randomly selected videos

Gamma Correction
- 10 is default (ranges from 1 to 28)
  - Set to 1
  - Set to 5
  - Set to 20
  - Set to 28
Blurring (5 pixels)
Resizing
- 50%
- 75%
- 150%
- 200%
Inserted Logo
Noise (0-100%)
- 10%
- 25%
- 50%

June 23 - Jul 4

I have implemented a function to remove letterbox and pillarbox effects which are commonly found. The SURF features are extracted from a subimage which does not have the letterbox and pillarbox effects.
I found a bug in my distance calculations which was causing me a lot of problems. I finally realized what was going on and was able to find and fix it.
I am still struggling with the mask for inserted text and logos. It is not quite working correctly, but I am very close.
My running time was really poor - it was taking me forever to calculate the distances. This was do to the large amount of sorting needed. I was naively sorting from scratch everytime I shifted my sliding window. I spent some time on figuring out a smart algorithm and implementing it. Running time is now good.
I have found and downloaded groundtruth data for the Trekvid 2009 dataset. I will post a link to it later.

June 14 - Jun 22

I have implemented the Temporal-Spacial Surf count based video copy detection scheme and it seems to be working well. I was able to detect and locate an unaltered copy of the original both by using every frame as well as by subsampling the frames. I subsampled every 10th frame. There is no experimentation as to how often to sample, so my choice is completely arbitrary at this point.

I have also implemented the ability to detect static content embedded into the original clip. This is implemented by exammining the variance of each pixel and thresholding it. If it does not vary enough it is considered static. I was successfully able to isolate some static text from the original and in regions where the number of static pixels exceeded 5% of the total number of pixels in the region, I excluded that region from the distance calculation. I tested this method with a video clip that had embedded text and was able to locdate the copy, but the position was off by 100 frames. Without using isolating the static content, I was not able to isolate the copy.

I tested a video clip with a guassian blur of 3 pixels and was able to lacate it.

I have been trying to integrate the static mask with the original method proposed by Roth et. al., but it is not working. They calculate their distance based on a normalized L1-distance. This seems to be no robust to shutting off the regions and I am not sure how to fix this. It would be good to have this working to make benchmark comparisons.

June 6 - Jun 13

I have done the following:

-I have changed the video frame extraction process to be done programmatically rather than using ffmpeg manually. This allows frame analysis to be done faster and without huge amounts of storage space

-I have streamlined the Video Copy Detection system developed for the multimedia project.

-I have implemented the proposed temporal ordinal surf based signature. I am currently testing it. I expect to have a very basic implementation working within the next day or so

May 29 - Jun 6

I went through papers again with the idea of improving the algorithm implemented as part of CMPT 820. I have suggested building on the work by counting the number of SURF features in each region and using an ordinal method instead. The temporal-ordinal method of Chen and Stentiford will be adopted for its superior performance. It is hoped that the adopted signature will capitalize on the temporal information within a video sequence.

Attended the Nossdav Conference. It was very interesting. Thank you for the opportunity!

I am currently working on implementing the solution proposed. I expect this will take a couple weeks

May 10 - May 28

Rewrote and clarified much my work relating to using MPEG motion vectors for video copy detection. I am looking into implementing some idea relating to optical flow.

April 27 - May 9

I have been reviewing more papers and trying to come up with a new idea for copy detection. I propose to find a weighted average location of the brightest pixels in a frame. The weighting gives more weight to the center of a frame and less weight to areas more likely to have edit effects. The location will need to be normalized based on the frame size. For each frame we will store the normalized location. The signature is quite compact as for each frame we need only to store 2 integers (or 2 floating points?)

Searching will be done with a sequence matching algorithm.

The brightest pixels I assume will be those with the highest luminance value in the YUV color space. I am looking into how easy it is to get luminance information from video content.

Spring 2011 (GF)

Courses: None

working on: Video Copy Detection using Optical Flow

April 13 - April 27

I added up all the motion vectors in one frame to get a global motion vector (x,y), where x and y are the sum of the x and y components of all the macroblocks in the frame. The direction is calculated as arctan(y/x). The motion vectors of the original video clip are not tracked well by the motion vectors in the transformed clips. The transformations examined were rotation and blur. It is expected that blurring should have really good performance because virtually nothing in the video is changed motionwise. I calculated the Euclidean distance between clips of 4500 frames. Calculation of the distances between the transformed and the original clip gave us a narrow range of 8187.6 - 8838.4. I am not sure how reliable this distance metric is due to the occurrence of I-frames. When an I-frame is encountered, there are two possible approaches. One can assume the difference between and I-frame and any other frame is zero. The approach I took was to make the direction of an I-frame be the average of the adjacent 2 frames. The distance metric between 2 completely different clips ranged from 8287.6- 11552. This is encouraging in that the range is much larger, but there are many results which fall within the range of the transformed clips. Thus far the motion vectors do not seem discriminating enough. Perhaps this is because motion vectors are of too fine a granularity

April 9 - April 12

Random Concerns about using motion vectors for the signature
- I used ffmpeg to encode a 3 minute video clip from a larger clip using the code in (1) below
  - I got some really large motion vectors. 565 motion vectors were greater than 100 in magnitude. The average Magnitude of all motion Vectors went from 3.97 in the original clip to 6.54 in the transcoded clip.
    - This about a 50% difference.
  - The average motion direction was 138.90 degrees in the original clip and 145.46 degrees in the transcoded clip.
    - This is difference of aout 4.61%.
- There is a me_range setting wich allows one to specify the search radius. I used this in (2) below
  - All motion vectors were of magnitude 5 or smaller.
  - I got an average magnitude of 2.25
    - This is a 55% difference
  - The average direction was 144.97 degrees
    - This is a difference of 4.09%
    - Closer to that in (1), and clearly more robust than magnitude

(1) ffmpeg -i LargeClip.mpg -g 12 -bf 2 -vframes 4500 -b 1344k out.mpg
(2) ffmpeg -i LargeClip.mpg -g 12 -bf 2 -vframes 4500 -b 1344k -me_range 5 out.mpg

Clearly the magnitudes of the motion vectors is arbitrary. It depends largly on the settings in the encoder which cannot be know when doing copy detection The direction of the motion vectors seem more robust, but I think must also be a function of the encoding processs. There are a number of encoding algorthms which use a search pattern to find a local minimum difference in MBs. The patterns such as Diamond, Cross Diamond, Kite Cross Diamond, Exhaustive, etc. These patterns may find local minima in differnent directions. I am investigating this now

March 30 - April 8

To deal with the large spike in the data around the (0,0) vector, I have been working on two approaches:

Looking at motion vectors which are away from the (0,0) vector
- I went a magnitude of 20 pixels away from the center. Results are significantly worse.
- I will run a script to go through all magnitudes away from the center and see if there is an optimum
Increasing the granularity of the direction bins
- I went from 8 to 12 to 24 bins for the directions. There was little change in the results

Neither approach is providing me with good results. The results I expect are that the distances between the same video under different transformations will be reasonble consistent AND the distances between different video clips will be larger than all of these.

Another Motion Vector Approach is used in this paper. It calculates the median magnitude of motion vectors for each direction bin. It is used for seeking. It does not claim to be useful for transformed videos, but it may be a good approach. We would need to implement a seqeunce matching algorithm to evaluate the results.

March 15 - March 29

I have analyzed the results from the transformations using both cosine and euclidean distance metrics. I then got motion histograms for the 399 videos in the trekvid 2009 database. I used ffmpeg to get a 3 minute clip and then created the motion histogram for the clips. I got the pairwise distance between all combinations of pairs and compared these distances to the ones obtained for the video clip under transformations. I discovered that many of the distances for pairs of different clips were smaller than the distance for the same clip under mild transformations. This presents a problem for using motion vectors as a signature.

Most of the motion is small. On average, 40% of the motion vectors are (0,0) and 23% are of magnitude 1. This leads to 63% of the motion being placed in the first few bins. I am experimenting with looking at motion outside this region as the fingerprint

March 8 - March 14

I have made copies of a clip with the following transformations applied:

Resizing to 90%, 80%, 70%, 60%, 50% of the original.
Cropping of 10, 20, 30, 40, 50 pixels from border around clip
Addition of a logo to the clip

I am now trying to analyse the histograms to see how well the motion signature matches between clips. I expect to have results by the end of Tuesday

March 2 - March 7

I worked on writing up the algorithm - I had some trouble envisioning how to combine the two independant elements (motion vectors and SURF features) into one signature. As per discussion with Dr. Hefeeda, I am looking just at the motion vectors for now.

Several papers use motion vectors.

Hampapur et. al. describe a signature based on the motion of the video across frames. Each frame is partitioned into N = Nx x Ny blocks. they examine a block in frame-t centered at (xt, yt) and look in frame t+1 within specified search region for a block which has the minimum Sum of Average Pixel Differences, SAPD. The difference between the patch location in frame t and the best match in frame t + 1 produces a motion vector.

Tasdemir et. al. is the only paper I found which used the motion vectors directly. This paper claims motion vectors are a good parameter, but that "a complete CBCD system should be capable of fusing the information coming from motion, color and SIFT feature sets in an intelligent manner to reach a decision"

No paper that I saw attempted to aggregate motion vectors across frames for the signature

I am working on testing the motion vectors under a few transformations and see how the histograms compare.

Feb 23 - March 1

I have created seversal test cased to examine the motion vectors I extract and validate them.
Experimenting with motion vectors extracted gave results which did not seem correct. I have been digging into the source code to try and figure what is going on. I have been able to fix one major problem relating whether the prediction was based on the previous frame or the next frame. I am still looking into some others.

Feb 16 - Feb 22

Using ffmpeg APIs I am now able to do the following:
- Extract I-frames and save in jpg format for later analysis
- Extract Motion vectors for B and P frames which I will use to build my histogram

Note: There does not seem to be a way to do this with ffmpeg in the compressed domain. I have to make a call to avcodec_decode_frame() in order to get the motion information and the frame type. There is a workaround for the I-frames. I can use a utility called ffprobe (use the R92 branch) with the -show_frames option from which a list of I-frames can be constructed, but since I need to decode the I-frames to decode subsequent B/P frames, I will just loop through and decode all frames. Theoretically, this can be run in hte compressed domain if one knew enough about the encoding protocal and had lots of time to write their own parser. I think that my approach is enough to show proof of concept for now.

Feb 9 - Feb 15

Worked on motion vector extraction using ffmpeg APIs

Feb 2 - Feb 8

Downloaded and compiled ffprobe
Started coding using ffmpeg libraries to extract I-frames and motion vectors. I have almost got the I-Frame portion worked out, and I will look into the motion vectors next week

Jan 19 - Feb 1

Clustered SURF pts with both k-means and x-means
Looking into how to extract I-frames and motion information from video sequences
Downloaded and compiled source files for ffmpeg
- I think I can write something using this which will parse the mpegs
- I am having problems with permissions when i try to install - working with Ahmed and Jason
Found a Matlab m-file for extracting motion vectors - I am not sure this will be all that usefulMatlab m files
Also found a reworking of mplayer: modified mplayer (modified for flow)
- Janez Pers: The modifications are relatively minor, but ugly. The code that draws motion vectors is changed to dump the arrow directions and length into the text file. I cannot offer any support for compiling Mplayer though. The binary-only (windows) version is available here, it has added example video (it is better to start with this): windows binary. If you like the code and will use it in your scientific work, you can check my paper, which uses the same code for the second batch of experiments: Pers's paper.
- Short instructions:
  - if you have MPEG4 already (I used the mpeg4 encoding as a fast way to get vectors as well), then skip the first step: mencoder original.avi -ovc lavc -lavcopts vcodec=mpeg4 -o mpeg4encoded.avi
  - now extract the motion vectors, without displaying the video (you can display the video as well, if you like, it was just more convenient for me) mplayer mpeg4encoded.avi -benchmark -lavdopts vismv=1
  - Now, the file opticalflow.dat will appear. Do not forget option vismv=1, the extraction is part of the visualisation.
  - The file opticalflow.dat has the following format: framenum,x,y,vx,vy (vx vy being the vectors, x y being the position of the block).
- Be aware that the data for the I frames will be missing (no flow there). And, in my experience, lower bitrates give better flow than high ones - with high ones the encoder does not need to bother with the motion vectors, since it has enough bandwith already...

Jan 19-25

TrecVid 2008 final transformation document with examples: Transformation Document
TrecVid 2008 explanation of how transformations are generated: Transformation Explanation
2010 TrecVid Requirements
Downloaded the TrecVid 2007 and 2009 databases with test cases and testcases for 2010
Downloaded and investigating x-means experimental software (Licensed to me for research purposes only)
Experimenting with SURF interext points and x-means clustering

Jan 12-18

Finished survey of Video Copy Detection methods Survey
Prepared presentation on Optical Flow My Presentation

Jan 11

Survey of State of the Art Techniques

Fall 2010 (RA)

Courses:
- CMPT-820: Multimedia Systems
worked on:
- Video Copy Detection

Summer 2010 (TA)

Courses:
- None
worked on:
- Energy-Efficient Gaming on Mobile Devices using Dead Reckoning-based Power Management
submitted
- NetGames 2010: Energy-Efficient Gaming on Mobile Devices using Dead Reckoning-based Power Management (accepted)

Spring 2010 (RA)

Courses:
- CMPT-822: Special topics in Database Systems
- CMPT 884: Computational Vision
worked on:
- Energy-Efficient Gaming on Mobile Devices
submitted
- Nosdav 2010: Energy-Efficient Gaming on Mobile Devices (not accepted)

Fall 2009 (TA)

Courses:
- CMPT-705: Algorithm
- CMPT-771: Internet Architecture and Protocols

To Do List

Full Write up of the proposed algorithm. Start with a centralized approach and them show how it can be distributed.
- Claim: This algorithm is novel, better, more efficient, and can be easily distributed
- State how it can be implemented distributively. Naively - 1 node/video clip - is there a better way?
- State evaluation criteria. Prioritize wich transformations are the most important and how our algorithm is expected to deal with each.

Private:progress-harvey

Contents