Difference between revisions of "Private:copyDetection: Notes"

From NMSL
 
(One intermediate revision by the same user not shown)
Line 7: Line 7:
 
* Extract local features (e.g., SIFT) from I frames  
 
* Extract local features (e.g., SIFT) from I frames  
  
* Cluster these features into K clusters using for example K-means method (need to try for different values of K)
+
* Cluster these features into K clusters using for example K-means method (need to try different values of K)
  
 
* To create signatures  
 
* To create signatures  
 
 
** Divide a video into groups (may be GOP)
 
** Divide a video into groups (may be GOP)
 
**Extract local features from I frames
 
**Extract local features from I frames
Line 16: Line 15:
 
**Normalize the probabilities so that they sum to 1
 
**Normalize the probabilities so that they sum to 1
 
**Use these (k) probabilities as a signature.  
 
**Use these (k) probabilities as a signature.  
--
 
 
**In addition, we extract motion vectors from non I-frames in the GOP.
 
**In addition, we extract motion vectors from non I-frames in the GOP.
 
**Quantize these motion vectors into fixed number of bins, say B
 
**Quantize these motion vectors into fixed number of bins, say B
 
**Build a histogram on these bins
 
**Build a histogram on these bins
 
**Normalize and compute probabilities (vector of size B).
 
**Normalize and compute probabilities (vector of size B).
 
 
**Now, use a combined signature from local features (K vector) and motion info (B vector).
 
**Now, use a combined signature from local features (K vector) and motion info (B vector).
 
  
 
*For comparing :
 
*For comparing :
Line 32: Line 28:
 
*Notes:
 
*Notes:
 
**Signature creation can be done on a moving window, i.e., shifting with each frame (computationally expensive though).  
 
**Signature creation can be done on a moving window, i.e., shifting with each frame (computationally expensive though).  
 
 
** Later, we can create another level of abstraction to improve performance: Use the K vector (local features) and build a topic model on top of it using for example LDA. That is, each k-vector will be used as a word. The topic model will identify the collection of words that commonly occur together (which is called a topic).  
 
** Later, we can create another level of abstraction to improve performance: Use the K vector (local features) and build a topic model on top of it using for example LDA. That is, each k-vector will be used as a word. The topic model will identify the collection of words that commonly occur together (which is called a topic).  
  

Latest revision as of 12:30, 29 December 2010

Video Copy Detection

Early Idea (discussed with Dr. Wael Abd-Almageed)

  • Collect large set of videos (may be from TREC)
  • Extract local features (e.g., SIFT) from I frames
  • Cluster these features into K clusters using for example K-means method (need to try different values of K)
  • To create signatures
    • Divide a video into groups (may be GOP)
    • Extract local features from I frames
    • Map these features to the K clusters (prob value for each cluster)
    • Normalize the probabilities so that they sum to 1
    • Use these (k) probabilities as a signature.
    • In addition, we extract motion vectors from non I-frames in the GOP.
    • Quantize these motion vectors into fixed number of bins, say B
    • Build a histogram on these bins
    • Normalize and compute probabilities (vector of size B).
    • Now, use a combined signature from local features (K vector) and motion info (B vector).
  • For comparing :
    • Create signatures for each GoP in the target video.
    • Compare signatures by comparing their vectors (some formal methods exist for this, check with Hamed).


  • Notes:
    • Signature creation can be done on a moving window, i.e., shifting with each frame (computationally expensive though).
    • Later, we can create another level of abstraction to improve performance: Use the K vector (local features) and build a topic model on top of it using for example LDA. That is, each k-vector will be used as a word. The topic model will identify the collection of words that commonly occur together (which is called a topic).


3D Video Copy Detection

  • ideas, previous works?