Revision as of 18:46, 17 June 2011

Summer 2011 (TA)

Research: Gram Matrix approximation

Jun 17

Report should be available here

Jun 03

Proposed new metric to be used instead of the FN ratio (more powerful to compare different families and to pick the optimal value of k)
More analysis on the spectral hashing
Writing up the report with the results to be sent soon

May 19

Spectral Hashing is Data dependent hashing and more powerful than the random projection, see this comparison examples in the 2D.
Random Projection Hashing
- Method:
  - Take random projections of data.
  - Quantize the projections to hash value’s bits.
- Disadvantage:
  - Data Independent Hashing.
- Advantage:
  - Strong theoretical guarantees.
  - Simple and fast to compute (linear time).
- Types:
  - Euclidean distance: two parameters to vary that have the opposite affect (number of random vectors k and quantization width w).
  - L1 distance: uncontrolled, as k increase error increase.
  - Cosine distance: more controlled than L1 as k increase.
Data-Dependent Hashing
- Advantage:
  - Data-distribution dependent; powerful than Random Projection.
- Disadvantage:
  - More complex to compute than Random Projection.
- Types:
  - Spectral hashing:
    - Compute PCA.
    - Calculate the k smallest single-dimension analytical EigenFunctions.
    - Threshold the analytical EigenFnction of zero to obtain hash value's bits.
  - KernelLSH:
    - The key idea is first embed the data into a lower space using the kernel G and then do LSH in the lower dimensional space.

May 5

Recap
- Problem: kernel methods' gram matrix is O(N^2), which is computationally and memory usage infeasible for large-scale datasets
- Previous Work:
  - Low rank matrix approximation:
    - using EigenDecomposition: Optimal solution
      - Problem: requires O(N^3) computation to construct the low rank matrix
      - Solution: use Nytsrom theorem to approximate the EigenDecomposition of the matrix which requires O(m^2N) where m << N.
      - Problem: In this methods in despite of reducing the computation complexity still we need O(N^2) to store the matrix, so the next methods reduce both computation and memory.
  - Efficient implementation for computing the kernel matrix:
    - Using Spatial Indexing:
      - using Z-order, H-order, kd-tress and other spatial indexing techniques to order the points in the space and then using sliding window where the kernel function computed only between points in that window.
      - Problems: requires O(NlogN) as preprocessing to order the points, fails for very high dimensional Data and hard to parallel
      - Solution: We proposed a method that works fine for large-scale and highly dimensional data that overcomes all pervious works drawbacks, easy to parallel, requires linear time preprocessing and reduce both computation and memory usage down form quadratic to subquadratic:
    - Using Spatial Hashing or Locality sensitive hashing: where the kernel function and the cluster methods works on the buckets individually.
      - LSH families :
        Random projection:
        Hamming distance, Euclidean Distance, Cosine Distance
        
        Problem for random projection, assume data uniformly distrubuted and it doesn’t take in account if data are in a kernel space, Solution:
        
        Spectral Hashing: overcomes previous families problems
        
        Kernelized LSH : Problem: more complex to compute
- In progress: Comparing different LSH families

Spring 2011 (TA/RA)

Courses:
- CMPT 885: Special Topics in Computer Architecture
- CMPT 886: Special Topics in Operating Systems

Fall 2010 (FAS-GF/RA)

Courses:
- CMPT 705: Design/Analysis Algorithms
- CMPT 820: Multimedia Systems

Worked on Gram matrix approximation, High Performance/large scale application case studies

Summer 2010 (TA)

Worked on large Scale Kernel methods

Spring 2010 (RA)

Courses:
- CMPT 768: Sound Synthesis

Worked on Locality sensitive hashing families and their applications and implementations

@@ Line 1: / Line 1: @@
 = Summer 2011 (TA) =
 * Research: Gram Matrix approximation
+=== Jun 17 ===
+* Report should be available [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/dameh/summer11/doc/main.pdf here]
 === Jun 03 ===
 * Proposed new metric to be used instead of the FN ratio (more powerful to compare different families and to pick the optimal value of k)

Difference between revisions of "Private:progress-dameh"

Revision as of 18:46, 17 June 2011

Contents

Summer 2011 (TA)

Jun 17

Jun 03

May 19

May 5

Spring 2011 (TA/RA)

Fall 2010 (FAS-GF/RA)

Summer 2010 (TA)

Spring 2010 (RA)