Difference between revisions of "Private:progress-dameh"
From NMSL
Line 2: | Line 2: | ||
* Research: Gram Matrix approximation | * Research: Gram Matrix approximation | ||
** Recap: | ** Recap: | ||
− | *** Problem: | + | *** Problem: kernel methods gram matrix is O(N^2), which is computational and memory usage infeasible for large-scale datasets |
− | *** | + | *** Previous Work: |
**** Low rank matrix approximation: | **** Low rank matrix approximation: | ||
− | ***** using | + | ***** using EigneDecomposition: Optimal solution but requires O(N^3) computation to construct the low rank matrix |
− | ***** Solution: use | + | ***** Solution: use Nytsrom theorem to approximate the EigenDecomposition of the matrix which requires O(m^2N) where m << N. |
− | ***** Problem: In this methods in despite of | + | ***** Problem: In this methods in despite of reducing the computation complexity still we need O(N^2) to store the matrix, so the next methods reduce both computation and memory. |
**** Efficient implementation for computing the kernel matrix: | **** Efficient implementation for computing the kernel matrix: | ||
***** Using Spatial Indexing: | ***** Using Spatial Indexing: | ||
− | using Z-order, H-order, | + | using Z-order, H-order, kd-tress and other spatial indexing techniques to order the points in the space and then using sliding window where the kernel function computed only between points in that window. |
− | ****** Problems: | + | ****** Problems: requires O(NlogN) as preprocessing to order the points, fails for very high dimensional Data and hard to parallel |
− | So we proposed a method that works fine | + | So we proposed a method that works fine for large-scale and highly dimensional data that overcomes all pervious works drawbacks, easy to parallel, requires linear time preprocessing and reduce both computation and memory usage down form quadratic to subquadratic: |
***** Using Spatial Hashing or Locality sensitive hashing: | ***** Using Spatial Hashing or Locality sensitive hashing: | ||
− | where the | + | where the kernel function and the cluster methods works on the buckets individually. |
****** LSH families : | ****** LSH families : | ||
− | ******* Random projection | + | ******* Random projection: Hamming distance, Euclidean Distance, Cosine Distance |
− | + | Problem: They assume data uniformly distributed solution: | |
− | + | ******* Spectral Hashing: overcomes previous families problmes | |
− | |||
− | |||
− | |||
− | General Problem for random projection, it' | + | General Problem for random projection, it' doesn’t take in account if data are in a kernel space, Solution |
− | ******* | + | ******* Kernelized LSH : Problem: more complex to compute |
+ | |||
+ | ** In progress: Comparing different LSH families | ||
− | |||
= Spring 2011 (TA/RA) = | = Spring 2011 (TA/RA) = |
Revision as of 12:25, 6 May 2011
Summer 2011 (TA)
- Research: Gram Matrix approximation
- Recap:
- Problem: kernel methods gram matrix is O(N^2), which is computational and memory usage infeasible for large-scale datasets
- Previous Work:
- Low rank matrix approximation:
- using EigneDecomposition: Optimal solution but requires O(N^3) computation to construct the low rank matrix
- Solution: use Nytsrom theorem to approximate the EigenDecomposition of the matrix which requires O(m^2N) where m << N.
- Low rank matrix approximation:
- Recap:
- Problem: In this methods in despite of reducing the computation complexity still we need O(N^2) to store the matrix, so the next methods reduce both computation and memory.
- Efficient implementation for computing the kernel matrix:
- Using Spatial Indexing:
- Efficient implementation for computing the kernel matrix:
using Z-order, H-order, kd-tress and other spatial indexing techniques to order the points in the space and then using sliding window where the kernel function computed only between points in that window.
- Problems: requires O(NlogN) as preprocessing to order the points, fails for very high dimensional Data and hard to parallel
So we proposed a method that works fine for large-scale and highly dimensional data that overcomes all pervious works drawbacks, easy to parallel, requires linear time preprocessing and reduce both computation and memory usage down form quadratic to subquadratic:
- Using Spatial Hashing or Locality sensitive hashing:
where the kernel function and the cluster methods works on the buckets individually.
- LSH families :
- Random projection: Hamming distance, Euclidean Distance, Cosine Distance
- LSH families :
Problem: They assume data uniformly distributed solution:
- Spectral Hashing: overcomes previous families problmes
General Problem for random projection, it' doesn’t take in account if data are in a kernel space, Solution
- Kernelized LSH : Problem: more complex to compute
- In progress: Comparing different LSH families
Spring 2011 (TA/RA)
- Courses:
- CMPT 885: Special Topics in Computer Architecture
- CMPT 886: Special Topics in Operating Systems
Fall 2010 (FAS-GF/RA)
- Courses:
- CMPT 705: Design/Analysis Algorithms
- CMPT 820: Multimedia Systems
- Worked on Gram matrix approximation, High Performance/large scale application case studies
Summer 2010 (TA)
- Worked on large Scale Kernel methods
Spring 2010 (RA)
- Courses:
- CMPT 768: Sound Synthesis
- Worked on Locality sensitive hashing families and their applications and implementations