Difference between revisions of "Private:progress-dameh"
From NMSL
(52 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | = Spring 2011 (TA) = | + | = Fall 2011 and Spring 2012 (RA) = |
+ | |||
+ | |||
+ | === Jan 27 === | ||
+ | * Implementing my method on cluster, report to be sent soon. | ||
+ | |||
+ | |||
+ | === Dec 23 === | ||
+ | * Finished the study of different methods, the analysis and the implementation on a single machine. | ||
+ | |||
+ | |||
+ | === Nov 9 === | ||
+ | * Scaled the Spectral Hashing. | ||
+ | * Working on the US Census Data (1990) [number of features (dimensions) = 69 ] | ||
+ | * run experiments on small DataSet (4k) using the 3 methods LSH Random Projection, LSH Spectral Hashing and Hilbert Curve | ||
+ | * now running on large DataSet (1M) using LSH only (as H-curve doesn't work on large Scale) | ||
+ | * experiment results to be added to the report | ||
+ | |||
+ | |||
+ | === Oct 26 === | ||
+ | * I am facing problems working on very high dimensional data (d~100000) [curse of dimensionality] | ||
+ | ** I can't use small dataSets (1k points) to compare different methods (Hilbert curve fails for high dimensional but LSH still works) | ||
+ | ** The concept of distance becomes less precise as the number of dimensions grows. | ||
+ | ** There are other techniques to cluster the high-dimensional data (no usage of gram matrix) such as Subspace Clustering, Projected Clustering and Correlation Clustering. | ||
+ | * I start working on different real DataSets with low dimension (~100) | ||
+ | * Working on Scaling the Spectral Hashing by scaling the Principle Component Analysis function on matlab | ||
+ | |||
+ | |||
+ | === Oct 12 === | ||
+ | * Process Very-High dimensional and Very-Large Scale real DataSets | ||
+ | ** I use document clustering, where each document i is commonly represented by a term-frequency vector xi = [x1i, x2i, ..., xDi], where D is the total number of keywords in the given document corpus, and xji is the number of occurrences of keyword j in document i. Keywords correspond to the words that remain after the pre-processing of stop-words removal and words stemming. Below N is the total number of documents. | ||
+ | * DataSets: source: http://archive.ics.uci.edu/ml/datasets/Bag+of+Words: | ||
+ | ** NYTimes news articles: | ||
+ | *** orig source: ldc.upenn.edu | ||
+ | *** N=300000 | ||
+ | *** D=102660 | ||
+ | ** PubMed abstracts: (PubMed comprises more than 21 million citations for biomedical literature from MEDLINE, life science journals, and online books) | ||
+ | *** orig source: www.pubmed.gov | ||
+ | *** N=8200000 | ||
+ | *** D=141043 | ||
+ | * Currently I process 1M documents from PubMed with the dimesion 141043, total size of this DataSet on HD is about 250GB. | ||
+ | * To Do: compare the performance using two different LSH families. | ||
+ | |||
+ | |||
+ | === Spet 28 === | ||
+ | * Updated report is here: | ||
+ | ** /nsl/students/dameh/reports/doc/main.pdf | ||
+ | * updated sections : | ||
+ | ** IV. PERFORMANCE METRICS | ||
+ | ** VI. RESULTS | ||
+ | |||
+ | |||
+ | === Sept 12 === | ||
+ | * Comparison Results for H-curve, LSH-Spectral hashing and LSH-Random Projection: | ||
+ | * Metrics used for comparison: | ||
+ | ** FN.SpaceReduction product Metric : Gram matrix level comparison which takes in account both the FN and the space reduction for the approximated gram matrix (as FN decreases space reduction increases) | ||
+ | *** to unify the comparison we set H-curve window_width = DataSets_size/2^k. where k is the number of the LSH hash bits (number of buckets 2^k), in this case we get same matrix size for both H-curve and LSH. | ||
+ | *** this metric gives us the optimal value of k (maximum). | ||
+ | *** LSH-Spectral hashing over performs LSH-Random projection. H-curve over performs both. | ||
+ | *** Graph results can be found at these addresses : | ||
+ | **** FN for different values of k: dameh/reports/dameh/fall11/FN.pdf | ||
+ | **** gram matrix size for different values of k: dameh/reports/dameh/fall11/Size.pdf | ||
+ | **** FN.SpaceReduction : dameh/reports/dameh/fall11/FSprod.pdf | ||
+ | ** Average distance between each point and its Exemplar: which is clustering error metric | ||
+ | *** to unify the comparison we are comparing Avg distance against the number of Exemplars, (we change a parameter in the affinity propagation to control number of Exemplars, or its an initialization parameter for k-means clustering) | ||
+ | *** again, LSH-Spectral hashing over performs LSH-Random projection, and LSH-Spectral hashing over performs H-curve only for smaller number of clusters | ||
+ | *** Graph result can be found at this address: dameh/reports/dameh/fall11/ClusteringError.pdf | ||
+ | ** over all, the advantage of LSH over H-cuve is the parallel property of LSH, where we can get a speedup up to 2^k for the LSH (each node takes care of each bucket) | ||
+ | |||
+ | |||
+ | = Summer 2011 (TA) = | ||
+ | * Research: Gram Matrix approximation | ||
+ | |||
+ | |||
+ | === Aug 22 === | ||
+ | *fixing bugs in the Hilbert curve and the affinity propagation codes (still working on them) | ||
+ | *revised the metric to be FN.SpaceRedcution product (Inspired from the Energy.Delay product used in the Energy-Efficient Computing ) | ||
+ | *cmpt300 Final grading | ||
+ | |||
+ | |||
+ | === Aug 8 === | ||
+ | *working on comparing LSH methods with Space filling Curve methods. | ||
+ | ** for the low level comparing (matrix level using average FN): we get almost same average FN for both spectral hashing and H-curve. | ||
+ | ** comparing in the high level (clustering error): | ||
+ | *** I am picking the optimal value of k for spectral hashing and choosing a widow width for H-curve that gives us same approximated matrix size, using the formula (winWidth=DataSize/2^k) | ||
+ | *** In the affinity propagation, using small value of self similarity (s(i,i)) we get small numbers of clusters and vise verse , so I am changing self similarity values between max and min to get same number of clusters for both methods, which will be the x-axis, on the other hand, the y-axis will be the average distance between each point and its Exemplar. | ||
+ | * PS: I was sick for a week after I came back form China and then busy in setting up and grading assignments for cmpt300. | ||
+ | |||
+ | |||
+ | ===July 10 - July 24 === | ||
+ | * Attending workshop on [http://www.nudt.edu.cn/summerschool/eng/lessoninfo.html High Performance Computing] at National University of Defense Technology, China. | ||
+ | |||
+ | |||
+ | === July 8 === | ||
+ | * Comparing our method using the "Locality Sensitive Hashing" to the method using "Hilbert Space filling Curve" | ||
+ | * Initial results can be found at this address students/dameh/reports/dameh/summer11/H-CurveVSlsh.pdf | ||
+ | |||
+ | |||
+ | === Jun 17 === | ||
+ | * Report should be available [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/dameh/summer11/doc/main.pdf here] | ||
+ | |||
+ | |||
+ | === Jun 03 === | ||
+ | * Proposed new metric to be used instead of the FN ratio (more powerful to compare different families and to pick the optimal value of k) | ||
+ | * More analysis on the spectral hashing | ||
+ | * Writing up the report with the results to be sent soon | ||
+ | |||
+ | |||
+ | === May 19 === | ||
+ | * Spectral Hashing is Data dependent hashing and more powerful than the random projection, see this [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/dameh/summer11/doc/RandomVsSpectral.pdf comparison examples] in the 2D. | ||
+ | * Random Projection Hashing | ||
+ | ** Method: | ||
+ | *** Take random projections of data. | ||
+ | *** Quantize the projections to hash value’s bits. | ||
+ | ** Disadvantage: | ||
+ | *** Data Independent Hashing. | ||
+ | ** Advantage: | ||
+ | *** Strong theoretical guarantees. | ||
+ | *** Simple and fast to compute (linear time). | ||
+ | ** Types: | ||
+ | *** Euclidean distance: two parameters to vary that have the opposite affect (number of random vectors k and quantization width w). | ||
+ | *** L1 distance: uncontrolled, as k increase error increase. | ||
+ | *** Cosine distance: more controlled than L1 as k increase. | ||
+ | * Data-Dependent Hashing | ||
+ | **Advantage: | ||
+ | *** Data-distribution dependent; powerful than Random Projection. | ||
+ | ** Disadvantage: | ||
+ | *** More complex to compute than Random Projection. | ||
+ | ** Types: | ||
+ | *** Spectral hashing: | ||
+ | **** Compute PCA. | ||
+ | **** Calculate the k smallest single-dimension analytical EigenFunctions. | ||
+ | **** Threshold the analytical EigenFnction of zero to obtain hash value's bits. | ||
+ | *** KernelLSH: | ||
+ | **** The key idea is first embed the data into a lower space using the kernel G and then do LSH in the lower dimensional space. | ||
+ | |||
+ | |||
+ | === May 5 === | ||
+ | * Recap | ||
+ | ** Problem: kernel methods' gram matrix is O(N^2), which is computationally and memory usage infeasible for large-scale datasets | ||
+ | ** Previous Work: | ||
+ | *** Low rank matrix approximation: | ||
+ | **** using EigenDecomposition: Optimal solution | ||
+ | ***** Problem: requires O(N^3) computation to construct the low rank matrix | ||
+ | ***** Solution: use Nytsrom theorem to approximate the EigenDecomposition of the matrix which requires O(m^2N) where m << N. | ||
+ | ***** Problem: In this methods in despite of reducing the computation complexity still we need O(N^2) to store the matrix, so the next methods reduce both computation and memory. | ||
+ | *** Efficient implementation for computing the kernel matrix: | ||
+ | **** Using Spatial Indexing: | ||
+ | ***** using Z-order, H-order, kd-tress and other spatial indexing techniques to order the points in the space and then using sliding window where the kernel function computed only between points in that window. | ||
+ | ***** Problems: requires O(NlogN) as preprocessing to order the points, fails for very high dimensional Data and hard to parallel | ||
+ | ***** Solution: We proposed a method that works fine for large-scale and highly dimensional data that overcomes all pervious works drawbacks, easy to parallel, requires linear time preprocessing and reduce both computation and memory usage down form quadratic to subquadratic: | ||
+ | **** Using Spatial Hashing or Locality sensitive hashing: where the kernel function and the cluster methods works on the buckets individually. | ||
+ | ***** LSH families : | ||
+ | ****** Random projection: | ||
+ | ******* Hamming distance, Euclidean Distance, Cosine Distance | ||
+ | ******* Problem for random projection, assume data uniformly distrubuted and it doesn’t take in account if data are in a kernel space, Solution: | ||
+ | ****** Spectral Hashing: overcomes previous families problems | ||
+ | ****** Kernelized LSH : Problem: more complex to compute | ||
+ | ** In progress: Comparing different LSH families | ||
+ | |||
+ | |||
+ | = Spring 2011 (TA/RA) = | ||
* Courses: | * Courses: | ||
+ | **CMPT 885: Special Topics in Computer Architecture | ||
+ | **CMPT 886: Special Topics in Operating Systems | ||
Line 6: | Line 169: | ||
* Courses: | * Courses: | ||
**CMPT 705: Design/Analysis Algorithms | **CMPT 705: Design/Analysis Algorithms | ||
− | **CMPT | + | **CMPT 820: Multimedia Systems |
* Worked on Gram matrix approximation, High Performance/large scale application case studies | * Worked on Gram matrix approximation, High Performance/large scale application case studies |
Latest revision as of 17:19, 26 January 2012
Fall 2011 and Spring 2012 (RA)
Jan 27
- Implementing my method on cluster, report to be sent soon.
Dec 23
- Finished the study of different methods, the analysis and the implementation on a single machine.
Nov 9
- Scaled the Spectral Hashing.
- Working on the US Census Data (1990) [number of features (dimensions) = 69 ]
- run experiments on small DataSet (4k) using the 3 methods LSH Random Projection, LSH Spectral Hashing and Hilbert Curve
- now running on large DataSet (1M) using LSH only (as H-curve doesn't work on large Scale)
- experiment results to be added to the report
Oct 26
- I am facing problems working on very high dimensional data (d~100000) [curse of dimensionality]
- I can't use small dataSets (1k points) to compare different methods (Hilbert curve fails for high dimensional but LSH still works)
- The concept of distance becomes less precise as the number of dimensions grows.
- There are other techniques to cluster the high-dimensional data (no usage of gram matrix) such as Subspace Clustering, Projected Clustering and Correlation Clustering.
- I start working on different real DataSets with low dimension (~100)
- Working on Scaling the Spectral Hashing by scaling the Principle Component Analysis function on matlab
Oct 12
- Process Very-High dimensional and Very-Large Scale real DataSets
- I use document clustering, where each document i is commonly represented by a term-frequency vector xi = [x1i, x2i, ..., xDi], where D is the total number of keywords in the given document corpus, and xji is the number of occurrences of keyword j in document i. Keywords correspond to the words that remain after the pre-processing of stop-words removal and words stemming. Below N is the total number of documents.
- DataSets: source: http://archive.ics.uci.edu/ml/datasets/Bag+of+Words:
- NYTimes news articles:
- orig source: ldc.upenn.edu
- N=300000
- D=102660
- PubMed abstracts: (PubMed comprises more than 21 million citations for biomedical literature from MEDLINE, life science journals, and online books)
- orig source: www.pubmed.gov
- N=8200000
- D=141043
- NYTimes news articles:
- Currently I process 1M documents from PubMed with the dimesion 141043, total size of this DataSet on HD is about 250GB.
- To Do: compare the performance using two different LSH families.
Spet 28
- Updated report is here:
- /nsl/students/dameh/reports/doc/main.pdf
- updated sections :
- IV. PERFORMANCE METRICS
- VI. RESULTS
Sept 12
- Comparison Results for H-curve, LSH-Spectral hashing and LSH-Random Projection:
- Metrics used for comparison:
- FN.SpaceReduction product Metric : Gram matrix level comparison which takes in account both the FN and the space reduction for the approximated gram matrix (as FN decreases space reduction increases)
- to unify the comparison we set H-curve window_width = DataSets_size/2^k. where k is the number of the LSH hash bits (number of buckets 2^k), in this case we get same matrix size for both H-curve and LSH.
- this metric gives us the optimal value of k (maximum).
- LSH-Spectral hashing over performs LSH-Random projection. H-curve over performs both.
- Graph results can be found at these addresses :
- FN for different values of k: dameh/reports/dameh/fall11/FN.pdf
- gram matrix size for different values of k: dameh/reports/dameh/fall11/Size.pdf
- FN.SpaceReduction : dameh/reports/dameh/fall11/FSprod.pdf
- Average distance between each point and its Exemplar: which is clustering error metric
- to unify the comparison we are comparing Avg distance against the number of Exemplars, (we change a parameter in the affinity propagation to control number of Exemplars, or its an initialization parameter for k-means clustering)
- again, LSH-Spectral hashing over performs LSH-Random projection, and LSH-Spectral hashing over performs H-curve only for smaller number of clusters
- Graph result can be found at this address: dameh/reports/dameh/fall11/ClusteringError.pdf
- over all, the advantage of LSH over H-cuve is the parallel property of LSH, where we can get a speedup up to 2^k for the LSH (each node takes care of each bucket)
- FN.SpaceReduction product Metric : Gram matrix level comparison which takes in account both the FN and the space reduction for the approximated gram matrix (as FN decreases space reduction increases)
Summer 2011 (TA)
- Research: Gram Matrix approximation
Aug 22
- fixing bugs in the Hilbert curve and the affinity propagation codes (still working on them)
- revised the metric to be FN.SpaceRedcution product (Inspired from the Energy.Delay product used in the Energy-Efficient Computing )
- cmpt300 Final grading
Aug 8
- working on comparing LSH methods with Space filling Curve methods.
- for the low level comparing (matrix level using average FN): we get almost same average FN for both spectral hashing and H-curve.
- comparing in the high level (clustering error):
- I am picking the optimal value of k for spectral hashing and choosing a widow width for H-curve that gives us same approximated matrix size, using the formula (winWidth=DataSize/2^k)
- In the affinity propagation, using small value of self similarity (s(i,i)) we get small numbers of clusters and vise verse , so I am changing self similarity values between max and min to get same number of clusters for both methods, which will be the x-axis, on the other hand, the y-axis will be the average distance between each point and its Exemplar.
- PS: I was sick for a week after I came back form China and then busy in setting up and grading assignments for cmpt300.
July 10 - July 24
- Attending workshop on High Performance Computing at National University of Defense Technology, China.
July 8
- Comparing our method using the "Locality Sensitive Hashing" to the method using "Hilbert Space filling Curve"
- Initial results can be found at this address students/dameh/reports/dameh/summer11/H-CurveVSlsh.pdf
Jun 17
- Report should be available here
Jun 03
- Proposed new metric to be used instead of the FN ratio (more powerful to compare different families and to pick the optimal value of k)
- More analysis on the spectral hashing
- Writing up the report with the results to be sent soon
May 19
- Spectral Hashing is Data dependent hashing and more powerful than the random projection, see this comparison examples in the 2D.
- Random Projection Hashing
- Method:
- Take random projections of data.
- Quantize the projections to hash value’s bits.
- Disadvantage:
- Data Independent Hashing.
- Advantage:
- Strong theoretical guarantees.
- Simple and fast to compute (linear time).
- Types:
- Euclidean distance: two parameters to vary that have the opposite affect (number of random vectors k and quantization width w).
- L1 distance: uncontrolled, as k increase error increase.
- Cosine distance: more controlled than L1 as k increase.
- Method:
- Data-Dependent Hashing
- Advantage:
- Data-distribution dependent; powerful than Random Projection.
- Disadvantage:
- More complex to compute than Random Projection.
- Types:
- Spectral hashing:
- Compute PCA.
- Calculate the k smallest single-dimension analytical EigenFunctions.
- Threshold the analytical EigenFnction of zero to obtain hash value's bits.
- KernelLSH:
- The key idea is first embed the data into a lower space using the kernel G and then do LSH in the lower dimensional space.
- Spectral hashing:
- Advantage:
May 5
- Recap
- Problem: kernel methods' gram matrix is O(N^2), which is computationally and memory usage infeasible for large-scale datasets
- Previous Work:
- Low rank matrix approximation:
- using EigenDecomposition: Optimal solution
- Problem: requires O(N^3) computation to construct the low rank matrix
- Solution: use Nytsrom theorem to approximate the EigenDecomposition of the matrix which requires O(m^2N) where m << N.
- Problem: In this methods in despite of reducing the computation complexity still we need O(N^2) to store the matrix, so the next methods reduce both computation and memory.
- using EigenDecomposition: Optimal solution
- Efficient implementation for computing the kernel matrix:
- Using Spatial Indexing:
- using Z-order, H-order, kd-tress and other spatial indexing techniques to order the points in the space and then using sliding window where the kernel function computed only between points in that window.
- Problems: requires O(NlogN) as preprocessing to order the points, fails for very high dimensional Data and hard to parallel
- Solution: We proposed a method that works fine for large-scale and highly dimensional data that overcomes all pervious works drawbacks, easy to parallel, requires linear time preprocessing and reduce both computation and memory usage down form quadratic to subquadratic:
- Using Spatial Hashing or Locality sensitive hashing: where the kernel function and the cluster methods works on the buckets individually.
- LSH families :
- Random projection:
- Hamming distance, Euclidean Distance, Cosine Distance
- Problem for random projection, assume data uniformly distrubuted and it doesn’t take in account if data are in a kernel space, Solution:
- Spectral Hashing: overcomes previous families problems
- Kernelized LSH : Problem: more complex to compute
- Random projection:
- LSH families :
- Using Spatial Indexing:
- Low rank matrix approximation:
- In progress: Comparing different LSH families
Spring 2011 (TA/RA)
- Courses:
- CMPT 885: Special Topics in Computer Architecture
- CMPT 886: Special Topics in Operating Systems
Fall 2010 (FAS-GF/RA)
- Courses:
- CMPT 705: Design/Analysis Algorithms
- CMPT 820: Multimedia Systems
- Worked on Gram matrix approximation, High Performance/large scale application case studies
Summer 2010 (TA)
- Worked on large Scale Kernel methods
Spring 2010 (RA)
- Courses:
- CMPT 768: Sound Synthesis
- Worked on Locality sensitive hashing families and their applications and implementations