Revision as of 00:02, 22 March 2011

Spring 2011 (TA)

Courses:
- None.

Mar 21st

Report: here
Corrected bugs in the original Mahout code. The original source code has a "java.lang.RuntimeException: java.lang.ClassNotFoundException" bug, this error appears in several places in the code.
Implementing evaluation measurement.

Mar 14th

Set up Mahout (together with Hadoop) environment and run tests
Implementing the proposed idea on Mahout
Found out a nice optimizing opportunity, named as, "Time Multiplexing". The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.

Mar 7th

Trying to add extra code in the library of Mahout to implement the proposed idea.
The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)
Merging the previous reports and making necessary changes

Feb 28

Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.
It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.

Feb 14

Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.

Feb 7

Studied LSH and explored different LSH families according to the distance measurement they use.
Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.

Jan 31

Examined how the main class clustering algorithms can be parallelized, respectively.
Explained spectral clustering from theoretical aspect, from graph cut viewpoint.
Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.
The proposed method makes use of LSH to do pre-precessing.

Jan 17

Understanding Spectral clustering and distributed implementation.
Mahout experimenting.

Jan 10

Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.
Mahout experimenting.

Fall 2010 (FELLOWSHIP)

Courses:
- CMPT 771: Internet Architecture and Protcols
- CMPT 741: Data Mining

Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.

Summer 2010 (RA)

Courses:
- None

Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.

Spring 2010 (TA+RA)

Courses:
- CMPT 886: Special Topics in Operating Systems and Computer Architecture

Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.

Fall 2009 (TA)

Courses:
- CMPT 705: Design and Analysis of Algorithms
- CMPT 726: Machine Learning

Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.

@@ Line 2: / Line 2: @@
 * Courses:
 **None.
+=== Mar 21st ===
+* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Mar_21st.pdf here]
+* Corrected bugs in the original Mahout code. The original source code has a "java.lang.RuntimeException: java.lang.ClassNotFoundException" bug, this error appears in several places in the code.
+* Implementing evaluation measurement.
 === Mar 14th ===
-* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Mar_14th.pdf here]
 * Set up Mahout (together with Hadoop) environment and run tests
 * Implementing the proposed idea on Mahout

Difference between revisions of "Private:progress-gao"

Revision as of 00:02, 22 March 2011

Contents

Spring 2011 (TA)

Mar 21st

Mar 14th

Mar 7th

Feb 28

Feb 14

Feb 7

Jan 31

Jan 17

Jan 10

Fall 2010 (FELLOWSHIP)

Summer 2010 (RA)

Spring 2010 (TA+RA)

Fall 2009 (TA)