<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-CA">
	<id>https://nmsl.cs.sfu.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Feig</id>
	<title>NMSL - User contributions [en-ca]</title>
	<link rel="self" type="application/atom+xml" href="https://nmsl.cs.sfu.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Feig"/>
	<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php/Special:Contributions/Feig"/>
	<updated>2026-06-05T13:42:44Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.35.1</generator>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=News&amp;diff=4746</id>
		<title>News</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=News&amp;diff=4746"/>
		<updated>2012-01-05T06:53:34Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== 2011 ==&lt;br /&gt;
&lt;br /&gt;
* December 2011: Fei Gao graduates (MSc) from NSL. Congratulations!&lt;br /&gt;
&lt;br /&gt;
* September 2011: Cameron Harvey graduates (MSc) from NSL and joins Amazon.com in Seattle, WA. Congratulations! Abdullah and Masum join NSL as MSc students. Welcome aboard. &lt;br /&gt;
&lt;br /&gt;
* June 2011: Dr. Hefeeda is awarded one of the prestigious [http://www.nserc-crsng.gc.ca/Professors-Professeurs/Grants-Subs/DGAS-SGSA_eng.asp NSERC Discovery Accelerator Supplements (DAS)], which are granted to 123 researchers in all Science and Engineering disciplines in Canada  (selected from more than 3,400 applicants in 2011). &lt;br /&gt;
&lt;br /&gt;
* June 2011: Hamed Neshat graduates (MSc) from NSL and joins Microsoft in Seattle, WA. Congratulations!&lt;br /&gt;
&lt;br /&gt;
* April 2011: Farid Tabrizi  graduates (MSc) from NSL. Congratulations!&lt;br /&gt;
&lt;br /&gt;
* April 2011: Mohammad Alkurbi graduates (MSc-Project) from NSL. Congratulations!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== 2010 == &lt;br /&gt;
&lt;br /&gt;
* December 2010: Shabnam Mirshokraie graduates (MSc) from NSL and joins UBC as a PhD student. Congratulations!&lt;br /&gt;
&lt;br /&gt;
* December 2010: Somsubhra Sharangi  graduates (MSc) from NSL. Congratulations! &lt;br /&gt;
&lt;br /&gt;
* October 2010: Our [http://www.nserc-crsng.gc.ca/professors-professeurs/rpp-pp/spg-sps_eng.asp NSERC Strategic Project Grant] application on Mobile Gaming and 3D Video Systems: Next Generation Services for Wireless Networks is accepted. &lt;br /&gt;
&lt;br /&gt;
* October 2010:  Dr. Hefeeda becomes an Associate Editor of the premier [http://tomccap.acm.org/ ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP)] journal. &lt;br /&gt;
&lt;br /&gt;
* September 2010:  Dr. Hefeeda becomes program co-chair of the [http://www.icme2011.org/ IEEE International Conference on Multimedia and Expo (ICME 2011)] conference. &lt;br /&gt;
&lt;br /&gt;
* August 2010: Yuanbin Shen  graduates (MSc) from NSL. Congratulations!&lt;br /&gt;
&lt;br /&gt;
* '''July 2010: Dr. Hefeeda is promoted to the associate professor rank and granted tenure. '''&lt;br /&gt;
&lt;br /&gt;
* June 2010: Cong Ly graduates (MSc) from NSL and joins Microsoft in Vancouver, BC. Congratulations!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== 2009 == &lt;br /&gt;
&lt;br /&gt;
* December 2009: Yi Liu graduates (MSc) from NSL and joins Incognito Software Inc. in Vancouver, BC. Congratulations!&lt;br /&gt;
&lt;br /&gt;
* December 2009: Greg Kowalski graduates (MSc) from NSL Congratulations!&lt;br /&gt;
&lt;br /&gt;
* December 2009: Dr. Hefeeda becomes the vice chair of the Distributed Multimedia track in the [http://www.ftrg.org/emc2010/index.php Embedded and Multimedia Computing (EMC-10) conference]. &lt;br /&gt;
&lt;br /&gt;
* '''November 2009: Cheng-Hsin Hsu graduates (PhD) from NSL and joins the Deutsche Telekom R&amp;amp;D Lab in California as a senior researcher. Congratulations Dr. Hsu -- the first PhD graduate from our group. ''' &lt;br /&gt;
&lt;br /&gt;
* November 2009: Dr. Hefeeda becomes a '''Senior Member of the IEEE'''. &lt;br /&gt;
&lt;br /&gt;
* November 2009: Dr. Hefeeda becomes the Preservation Editor of the ACM Special Interest Group on Multimedia (SIGMM) Web Magazine.&lt;br /&gt;
&lt;br /&gt;
*  November 2009:  Farid and Fei join NSL as MSc students. Welcome aboard. &lt;br /&gt;
&lt;br /&gt;
* October 2009: Dr. Hefeeda becomes the program chair of [http://nsl.cs.sfu.ca/nossdav10/ NOSSDAV 2010].&lt;br /&gt;
&lt;br /&gt;
*  September 2009:  Cameron and Hamed join NSL as MSc students. Welcome aboard. &lt;br /&gt;
&lt;br /&gt;
*  September 2009: Kianoosh Mokhtarian graduates (MSc) from NSL and joins Mobidia in Richmond, BC. Congratulations!&lt;br /&gt;
&lt;br /&gt;
* July 2009:  Our mobile TV research is also featured in the July issue of the [http://technews.acm.org/archives.cfm?fo=2009-07-jul/jul-01-2009.html#418148 ACM Tech News.]&lt;br /&gt;
&lt;br /&gt;
* June 2009: Our mobile TV research is featured on CTV British Columbia News: [http://www.ctvbc.ctv.ca/servlet/an/local/CTVNews/20090625/bc_mobile_tv_090625/ see article] or [[media:ctv09.pdf | local PDF.]]&lt;br /&gt;
&lt;br /&gt;
* May 2009: Cheng-Hsin Hsu (NSL PhD student)  is featured in [http://www.sfu.ca/sfunews/_pvw9C438741/news/story_05290909.shtml SFU News.] &lt;br /&gt;
&lt;br /&gt;
* January 2009: Ahmed Hamza joins NSL as PhD student. Mohammad Alkurbi joins NSL as MSc-Project student. Welcome aboard.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== 2008 == &lt;br /&gt;
&lt;br /&gt;
* December 2008: Our paper on [http://www.cs.sfu.ca/~mhefeeda/Papers/innovations08.pdf Energy Optimization in Mobile TV Broadcast Networks] wins the '''Best Paper Award''' in the IEEE Innovations 2008 Conference. &lt;br /&gt;
&lt;br /&gt;
* October 2008: Our Mobile TV Testbed wins the [http://www.cs.sfu.ca/~mhefeeda/Papers/mm08DemoAward.pdf Best Technical Demonstration Award] in ACM Multimedia 2008. &lt;br /&gt;
&lt;br /&gt;
* September 2008: Cong Ly, Yuanbin Shen, and Shabnam Mirshokraie join NSL as MSc students. Welcome aboard. &lt;br /&gt;
&lt;br /&gt;
* ''' March 2008:  This Wiki system has been launched.'''&lt;br /&gt;
&lt;br /&gt;
* February 2008:  Yi Liu joins NSL as MSc student. Welcome aboard Yi!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== 2007 == &lt;br /&gt;
&lt;br /&gt;
* September 2007: Kianoosh Mokhtarian joins NSL as MSc student. Welcome aboard Kianoosh!&lt;br /&gt;
&lt;br /&gt;
* December 2007: Behrooz Noorizadeh Graduates (MSc) from NSL joins Eyeball Networks in Vancouver. .&lt;br /&gt;
&lt;br /&gt;
*  July 2007: Hossein Ahmadi Graduates (MSc) from NSL and joins UIUC as PhD Student.&lt;br /&gt;
&lt;br /&gt;
* April 2007: Majid Bagheri Graduates (MSc) from NSL and takes on a job in Iran.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== 2006 == &lt;br /&gt;
&lt;br /&gt;
* '''November 2006: Osama Saleh Graduates (MSc) from NSL and joins Eyeball Networks in Vancouver. Congratulations Osama -- the first MSc graduate from our group. '''&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4690</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4690"/>
		<updated>2011-11-16T08:01:11Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Fall 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Nov 16th ===&lt;br /&gt;
* Leveraged Amazon Elastic MapReduce cluster to run the proposed method on Wikipedia dataset, and got great reduction of computing time when comparing with that using 5-node cluster.&lt;br /&gt;
* Run the proposed method, Parallel Spectral Clustering and Spectral Clustering on Wikipedia datasets. The results show that the proposed method has better performance than the other two methods.&lt;br /&gt;
&lt;br /&gt;
=== Oct 17th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Oct_17th/reportTemplate.pdf here]&lt;br /&gt;
* Experimented with the tuning of the parameter &amp;quot;how many terms to keep&amp;quot;&lt;br /&gt;
* Run large dataset on the proposed algorithm with good performance (91.38%).&lt;br /&gt;
&lt;br /&gt;
=== Oct 4th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Oct_4th/reportTemplate.pdf here]&lt;br /&gt;
* Done with stop-word removing and stemming.&lt;br /&gt;
* In similarity matrix computation part, in order to reduce feature set, used first-F items in one doc's content.&lt;br /&gt;
* Spectral clustering accuracy is 96.6%. Kmeans accuracy is 62.22%.&lt;br /&gt;
* Will use the proposed algorithm on the dataset.&lt;br /&gt;
&lt;br /&gt;
=== Sep 20th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Sep_20th/reportTemplate.pdf here]&lt;br /&gt;
* Successfully crawled 3,550,567 web pages from Wikipedia, forming 739,144 categories.&lt;br /&gt;
* Processed the raw HTML files and have it ready for test purpose.&lt;br /&gt;
* Now working on stop-word removing and stemming.&lt;br /&gt;
* In the following 4-5 days, will have the dataset fully processed and ready. &lt;br /&gt;
* And then, one day to do tf-idf on the dataset (since that part of the code is done). &lt;br /&gt;
* Following that, first test the dataset on kmeans; and then, try spectral clustering on the data.&lt;br /&gt;
&lt;br /&gt;
= Summer 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== May 30th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_May_30th/reportTemplate.pdf here]&lt;br /&gt;
* Added Nyström method into the comparison of proposed method and PSC method, in the aspect of accuracy.&lt;br /&gt;
* Continued to run the experiment on large dataset. Will continue the experiment and collect more data.&lt;br /&gt;
* Analysed the experiment results in the experiment result part.&lt;br /&gt;
* Included more sources to show the legitimacy of the evaluation metrics.&lt;br /&gt;
* Refined the structure of figures.&lt;br /&gt;
&lt;br /&gt;
=== May 16th ===&lt;br /&gt;
* Got Parallel SC method (using MPICH2 in implementation) running on NSL cluster and got the result for comparison purpose. The hardware platform remains unchanged (NSL cluster).&lt;br /&gt;
* Examined and ran a few experiments with Nystrom Extension Spectral Clustering method (in matlab) and &amp;quot;Fast approximate spectral clustering&amp;quot; method (in R). However, the problem here is that these two method runs on single machine, which will somehow defeat the comparison purpose (because platform is changed). Moreover, the memory usage limitation of Matlab will fail the experiment when using 2^12 data set.&lt;br /&gt;
* Area affected by update: experiment setup and evaluation parts.&lt;br /&gt;
&lt;br /&gt;
=== May 5th ===&lt;br /&gt;
* Managed to find two recent works (implementation) for comparison purpose, set up the cluster to run the implementation.&lt;br /&gt;
* There are also other similar papers on the same topic, which have been added into the related work part.&lt;br /&gt;
* Further clarification of the results (as well as the methodology of our proposed method).&lt;br /&gt;
&lt;br /&gt;
= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18) with guaranteed error, more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4689</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4689"/>
		<updated>2011-11-16T08:00:51Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Fall 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Nov 10th ===&lt;br /&gt;
* Leveraged Amazon Elastic MapReduce cluster to run the proposed method on Wikipedia dataset, and got great reduction of computing time when comparing with that using 5-node cluster.&lt;br /&gt;
* Run the proposed method, Parallel Spectral Clustering and Spectral Clustering on Wikipedia datasets. The results show that the proposed method has better performance than the other two methods.&lt;br /&gt;
&lt;br /&gt;
=== Oct 17th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Oct_17th/reportTemplate.pdf here]&lt;br /&gt;
* Experimented with the tuning of the parameter &amp;quot;how many terms to keep&amp;quot;&lt;br /&gt;
* Run large dataset on the proposed algorithm with good performance (91.38%).&lt;br /&gt;
&lt;br /&gt;
=== Oct 4th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Oct_4th/reportTemplate.pdf here]&lt;br /&gt;
* Done with stop-word removing and stemming.&lt;br /&gt;
* In similarity matrix computation part, in order to reduce feature set, used first-F items in one doc's content.&lt;br /&gt;
* Spectral clustering accuracy is 96.6%. Kmeans accuracy is 62.22%.&lt;br /&gt;
* Will use the proposed algorithm on the dataset.&lt;br /&gt;
&lt;br /&gt;
=== Sep 20th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Sep_20th/reportTemplate.pdf here]&lt;br /&gt;
* Successfully crawled 3,550,567 web pages from Wikipedia, forming 739,144 categories.&lt;br /&gt;
* Processed the raw HTML files and have it ready for test purpose.&lt;br /&gt;
* Now working on stop-word removing and stemming.&lt;br /&gt;
* In the following 4-5 days, will have the dataset fully processed and ready. &lt;br /&gt;
* And then, one day to do tf-idf on the dataset (since that part of the code is done). &lt;br /&gt;
* Following that, first test the dataset on kmeans; and then, try spectral clustering on the data.&lt;br /&gt;
&lt;br /&gt;
= Summer 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== May 30th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_May_30th/reportTemplate.pdf here]&lt;br /&gt;
* Added Nyström method into the comparison of proposed method and PSC method, in the aspect of accuracy.&lt;br /&gt;
* Continued to run the experiment on large dataset. Will continue the experiment and collect more data.&lt;br /&gt;
* Analysed the experiment results in the experiment result part.&lt;br /&gt;
* Included more sources to show the legitimacy of the evaluation metrics.&lt;br /&gt;
* Refined the structure of figures.&lt;br /&gt;
&lt;br /&gt;
=== May 16th ===&lt;br /&gt;
* Got Parallel SC method (using MPICH2 in implementation) running on NSL cluster and got the result for comparison purpose. The hardware platform remains unchanged (NSL cluster).&lt;br /&gt;
* Examined and ran a few experiments with Nystrom Extension Spectral Clustering method (in matlab) and &amp;quot;Fast approximate spectral clustering&amp;quot; method (in R). However, the problem here is that these two method runs on single machine, which will somehow defeat the comparison purpose (because platform is changed). Moreover, the memory usage limitation of Matlab will fail the experiment when using 2^12 data set.&lt;br /&gt;
* Area affected by update: experiment setup and evaluation parts.&lt;br /&gt;
&lt;br /&gt;
=== May 5th ===&lt;br /&gt;
* Managed to find two recent works (implementation) for comparison purpose, set up the cluster to run the implementation.&lt;br /&gt;
* There are also other similar papers on the same topic, which have been added into the related work part.&lt;br /&gt;
* Further clarification of the results (as well as the methodology of our proposed method).&lt;br /&gt;
&lt;br /&gt;
= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18) with guaranteed error, more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4687</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4687"/>
		<updated>2011-11-16T06:09:26Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Fall 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Nov 10th ===&lt;br /&gt;
* Leveraged Amazon Elastic MapReduce cluster to run the proposed method on Wikipedia dataset, and got great reduction of computing time when comparing with that using 5-node cluster.&lt;br /&gt;
* Run the proposed method, distributed Spectral Clustering and Spectral Clustering on Wikipedia datasets. The results show that the proposed method has better performance than the other two methods.&lt;br /&gt;
&lt;br /&gt;
=== Oct 17th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Oct_17th/reportTemplate.pdf here]&lt;br /&gt;
* Experimented with the tuning of the parameter &amp;quot;how many terms to keep&amp;quot;&lt;br /&gt;
* Run large dataset on the proposed algorithm with good performance (91.38%).&lt;br /&gt;
&lt;br /&gt;
=== Oct 4th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Oct_4th/reportTemplate.pdf here]&lt;br /&gt;
* Done with stop-word removing and stemming.&lt;br /&gt;
* In similarity matrix computation part, in order to reduce feature set, used first-F items in one doc's content.&lt;br /&gt;
* Spectral clustering accuracy is 96.6%. Kmeans accuracy is 62.22%.&lt;br /&gt;
* Will use the proposed algorithm on the dataset.&lt;br /&gt;
&lt;br /&gt;
=== Sep 20th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Sep_20th/reportTemplate.pdf here]&lt;br /&gt;
* Successfully crawled 3,550,567 web pages from Wikipedia, forming 739,144 categories.&lt;br /&gt;
* Processed the raw HTML files and have it ready for test purpose.&lt;br /&gt;
* Now working on stop-word removing and stemming.&lt;br /&gt;
* In the following 4-5 days, will have the dataset fully processed and ready. &lt;br /&gt;
* And then, one day to do tf-idf on the dataset (since that part of the code is done). &lt;br /&gt;
* Following that, first test the dataset on kmeans; and then, try spectral clustering on the data.&lt;br /&gt;
&lt;br /&gt;
= Summer 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== May 30th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_May_30th/reportTemplate.pdf here]&lt;br /&gt;
* Added Nyström method into the comparison of proposed method and PSC method, in the aspect of accuracy.&lt;br /&gt;
* Continued to run the experiment on large dataset. Will continue the experiment and collect more data.&lt;br /&gt;
* Analysed the experiment results in the experiment result part.&lt;br /&gt;
* Included more sources to show the legitimacy of the evaluation metrics.&lt;br /&gt;
* Refined the structure of figures.&lt;br /&gt;
&lt;br /&gt;
=== May 16th ===&lt;br /&gt;
* Got Parallel SC method (using MPICH2 in implementation) running on NSL cluster and got the result for comparison purpose. The hardware platform remains unchanged (NSL cluster).&lt;br /&gt;
* Examined and ran a few experiments with Nystrom Extension Spectral Clustering method (in matlab) and &amp;quot;Fast approximate spectral clustering&amp;quot; method (in R). However, the problem here is that these two method runs on single machine, which will somehow defeat the comparison purpose (because platform is changed). Moreover, the memory usage limitation of Matlab will fail the experiment when using 2^12 data set.&lt;br /&gt;
* Area affected by update: experiment setup and evaluation parts.&lt;br /&gt;
&lt;br /&gt;
=== May 5th ===&lt;br /&gt;
* Managed to find two recent works (implementation) for comparison purpose, set up the cluster to run the implementation.&lt;br /&gt;
* There are also other similar papers on the same topic, which have been added into the related work part.&lt;br /&gt;
* Further clarification of the results (as well as the methodology of our proposed method).&lt;br /&gt;
&lt;br /&gt;
= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18) with guaranteed error, more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4669</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4669"/>
		<updated>2011-10-18T08:43:18Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Fall 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Oct 17th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Oct_17th/reportTemplate.pdf here]&lt;br /&gt;
* Experimented with the tuning of the parameter &amp;quot;how many terms to keep&amp;quot;&lt;br /&gt;
* Run large dataset on the proposed algorithm with good performance (91.38%).&lt;br /&gt;
&lt;br /&gt;
=== Oct 4th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Oct_4th/reportTemplate.pdf here]&lt;br /&gt;
* Done with stop-word removing and stemming.&lt;br /&gt;
* In similarity matrix computation part, in order to reduce feature set, used first-F items in one doc's content.&lt;br /&gt;
* Spectral clustering accuracy is 96.6%. Kmeans accuracy is 62.22%.&lt;br /&gt;
* Will use the proposed algorithm on the dataset.&lt;br /&gt;
&lt;br /&gt;
=== Sep 20th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Sep_20th/reportTemplate.pdf here]&lt;br /&gt;
* Successfully crawled 3,550,567 web pages from Wikipedia, forming 739,144 categories.&lt;br /&gt;
* Processed the raw HTML files and have it ready for test purpose.&lt;br /&gt;
* Now working on stop-word removing and stemming.&lt;br /&gt;
* In the following 4-5 days, will have the dataset fully processed and ready. &lt;br /&gt;
* And then, one day to do tf-idf on the dataset (since that part of the code is done). &lt;br /&gt;
* Following that, first test the dataset on kmeans; and then, try spectral clustering on the data.&lt;br /&gt;
&lt;br /&gt;
= Summer 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== May 30th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_May_30th/reportTemplate.pdf here]&lt;br /&gt;
* Added Nyström method into the comparison of proposed method and PSC method, in the aspect of accuracy.&lt;br /&gt;
* Continued to run the experiment on large dataset. Will continue the experiment and collect more data.&lt;br /&gt;
* Analysed the experiment results in the experiment result part.&lt;br /&gt;
* Included more sources to show the legitimacy of the evaluation metrics.&lt;br /&gt;
* Refined the structure of figures.&lt;br /&gt;
&lt;br /&gt;
=== May 16th ===&lt;br /&gt;
* Got Parallel SC method (using MPICH2 in implementation) running on NSL cluster and got the result for comparison purpose. The hardware platform remains unchanged (NSL cluster).&lt;br /&gt;
* Examined and ran a few experiments with Nystrom Extension Spectral Clustering method (in matlab) and &amp;quot;Fast approximate spectral clustering&amp;quot; method (in R). However, the problem here is that these two method runs on single machine, which will somehow defeat the comparison purpose (because platform is changed). Moreover, the memory usage limitation of Matlab will fail the experiment when using 2^12 data set.&lt;br /&gt;
* Area affected by update: experiment setup and evaluation parts.&lt;br /&gt;
&lt;br /&gt;
=== May 5th ===&lt;br /&gt;
* Managed to find two recent works (implementation) for comparison purpose, set up the cluster to run the implementation.&lt;br /&gt;
* There are also other similar papers on the same topic, which have been added into the related work part.&lt;br /&gt;
* Further clarification of the results (as well as the methodology of our proposed method).&lt;br /&gt;
&lt;br /&gt;
= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18) with guaranteed error, more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4668</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4668"/>
		<updated>2011-10-18T08:42:27Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Fall 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Oct 17th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Oct_17th/reportTemplate.pdf here]&lt;br /&gt;
* Experimented with the tuning of the parameter &amp;quot;how many terms to keep&amp;quot;&lt;br /&gt;
* Run large dataset on the proposed algorithm with good performance.&lt;br /&gt;
&lt;br /&gt;
=== Oct 4th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Oct_4th/reportTemplate.pdf here]&lt;br /&gt;
* Done with stop-word removing and stemming.&lt;br /&gt;
* In similarity matrix computation part, in order to reduce feature set, used first-F items in one doc's content.&lt;br /&gt;
* Spectral clustering accuracy is 96.6%. Kmeans accuracy is 62.22%.&lt;br /&gt;
* Will use the proposed algorithm on the dataset.&lt;br /&gt;
&lt;br /&gt;
=== Sep 20th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Sep_20th/reportTemplate.pdf here]&lt;br /&gt;
* Successfully crawled 3,550,567 web pages from Wikipedia, forming 739,144 categories.&lt;br /&gt;
* Processed the raw HTML files and have it ready for test purpose.&lt;br /&gt;
* Now working on stop-word removing and stemming.&lt;br /&gt;
* In the following 4-5 days, will have the dataset fully processed and ready. &lt;br /&gt;
* And then, one day to do tf-idf on the dataset (since that part of the code is done). &lt;br /&gt;
* Following that, first test the dataset on kmeans; and then, try spectral clustering on the data.&lt;br /&gt;
&lt;br /&gt;
= Summer 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== May 30th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_May_30th/reportTemplate.pdf here]&lt;br /&gt;
* Added Nyström method into the comparison of proposed method and PSC method, in the aspect of accuracy.&lt;br /&gt;
* Continued to run the experiment on large dataset. Will continue the experiment and collect more data.&lt;br /&gt;
* Analysed the experiment results in the experiment result part.&lt;br /&gt;
* Included more sources to show the legitimacy of the evaluation metrics.&lt;br /&gt;
* Refined the structure of figures.&lt;br /&gt;
&lt;br /&gt;
=== May 16th ===&lt;br /&gt;
* Got Parallel SC method (using MPICH2 in implementation) running on NSL cluster and got the result for comparison purpose. The hardware platform remains unchanged (NSL cluster).&lt;br /&gt;
* Examined and ran a few experiments with Nystrom Extension Spectral Clustering method (in matlab) and &amp;quot;Fast approximate spectral clustering&amp;quot; method (in R). However, the problem here is that these two method runs on single machine, which will somehow defeat the comparison purpose (because platform is changed). Moreover, the memory usage limitation of Matlab will fail the experiment when using 2^12 data set.&lt;br /&gt;
* Area affected by update: experiment setup and evaluation parts.&lt;br /&gt;
&lt;br /&gt;
=== May 5th ===&lt;br /&gt;
* Managed to find two recent works (implementation) for comparison purpose, set up the cluster to run the implementation.&lt;br /&gt;
* There are also other similar papers on the same topic, which have been added into the related work part.&lt;br /&gt;
* Further clarification of the results (as well as the methodology of our proposed method).&lt;br /&gt;
&lt;br /&gt;
= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18) with guaranteed error, more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4646</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4646"/>
		<updated>2011-10-05T05:56:34Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Fall 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Oct 4th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Oct_4th/reportTemplate.pdf here]&lt;br /&gt;
* Done with stop-word removing and stemming.&lt;br /&gt;
* In similarity matrix computation part, in order to reduce feature set, used first-F items in one doc's content.&lt;br /&gt;
* Spectral clustering accuracy is 96.6%. Kmeans accuracy is 62.22%.&lt;br /&gt;
* Will use the proposed algorithm on the dataset.&lt;br /&gt;
&lt;br /&gt;
=== Sep 20th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Sep_20th/reportTemplate.pdf here]&lt;br /&gt;
* Successfully crawled 3,550,567 web pages from Wikipedia, forming 739,144 categories.&lt;br /&gt;
* Processed the raw HTML files and have it ready for test purpose.&lt;br /&gt;
* Now working on stop-word removing and stemming.&lt;br /&gt;
* In the following 4-5 days, will have the dataset fully processed and ready. &lt;br /&gt;
* And then, one day to do tf-idf on the dataset (since that part of the code is done). &lt;br /&gt;
* Following that, first test the dataset on kmeans; and then, try spectral clustering on the data.&lt;br /&gt;
&lt;br /&gt;
= Summer 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== May 30th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_May_30th/reportTemplate.pdf here]&lt;br /&gt;
* Added Nyström method into the comparison of proposed method and PSC method, in the aspect of accuracy.&lt;br /&gt;
* Continued to run the experiment on large dataset. Will continue the experiment and collect more data.&lt;br /&gt;
* Analysed the experiment results in the experiment result part.&lt;br /&gt;
* Included more sources to show the legitimacy of the evaluation metrics.&lt;br /&gt;
* Refined the structure of figures.&lt;br /&gt;
&lt;br /&gt;
=== May 16th ===&lt;br /&gt;
* Got Parallel SC method (using MPICH2 in implementation) running on NSL cluster and got the result for comparison purpose. The hardware platform remains unchanged (NSL cluster).&lt;br /&gt;
* Examined and ran a few experiments with Nystrom Extension Spectral Clustering method (in matlab) and &amp;quot;Fast approximate spectral clustering&amp;quot; method (in R). However, the problem here is that these two method runs on single machine, which will somehow defeat the comparison purpose (because platform is changed). Moreover, the memory usage limitation of Matlab will fail the experiment when using 2^12 data set.&lt;br /&gt;
* Area affected by update: experiment setup and evaluation parts.&lt;br /&gt;
&lt;br /&gt;
=== May 5th ===&lt;br /&gt;
* Managed to find two recent works (implementation) for comparison purpose, set up the cluster to run the implementation.&lt;br /&gt;
* There are also other similar papers on the same topic, which have been added into the related work part.&lt;br /&gt;
* Further clarification of the results (as well as the methodology of our proposed method).&lt;br /&gt;
&lt;br /&gt;
= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18) with guaranteed error, more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4625</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4625"/>
		<updated>2011-09-21T03:19:09Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Fall 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Sep 20th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Sep_20th/reportTemplate.pdf here]&lt;br /&gt;
* Successfully crawled 3,550,567 web pages from Wikipedia, forming 739,144 categories.&lt;br /&gt;
* Processed the raw HTML files and have it ready for test purpose.&lt;br /&gt;
* Now working on stop-word removing and stemming.&lt;br /&gt;
* In the following 4-5 days, will have the dataset fully processed and ready. &lt;br /&gt;
* And then, one day to do tf-idf on the dataset (since that part of the code is done). &lt;br /&gt;
* Following that, first test the dataset on kmeans; and then, try spectral clustering on the data.&lt;br /&gt;
&lt;br /&gt;
= Summer 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== May 30th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_May_30th/reportTemplate.pdf here]&lt;br /&gt;
* Added Nyström method into the comparison of proposed method and PSC method, in the aspect of accuracy.&lt;br /&gt;
* Continued to run the experiment on large dataset. Will continue the experiment and collect more data.&lt;br /&gt;
* Analysed the experiment results in the experiment result part.&lt;br /&gt;
* Included more sources to show the legitimacy of the evaluation metrics.&lt;br /&gt;
* Refined the structure of figures.&lt;br /&gt;
&lt;br /&gt;
=== May 16th ===&lt;br /&gt;
* Got Parallel SC method (using MPICH2 in implementation) running on NSL cluster and got the result for comparison purpose. The hardware platform remains unchanged (NSL cluster).&lt;br /&gt;
* Examined and ran a few experiments with Nystrom Extension Spectral Clustering method (in matlab) and &amp;quot;Fast approximate spectral clustering&amp;quot; method (in R). However, the problem here is that these two method runs on single machine, which will somehow defeat the comparison purpose (because platform is changed). Moreover, the memory usage limitation of Matlab will fail the experiment when using 2^12 data set.&lt;br /&gt;
* Area affected by update: experiment setup and evaluation parts.&lt;br /&gt;
&lt;br /&gt;
=== May 5th ===&lt;br /&gt;
* Managed to find two recent works (implementation) for comparison purpose, set up the cluster to run the implementation.&lt;br /&gt;
* There are also other similar papers on the same topic, which have been added into the related work part.&lt;br /&gt;
* Further clarification of the results (as well as the methodology of our proposed method).&lt;br /&gt;
&lt;br /&gt;
= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18) with guaranteed error, more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4477</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4477"/>
		<updated>2011-06-01T02:31:12Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Summer 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== May 30th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_May_30th/reportTemplate.pdf here]&lt;br /&gt;
* Added Nyström method into the comparison of proposed method and PSC method, in the aspect of accuracy.&lt;br /&gt;
* Continued to run the experiment on large dataset. Will continue the experiment and collect more data.&lt;br /&gt;
* Analysed the experiment results in the experiment result part.&lt;br /&gt;
* Included more sources to show the legitimacy of the evaluation metrics.&lt;br /&gt;
* Refined the structure of figures.&lt;br /&gt;
&lt;br /&gt;
=== May 16th ===&lt;br /&gt;
* Got Parallel SC method (using MPICH2 in implementation) running on NSL cluster and got the result for comparison purpose. The hardware platform remains unchanged (NSL cluster).&lt;br /&gt;
* Examined and ran a few experiments with Nystrom Extension Spectral Clustering method (in matlab) and &amp;quot;Fast approximate spectral clustering&amp;quot; method (in R). However, the problem here is that these two method runs on single machine, which will somehow defeat the comparison purpose (because platform is changed). Moreover, the memory usage limitation of Matlab will fail the experiment when using 2^12 data set.&lt;br /&gt;
* Area affected by update: experiment setup and evaluation parts.&lt;br /&gt;
&lt;br /&gt;
=== May 5th ===&lt;br /&gt;
* Managed to find two recent works (implementation) for comparison purpose, set up the cluster to run the implementation.&lt;br /&gt;
* There are also other similar papers on the same topic, which have been added into the related work part.&lt;br /&gt;
* Further clarification of the results (as well as the methodology of our proposed method).&lt;br /&gt;
&lt;br /&gt;
= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18) with guaranteed error, more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4476</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4476"/>
		<updated>2011-06-01T02:25:34Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Summer 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== May 30th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_May_30th/reportTemplate.pdf here]&lt;br /&gt;
* Added Nyström method into the comparison of proposed method and PSC method, in the aspect of accuracy.&lt;br /&gt;
* Continued to run the experiment on large dataset.&lt;br /&gt;
* Analysed the experiment results in the experiment result part.&lt;br /&gt;
* Included more sources to show the legitimacy of the evaluation metrics.&lt;br /&gt;
* Refined the structure of figures.&lt;br /&gt;
&lt;br /&gt;
=== May 16th ===&lt;br /&gt;
* Got Parallel SC method (using MPICH2 in implementation) running on NSL cluster and got the result for comparison purpose. The hardware platform remains unchanged (NSL cluster).&lt;br /&gt;
* Examined and ran a few experiments with Nystrom Extension Spectral Clustering method (in matlab) and &amp;quot;Fast approximate spectral clustering&amp;quot; method (in R). However, the problem here is that these two method runs on single machine, which will somehow defeat the comparison purpose (because platform is changed). Moreover, the memory usage limitation of Matlab will fail the experiment when using 2^12 data set.&lt;br /&gt;
* Area affected by update: experiment setup and evaluation parts.&lt;br /&gt;
&lt;br /&gt;
=== May 5th ===&lt;br /&gt;
* Managed to find two recent works (implementation) for comparison purpose, set up the cluster to run the implementation.&lt;br /&gt;
* There are also other similar papers on the same topic, which have been added into the related work part.&lt;br /&gt;
* Further clarification of the results (as well as the methodology of our proposed method).&lt;br /&gt;
&lt;br /&gt;
= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18) with guaranteed error, more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4446</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4446"/>
		<updated>2011-05-17T03:43:15Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Summer 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== May 16th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_May_17th/reportTemplate.pdf here]&lt;br /&gt;
* Got Parallel SC method (using MPICH2 in implementation) running on NSL cluster and got the result for comparison purpose. The hardware platform remains unchanged (NSL cluster).&lt;br /&gt;
* Examined and ran a few experiments with Nystrom Extension Spectral Clustering method (in matlab) and &amp;quot;Fast approximate spectral clustering&amp;quot; method (in R). However, the problem here is that these two method runs on single machine, which will somehow defeat the comparison purpose (because platform is changed). Moreover, the memory usage limitation of Matlab will fail the experiment when using 2^12 data set.&lt;br /&gt;
* Area affected by update: experiment setup and evaluation parts.&lt;br /&gt;
&lt;br /&gt;
=== May 5th ===&lt;br /&gt;
* Managed to find two recent works (implementation) for comparison purpose, set up the cluster to run the implementation.&lt;br /&gt;
* There are also other similar papers on the same topic, which have been added into the related work part.&lt;br /&gt;
* Further clarification of the results (as well as the methodology of our proposed method).&lt;br /&gt;
&lt;br /&gt;
= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18) with guaranteed error, more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4445</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4445"/>
		<updated>2011-05-17T03:35:58Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Summer 2011 (RA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== May 16th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_May_17th/reportTemplate.pdf here]&lt;br /&gt;
* Got Parallel SC method (using MPICH2 in implementation) running on NSL cluster and got the result for comparison purpose. The hardware platform remains unchanged (NSL cluster).&lt;br /&gt;
* Examined and ran a few experiments with Nystrom Extension Spectral Clustering method (in matlab) and &amp;quot;Fast approximate spectral clustering&amp;quot; method (in R). However, the problem here is that these two method runs on single machine, which will somehow defeat the comparison purpose (because platform is changed). &lt;br /&gt;
* Area affected by update: experiment setup and evaluation parts.&lt;br /&gt;
&lt;br /&gt;
=== May 5th ===&lt;br /&gt;
* Managed to find two recent works (implementation) for comparison purpose, set up the cluster to run the implementation.&lt;br /&gt;
* There are also other similar papers on the same topic, which have been added into the related work part.&lt;br /&gt;
* Further clarification of the results (as well as the methodology of our proposed method).&lt;br /&gt;
&lt;br /&gt;
= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18) with guaranteed error, more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4408</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4408"/>
		<updated>2011-05-06T01:56:22Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== May 5th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_May_5th/reportTemplate.pdf here]&lt;br /&gt;
* Managed to find two recent works (implementation) for comparison purpose, set up the cluster to run the implementation.&lt;br /&gt;
* There are also other similar papers on the same topic, which have been added into the related work part.&lt;br /&gt;
* Further clarification of the results (as well as the methodology of our proposed method).&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Apr_19th.pdf here]&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18) with guaranteed error, more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4399</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4399"/>
		<updated>2011-04-21T04:48:56Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Apr_19th.pdf here]&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18) with guaranteed error, more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4398</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4398"/>
		<updated>2011-04-21T04:38:17Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Apr_19th.pdf here]&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18) with guaranteed error, more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Apr_3rd.pdf here]&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4397</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4397"/>
		<updated>2011-04-21T04:36:55Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 19th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Apr_19th.pdf here]&lt;br /&gt;
* TA course assignment five marking, final exam supervision and marking&lt;br /&gt;
* Applied larger datasets to the proposed scheme, and get HUGE computing time reduction (the ratio of about 1/18), more can be found in the report.&lt;br /&gt;
* Read through related publications, compared our method with the already existing ones, and updated the related work part in the paper.&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Apr_3rd.pdf here]&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4360</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4360"/>
		<updated>2011-04-12T00:59:40Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 11th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Apr_3rd.pdf here]&lt;br /&gt;
* Corrected the implementation of Dunn Index (which is used for evaluation purpose).&lt;br /&gt;
* Solved the problem of memory insufficiency in gram matrix generation (for comparison purpose). The problem is that, in order to compare the proposed method to the original method, we need the raw gram matrix computed beforehand. However, for dataset no less than 2^13, there will be memory insufficiency. The problem is solved using a buffer (instead of keeping all the matrix entries in memory)&lt;br /&gt;
* The above mentioned fix will enable us to extend the comparison to larger datasets.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4329</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4329"/>
		<updated>2011-04-04T06:07:31Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Apr_3rd.pdf here]&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4328</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4328"/>
		<updated>2011-04-04T06:07:13Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Apr 3rd ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Apr_3rd.pdf here]&lt;br /&gt;
* Implemented the whole proposed design.&lt;br /&gt;
* Implemented evaluation measurement.&lt;br /&gt;
* Got the result of running dataset with size from 2^6 to 2^13, more results will be collected.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Mar_21st.pdf here]&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4314</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4314"/>
		<updated>2011-03-22T07:02:13Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Mar 21st ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Mar_21st.pdf here]&lt;br /&gt;
* Corrected bugs in the original Mahout code. The original source code has a &amp;quot;java.lang.RuntimeException: java.lang.ClassNotFoundException&amp;quot; bug, this error appears in several places in the code. &lt;br /&gt;
* Implementing evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4313</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4313"/>
		<updated>2011-03-21T23:35:16Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Mar_14th.pdf here]&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing opportunity, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4280</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4280"/>
		<updated>2011-03-15T01:36:52Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Mar_14th.pdf here]&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing chance, named as, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4279</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4279"/>
		<updated>2011-03-15T01:36:37Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Mar_14th.pdf here]&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing chance, named, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4278</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4278"/>
		<updated>2011-03-15T01:35:55Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Mar_14th.pdf here]&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing chance, named, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we observe that one Spectral Clustering instance can be initiated right after one bin of data is ready, which is &lt;br /&gt;
to say, there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4277</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4277"/>
		<updated>2011-03-15T01:35:39Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Mar 14th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Mar_14th.pdf here]&lt;br /&gt;
* Set up Mahout (together with Hadoop) environment and run tests&lt;br /&gt;
* Implementing the proposed idea on Mahout&lt;br /&gt;
* Found out a nice optimizing chance, named, &amp;quot;Time Multiplexing&amp;quot;. The optimization is based on the observation: In common cases, the Spectral Clustering step should execute only after the preprocessing step. However, we &lt;br /&gt;
observe that one Spectral Clustering instance can be initiated right after one bin of data is ready, which is &lt;br /&gt;
to say, there can be intersection (time multiplexing) between LSH step and Spectral Clustering step.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4249</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4249"/>
		<updated>2011-03-07T06:05:48Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Mar 7th ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Mar_7th.pdf here]&lt;br /&gt;
* Trying to add extra code in the library of Mahout to implement the proposed idea.&lt;br /&gt;
* The affected classes: AffinityMatrixInputJob (to perform gram matrix calculation), DistributedRowMatrix (to construct the diagonal matrix), DistributedLanczosSolver (to perform eigen-decomposition)&lt;br /&gt;
* Merging the previous reports and making necessary changes&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4223</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4223"/>
		<updated>2011-02-28T05:13:36Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* '''Report:''' [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/reports/Report_Feb_28th.pdf here]&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4222</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4222"/>
		<updated>2011-02-28T05:08:10Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4221</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4221"/>
		<updated>2011-02-28T05:07:48Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance&lt;br /&gt;
are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4220</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4220"/>
		<updated>2011-02-28T05:07:37Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Feb 28 ===&lt;br /&gt;
* Hadoop's two important design features, which have great influence over the experiment's performance&lt;br /&gt;
are examined: data placement policy and task scheduling policy.&lt;br /&gt;
* It is noted that, for map tasks, the scheduler uses a locality optimization technique. After selecting a job, the scheduler picks the map task in the job with data closest to the slave, on the same node if possible, otherwise on the same rack, or finally on a remote rack. For reduce tasks, the jobtracker just takes the next in the reduce tasks' list and&lt;br /&gt;
assign it to the tasktracker.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4206</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4206"/>
		<updated>2011-02-19T00:23:20Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Feb 14 ===&lt;br /&gt;
* Formalized the proposed design. Analyzing why a specific family of function are used, the speedup attained, the evaluation measurement.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4163</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4163"/>
		<updated>2011-02-08T05:42:20Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Feb 7 ===&lt;br /&gt;
* Studied LSH and explored different LSH families according to the distance measurement they use.&lt;br /&gt;
* Based on how LSH methods work, analysed how these LSH methods can be parallelized in distributed environment.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examined how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explained spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, proposed optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4150</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4150"/>
		<updated>2011-02-01T16:56:31Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Examining how the main class clustering algorithms can be parallelized, respectively.&lt;br /&gt;
* Explaining spectral clustering from theoretical aspect, from graph cut viewpoint. &lt;br /&gt;
* Based on the understanding of main clustering algorithms, propose optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
* The proposed method makes use of LSH to do pre-precessing.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4148</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4148"/>
		<updated>2011-02-01T05:15:57Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Jan 31 ===&lt;br /&gt;
* Based on the understanding of main clustering algorithms, propose optimizing method for spectral clustering to deal with large data set.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=group_meeting&amp;diff=4039</id>
		<title>group meeting</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=group_meeting&amp;diff=4039"/>
		<updated>2011-01-21T00:17:03Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;We hold regular meetings for discussion and for every student to update the group on his/her progress. In some of the meetings, graduate students present talks summarizing their research progress so far. &lt;br /&gt;
&lt;br /&gt;
The meetings are good opportunities for students to practice their presentation skills and to get constructive feedback from the group on their research.  The meetings keep the group members informed about different research problems being addressed in the group. They are also very helpful in finding research topics specially for new students. &lt;br /&gt;
&lt;br /&gt;
Everybody is welcome to attend. Meeting time: Every Tuesday, 10:00 AM -12:00 PM, room SUR 4010.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Spring 2011==&lt;br /&gt;
&lt;br /&gt;
* 1 Feb: Ahmed Bu-khamsin, Top Ten Computationally-Complex Problems in Oil and Gas Exploration Filed&lt;br /&gt;
&lt;br /&gt;
* 25 Jan: Naghmeh, 3D Video Copy Detection&lt;br /&gt;
&lt;br /&gt;
* 18 Jan: Cameron, [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/harvey/talks/Copy%20Detection%20Using%20Optical%20Flow.pptx Video Copy Detection using Optical Flow]&lt;br /&gt;
&lt;br /&gt;
* 11 Jan: Mathieu,  [[media:3D_VideosOverview.pptx | 3D Media - An Overview]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Group meetings were held biweekly up to Dec 2010. &lt;br /&gt;
&lt;br /&gt;
== Fall 2010==&lt;br /&gt;
&lt;br /&gt;
* 21 Dec 10: Group discussion.&lt;br /&gt;
&lt;br /&gt;
* 9 Dec 10: Mohammad, [[media:Botnet-Detection-2.0.pptx | Detection of SIP Botnets Based on C&amp;amp;C Communication]]&lt;br /&gt;
&lt;br /&gt;
* 23 Nov 10: Taher, Approximation algorithms for Web-Scale Kernel Methods&lt;br /&gt;
&lt;br /&gt;
* 19 Oct 10: Jeff, [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/gao/talks/LSH_Cluster.pdf Gram Matrix Approximation Using Locality Sensitive Hashing on Cluster]&lt;br /&gt;
&lt;br /&gt;
* 7 Oct 10: Dr. Rocky Chang (Hong Kong Polytechnic University),&amp;lt;br/&amp;gt; [[media:Rocky-SFU-7-Oct-2010.pdf | Active Measurement of Data-Path Quality in a Non-cooperative Internet]]&lt;br /&gt;
&lt;br /&gt;
* 28 Sep 10: Ahmed, [[media:DRS.pdf | Energy-Efficient Gaming on Mobile Devices using Dead Reckoning-based Power Management]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Spring/Summer 2010==&lt;br /&gt;
&lt;br /&gt;
* 31 Aug 10: Hamed, [[media:Predicting_Click_Through_Rate_for_new_ads.pdf | Predicting Click Through Rate for New Ads with Semantically Similarity Measurement]]&lt;br /&gt;
&lt;br /&gt;
* 9 Aug 10: Yuanbin, [[media:mutisendP2Pstream.pdf | Efficient Algorithms for Multi-Sender Data Transmission in Swarm-based P2P Streaming Systems]]&lt;br /&gt;
&lt;br /&gt;
* 3 Aug 10: Azin, [[media:CognitiveRadio.ppt | Cognitive Radio Networks]]&lt;br /&gt;
&lt;br /&gt;
* 20 July 10: Farid, [[media:movid10.pdf | Optimal Scalable Video Multiplexing in Mobile Broadcast Networks]]&lt;br /&gt;
&lt;br /&gt;
* 17 May 10: Cameron, Reducing Energy Consumption in Online Network Games on Mobile Devices&lt;br /&gt;
&lt;br /&gt;
* 10 May 10: Cong, Latency Reduction in Online Network Games&lt;br /&gt;
&lt;br /&gt;
* 19 April 10: Shabnam, [[media:Svc-nc.ppt | Live P2P Streaming with Scalable Video Coding and Network Coding]]&lt;br /&gt;
&lt;br /&gt;
* 29 March 10: Jeff and Taher, [[media:AppAlgo.ppt | Approximation algorithms for Kernel Methods on Multi-core CPUs and GPUs]]&lt;br /&gt;
&lt;br /&gt;
* 15 March 10: Som, [http://www.cs.sfu.ca/~ssa121/personal/wimaxSVC.pdf Video Streaming over WiMAX]&lt;br /&gt;
&lt;br /&gt;
* 1 March 10: Farid, [[media:scalable_video_streaming_for_mobiletv.pptx | Scalable Video Streaming for MobileTV]]&lt;br /&gt;
&lt;br /&gt;
* 1 Feb 10: Ahmed, Design of pCDN with Scalable Video Coding&lt;br /&gt;
&lt;br /&gt;
* 18 Jan 10: Shabnam, P2P Streaming with Newtork Coding and Scalable Video Coding&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Fall 2009==&lt;br /&gt;
&lt;br /&gt;
* 8 December 09: Yi, Video Streaming over Cooperative Wireless Networks&lt;br /&gt;
&lt;br /&gt;
* 10 Nov 09: Cheng, [[media:testbed.ppt | Design of a Mobile TV Testbed]]&lt;br /&gt;
&lt;br /&gt;
* 27 October 09: Yuanbin, Segment Scheduling in P2P Streaming Systems&lt;br /&gt;
&lt;br /&gt;
* 13 October 09: Ahmed, [[media:LTE.pdf | Long Term Evolution (LTE) - A Tutorial]]&lt;br /&gt;
&lt;br /&gt;
* 6 October 09: Cheng, [[media:Mm09.ppt | Statistical Multiplexing of VBR Video Streams]] (ACM MM 09 talk)&lt;br /&gt;
&lt;br /&gt;
* 22 September 09: Som, Video Streaming over WiMAX Networks&lt;br /&gt;
&lt;br /&gt;
* 8 September 09: Cong, Minimizing Round-Trip Time in Online Games&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summer 2009==&lt;br /&gt;
&lt;br /&gt;
* 18 August 09: Mohammad and Cong: 30 min each. Present their Directed Reading projects. &lt;br /&gt;
&lt;br /&gt;
* 14 July 09: Cheng, [[media:wimaxTV.pptx | Broadcasting Variable-Bit-Rate Videos in 802.16e-Like Mobile Networks]] &lt;br /&gt;
&lt;br /&gt;
*  7 July 09:  Yi&lt;br /&gt;
&lt;br /&gt;
* 26 June 09: Ahmed &lt;br /&gt;
&lt;br /&gt;
* 5 June 09: '''Canceled''' (Mohamed attending NOSSDAV'09)&lt;br /&gt;
&lt;br /&gt;
* 29 May 09: Kianoosh, End-to-End Secure Delivery of Scalable Video Streams &lt;br /&gt;
&lt;br /&gt;
* 22 May 09: Cong, [[media:wimax.pptx| Multimedia Streaming over WiMAX Networks]]&lt;br /&gt;
&lt;br /&gt;
* 8 May 09:  Kianoosh,   Analysis of Authentication Schemes for Nonscalable Video Streams&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Spring 2009 ==&lt;br /&gt;
&lt;br /&gt;
* 17 Apr 09: [[media:infocom09.pptx|Cheng (practice your infocom presentation)]]&lt;br /&gt;
&lt;br /&gt;
* 27 March 09: Andreas Berger, [[media:Nsl_vancouver.odp | Network-based Detection of SIP Bots]]&lt;br /&gt;
&lt;br /&gt;
* 27 Feb 09: Shabnam and Yuanbin&lt;br /&gt;
&lt;br /&gt;
* 23 Jan 09: Cheng (rehearse your PhD proposal) and Kianoosh&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4027</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=4027"/>
		<updated>2011-01-20T05:38:28Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Jan 17 ===&lt;br /&gt;
* Understanding Spectral clustering and distributed implementation.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=3975</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=3975"/>
		<updated>2011-01-11T04:02:09Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout experimenting.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=3974</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=3974"/>
		<updated>2011-01-11T03:56:35Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout code reading (clustering part)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=3973</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=3973"/>
		<updated>2011-01-11T03:56:15Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout code reading (clustering part)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=3972</id>
		<title>Private:progress-gao</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=Private:progress-gao&amp;diff=3972"/>
		<updated>2011-01-11T03:55:32Z</updated>

		<summary type="html">&lt;p&gt;Feig: New page: = Spring 2011 (TA) = * Courses: **None.  === Jan 10 === * Survey on main clustering algorithms and the distributed map-reduce method of these algorithms. * Mahout code reading (clustering ...&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Spring 2011 (TA) =&lt;br /&gt;
* Courses:&lt;br /&gt;
**None.&lt;br /&gt;
&lt;br /&gt;
=== Jan 10 ===&lt;br /&gt;
* Survey on main clustering algorithms and the distributed map-reduce method of these algorithms.&lt;br /&gt;
* Mahout code reading (clustering part)&lt;br /&gt;
&lt;br /&gt;
== Jan 3 == &lt;br /&gt;
* having fun, not really working&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Fall 2010 (FELLOWSHIP) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 771: Internet Architecture and Protcols&lt;br /&gt;
**CMPT 741: Data Mining&lt;br /&gt;
&lt;br /&gt;
* Worked on efficient approximation of gram matrix using map-reduce framework, focusing on LSH performance evaluation and network communication measurement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Summer 2010 (RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**None&lt;br /&gt;
&lt;br /&gt;
* Worked on Approximation of gram matrices using Locality Sensitive Hashing on Cluster.&lt;br /&gt;
&lt;br /&gt;
= Spring 2010 (TA+RA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 886: Special Topics in Operating Systems and Computer Architecture&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;br /&gt;
&lt;br /&gt;
= Fall 2009 (TA) = &lt;br /&gt;
* Courses:&lt;br /&gt;
**CMPT 705: Design and Analysis of Algorithms&lt;br /&gt;
**CMPT 726: Machine Learning&lt;br /&gt;
&lt;br /&gt;
* Worked on Band approximation of gram matrices (large high-dimensional dataset) using Hilbert curve on multicore.&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=group_meeting&amp;diff=3859</id>
		<title>group meeting</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=group_meeting&amp;diff=3859"/>
		<updated>2010-10-19T23:46:54Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;We hold regular meetings (mostly bi-weekly) for discussion. In each meeting, a graduate student will present his/her progress on research. This is followed by 15-20 minute discussion. The presenter can also choose a recent paper and present it to the group. The paper must be from the top conferences/journals in our research areas, such as, ACM Multimedia, SIGCOMM, INFOCOM, ICNP, IEEE Transactions on Networking, ACM TOMCCAP, and IEEE Transactions on Multimedia. &lt;br /&gt;
&lt;br /&gt;
The meetings are good opportunities for students to practice their presentation skills and to get constructive feedback from the group on their research.  The meetings keep the group members informed about different research problems being addressed in the group. They are also very helpful in finding research topics specially for new students. &lt;br /&gt;
&lt;br /&gt;
Everybody is welcome to attend. Meeting time: Every other Tuesday, 11:00 AM -12:00 PM, room SUR 4040.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Fall 2010==&lt;br /&gt;
&lt;br /&gt;
* 19 Oct 10: Jeff, Gram Matrix Approximation Using Locality Sensitive Hashing on Cluster&lt;br /&gt;
&lt;br /&gt;
* 7 Oct 10: Dr. Rocky Chang (Hong Kong Polytechnic University),&amp;lt;br/&amp;gt; [[media:Rocky-SFU-7-Oct-2010.pdf | Active Measurement of Data-Path Quality in a Non-cooperative Internet]]&lt;br /&gt;
&lt;br /&gt;
* 28 Sep 10: Ahmed, [[media:DRS.pdf | Energy-Efficient Gaming on Mobile Devices using Dead Reckoning-based Power Management]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Spring/Summer 2010==&lt;br /&gt;
&lt;br /&gt;
* 31 Aug 10: Hamed, [[media:Predicting_Click_Through_Rate_for_new_ads.pdf | Predicting Click Through Rate for New Ads with Semantically Similarity Measurement]]&lt;br /&gt;
&lt;br /&gt;
* 9 Aug 10: Yuanbin, [[media:mutisendP2Pstream.pdf | Efficient Algorithms for Multi-Sender Data Transmission in Swarm-based P2P Streaming Systems]]&lt;br /&gt;
&lt;br /&gt;
* 3 Aug 10: Azin, [[media:CognitiveRadio.ppt | Cognitive Radio Networks]]&lt;br /&gt;
&lt;br /&gt;
* 20 July 10: Farid, Optimal Scalable Video Multiplexing in Mobile Broadcast Networks&lt;br /&gt;
&lt;br /&gt;
* 17 May 10: Cameron, Reducing Energy Consumption in Online Network Games on Mobile Devices&lt;br /&gt;
&lt;br /&gt;
* 10 May 10: Cong, Latency Reduction in Online Network Games&lt;br /&gt;
&lt;br /&gt;
* 19 April 10: Shabnam, [[media:Svc-nc.ppt | Live P2P Streaming with Scalable Video Coding and Network Coding]]&lt;br /&gt;
&lt;br /&gt;
* 29 March 10: Jeff and Taher, [[media:AppAlgo.ppt | Approximation algorithms for Kernel Methods on Multi-core CPUs and GPUs]]&lt;br /&gt;
&lt;br /&gt;
* 15 March 10: Som, [http://www.cs.sfu.ca/~ssa121/personal/wimaxSVC.pdf Video Streaming over WiMAX]&lt;br /&gt;
&lt;br /&gt;
* 1 March 10: Farid, Mobile Video Streaming &lt;br /&gt;
&lt;br /&gt;
* 1 Feb 10: Ahmed, Design of pCDN with Scalable Video Coding&lt;br /&gt;
&lt;br /&gt;
* 18 Jan 10: Shabnam, P2P Streaming with Newtork Coding and Scalable Video Coding&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Fall 2009==&lt;br /&gt;
&lt;br /&gt;
* 8 December 09: Yi, Video Streaming over Cooperative Wireless Networks&lt;br /&gt;
&lt;br /&gt;
* 10 Nov 09: Cheng, [[media:testbed.ppt | Design of a Mobile TV Testbed]]&lt;br /&gt;
&lt;br /&gt;
* 27 October 09: Yuanbin, Segment Scheduling in P2P Streaming Systems&lt;br /&gt;
&lt;br /&gt;
* 13 October 09: Ahmed, [[media:LTE.pdf | Long Term Evolution (LTE) - A Tutorial]]&lt;br /&gt;
&lt;br /&gt;
* 6 October 09: Cheng, [[media:Mm09.ppt | Statistical Multiplexing of VBR Video Streams]] (ACM MM 09 talk)&lt;br /&gt;
&lt;br /&gt;
* 22 September 09: Som, Video Streaming over WiMAX Networks&lt;br /&gt;
&lt;br /&gt;
* 8 September 09: Cong, Minimizing Round-Trip Time in Online Games&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summer 2009==&lt;br /&gt;
&lt;br /&gt;
* 18 August 09: Mohammad and Cong: 30 min each. Present their Directed Reading projects. &lt;br /&gt;
&lt;br /&gt;
* 14 July 09: Cheng, [[media:wimaxTV.pptx | Broadcasting Variable-Bit-Rate Videos in 802.16e-Like Mobile Networks]] &lt;br /&gt;
&lt;br /&gt;
*  7 July 09:  Yi&lt;br /&gt;
&lt;br /&gt;
* 26 June 09: Ahmed &lt;br /&gt;
&lt;br /&gt;
* 5 June 09: '''Canceled''' (Mohamed attending NOSSDAV'09)&lt;br /&gt;
&lt;br /&gt;
* 29 May 09: Kianoosh, End-to-End Secure Delivery of Scalable Video Streams &lt;br /&gt;
&lt;br /&gt;
* 22 May 09: Cong, [[media:wimax.pptx| Multimedia Streaming over WiMAX Networks]]&lt;br /&gt;
&lt;br /&gt;
* 8 May 09:  Kianoosh,   Analysis of Authentication Schemes for Nonscalable Video Streams&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Spring 2009 ==&lt;br /&gt;
&lt;br /&gt;
* 17 Apr 09: [[media:infocom09.pptx|Cheng (practice your infocom presentation)]]&lt;br /&gt;
&lt;br /&gt;
* 27 March 09: Andreas Berger, [[media:Nsl_vancouver.odp | Network-based Detection of SIP Bots]]&lt;br /&gt;
&lt;br /&gt;
* 27 Feb 09: Shabnam and Yuanbin&lt;br /&gt;
&lt;br /&gt;
* 23 Jan 09: Cheng (rehearse your PhD proposal) and Kianoosh&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=group_meeting&amp;diff=3858</id>
		<title>group meeting</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=group_meeting&amp;diff=3858"/>
		<updated>2010-10-19T19:35:31Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;We hold regular meetings (mostly bi-weekly) for discussion. In each meeting, a graduate student will present his/her progress on research. This is followed by 15-20 minute discussion. The presenter can also choose a recent paper and present it to the group. The paper must be from the top conferences/journals in our research areas, such as, ACM Multimedia, SIGCOMM, INFOCOM, ICNP, IEEE Transactions on Networking, ACM TOMCCAP, and IEEE Transactions on Multimedia. &lt;br /&gt;
&lt;br /&gt;
The meetings are good opportunities for students to practice their presentation skills and to get constructive feedback from the group on their research.  The meetings keep the group members informed about different research problems being addressed in the group. They are also very helpful in finding research topics specially for new students. &lt;br /&gt;
&lt;br /&gt;
Everybody is welcome to attend. Meeting time: Every other Tuesday, 11:00 AM -12:00 PM, room SUR 4040.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Fall 2010==&lt;br /&gt;
&lt;br /&gt;
* 19 Oct 10: Jeff, [[media:LSH_Cluster.pdf | Gram Matrix Approximation Using Locality Sensitive Hashing on Cluster]]&lt;br /&gt;
&lt;br /&gt;
* 7 Oct 10: Dr. Rocky Chang (Hong Kong Polytechnic University),&amp;lt;br/&amp;gt; [[media:Rocky-SFU-7-Oct-2010.pdf | Active Measurement of Data-Path Quality in a Non-cooperative Internet]]&lt;br /&gt;
&lt;br /&gt;
* 28 Sep 10: Ahmed, [[media:DRS.pdf | Energy-Efficient Gaming on Mobile Devices using Dead Reckoning-based Power Management]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Spring/Summer 2010==&lt;br /&gt;
&lt;br /&gt;
* 31 Aug 10: Hamed, [[media:Predicting_Click_Through_Rate_for_new_ads.pdf | Predicting Click Through Rate for New Ads with Semantically Similarity Measurement]]&lt;br /&gt;
&lt;br /&gt;
* 9 Aug 10: Yuanbin, [[media:mutisendP2Pstream.pdf | Efficient Algorithms for Multi-Sender Data Transmission in Swarm-based P2P Streaming Systems]]&lt;br /&gt;
&lt;br /&gt;
* 3 Aug 10: Azin, [[media:CognitiveRadio.ppt | Cognitive Radio Networks]]&lt;br /&gt;
&lt;br /&gt;
* 20 July 10: Farid, Optimal Scalable Video Multiplexing in Mobile Broadcast Networks&lt;br /&gt;
&lt;br /&gt;
* 17 May 10: Cameron, Reducing Energy Consumption in Online Network Games on Mobile Devices&lt;br /&gt;
&lt;br /&gt;
* 10 May 10: Cong, Latency Reduction in Online Network Games&lt;br /&gt;
&lt;br /&gt;
* 19 April 10: Shabnam, [[media:Svc-nc.ppt | Live P2P Streaming with Scalable Video Coding and Network Coding]]&lt;br /&gt;
&lt;br /&gt;
* 29 March 10: Jeff and Taher, [[media:AppAlgo.ppt | Approximation algorithms for Kernel Methods on Multi-core CPUs and GPUs]]&lt;br /&gt;
&lt;br /&gt;
* 15 March 10: Som, [http://www.cs.sfu.ca/~ssa121/personal/wimaxSVC.pdf Video Streaming over WiMAX]&lt;br /&gt;
&lt;br /&gt;
* 1 March 10: Farid, Mobile Video Streaming &lt;br /&gt;
&lt;br /&gt;
* 1 Feb 10: Ahmed, Design of pCDN with Scalable Video Coding&lt;br /&gt;
&lt;br /&gt;
* 18 Jan 10: Shabnam, P2P Streaming with Newtork Coding and Scalable Video Coding&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Fall 2009==&lt;br /&gt;
&lt;br /&gt;
* 8 December 09: Yi, Video Streaming over Cooperative Wireless Networks&lt;br /&gt;
&lt;br /&gt;
* 10 Nov 09: Cheng, [[media:testbed.ppt | Design of a Mobile TV Testbed]]&lt;br /&gt;
&lt;br /&gt;
* 27 October 09: Yuanbin, Segment Scheduling in P2P Streaming Systems&lt;br /&gt;
&lt;br /&gt;
* 13 October 09: Ahmed, [[media:LTE.pdf | Long Term Evolution (LTE) - A Tutorial]]&lt;br /&gt;
&lt;br /&gt;
* 6 October 09: Cheng, [[media:Mm09.ppt | Statistical Multiplexing of VBR Video Streams]] (ACM MM 09 talk)&lt;br /&gt;
&lt;br /&gt;
* 22 September 09: Som, Video Streaming over WiMAX Networks&lt;br /&gt;
&lt;br /&gt;
* 8 September 09: Cong, Minimizing Round-Trip Time in Online Games&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summer 2009==&lt;br /&gt;
&lt;br /&gt;
* 18 August 09: Mohammad and Cong: 30 min each. Present their Directed Reading projects. &lt;br /&gt;
&lt;br /&gt;
* 14 July 09: Cheng, [[media:wimaxTV.pptx | Broadcasting Variable-Bit-Rate Videos in 802.16e-Like Mobile Networks]] &lt;br /&gt;
&lt;br /&gt;
*  7 July 09:  Yi&lt;br /&gt;
&lt;br /&gt;
* 26 June 09: Ahmed &lt;br /&gt;
&lt;br /&gt;
* 5 June 09: '''Canceled''' (Mohamed attending NOSSDAV'09)&lt;br /&gt;
&lt;br /&gt;
* 29 May 09: Kianoosh, End-to-End Secure Delivery of Scalable Video Streams &lt;br /&gt;
&lt;br /&gt;
* 22 May 09: Cong, [[media:wimax.pptx| Multimedia Streaming over WiMAX Networks]]&lt;br /&gt;
&lt;br /&gt;
* 8 May 09:  Kianoosh,   Analysis of Authentication Schemes for Nonscalable Video Streams&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Spring 2009 ==&lt;br /&gt;
&lt;br /&gt;
* 17 Apr 09: [[media:infocom09.pptx|Cheng (practice your infocom presentation)]]&lt;br /&gt;
&lt;br /&gt;
* 27 March 09: Andreas Berger, [[media:Nsl_vancouver.odp | Network-based Detection of SIP Bots]]&lt;br /&gt;
&lt;br /&gt;
* 27 Feb 09: Shabnam and Yuanbin&lt;br /&gt;
&lt;br /&gt;
* 23 Jan 09: Cheng (rehearse your PhD proposal) and Kianoosh&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=File:LSH_Cluster.pdf&amp;diff=3857</id>
		<title>File:LSH Cluster.pdf</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=File:LSH_Cluster.pdf&amp;diff=3857"/>
		<updated>2010-10-19T19:33:41Z</updated>

		<summary type="html">&lt;p&gt;Feig: Gram Matrix Approximation�Using Locality Sensitive Hashing�on Cluster&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Gram Matrix Approximation�Using Locality Sensitive Hashing�on Cluster&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=group_meeting&amp;diff=3856</id>
		<title>group meeting</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=group_meeting&amp;diff=3856"/>
		<updated>2010-10-19T17:15:25Z</updated>

		<summary type="html">&lt;p&gt;Feig: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;We hold regular meetings (mostly bi-weekly) for discussion. In each meeting, a graduate student will present his/her progress on research. This is followed by 15-20 minute discussion. The presenter can also choose a recent paper and present it to the group. The paper must be from the top conferences/journals in our research areas, such as, ACM Multimedia, SIGCOMM, INFOCOM, ICNP, IEEE Transactions on Networking, ACM TOMCCAP, and IEEE Transactions on Multimedia. &lt;br /&gt;
&lt;br /&gt;
The meetings are good opportunities for students to practice their presentation skills and to get constructive feedback from the group on their research.  The meetings keep the group members informed about different research problems being addressed in the group. They are also very helpful in finding research topics specially for new students. &lt;br /&gt;
&lt;br /&gt;
Everybody is welcome to attend. Meeting time: Every other Tuesday, 11:00 AM -12:00 PM, room SUR 4040.   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Fall 2010==&lt;br /&gt;
&lt;br /&gt;
* 19 Oct 10: Jeff, Gram Matrix Approximation Using Locality Sensitive Hashing on Cluster&lt;br /&gt;
&lt;br /&gt;
* 7 Oct 10: Dr. Rocky Chang (Hong Kong Polytechnic University),&amp;lt;br/&amp;gt; [[media:Rocky-SFU-7-Oct-2010.pdf | Active Measurement of Data-Path Quality in a Non-cooperative Internet]]&lt;br /&gt;
&lt;br /&gt;
* 28 Sep 10: Ahmed, [[media:DRS.pdf | Energy-Efficient Gaming on Mobile Devices using Dead Reckoning-based Power Management]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Spring/Summer 2010==&lt;br /&gt;
&lt;br /&gt;
* 31 Aug 10: Hamed, [[media:Predicting_Click_Through_Rate_for_new_ads.pdf | Predicting Click Through Rate for New Ads with Semantically Similarity Measurement]]&lt;br /&gt;
&lt;br /&gt;
* 9 Aug 10: Yuanbin, [[media:mutisendP2Pstream.pdf | Efficient Algorithms for Multi-Sender Data Transmission in Swarm-based P2P Streaming Systems]]&lt;br /&gt;
&lt;br /&gt;
* 3 Aug 10: Azin, [[media:CognitiveRadio.ppt | Cognitive Radio Networks]]&lt;br /&gt;
&lt;br /&gt;
* 20 July 10: Farid, Optimal Scalable Video Multiplexing in Mobile Broadcast Networks&lt;br /&gt;
&lt;br /&gt;
* 17 May 10: Cameron, Reducing Energy Consumption in Online Network Games on Mobile Devices&lt;br /&gt;
&lt;br /&gt;
* 10 May 10: Cong, Latency Reduction in Online Network Games&lt;br /&gt;
&lt;br /&gt;
* 19 April 10: Shabnam, [[media:Svc-nc.ppt | Live P2P Streaming with Scalable Video Coding and Network Coding]]&lt;br /&gt;
&lt;br /&gt;
* 29 March 10: Jeff and Taher, [[media:AppAlgo.ppt | Approximation algorithms for Kernel Methods on Multi-core CPUs and GPUs]]&lt;br /&gt;
&lt;br /&gt;
* 15 March 10: Som, [http://www.cs.sfu.ca/~ssa121/personal/wimaxSVC.pdf Video Streaming over WiMAX]&lt;br /&gt;
&lt;br /&gt;
* 1 March 10: Farid, Mobile Video Streaming &lt;br /&gt;
&lt;br /&gt;
* 1 Feb 10: Ahmed, Design of pCDN with Scalable Video Coding&lt;br /&gt;
&lt;br /&gt;
* 18 Jan 10: Shabnam, P2P Streaming with Newtork Coding and Scalable Video Coding&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Fall 2009==&lt;br /&gt;
&lt;br /&gt;
* 8 December 09: Yi, Video Streaming over Cooperative Wireless Networks&lt;br /&gt;
&lt;br /&gt;
* 10 Nov 09: Cheng, [[media:testbed.ppt | Design of a Mobile TV Testbed]]&lt;br /&gt;
&lt;br /&gt;
* 27 October 09: Yuanbin, Segment Scheduling in P2P Streaming Systems&lt;br /&gt;
&lt;br /&gt;
* 13 October 09: Ahmed, [[media:LTE.pdf | Long Term Evolution (LTE) - A Tutorial]]&lt;br /&gt;
&lt;br /&gt;
* 6 October 09: Cheng, [[media:Mm09.ppt | Statistical Multiplexing of VBR Video Streams]] (ACM MM 09 talk)&lt;br /&gt;
&lt;br /&gt;
* 22 September 09: Som, Video Streaming over WiMAX Networks&lt;br /&gt;
&lt;br /&gt;
* 8 September 09: Cong, Minimizing Round-Trip Time in Online Games&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Summer 2009==&lt;br /&gt;
&lt;br /&gt;
* 18 August 09: Mohammad and Cong: 30 min each. Present their Directed Reading projects. &lt;br /&gt;
&lt;br /&gt;
* 14 July 09: Cheng, [[media:wimaxTV.pptx | Broadcasting Variable-Bit-Rate Videos in 802.16e-Like Mobile Networks]] &lt;br /&gt;
&lt;br /&gt;
*  7 July 09:  Yi&lt;br /&gt;
&lt;br /&gt;
* 26 June 09: Ahmed &lt;br /&gt;
&lt;br /&gt;
* 5 June 09: '''Canceled''' (Mohamed attending NOSSDAV'09)&lt;br /&gt;
&lt;br /&gt;
* 29 May 09: Kianoosh, End-to-End Secure Delivery of Scalable Video Streams &lt;br /&gt;
&lt;br /&gt;
* 22 May 09: Cong, [[media:wimax.pptx| Multimedia Streaming over WiMAX Networks]]&lt;br /&gt;
&lt;br /&gt;
* 8 May 09:  Kianoosh,   Analysis of Authentication Schemes for Nonscalable Video Streams&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Spring 2009 ==&lt;br /&gt;
&lt;br /&gt;
* 17 Apr 09: [[media:infocom09.pptx|Cheng (practice your infocom presentation)]]&lt;br /&gt;
&lt;br /&gt;
* 27 March 09: Andreas Berger, [[media:Nsl_vancouver.odp | Network-based Detection of SIP Bots]]&lt;br /&gt;
&lt;br /&gt;
* 27 Feb 09: Shabnam and Yuanbin&lt;br /&gt;
&lt;br /&gt;
* 23 Jan 09: Cheng (rehearse your PhD proposal) and Kianoosh&lt;/div&gt;</summary>
		<author><name>Feig</name></author>
	</entry>
</feed>