Difference between revisions of "Private:progress-aldhamin"

From NMSL
 
(33 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
= '''Summer 2013''' =
 +
* '''Courses:
 +
** CMPT 880: Large-scale Multimedia Systems and Cloud Computing'''
 +
 +
== June 2013 ==
 +
* ''June 24:''
 +
** A memory issue related to the action table sparse matrix has been detected and fixed. With this problem resolved, the algorithm now converges smoothly (mostly towards 0).
 +
** Working on different experiments to try different combination of block size and sampling interval to find where the algorithm works best. So far, the experiments I have performed show that the larger the block size with reasonable sampling period, the higher the hit ration.
 +
 +
= '''Spring 2013''' =
 +
* '''Courses: none'''
 +
 +
== April 2013 ==
 +
* ''April 28:''
 +
** Building a simple simulation program to see the effect of the algorithm. The program will measure the SSD hit according to the action performed by the algorithm.
 +
** The output of the algorithm (Action Table) will be used for the simulation.
 +
 +
* ''April 15:''
 +
** We have been able to trace what's causing the application to behave in such a strange way with regard to the table size. In our algorithm we generate random numbers in two places:
 +
1. simulating next state (random between 0 and 1).
 +
 +
2. random action (0s and 1s).
 +
 +
Aside from the main program, I generated two lists of random numbers, one for each of the above. Then, I used these lists in the main program as follows:
 +
First, I used both of those lists in the main program and it always converged at the 1000 iterations regardless of the table size. I set it to 2M, 3M, 4M and it always converges at 1000 iterations. Second, I used the first list of random numbers and used it to generate simulating next state, and let the random action to be generated on the fly within in the main program. When doing this, the application termination iteration is dependant on the table size; when I set the table size to 4M it converges at 1000 iterations and when I set it to 3M it didn't converges even at the 3M iteration. Third, I used the second list in the main program for the random action and let the simulation generate random number on the fly. When doing this, the program always terminates at the 1000 iteration, similar to the first case.
 +
 +
This observation lead me to conclude that the problem is related to random number generator for the random action. Let's meet tomorrow to discuss this and see what should we do next.
 +
 +
* ''April 10:''
 +
** We're witnessing a problem that's not directly related to the algorithm, but has a crucial impact on it. It's related to the value table size that we specify on the implementation. In some cases we need to increase the value table size to accommodate for more calculations, but when we do so the algorithm executes in a strange manner which leads the function to converges to the threshold value sooner. For example, in one of the cases we have table size set to 2M but it wasn't enough for the function to converges. However, when we increased the size into 3M, the function converges very soon say at 9000 iteration or below at other experiments. I couldn't figure out what exactly is causing this issue. But, one thing that I noticed on my recent experiments is that it occurs with the extended version of the algorithm (3-most-frequent blocks). I'm striving to resolve this issue ASAP.
 +
 +
* ''April 9:''
 +
** We've been working on finding a suitable composition for both the chunk size as well as sampling period. And we think that we some good selections based on some empirical observations.
 +
* We've collected a larger workload for the Postmark benchmark and we're using this as an input to our algorithms. The new workload represents the I/O traces for 6,500,000 Read/Write transactions. We're now testing the algorithm with this new workload and hopefully we can draw some good conclusions to go to next step.
 +
* Two crucial bugs in the code was captured and has been fixed. The first was related to calculating the cumulative probability and simulating the next state of the algorithm, whereas the second was to do with generating a row-major transition matrix instead of a column matrix. This also results on performing wrong calculations for the next state.
 +
 +
== February 2013 ==
 +
* ''February 25:''
 +
* We faced a memory issue when testing one of the cases. It took us time to debug and find a permanent fix for it. The code is now running fine and we have verified the function convergence for several cases.
 +
* We're half-way on the implementation of an extended version of the algorithm that considers the 3-most-frequent accessed blocks.
 +
 +
* ''February 14:''
 +
* The off-line version of our dynamic programming algorithm is done. We've tested few cases and it shows some good results in terms. The tests performed included different "chunk" sizes to measure the time it takes for the evaluation function to converge to some threshold we can set. Overall, the algorithm seems to perform as expected. More parameter-tuning will be performed later to ensure it meets the problem requirements.
 +
* The off-line version of the algorithm is designed with a basic state space of Markov decision process. It only considers the most frequently accessed block in time/sampling period. Now, we are working on extending the algorithm. First, we are looking at a more complex scenario  for the state space, where we explore the three most frequently accessed blocks. We think this step will enhance the outcome of the algorithm. We expect a better learning and convergence of the algorithm.
 +
* We'll be working on testing the algorithm on a real scenario using some of the benchmarks available. Currently, we've Postmark benchmark, but we'll be looking at couple more excellent benchmarks to test our algorithm.
 +
 +
----
 +
 +
== January 2013 ==
 +
* "January 31":
 +
* We were able to implement the first phase of the real-time dynamic programming algorithm based on Markov Decision Process. We will start the next phase which is working on fine-tuning the MDP tuple (S, T, A, R).
 +
 +
* ''January 6'':
 +
* The development has already started. I'm using Java as I'm mostly confortable with. I've built the Markov model. The output of the model is the transition matrix T and the state space S. Now I'm working on implementing the second part of the algorithm, the dynamic programming algorithm. I've found a working open-source library that provide us with sparse matrix implementation but the issue now is that it only provides "double" primitive type which results in loss of precision (very important). I'm trying different workarounds to resolve this and proceed with the implementation.
 +
* An email sent earlier last week, but I'm yet to receive an answer from Hystor authors regarding our request of some binaries and/or APIs for our system.
 +
 +
----
 +
 +
* Since we need a sparse matrix to represent the transition matrix (T), I've started with this. My implementation didn't work well, and I found out that I'm missing some important aspects of the sparse implementation. I've found some available good implementations on the web, which I'm looking at to choose from an efficient (and possibly supported) implementation.
 +
* Second, the State space (S) includes all possible blocks on disk, i.e. all disk blocks, I'm trying to find an efficient way where I can get all blocks addresses on disk.
 +
* Third, when I look at the I/O traces I've collected I can see that it includes all I/O type request, not only read and write, but also queued, merged and other type of I/O requests performed by the system. I guess we need to filter them to include only read/write as this of interest to us. Am I right?
 +
 +
= '''Fall 2012''' =
 +
* '''Courses: none'''
 +
 +
== November 2012 ==
 +
* * '''Report''' : [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/aldhamin/reports/Nov12/reportTemplate.pdf]
 +
 +
== October 2012 ==
 +
* '''Report''' :'
 +
* We've used the Postmark benchmark (developed by NetApp) to create a random workload and created the Markov model for it on MatLab.
 +
* Searching for open-source and data-intensive high-performance computing application to work with.
 +
* I tried many open-source applications I've found so far. Some of which built and compiled correctly and some didn't compile or that they require huge computational power to run.
 +
* We've chosen the Parallel Ocean Program (POP) application to work with. It's an open-source MPI ocean circulation model.
 +
* Using my Linux machine, I ran couple of experiments to collect the I/O log of the application.
 +
* Using MatLab, we've created the Markov model of the collected I/O logs and we're working on the analysis right now.
 +
 +
== September 2012 ==
 +
* Have been working with Dr. Hegazy on developing and refining the research problem.
 +
* Reading and understanding the Markov Decision Process MDP; the Models andtheir orders of magnitude.
 +
* Working on defining the MDP model of our problem.
 +
 +
 +
= '''Summer 2012''' =
 +
* '''Courses: none'''
 +
* Working on my research...
 +
 +
== June'12 ==
 +
* '''Report''' : [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/aldhamin/reports/June12/reportTemplate.pdf]
 +
* ''June 21'': I compared available Linux tools and I believe that collectl is the best to serve our need. The tool is very powerful and can even graph the data directly, but that will limit our ability to playback and message raw data again. Now I'm waiting for the approval from application support @ Aramco to add it in the job script. Hopefully I can get an answer very soon.
 +
* ''June 18'': Learning about "collectl", "sar" and "ganglia" Linux tools for system I/O logs. I need root/sudo access into my Linux machine to be able to install and test each one of them.
 +
* ''June 11'': We experimented few Linux tools together on some benchmarks to see if we will be able to get some useful data. I think we do. But, my colleague at Saudi Aramco, who's working with me,  is out of office for few days attending ICS12 in Hamburg, Germany. Because the clusters are not exposed to public network, I cannot have access to them remotely and so I'll have to wait for him to comeback to resume the work.
 +
 +
* I've been in close contact with Expec Computer Center (ECC) to collect I/O data about the application. We're facing some technical challenges to achieve this. First, the application's clusters are connected to two different high performance storage filers (Panasas and NetAppGX). When trying to collect data from the filers side we cannot tell which job is the owner of a particular I/O operation. That said, we've been trying find an alternative way to collect the data from the node side such that we automate it in the job script, so all nodes participate and will have a file of stats at the end of the jobs that can be correlated , manipulated. Linux provides several tools that can help on this but we want the least expensive (i.e. it has a minimal affect the performance of the running job). I'm working with my colleagues there from both HPC and Data Storage teams on the task.
 +
* Learning and practising Intel Open Storage toolkit.
 +
* learning and practising blktrace tool.
 +
 +
== May'12 ==
 +
* Report: [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/aldhamin/reports/May/reportTemplate.pdf]
 +
* Doing a final review of the "Hystor" paper and organizing my thoughts
 +
 +
 
= '''Spring 2012''' =
 
= '''Spring 2012''' =
 
* '''Courses:'''
 
* '''Courses:'''
Line 4: Line 106:
 
** CMPT 886: Special Topics in Operating Systems
 
** CMPT 886: Special Topics in Operating Systems
  
== February ==
+
== March'12 ==
 +
* Report: [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/aldhamin/reports/Mar12/reportTemplate.pdf]
 +
* Working on my 886 project and there is a high chance for a conference publication.
 +
 
 +
== February'12 ==
 
* Report: [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/aldhamin/reports/Feb12/doc/reportTemplate.pdf]
 
* Report: [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/aldhamin/reports/Feb12/doc/reportTemplate.pdf]
* Summarized common problems in SSD-based applications
+
* Summarized common problems in SSD-based applications.
 +
* Working on my 886 term project.
 +
* Submitted the final draft and my talk to ISMS'12 committee. My paper was accepted in the conference, which took place between Feb 8 - 10.
  
== January 27 ==
+
== January'12 ==
 
* Report: [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/aldhamin/reports/Jan12/doc/reportTemplate.pdf]
 
* Report: [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-members/aldhamin/reports/Jan12/doc/reportTemplate.pdf]
 
* Continued my survey into more researches to get a wider horizon into classifying the studies on this prospect.
 
* Continued my survey into more researches to get a wider horizon into classifying the studies on this prospect.

Latest revision as of 14:50, 24 June 2013

Summer 2013

  • Courses:
    • CMPT 880: Large-scale Multimedia Systems and Cloud Computing

June 2013

  • June 24:
    • A memory issue related to the action table sparse matrix has been detected and fixed. With this problem resolved, the algorithm now converges smoothly (mostly towards 0).
    • Working on different experiments to try different combination of block size and sampling interval to find where the algorithm works best. So far, the experiments I have performed show that the larger the block size with reasonable sampling period, the higher the hit ration.

Spring 2013

  • Courses: none

April 2013

  • April 28:
    • Building a simple simulation program to see the effect of the algorithm. The program will measure the SSD hit according to the action performed by the algorithm.
    • The output of the algorithm (Action Table) will be used for the simulation.
  • April 15:
    • We have been able to trace what's causing the application to behave in such a strange way with regard to the table size. In our algorithm we generate random numbers in two places:

1. simulating next state (random between 0 and 1).

2. random action (0s and 1s).

Aside from the main program, I generated two lists of random numbers, one for each of the above. Then, I used these lists in the main program as follows: First, I used both of those lists in the main program and it always converged at the 1000 iterations regardless of the table size. I set it to 2M, 3M, 4M and it always converges at 1000 iterations. Second, I used the first list of random numbers and used it to generate simulating next state, and let the random action to be generated on the fly within in the main program. When doing this, the application termination iteration is dependant on the table size; when I set the table size to 4M it converges at 1000 iterations and when I set it to 3M it didn't converges even at the 3M iteration. Third, I used the second list in the main program for the random action and let the simulation generate random number on the fly. When doing this, the program always terminates at the 1000 iteration, similar to the first case.

This observation lead me to conclude that the problem is related to random number generator for the random action. Let's meet tomorrow to discuss this and see what should we do next.

  • April 10:
    • We're witnessing a problem that's not directly related to the algorithm, but has a crucial impact on it. It's related to the value table size that we specify on the implementation. In some cases we need to increase the value table size to accommodate for more calculations, but when we do so the algorithm executes in a strange manner which leads the function to converges to the threshold value sooner. For example, in one of the cases we have table size set to 2M but it wasn't enough for the function to converges. However, when we increased the size into 3M, the function converges very soon say at 9000 iteration or below at other experiments. I couldn't figure out what exactly is causing this issue. But, one thing that I noticed on my recent experiments is that it occurs with the extended version of the algorithm (3-most-frequent blocks). I'm striving to resolve this issue ASAP.
  • April 9:
    • We've been working on finding a suitable composition for both the chunk size as well as sampling period. And we think that we some good selections based on some empirical observations.
  • We've collected a larger workload for the Postmark benchmark and we're using this as an input to our algorithms. The new workload represents the I/O traces for 6,500,000 Read/Write transactions. We're now testing the algorithm with this new workload and hopefully we can draw some good conclusions to go to next step.
  • Two crucial bugs in the code was captured and has been fixed. The first was related to calculating the cumulative probability and simulating the next state of the algorithm, whereas the second was to do with generating a row-major transition matrix instead of a column matrix. This also results on performing wrong calculations for the next state.

February 2013

  • February 25:
  • We faced a memory issue when testing one of the cases. It took us time to debug and find a permanent fix for it. The code is now running fine and we have verified the function convergence for several cases.
  • We're half-way on the implementation of an extended version of the algorithm that considers the 3-most-frequent accessed blocks.
  • February 14:
  • The off-line version of our dynamic programming algorithm is done. We've tested few cases and it shows some good results in terms. The tests performed included different "chunk" sizes to measure the time it takes for the evaluation function to converge to some threshold we can set. Overall, the algorithm seems to perform as expected. More parameter-tuning will be performed later to ensure it meets the problem requirements.
  • The off-line version of the algorithm is designed with a basic state space of Markov decision process. It only considers the most frequently accessed block in time/sampling period. Now, we are working on extending the algorithm. First, we are looking at a more complex scenario for the state space, where we explore the three most frequently accessed blocks. We think this step will enhance the outcome of the algorithm. We expect a better learning and convergence of the algorithm.
  • We'll be working on testing the algorithm on a real scenario using some of the benchmarks available. Currently, we've Postmark benchmark, but we'll be looking at couple more excellent benchmarks to test our algorithm.

January 2013

  • "January 31":
  • We were able to implement the first phase of the real-time dynamic programming algorithm based on Markov Decision Process. We will start the next phase which is working on fine-tuning the MDP tuple (S, T, A, R).
  • January 6:
  • The development has already started. I'm using Java as I'm mostly confortable with. I've built the Markov model. The output of the model is the transition matrix T and the state space S. Now I'm working on implementing the second part of the algorithm, the dynamic programming algorithm. I've found a working open-source library that provide us with sparse matrix implementation but the issue now is that it only provides "double" primitive type which results in loss of precision (very important). I'm trying different workarounds to resolve this and proceed with the implementation.
  • An email sent earlier last week, but I'm yet to receive an answer from Hystor authors regarding our request of some binaries and/or APIs for our system.

  • Since we need a sparse matrix to represent the transition matrix (T), I've started with this. My implementation didn't work well, and I found out that I'm missing some important aspects of the sparse implementation. I've found some available good implementations on the web, which I'm looking at to choose from an efficient (and possibly supported) implementation.
  • Second, the State space (S) includes all possible blocks on disk, i.e. all disk blocks, I'm trying to find an efficient way where I can get all blocks addresses on disk.
  • Third, when I look at the I/O traces I've collected I can see that it includes all I/O type request, not only read and write, but also queued, merged and other type of I/O requests performed by the system. I guess we need to filter them to include only read/write as this of interest to us. Am I right?

Fall 2012

  • Courses: none

November 2012

  • * Report : [1]

October 2012

  • Report :'
  • We've used the Postmark benchmark (developed by NetApp) to create a random workload and created the Markov model for it on MatLab.
  • Searching for open-source and data-intensive high-performance computing application to work with.
  • I tried many open-source applications I've found so far. Some of which built and compiled correctly and some didn't compile or that they require huge computational power to run.
  • We've chosen the Parallel Ocean Program (POP) application to work with. It's an open-source MPI ocean circulation model.
  • Using my Linux machine, I ran couple of experiments to collect the I/O log of the application.
  • Using MatLab, we've created the Markov model of the collected I/O logs and we're working on the analysis right now.

September 2012

  • Have been working with Dr. Hegazy on developing and refining the research problem.
  • Reading and understanding the Markov Decision Process MDP; the Models andtheir orders of magnitude.
  • Working on defining the MDP model of our problem.


Summer 2012

  • Courses: none
  • Working on my research...

June'12

  • Report : [2]
  • June 21: I compared available Linux tools and I believe that collectl is the best to serve our need. The tool is very powerful and can even graph the data directly, but that will limit our ability to playback and message raw data again. Now I'm waiting for the approval from application support @ Aramco to add it in the job script. Hopefully I can get an answer very soon.
  • June 18: Learning about "collectl", "sar" and "ganglia" Linux tools for system I/O logs. I need root/sudo access into my Linux machine to be able to install and test each one of them.
  • June 11: We experimented few Linux tools together on some benchmarks to see if we will be able to get some useful data. I think we do. But, my colleague at Saudi Aramco, who's working with me, is out of office for few days attending ICS12 in Hamburg, Germany. Because the clusters are not exposed to public network, I cannot have access to them remotely and so I'll have to wait for him to comeback to resume the work.
  • I've been in close contact with Expec Computer Center (ECC) to collect I/O data about the application. We're facing some technical challenges to achieve this. First, the application's clusters are connected to two different high performance storage filers (Panasas and NetAppGX). When trying to collect data from the filers side we cannot tell which job is the owner of a particular I/O operation. That said, we've been trying find an alternative way to collect the data from the node side such that we automate it in the job script, so all nodes participate and will have a file of stats at the end of the jobs that can be correlated , manipulated. Linux provides several tools that can help on this but we want the least expensive (i.e. it has a minimal affect the performance of the running job). I'm working with my colleagues there from both HPC and Data Storage teams on the task.
  • Learning and practising Intel Open Storage toolkit.
  • learning and practising blktrace tool.

May'12

  • Report: [3]
  • Doing a final review of the "Hystor" paper and organizing my thoughts


Spring 2012

  • Courses:
    • CMPT 781: Technical Communication
    • CMPT 886: Special Topics in Operating Systems

March'12

  • Report: [4]
  • Working on my 886 project and there is a high chance for a conference publication.

February'12

  • Report: [5]
  • Summarized common problems in SSD-based applications.
  • Working on my 886 term project.
  • Submitted the final draft and my talk to ISMS'12 committee. My paper was accepted in the conference, which took place between Feb 8 - 10.

January'12

  • Report: [6]
  • Continued my survey into more researches to get a wider horizon into classifying the studies on this prospect.
  • I'm reading about the impact of PCI Express and phase-change memory on this field.
  • I will utilize my research on the 781 course requirement to utilize the time for both.
  • I'm still having difficulty reading papers (in terms of the time it takes to completely finishing a paper) but it's improving with time considering that both courses I'm taken this semester do have a considerable amount of readings of that type.

Fall 2011

  • Courses:
    • CMPT-705: Design and Analysis of Algorithms
    • CMPT 880: Programming Parallel and Distributed Systems

November 14

  • Report (No updates): [7]
  • No progress regarding the research. Courses are consuming all the time I have!!


October 11

  • Report (No major updates): [8]
  • Still working on a separate report for each of the problems listed on the progress report.


September 22

  • Continuing the research on various topics of flash memory and high performance computing.
  • Five (5) general ideas have been listed.