Difference between revisions of "Private:progress-aldhamin"

From NMSL
Line 1: Line 1:
 
= '''Spring 2013''' =
 
= '''Spring 2013''' =
 
* '''Courses: none'''
 
* '''Courses: none'''
 +
 +
== April 2013 ==
 +
* ''April 9:''
 +
*We've been working on finding a suitable composition for both the chunk size as well as sampling period. And we think that we some good selections based on some empirical observations.
 +
* We've collected a larger workload for the Postmark benchmark and we're using this as an input to our algorithms. The new workload represents the I/O traces for 6,500,000 Read/Write transactions. We're now testing the algorithm with this new workload and hopefully we can draw some good conclusions to go to next step.
 +
* Two crucial bugs in the code was captured and has been fixed. The first was related to calculating the cumulative probability and simulating the next state of the algorithm, whereas the second was to do with generating a row-major transition matrix instead of a column matrix. This also results on performing wrong calculations for the next state.
  
 
== February 2013 ==
 
== February 2013 ==

Revision as of 20:55, 8 April 2013

Spring 2013

  • Courses: none

April 2013

  • April 9:
  • We've been working on finding a suitable composition for both the chunk size as well as sampling period. And we think that we some good selections based on some empirical observations.
  • We've collected a larger workload for the Postmark benchmark and we're using this as an input to our algorithms. The new workload represents the I/O traces for 6,500,000 Read/Write transactions. We're now testing the algorithm with this new workload and hopefully we can draw some good conclusions to go to next step.
  • Two crucial bugs in the code was captured and has been fixed. The first was related to calculating the cumulative probability and simulating the next state of the algorithm, whereas the second was to do with generating a row-major transition matrix instead of a column matrix. This also results on performing wrong calculations for the next state.

February 2013

  • February 25:
  • We faced a memory issue when testing one of the cases. It took us time to debug and find a permanent fix for it. The code is now running fine and we have verified the function convergence for several cases.
  • We're half-way on the implementation of an extended version of the algorithm that considers the 3-most-frequent accessed blocks.
  • February 14:
  • The off-line version of our dynamic programming algorithm is done. We've tested few cases and it shows some good results in terms. The tests performed included different "chunk" sizes to measure the time it takes for the evaluation function to converge to some threshold we can set. Overall, the algorithm seems to perform as expected. More parameter-tuning will be performed later to ensure it meets the problem requirements.
  • The off-line version of the algorithm is designed with a basic state space of Markov decision process. It only considers the most frequently accessed block in time/sampling period. Now, we are working on extending the algorithm. First, we are looking at a more complex scenario for the state space, where we explore the three most frequently accessed blocks. We think this step will enhance the outcome of the algorithm. We expect a better learning and convergence of the algorithm.
  • We'll be working on testing the algorithm on a real scenario using some of the benchmarks available. Currently, we've Postmark benchmark, but we'll be looking at couple more excellent benchmarks to test our algorithm.

January 2013

  • "January 31":
  • We were able to implement the first phase of the real-time dynamic programming algorithm based on Markov Decision Process. We will start the next phase which is working on fine-tuning the MDP tuple (S, T, A, R).
  • January 6:
  • The development has already started. I'm using Java as I'm mostly confortable with. I've built the Markov model. The output of the model is the transition matrix T and the state space S. Now I'm working on implementing the second part of the algorithm, the dynamic programming algorithm. I've found a working open-source library that provide us with sparse matrix implementation but the issue now is that it only provides "double" primitive type which results in loss of precision (very important). I'm trying different workarounds to resolve this and proceed with the implementation.
  • An email sent earlier last week, but I'm yet to receive an answer from Hystor authors regarding our request of some binaries and/or APIs for our system.

  • Since we need a sparse matrix to represent the transition matrix (T), I've started with this. My implementation didn't work well, and I found out that I'm missing some important aspects of the sparse implementation. I've found some available good implementations on the web, which I'm looking at to choose from an efficient (and possibly supported) implementation.
  • Second, the State space (S) includes all possible blocks on disk, i.e. all disk blocks, I'm trying to find an efficient way where I can get all blocks addresses on disk.
  • Third, when I look at the I/O traces I've collected I can see that it includes all I/O type request, not only read and write, but also queued, merged and other type of I/O requests performed by the system. I guess we need to filter them to include only read/write as this of interest to us. Am I right?

Fall 2012

  • Courses: none

November 2012

  • * Report : [1]

October 2012

  • Report :'
  • We've used the Postmark benchmark (developed by NetApp) to create a random workload and created the Markov model for it on MatLab.
  • Searching for open-source and data-intensive high-performance computing application to work with.
  • I tried many open-source applications I've found so far. Some of which built and compiled correctly and some didn't compile or that they require huge computational power to run.
  • We've chosen the Parallel Ocean Program (POP) application to work with. It's an open-source MPI ocean circulation model.
  • Using my Linux machine, I ran couple of experiments to collect the I/O log of the application.
  • Using MatLab, we've created the Markov model of the collected I/O logs and we're working on the analysis right now.

September 2012

  • Have been working with Dr. Hegazy on developing and refining the research problem.
  • Reading and understanding the Markov Decision Process MDP; the Models andtheir orders of magnitude.
  • Working on defining the MDP model of our problem.


Summer 2012

  • Courses: none
  • Working on my research...

June'12

  • Report : [2]
  • June 21: I compared available Linux tools and I believe that collectl is the best to serve our need. The tool is very powerful and can even graph the data directly, but that will limit our ability to playback and message raw data again. Now I'm waiting for the approval from application support @ Aramco to add it in the job script. Hopefully I can get an answer very soon.
  • June 18: Learning about "collectl", "sar" and "ganglia" Linux tools for system I/O logs. I need root/sudo access into my Linux machine to be able to install and test each one of them.
  • June 11: We experimented few Linux tools together on some benchmarks to see if we will be able to get some useful data. I think we do. But, my colleague at Saudi Aramco, who's working with me, is out of office for few days attending ICS12 in Hamburg, Germany. Because the clusters are not exposed to public network, I cannot have access to them remotely and so I'll have to wait for him to comeback to resume the work.
  • I've been in close contact with Expec Computer Center (ECC) to collect I/O data about the application. We're facing some technical challenges to achieve this. First, the application's clusters are connected to two different high performance storage filers (Panasas and NetAppGX). When trying to collect data from the filers side we cannot tell which job is the owner of a particular I/O operation. That said, we've been trying find an alternative way to collect the data from the node side such that we automate it in the job script, so all nodes participate and will have a file of stats at the end of the jobs that can be correlated , manipulated. Linux provides several tools that can help on this but we want the least expensive (i.e. it has a minimal affect the performance of the running job). I'm working with my colleagues there from both HPC and Data Storage teams on the task.
  • Learning and practising Intel Open Storage toolkit.
  • learning and practising blktrace tool.

May'12

  • Report: [3]
  • Doing a final review of the "Hystor" paper and organizing my thoughts


Spring 2012

  • Courses:
    • CMPT 781: Technical Communication
    • CMPT 886: Special Topics in Operating Systems

March'12

  • Report: [4]
  • Working on my 886 project and there is a high chance for a conference publication.

February'12

  • Report: [5]
  • Summarized common problems in SSD-based applications.
  • Working on my 886 term project.
  • Submitted the final draft and my talk to ISMS'12 committee. My paper was accepted in the conference, which took place between Feb 8 - 10.

January'12

  • Report: [6]
  • Continued my survey into more researches to get a wider horizon into classifying the studies on this prospect.
  • I'm reading about the impact of PCI Express and phase-change memory on this field.
  • I will utilize my research on the 781 course requirement to utilize the time for both.
  • I'm still having difficulty reading papers (in terms of the time it takes to completely finishing a paper) but it's improving with time considering that both courses I'm taken this semester do have a considerable amount of readings of that type.

Fall 2011

  • Courses:
    • CMPT-705: Design and Analysis of Algorithms
    • CMPT 880: Programming Parallel and Distributed Systems

November 14

  • Report (No updates): [7]
  • No progress regarding the research. Courses are consuming all the time I have!!


October 11

  • Report (No major updates): [8]
  • Still working on a separate report for each of the problems listed on the progress report.


September 22

  • Continuing the research on various topics of flash memory and high performance computing.
  • Five (5) general ideas have been listed.