Difference between revisions of "Private:pCache progress"

From NMSL
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
This page lists the possible improvements over the current pCache
 
implementation. The items are roughly listed by priority. We also list the
 
person-day estimation on each item, considering a good graduate student who
 
works 9 hours a day. The estimations include unit-tests.
 
 
= Implementation Tasks =
 
 
To get a reliable system that can be deployed. The following items should be done:
 
 
#Nonvolatile storage system: To keep cached data across reboots, we need to write in-memory metainfo to the super blocks (in pCache file systems).  This require save(...)/restore(...) functions for a few hashtables in-memory.  This task needs 4~5 ppl-day.
 
#Proxy performance: We should integrate our code with the latest tproxy for better performance (we can ignore the tcp splicing part for now). After integration, we should quantify the performance of tproxy (by emulating a large number of P2P clients in two private subnets). If possible we can identify the bottlenecks in tproxy, and improve it. We then can contribute the code back to the community. This can be a small/side research project. TProxy integration takes 1 ppl-day. Designing/Implementing the emulation and getting a write-up on comparison and bottleneck analysis takes 5~10 ppl-day.
 
#Event-driven connection manager: We should define a stateful connection class, rewrite the connection manager into an event handler, use epoll (for network) and aio (for disk) to improve scalability. Finally, a test similar to the one in TProxy test should be performed. Designing it takes 4 ppl-day. Implementing it takes 8~10 ppl-day. Evaluating it takes 3 ppl-day, assuming we have gained experiences from evaluating TProxy.
 
#Simpler segment matching: For every incoming request, we either request it in its entirety or we don't request it at all. Current partial request code is over-complicated.This takes 1 ppl-day, but may depend on (overlap with) event-driven connection manager.
 
#Improve the compatibility: Identify the unsupported BT/Gnutella clients, and locate the root causes (which message types cause the problem). Then fix it.  I imagine that this will take some time. I cannot come up with a time estimation as of now.
 
#Better logging system: We currently use a home-made logging system, but in an inconsistent way: some modules log through stderr rather than the logging system. If time permitted, we may switch to an open-source logging library similar to log4c. This takes 5~7 ppl-day, given that there are many logging statements in the system.
 
  
 
= Possible Research Problems =
 
= Possible Research Problems =
Line 21: Line 6:
 
#Collect requests and compare various replacement policies. Explore the different traffic models of various P2P systems.
 
#Collect requests and compare various replacement policies. Explore the different traffic models of various P2P systems.
 
#Enable both directions of caching but control the upload rates to optimize local benefit. The local benefit is defined as: minimizing the amount of upload while maximizing the speed of downloading. I think the problem is still open, even for a single bittorrent client.
 
#Enable both directions of caching but control the upload rates to optimize local benefit. The local benefit is defined as: minimizing the amount of upload while maximizing the speed of downloading. I think the problem is still open, even for a single bittorrent client.
 
= What can we ask from potential ISP operators =
 
 
#Traffic traces. This is to simulate various replacement policy. We can implement an utility to do that (we have a partial prototype for BitTorrent).
 
#Lateron, we can ask them to perform the compatibility tests as the tests will take a lot of efforts.
 

Latest revision as of 11:38, 17 September 2008

Possible Research Problems

  1. Data localizer: This can be complementary to the caching mechanism, or be a standalone system. Using the traffic identifier in pCache, we can extract, keep track, and manipulate the piece availability info in BitTorrent network. See Jin's PV 2007 paper, he proposed to allow only one copy of each data piece to pass through the ISP-PoP link. Similarly, as we sit at the AS-Internet boundary, we can control the availability info so that each piece only passes through us once. To an extreme, we can just let peers to cache for us. This may require us to manipulate the replies that contain the sender list as well, so that peers inside our AS can know each other. Manipulating the reply messages also allows us to segment our network into a few groups. For example, we can prevent any computers in group A to communicate with any computers in group B (e.g., because the link between A and B is expensive).
  2. Generalize the traffic identifier into a (in-kernel) TCP connection classifier that works with TProxy. A connection classifier refers to an inspector that check the first few packets of each TCP connection to determine the connection type. Based on the connection type, the connection classifier may decide to perform a "reverse TCP splicing", which means breaking an established TCP connection into two to allow application (pCache) to manipulate the data. Reverse TCP splicing is more efficient, because we do not need to "translate" sequence number for non-P2P connections. Note that a connection classifier is a specialized deep packet inspector, and we might make it more efficient than DPIs. Last, part of the connection classifier can go into GPUs, similar to what was done at UW-Madison for DPIs (see here).
  3. Collect requests and compare various replacement policies. Explore the different traffic models of various P2P systems.
  4. Enable both directions of caching but control the upload rates to optimize local benefit. The local benefit is defined as: minimizing the amount of upload while maximizing the speed of downloading. I think the problem is still open, even for a single bittorrent client.