Private:pCache progress

From NMSL
Revision as of 15:31, 15 September 2008 by MediaWiki default (talk | contribs)

This page lists the possible improvements over the current pCache implementation. The items are roughly listed by priority. We also list the person-day estimation on each item, considering a good graduate student who works 9 hours a day. The estimations include unit-tests.

Implementation Tasks

To get a reliable system that can be deployed. The following items should be done:

  1. Nonvolatile storage system: To keep cached data across reboots, we need to write in-memory metainfo to the super blocks (in pCache file systems). This require save(...)/restore(...) functions for a few hashtables in-memory. This task needs 4~5 ppl-day.
  2. Proxy performance: We should integrate our code with the latest tproxy for better performance (we can ignore the tcp splicing part for now). After integration, we should quantify the performance of tproxy (by emulating a large number of P2P clients in two private subnets). If possible we can identify the bottlenecks in tproxy, and improve it. We then can contribute the code back to the community. This can be a small/side research project. TProxy integration takes 1 ppl-day. Designing/Implementing the emulation and getting a write-up on comparison and bottleneck analysis takes 5~10 ppl-day.
  3. Event-driven connection manager: We should define a stateful connection class, rewrite the connection manager into an event handler, use epoll (for network) and aio (for disk) to improve scalability. Finally, a test similar to the one in TProxy test should be performed. Designing it takes 4 ppl-day. Implementing it takes 8~10 ppl-day. Evaluating it takes 3 ppl-day, assuming we have gained experiences from evaluating TProxy.
  4. Simpler segment matching: For every incoming request, we either request it in its entirety or we don't request it at all. Current partial request code is over-complicated.This takes 1 ppl-day, but may depend on (overlap with) event-driven connection manager.
  5. Improve the compatibility: Identify the unsupported BT/Gnutella clients, and locate the root causes (which message types cause the problem). Then fix it. I imagine that this will take some time. I cannot come up with a time estimation as of now.
  6. Better logging system: We currently use a home-made logging system, but in an inconsistent way: some modules log through stderr rather than the logging system. If time permitted, we may switch to an open-source logging library similar to log4c. This takes 5~7 ppl-day, given that there are many logging statements in the system.

Possible Research Problems

  1. Generalize the traffic identifier into a (in-kernel) TCP connection classifier that works with TProxy. A connection classifier refers to an inspector that check the first few packets of each TCP connection to determine the connection type. Based on the connection type, the connection classifier may decide to perform a "reverse TCP splicing", which means breaking an established TCP connection into two to allow application (pCache) to manipulate the data. Reverse TCP splicing is more efficient, because we do not need to "translate" sequence number for non-P2P connections. Note that a connection classifier is a specialized deep packet inspector, and we might make it more efficient than DPIs. Last, part of the connection classifier can go into GPUs, similar to what was done at UW-Madison for DPIs (see here).
  2. Collect requests and compare various replacement policies. Explore the different traffic models of various P2P systems.
  3. Enable both directions of caching but control the upload rates to optimize local benefit. The local benefit is defined as: minimizing the amount of upload while maximizing the speed of downloading. I think the problem is still open, even for a single bittorrent client.

What can we ask from potential ISP operators

  1. Traffic traces. This is to simulate various replacement policy. We can implement an utility to do that (we have a partial prototype for BitTorrent).
  2. Lateron, we can ask them to perform the compatibility tests as the tests will take a lot of efforts.