Private:pCache progress

From NMSL

Possible Research Problems

  1. Data localizer: This can be complementary to the caching mechanism, or be a standalone system. Using the traffic identifier in pCache, we can extract, keep track, and manipulate the piece availability info in BitTorrent network. See Jin's PV 2007 paper, he proposed to allow only one copy of each data piece to pass through the ISP-PoP link. Similarly, as we sit at the AS-Internet boundary, we can control the availability info so that each piece only passes through us once. To an extreme, we can just let peers to cache for us. This may require us to manipulate the replies that contain the sender list as well, so that peers inside our AS can know each other. Manipulating the reply messages also allows us to segment our network into a few groups. For example, we can prevent any computers in group A to communicate with any computers in group B (e.g., because the link between A and B is expensive).
  2. Generalize the traffic identifier into a (in-kernel) TCP connection classifier that works with TProxy. A connection classifier refers to an inspector that check the first few packets of each TCP connection to determine the connection type. Based on the connection type, the connection classifier may decide to perform a "reverse TCP splicing", which means breaking an established TCP connection into two to allow application (pCache) to manipulate the data. Reverse TCP splicing is more efficient, because we do not need to "translate" sequence number for non-P2P connections. Note that a connection classifier is a specialized deep packet inspector, and we might make it more efficient than DPIs. Last, part of the connection classifier can go into GPUs, similar to what was done at UW-Madison for DPIs (see here).
  3. Collect requests and compare various replacement policies. Explore the different traffic models of various P2P systems.
  4. Enable both directions of caching but control the upload rates to optimize local benefit. The local benefit is defined as: minimizing the amount of upload while maximizing the speed of downloading. I think the problem is still open, even for a single bittorrent client.