Difference between revisions of "Modeling and Caching of P2P Traffic"

From NMSL
Line 62: Line 62:
 
=== Detailed Design ===  
 
=== Detailed Design ===  
  
We are continuously improve our pCache implementation. The latest
+
==== Transparent Proxy and P2P Traffic Identifier ====
development branch can be browsed on our subversion page at
+
These two components reside on the gateway router. They transparently inspect
https://cs-svn.cs.surrey.sfu.ca/nsl/browser/p2pcache .
+
traffic going through the router and forward only P2P connections to pCache.
 +
Traffic that does not belong to any P2P system is processed by the router in
 +
the regular way and is not affected by the presence of pCache. This is done
 +
using the Netfilter framework for custom packet handling.  
  
 +
Netfilter defines hook points at various packet processing stages, such as
 +
PREROUTING, LOCAL_INPUT, LOCAL_OUT, FORWARD, and POSTROUTING. Netfilter allows
 +
us to register callback functions at any of these hook points to be invoked
 +
when packets reach those hook points. Netfilter is commonly used with iptables,
 +
which provides an interface to define rulesets to be applied on packets. Each
 +
ruleset has a number of classifiers (fields to be matched) and an action. To
 +
support transparent web proxy, a callback function is registered at the
 +
PREROUTING hook point to intercept packets with the destination port number set
 +
to 80 on TCP. Once intercepted, the destination IP address and port number of
 +
each packet will be changed to those of the process running the proxy cache.
 +
Thus, HTTP packets will be redirected to the web proxy cache for further
 +
processing. Although the destination IP address is lost during this
 +
redirection, the web proxy cache can know the address of the web server because
 +
HTTP 1.1 requests include the server location in the header. This simple
 +
redirection, however, does not work for proxy caches for P2P traffic, because
 +
the address of the remote peer is not included in the request messages, and the
 +
proxy server cannot find the remote peer. Hence, packets need to be redirected
 +
to the proxy process without changing their destination IP and port numbers.
 +
We notice that Netfilter supports very flexible, complicated, forwarding
 +
rulesets. This allows us to run the gateway router and pCache as two processes
 +
on the same machines, or to run them on two separate machines.
 +
 +
We implement our transparent proxy based on the TPROXY project. In
 +
implementation, the proxy process creates a listening socket. A callback
 +
function is registered at the PREROUTING hook point to intercept packets that
 +
might be of interest to the proxy process. This function sets a pointer to the
 +
listening socket in the structure containing the packet itself. It also sets a
 +
flag in the packet. The route lookup procedure is modified to check the flag
 +
bit. If it is set, the packet is sent to the local IP stack, even though its
 +
destination is an external IP address. Using the pointer in the packet
 +
structure, the packet is then redirected to the listening socket of the proxy.
 +
A new (connected) socket is created between the proxy and the internal host.
 +
This new socket uses the IP address and port number of the external host, not
 +
of the proxy process. Another socket is created between the proxy and the
 +
external host; it uses the IP address and port number of the internal host. Two
 +
new entries are added to the socket table for these two sockets. Traffic
 +
packets passing through the gateway router are checked at the PREROUTING hook
 +
point to see whether they match any of these sockets.
 +
 +
The P2P Traffic Identifier determines whether a connection belongs to any P2P
 +
system known to the cache. This is done by comparing a number of bytes from the
 +
connection stream against known P2P application signatures. We have implemented
 +
identifiers for BitTorrent and Gnutella, which are the most common P2P systems
 +
nowadays. We can readily support traffic identification for other protocols by
 +
adding new identifiers to the cache.
 +
 +
Since P2P systems use dynamic ports, the proxy process may initially intercept
 +
some connections that do not belong to P2P systems. This can only be discovered
 +
after inspecting a few packets using the P2P Traffic Identification module.
 +
Each intercepted connection is split into a pair of connections, and all
 +
packets have to go through the proxy process. This imposes overhead on the
 +
proxy cache and may increase the end-to-end delay of the connections. To reduce
 +
this overhead, we splice each pair of non-P2P connections using TCP splicing
 +
techniques, which are usually used in layer-7 switching. We modify our
 +
Netfilter callback function to support TCP splicing as well. Our implementation
 +
is similar to a layer-7 switching project that is largely abandoned. For
 +
spliced connections, the sockets in the proxy process are closed and packets
 +
are relayed in the kernel stack instead of passing them up to the proxy process
 +
in the application layer. Implementation details such as adjusting sequence
 +
numbers in the spliced TCP connections had to be addressed, because these two
 +
TCP connections start from different initial sequence numbers.
 +
 +
 +
==== Connection Manager ====
 +
When a connection is identified as belonging to a P2P system, it is passed to
 +
the Connection Manager, which coordinates different components of pCache to
 +
store and serve requests from this connection. For example, once seeing a
 +
request message, the Connection Manager calls a lookup function in the Storage
 +
System Manager to determine whether this request can be fulfilled with
 +
previously cached data either in memory or on disk. In addition, if only parts
 +
of the requested data are available in the cache, the Connection Manager sends
 +
a message to the actual external peer to request the missing portion of data.
 +
It then assembles this data with cached data in a protocol-specific message and
 +
sends it to the client.
 +
 +
Since each peer may open many connections to request a single file and pCache
 +
is supposed to serve a large number of peers, efficient support of concurrent
 +
connections is important. A simple solution for concurrency is to use
 +
multithreading, where a thread is created to handle each new connection. This
 +
is simple because the states of connections are isolated from each other and
 +
processed by identical threads.  The downside of this solution is increased
 +
overhead in terms of creation/deletion, scheduling, and context switching of
 +
threads. Some of these overheads can significantly be reduced using user-level
 +
thread libraries such as Capriccio which is reported to scale to a hundred
 +
thousands threads. Our current implementation of pCache uses multithreading.
 +
 +
More sophisticated solutions to support efficient concurrency that employ
 +
non-blocking (asynchronous) I/O operations can also be used with pCache. For
 +
example, the single-process event-driven model uses only one thread to detect
 +
events on multiple sockets using non-blocking socket operations such as epoll
 +
and select. These events are then scheduled for an event handler to process
 +
them. Unlike socket operations, asynchronous disk operations are either poorly
 +
supported or not existing on most UNIX systems. To mitigate this problem,
 +
multi-process event-driven models have been proposed, which can be roughly
 +
categorized into asymmetric and symmetric models. The asymmetric models create
 +
a single process to handle events from the network sockets, and multiple
 +
processes to handle disk operations. In contrast, the symmetric models create
 +
multiple event schedulers that can handle both disk and network events. The
 +
above concurrency models have been proposed and used mostly for web servers.
 +
Unlike our pCache, web servers may not need to provide full transparency and
 +
connection splicing, which could impact these concurrency models. We are
 +
currently designing and implementing new concurrency models that are more
 +
suitable for P2P proxy caching systems.
 +
 +
 +
==== Storage System Manager ====
 +
 +
We propose a new storage management system optimized for P2P traffic.  The
 +
proposed storage system contains three modules: in-memory structures, block
 +
allocation method, and replacement policy. The in-memory structures contain
 +
metadata to support storing and serving byte ranges of objects, and memory
 +
buffers to reduce disk I/O operations. The block allocation method organizes
 +
the layout of data on the disk. The replacement policy decides which segments
 +
of objects to evict from the cache in order to make room for a new requested
 +
segment.
 +
 +
Two structures are maintained in memory: metadata and page buffers. The
 +
metadata is a two-level lookup table designed to enable efficient segment
 +
lookups. The first level is a hash table keyed on object IDs; collisions are
 +
resolved using common chaining techniques. Every entry points to the second
 +
level of the table, which is a set of cached segments belonging to the same
 +
object. Every segment entry consists of a few fields, which includes Offset for
 +
the absolute segment location within the object and RefCnt for how many
 +
connections are currently using this segment.  RefCnt is used to prevent
 +
evicting a buffer page if there are connections currently using it. The set of
 +
cached segments is implemented as a balanced (redblack) binary tree, and it
 +
sorted based on the Offset field. Segments inserted into the cached segment set
 +
are adjusted to be mutually disjoint. This ensures that the same data is never
 +
stored more than once in the cache. Using this structure, partial hits can be
 +
found in at most <math>O(log S)</math> steps, where <math>S</math> is the
 +
number of segments in the object. This is done by searching on the offset
 +
field. Segment insertions and deletions are done in logarithmic steps. Notice
 +
that segments stored in the set are not necessarily contiguous. 
 +
 +
The second part of the in-memory structures is the page buffers. Page buffers
 +
are used to reduce disk I/O operations as well as to perform segment merging.
 +
As shown in Fig. 3, we define multiple sizes of page buffers.  We pre-allocate
 +
these pages in memory to avoid processing overhead caused by memory allocation
 +
and deallocation. We maintain unoccupied pages of the same size in the same
 +
free-page list. If peers request segments that are in the buffers, they are
 +
served from memory and no disk I/O operations are issued. If the requested
 +
segments are on the disk, they need to be swapped in some free memory buffers.
 +
When all free buffers are used up, the least popular data in some of the
 +
buffers are swapped out to the disk if this data has been modified since it was
 +
brought in memory, and it is overwritten otherwise.
 +
 +
Another benefit of having memory pages is for anti-interleaving.  Since the
 +
cache is expected to receive many requests issued by clients at the same time.
 +
Therefore, segments of different objects will be multiplexed.  That is,
 +
segments of object <math>x</math> could be interleaved with segments of object
 +
<math>y</math>.  Therefore, an anti-interleaving scheme is needed before
 +
segments are swapped out to the disk. We propose to merge neighboring segments
 +
together whenever there is no gap between them, which serves as our
 +
anti-interleaving scheme.  Segment merging reduces the number of entries in the
 +
lookup table and accelerates searching for partial hits. In addition, the
 +
merging process creates larger segments, which reduces the number of disk
 +
read/write operations. This is because the requested data will be read/written
 +
in larger chunks. Furthermore, segments are stored on contiguous disk blocks,
 +
which reduces the number of head movements and increases disk throughput.
 +
Segment merging is implemented in two steps. First, we combine the memory
 +
buffers belonging to the two adjacent segments. Then, if the disk blocks of the
 +
old two segments are not contiguous, they are returned to the free block set,
 +
and the memory buffer containing the new (large) segment is marked as modified.
 +
Modified buffers are written to the disk when they are chosen to be swapped out
 +
of the memory. If the disk blocks are contiguous, the buffer is marked as
 +
unmodified.
 +
 +
Next, we describe the organization of disk blocks. We have two types of
 +
blocks: super blocks and normal blocks. Super blocks are used for persistent
 +
storage of the metadata to reconstruct the lookup table after system reboots.
 +
Recall that proxy caches have relaxed data integrity requirements compared to
 +
regular workstations, because cached objects can be retrieved from the P2P
 +
networks again. Therefore, the metadata can be written to the disk only
 +
occasionally. Disk blocks are allocated to data segments in a contiguous manner
 +
to increase disk throughput. Unoccupied disk blocks are maintained in a
 +
free-block set, which is implemented as a red-black tree sorted on the block
 +
number. When a segment of data is to be swapped from memory buffers to the
 +
disk, a simple first-fit scheme is used to find a contiguous number of disk
 +
blocks to store this segment. If no contiguous blocks can satisfy the request,
 +
blocks nearest to the largest number of contiguous free blocks are evicted from
 +
the cache to make up for the deficit. No expensive disk de-fragmentation
 +
process is needed.
 +
 +
 +
==== P2P Traffic Processor ====
 +
pCache needs to communicate with peers from different P2P systems. For each
 +
supported P2P system, the P2P Traffic Processor provides three modules to
 +
enable this communication: Parser, Composer, and Analyzer. The Parser performs
 +
functions such as identifying control and payload messages, and extracting
 +
messages that could be of interest to the cache such as object request
 +
messages. The Composer constructs properly-formatted messages to be sent to
 +
peers. The Analyzer is a place holder for any auxiliary functions that may need
 +
to be performed on P2P traffic from different systems. For example, in
 +
BitTorrent the Analyzer infers information (piece length) needed by pCache that
 +
is not included in messages exchanged between peers.
 +
 +
To store and serve P2P traffic, the cache needs to perform several functions
 +
beyond identifying the traffic. These functions are provided by the P2P Traffic
 +
Processor, which has three components: Parser, Composer, and Analyzer. By
 +
inspecting the byte stream of the connection, the Parser determines the
 +
boundaries of messages exchanged between peers, and it extracts the request and
 +
response messages that are of interest to the cache. The Parser returns the ID
 +
of the object being downloaded in the session, as well as the requested byte
 +
range (start and end bytes). The byte range is relative to the whole object.
 +
The Composer prepares protocol-specific messages, and may combine data stored
 +
in the cache with data obtained from the network into one message to be sent to
 +
a peer.
  
 
=== Browse and Download Code ===  
 
=== Browse and Download Code ===  
  
* [https://cs-svn.cs.surrey.sfu.ca/nsl/browser/p2pcache Browse pCache Source Code (SVN Server)]  
+
* We are continuously improve our pCache implementation. The latest
 +
development branch can be browsed on our subversion page at [https://cs-svn.cs.surrey.sfu.ca/nsl/browser/p2pcache our subversion server]  
  
 
* Download pCache Code.  Installation and Configuration Steps.
 
* Download pCache Code.  Installation and Configuration Steps.

Revision as of 11:05, 5 July 2008

Peer-to-peer (P2P) file sharing systems generate a major portion of the Internet traffic, and this portion is expected to increase in the future. The sheer volume and expected high growth of P2P traffic have negative consequences, including: (i) significantly increased load on the Internet backbone, hence, higher chances of congestion; and (ii) increased cost on Internet Service Providers (ISPs), hence, higher service charges for all Internet users.

A potential solution for alleviating those negative impacts is to cache a fraction of the P2P traffic such that future requests for the same objects could be served from a cache in the requester’s autonomous system (AS). Caching in the Internet has mainly been considered for web and video streaming traffic, with little attention to the P2P traffic. Many caching algorithms for web traffic and for video streaming systems have been proposed and analyzed. Directly applying such algorithms to cache P2P traffic may not yield the best cache performance, because of the different traffic characteristics and caching objectives. For instance, reducing user-perceived access latency is a key objective for web caches. Consequently, web caching algorithms often incorporate information about the cost (latency) of a cache miss when deciding which object to cache/evict. Although latency is important to P2P users, the goal of a P2P cache is often focused on the ISP’s primary concern; namely, the amount of bandwidth consumed by large P2P transfers. Consequently, the byte hit rate, i.e., the number of bytes served from the cache to the total number of transfered bytes, is more important than latency.

We are developing caching algorithms that capitalize on the P2P traffic characteristics. We are also exploring the potential of cooperative caching of P2P traffic, where multiple caches deployed in different ASes (which could have a peering relatioship) or within a large AS (e.g., a Tier-1 ISP) cooperate to serve traffic from each other`s clients. Cooperation reduces the load on expensive inter-ISP links. Furthermore, we are implementing all of our algorithms and ideas in a prototype caching system.


People

  • Cheng-Hsin Hsu (PhD Student)
  • Kianoosh Mokhtarian (MSc Student)
  • Behrooz Noorizadeh (MSc Student, Graduated Fall 2007)
  • Osama Saleh (MSc Student, Graduated Fall 2006)


Publications

  • M. Hefeeda and O. Saleh, Traffic Modeling and Proportional Partial Caching for Peer-to-Peer Systems, IEEE/ACM Transactions on Networking, Accepted October 2007.
  • M. Hefeeda and B. Noorizadeh, Cooperative Caching: The Case for P2P Traffic, In Proc. of IEEE Conference on Local Computer Networks (LCN'08), Montreal, Canada, October 2008.

pCache Software

Overview

We have designed and implemented a proxy cache for P2P traffic, which we call pCache. pCache is to be deployed by autonomous systems (ASes) or ISPs that are interested in reducing the burden of P2P traffic. pCache would be deployed at or near the gateway router of an AS. At a high-level, a client participating in a particular P2P network issues a request to download an object. This request is intercepted by pCache. If the requested object or parts of it are stored in the cache, they are served to the requesting client. This saves bandwidth on the external (expensive) links to the Internet. If a part of the requested object is not found in the cache, the request is forwarded to the P2P network. When the response comes back, pCache may store a copy of the object for future requests from other clients in its AS. Clients inside the AS as well as external clients are not aware of pCache, i.e., pCache is fully-transparent in both directions.

Our C++ implementation of pCache has more than 11,000 lines of code. We have rigorously validated and evaluated the performance of pCache as well as its impacts on ISPs and clients. Our experimental results show that pCache benefits both the clients and the ISP in which the cache is deployed, without hurting the performance of the P2P networks. Specifically, clients behind the cache achieve much higher download speeds than other clients running in the same conditions without the cache. In addition, a significant portion of the traffic is served from the cache, which reduces the load on the expensive WAN links for the ISP. Our results also show that the cache does not reduce the connectivity of clients behind it, nor does it reduce their upload speeds. This is important for the whole P2P network, because reduced connectivity could lead to decreased availability of peers and the content stored on them.


Detailed Design

Transparent Proxy and P2P Traffic Identifier

These two components reside on the gateway router. They transparently inspect traffic going through the router and forward only P2P connections to pCache. Traffic that does not belong to any P2P system is processed by the router in the regular way and is not affected by the presence of pCache. This is done using the Netfilter framework for custom packet handling.

Netfilter defines hook points at various packet processing stages, such as PREROUTING, LOCAL_INPUT, LOCAL_OUT, FORWARD, and POSTROUTING. Netfilter allows us to register callback functions at any of these hook points to be invoked when packets reach those hook points. Netfilter is commonly used with iptables, which provides an interface to define rulesets to be applied on packets. Each ruleset has a number of classifiers (fields to be matched) and an action. To support transparent web proxy, a callback function is registered at the PREROUTING hook point to intercept packets with the destination port number set to 80 on TCP. Once intercepted, the destination IP address and port number of each packet will be changed to those of the process running the proxy cache. Thus, HTTP packets will be redirected to the web proxy cache for further processing. Although the destination IP address is lost during this redirection, the web proxy cache can know the address of the web server because HTTP 1.1 requests include the server location in the header. This simple redirection, however, does not work for proxy caches for P2P traffic, because the address of the remote peer is not included in the request messages, and the proxy server cannot find the remote peer. Hence, packets need to be redirected to the proxy process without changing their destination IP and port numbers. We notice that Netfilter supports very flexible, complicated, forwarding rulesets. This allows us to run the gateway router and pCache as two processes on the same machines, or to run them on two separate machines.

We implement our transparent proxy based on the TPROXY project. In implementation, the proxy process creates a listening socket. A callback function is registered at the PREROUTING hook point to intercept packets that might be of interest to the proxy process. This function sets a pointer to the listening socket in the structure containing the packet itself. It also sets a flag in the packet. The route lookup procedure is modified to check the flag bit. If it is set, the packet is sent to the local IP stack, even though its destination is an external IP address. Using the pointer in the packet structure, the packet is then redirected to the listening socket of the proxy. A new (connected) socket is created between the proxy and the internal host. This new socket uses the IP address and port number of the external host, not of the proxy process. Another socket is created between the proxy and the external host; it uses the IP address and port number of the internal host. Two new entries are added to the socket table for these two sockets. Traffic packets passing through the gateway router are checked at the PREROUTING hook point to see whether they match any of these sockets.

The P2P Traffic Identifier determines whether a connection belongs to any P2P system known to the cache. This is done by comparing a number of bytes from the connection stream against known P2P application signatures. We have implemented identifiers for BitTorrent and Gnutella, which are the most common P2P systems nowadays. We can readily support traffic identification for other protocols by adding new identifiers to the cache.

Since P2P systems use dynamic ports, the proxy process may initially intercept some connections that do not belong to P2P systems. This can only be discovered after inspecting a few packets using the P2P Traffic Identification module. Each intercepted connection is split into a pair of connections, and all packets have to go through the proxy process. This imposes overhead on the proxy cache and may increase the end-to-end delay of the connections. To reduce this overhead, we splice each pair of non-P2P connections using TCP splicing techniques, which are usually used in layer-7 switching. We modify our Netfilter callback function to support TCP splicing as well. Our implementation is similar to a layer-7 switching project that is largely abandoned. For spliced connections, the sockets in the proxy process are closed and packets are relayed in the kernel stack instead of passing them up to the proxy process in the application layer. Implementation details such as adjusting sequence numbers in the spliced TCP connections had to be addressed, because these two TCP connections start from different initial sequence numbers.


Connection Manager

When a connection is identified as belonging to a P2P system, it is passed to the Connection Manager, which coordinates different components of pCache to store and serve requests from this connection. For example, once seeing a request message, the Connection Manager calls a lookup function in the Storage System Manager to determine whether this request can be fulfilled with previously cached data either in memory or on disk. In addition, if only parts of the requested data are available in the cache, the Connection Manager sends a message to the actual external peer to request the missing portion of data. It then assembles this data with cached data in a protocol-specific message and sends it to the client.

Since each peer may open many connections to request a single file and pCache is supposed to serve a large number of peers, efficient support of concurrent connections is important. A simple solution for concurrency is to use multithreading, where a thread is created to handle each new connection. This is simple because the states of connections are isolated from each other and processed by identical threads. The downside of this solution is increased overhead in terms of creation/deletion, scheduling, and context switching of threads. Some of these overheads can significantly be reduced using user-level thread libraries such as Capriccio which is reported to scale to a hundred thousands threads. Our current implementation of pCache uses multithreading.

More sophisticated solutions to support efficient concurrency that employ non-blocking (asynchronous) I/O operations can also be used with pCache. For example, the single-process event-driven model uses only one thread to detect events on multiple sockets using non-blocking socket operations such as epoll and select. These events are then scheduled for an event handler to process them. Unlike socket operations, asynchronous disk operations are either poorly supported or not existing on most UNIX systems. To mitigate this problem, multi-process event-driven models have been proposed, which can be roughly categorized into asymmetric and symmetric models. The asymmetric models create a single process to handle events from the network sockets, and multiple processes to handle disk operations. In contrast, the symmetric models create multiple event schedulers that can handle both disk and network events. The above concurrency models have been proposed and used mostly for web servers. Unlike our pCache, web servers may not need to provide full transparency and connection splicing, which could impact these concurrency models. We are currently designing and implementing new concurrency models that are more suitable for P2P proxy caching systems.


Storage System Manager

We propose a new storage management system optimized for P2P traffic. The proposed storage system contains three modules: in-memory structures, block allocation method, and replacement policy. The in-memory structures contain metadata to support storing and serving byte ranges of objects, and memory buffers to reduce disk I/O operations. The block allocation method organizes the layout of data on the disk. The replacement policy decides which segments of objects to evict from the cache in order to make room for a new requested segment.

Two structures are maintained in memory: metadata and page buffers. The metadata is a two-level lookup table designed to enable efficient segment lookups. The first level is a hash table keyed on object IDs; collisions are resolved using common chaining techniques. Every entry points to the second level of the table, which is a set of cached segments belonging to the same object. Every segment entry consists of a few fields, which includes Offset for the absolute segment location within the object and RefCnt for how many connections are currently using this segment. RefCnt is used to prevent evicting a buffer page if there are connections currently using it. The set of cached segments is implemented as a balanced (redblack) binary tree, and it sorted based on the Offset field. Segments inserted into the cached segment set are adjusted to be mutually disjoint. This ensures that the same data is never stored more than once in the cache. Using this structure, partial hits can be found in at most <math>O(log S)</math> steps, where <math>S</math> is the number of segments in the object. This is done by searching on the offset field. Segment insertions and deletions are done in logarithmic steps. Notice that segments stored in the set are not necessarily contiguous.

The second part of the in-memory structures is the page buffers. Page buffers are used to reduce disk I/O operations as well as to perform segment merging. As shown in Fig. 3, we define multiple sizes of page buffers. We pre-allocate these pages in memory to avoid processing overhead caused by memory allocation and deallocation. We maintain unoccupied pages of the same size in the same free-page list. If peers request segments that are in the buffers, they are served from memory and no disk I/O operations are issued. If the requested segments are on the disk, they need to be swapped in some free memory buffers. When all free buffers are used up, the least popular data in some of the buffers are swapped out to the disk if this data has been modified since it was brought in memory, and it is overwritten otherwise.

Another benefit of having memory pages is for anti-interleaving. Since the cache is expected to receive many requests issued by clients at the same time. Therefore, segments of different objects will be multiplexed. That is, segments of object <math>x</math> could be interleaved with segments of object <math>y</math>. Therefore, an anti-interleaving scheme is needed before segments are swapped out to the disk. We propose to merge neighboring segments together whenever there is no gap between them, which serves as our anti-interleaving scheme. Segment merging reduces the number of entries in the lookup table and accelerates searching for partial hits. In addition, the merging process creates larger segments, which reduces the number of disk read/write operations. This is because the requested data will be read/written in larger chunks. Furthermore, segments are stored on contiguous disk blocks, which reduces the number of head movements and increases disk throughput. Segment merging is implemented in two steps. First, we combine the memory buffers belonging to the two adjacent segments. Then, if the disk blocks of the old two segments are not contiguous, they are returned to the free block set, and the memory buffer containing the new (large) segment is marked as modified. Modified buffers are written to the disk when they are chosen to be swapped out of the memory. If the disk blocks are contiguous, the buffer is marked as unmodified.

Next, we describe the organization of disk blocks. We have two types of blocks: super blocks and normal blocks. Super blocks are used for persistent storage of the metadata to reconstruct the lookup table after system reboots. Recall that proxy caches have relaxed data integrity requirements compared to regular workstations, because cached objects can be retrieved from the P2P networks again. Therefore, the metadata can be written to the disk only occasionally. Disk blocks are allocated to data segments in a contiguous manner to increase disk throughput. Unoccupied disk blocks are maintained in a free-block set, which is implemented as a red-black tree sorted on the block number. When a segment of data is to be swapped from memory buffers to the disk, a simple first-fit scheme is used to find a contiguous number of disk blocks to store this segment. If no contiguous blocks can satisfy the request, blocks nearest to the largest number of contiguous free blocks are evicted from the cache to make up for the deficit. No expensive disk de-fragmentation process is needed.


P2P Traffic Processor

pCache needs to communicate with peers from different P2P systems. For each supported P2P system, the P2P Traffic Processor provides three modules to enable this communication: Parser, Composer, and Analyzer. The Parser performs functions such as identifying control and payload messages, and extracting messages that could be of interest to the cache such as object request messages. The Composer constructs properly-formatted messages to be sent to peers. The Analyzer is a place holder for any auxiliary functions that may need to be performed on P2P traffic from different systems. For example, in BitTorrent the Analyzer infers information (piece length) needed by pCache that is not included in messages exchanged between peers.

To store and serve P2P traffic, the cache needs to perform several functions beyond identifying the traffic. These functions are provided by the P2P Traffic Processor, which has three components: Parser, Composer, and Analyzer. By inspecting the byte stream of the connection, the Parser determines the boundaries of messages exchanged between peers, and it extracts the request and response messages that are of interest to the cache. The Parser returns the ID of the object being downloaded in the session, as well as the requested byte range (start and end bytes). The byte range is relative to the whole object. The Composer prepares protocol-specific messages, and may combine data stored in the cache with data obtained from the network into one message to be sent to a peer.

Browse and Download Code

  • We are continuously improve our pCache implementation. The latest

development branch can be browsed on our subversion page at our subversion server

  • Download pCache Code. Installation and Configuration Steps.


Known Issues and Future Enhancements

  • Revisit Gnutella traffic parser/composer. In particular, we need to properly handle cancel messages. To achieve this, the cache manager needs to allow partially received segments (i.e., changing segment length of admitted data messages).
  • Define a stateful connection class, rewrite the connection manager into a event handler. Also use epoll to improve scalability.
  • Adopt simpler segment matching algorithm. For every incoming request, we either request it in its entirety or we don't request it at all. Current partial request code is over-complicated. Especially, when multiple P2P protocols are considered.
  • Implement traffic identifier as a netfilter module and implement reverse TCP splicing.

Feedback and Comments

We welcome all comments and suggestions. You can enter your comments here.

Related Caching Systems and Commercial Products


P2P Traffic Traces

  • If you are interested in the traces, please send us an email along with a brief description of your research and the university/organization you are affiliated with. Brief description of our traces can be found in this readme.txt file.