Difference between revisions of "Modeling and Caching of P2P Traffic"

From NMSL
 
(44 intermediate revisions by 4 users not shown)
Line 5: Line 5:
 
latency.  
 
latency.  
  
We are developing caching algorithms that capitalize on the P2P traffic characteristics. We are also exploring the potential of cooperative caching of P2P traffic, where multiple caches deployed in different  ASes (which could have a peering relatioship) or within a large AS (e.g., a Tier-1 ISP) cooperate to serve traffic from each other`s clients. Cooperation reduces the load on expensive inter-ISP links. Furthermore, we are implementing all of our algorithms and ideas in a prototype caching system.   
+
We are developing caching algorithms that capitalize on the P2P traffic characteristics. We are also exploring the potential of cooperative caching of P2P traffic, where multiple caches deployed in different  ASes (which could have a peering relationship) or within a large AS (e.g., a Tier-1 ISP) cooperate to serve traffic from each other`s clients. Cooperation reduces the load on expensive inter-ISP links. Furthermore, we are implementing all of our algorithms and ideas in a prototype caching system.   
  
  
Line 12: Line 12:
 
* [http://www.cs.sfu.ca/~mhefeeda/ Mohamed Hefeeda]  
 
* [http://www.cs.sfu.ca/~mhefeeda/ Mohamed Hefeeda]  
  
* Cheng-Hsin Hsu (PhD Student)
+
* [http://www.sfu.ca/~cha16/ ChengHsin Hsu (PhD student)]
  
* Kianoosh Mokhtarian (MSc Student)
+
* [http://www.cs.sfu.ca/~kma26/personal/ Kianoosh Mokhtarian (MSc Student)]
  
 
* Behrooz Noorizadeh (MSc Student, Graduated Fall 2007)
 
* Behrooz Noorizadeh (MSc Student, Graduated Fall 2007)
Line 23: Line 23:
 
== Publications ==  
 
== Publications ==  
  
* M. Hefeeda and O. Saleh, Traffic Modeling and Proportional Partial Caching for Peer-to-Peer Systems, IEEE/ACM Transactions on Networking, Accepted October 2007.
+
* M. Hefeeda, C. Hsu, and K. Mokhtarian, [http://www.cs.sfu.ca/~mhefeeda/Papers/tc11.pdf Design and Evaluation of a Proxy Cache for Peer to Peer Traffic], ''IEEE Transactions on Computers'', 60(7), pp. 964--977, July 2011.
  
* M. Hefeeda and B. Noorizadeh, Cooperative Caching: The Case for P2P Traffic, In Proc. of IEEE Conference on Local Computer Networks (LCN'08), Montreal, Canada, October 2008.
+
* M. Hefeeda and B. Noorizadeh, [http://www.cs.sfu.ca/~mhefeeda/Papers/tpds10_caching.pdf On the Benefits of Cooperative Proxy Caching for Peer-to-Peer Traffic], ''IEEE Transactions on Parallel and Distributed Systems'', 21(7), pp. 998--1010, July 2010..  
  
* O. Saleh and M. Hefeeda, [http://www.cs.sfu.ca/~mhefeeda/Papers/icnp06.pdf Modeling and Caching of Peer-to-Peer Traffic] , In Proc. of IEEE International Conference on Network Protocols (ICNP'06), pp. 249--258, Santa Barbara, CA, November 2006.  (Acceptance: 14%)
+
* M. Hefeeda and O. Saleh, [http://www.cs.sfu.ca/~mhefeeda/Papers/ton08.pdf Traffic Modeling and Proportional Partial Caching for Peer-to-Peer Systems], ''IEEE/ACM Transactions on Networking,''  16(6), pp. 1447--1460, December 2008.
 +
 
 +
* M. Hefeeda, C. Hsu, and K. Mokhtarian, [http://www.cs.sfu.ca/~mhefeeda/Papers/sigcomm08demo_abstract.pdf pCache: A Proxy Cache for Peer-to-Peer Traffic], ACM SIGCOMM'08 Technical Demonstration, Seattle, WA, August 2008.  [[http://www.cs.sfu.ca/~mhefeeda/Papers/sigcomm08demo.pdf Poster: pdf]]  [[http://www.cs.sfu.ca/~mhefeeda/Papers/sigcomm08demo.ppt Poster: ppt]].
 +
 
 +
* M. Hefeeda and B. Noorizadeh, [http://www.cs.sfu.ca/~mhefeeda/Papers/lcn08.pdf Cooperative Caching: The Case for P2P Traffic], In Proc. of IEEE Conference on Local Computer Networks (LCN'08), Montreal, Canada, October 2008.
 +
 
 +
* O. Saleh and M. Hefeeda, [http://www.cs.sfu.ca/~mhefeeda/Papers/icnp06.pdf Modeling and Caching of Peer-to-Peer Traffic] , In Proc. of IEEE International Conference on Network Protocols (ICNP'06), pp. 249--258, Santa Barbara, CA, November 2006.  (Acceptance: 14%)
  
 
== pCache Software ==
 
== pCache Software ==
 
+
[[Image:caching.jpg|frame|right|P2P Caching]]
=== Overview ===
 
 
We have designed and implemented a proxy cache for P2P traffic, which we call
 
We have designed and implemented a proxy cache for P2P traffic, which we call
 
pCache. pCache is to be deployed by autonomous systems (ASes) or ISPs that
 
pCache. pCache is to be deployed by autonomous systems (ASes) or ISPs that
Line 59: Line 64:
 
lead to decreased availability of peers and the content stored on them.  
 
lead to decreased availability of peers and the content stored on them.  
  
 +
The following figure demonstrates the main components of pCache.  A brief description
 +
of each component is given in [[pCacheOverview|pCache overview page]]. 
 +
Detailed design and performance evaluations are presented in
 +
our IEEE Transactions on Computers paper.
  
=== Detailed Design ===
+
[[Image:pCache-design.jpg|center|Components]]
  
==== Transparent Proxy and P2P Traffic Identifier ====
+
=== Browse and Download Code ===  
These two components reside on the gateway router. They transparently inspect
 
traffic going through the router and forward only P2P connections to pCache.
 
Traffic that does not belong to any P2P system is processed by the router in
 
the regular way and is not affected by the presence of pCache. This is done
 
using the [http://www.netfilter.org/ Netfilter] framework for custom packet handling.
 
  
[http://www.netfilter.org/ Netfilter] defines hook points at various packet processing stages, such as
+
This project has been concluded and no future releases are scheduled. [https://cs-nsl-svn.cs.surrey.sfu.ca/cssvn/nsl-projects/P2P/p2pCache/code The SVN repository is here.] (for local access only).
PREROUTING, LOCAL_INPUT, LOCAL_OUT, FORWARD, and POSTROUTING. [http://www.netfilter.org/ Netfilter] allows
 
us to register callback functions at any of these hook points to be invoked
 
when packets reach those hook points. [http://www.netfilter.org/ Netfilter] is commonly used with [http://www.netfilter.org/projects/iptables/index.html iptables],
 
which provides an interface to define rulesets to be applied on packets. Each
 
ruleset has a number of classifiers (fields to be matched) and an action. To
 
support transparent web proxy, a callback function is registered at the
 
PREROUTING hook point to intercept packets with the destination port number set
 
to 80 on TCP. Once intercepted, the destination IP address and port number of
 
each packet will be changed to those of the process running the proxy cache.
 
Thus, HTTP packets will be redirected to the web proxy cache for further
 
processing. Although the destination IP address is lost during this
 
redirection, the web proxy cache can know the address of the web server because
 
HTTP 1.1 requests include the server location in the header. This simple
 
redirection, however, does not work for proxy caches for P2P traffic, because
 
the address of the remote peer is not included in the request messages, and the
 
proxy server cannot find the remote peer. Hence, packets need to be redirected
 
to the proxy process without changing their destination IP and port numbers.
 
We notice that [http://www.netfilter.org/ Netfilter] supports very flexible, complicated, forwarding
 
rulesets. This allows us to run the gateway router and pCache as two processes
 
on the same machines, or to run them on two separate machines.
 
  
We implement our transparent proxy based on the [http://www.balabit.com/support/community/products/tproxy/ tproxy] project. In
 
implementation, the proxy process creates a listening socket. A callback
 
function is registered at the PREROUTING hook point to intercept packets that
 
might be of interest to the proxy process. This function sets a pointer to the
 
listening socket in the structure containing the packet itself. It also sets a
 
flag in the packet. The route lookup procedure is modified to check the flag
 
bit. If it is set, the packet is sent to the local IP stack, even though its
 
destination is an external IP address. Using the pointer in the packet
 
structure, the packet is then redirected to the listening socket of the proxy.
 
A new (connected) socket is created between the proxy and the internal host.
 
This new socket uses the IP address and port number of the external host, not
 
of the proxy process. Another socket is created between the proxy and the
 
external host; it uses the IP address and port number of the internal host. Two
 
new entries are added to the socket table for these two sockets. Traffic
 
packets passing through the gateway router are checked at the PREROUTING hook
 
point to see whether they match any of these sockets.
 
  
The P2P Traffic Identifier determines whether a connection belongs to any P2P
+
pCache code is released under [http://www.gnu.org/licenses/gpl-3.0.txt GPLv3].
system known to the cache. This is done by comparing a number of bytes from the
 
connection stream against known P2P application signatures. We have implemented
 
identifiers for BitTorrent and Gnutella, which are the most common P2P systems
 
nowadays. We can readily support traffic identification for other protocols by
 
adding new identifiers to the cache.  
 
  
Since P2P systems use dynamic ports, the proxy process may initially intercept
 
some connections that do not belong to P2P systems. This can only be discovered
 
after inspecting a few packets using the P2P Traffic Identification module.
 
Each intercepted connection is split into a pair of connections, and all
 
packets have to go through the proxy process. This imposes overhead on the
 
proxy cache and may increase the end-to-end delay of the connections. To reduce
 
this overhead, we splice each pair of non-P2P connections using TCP splicing
 
techniques, which are usually used in layer-7 switching. We modify our
 
[http://www.netfilter.org/ Netfilter] callback function to support TCP splicing as well. Our implementation
 
is similar to an inactive layer-7 switching project, called [http://www.linux-l7sw.org/ l7switch]. For
 
spliced connections, the sockets in the proxy process are closed and packets
 
are relayed in the kernel stack instead of passing them up to the proxy process
 
in the application layer. Implementation details such as adjusting sequence
 
numbers in the spliced TCP connections had to be addressed, because these two
 
TCP connections start from different initial sequence numbers.
 
  
 +
pCache has two parts: Patched Linux Kernel and pCache Application.  The Patched Linux Kernel consists of all required patches to support transparent proxy, which simplifies setting up required environment. This patched kernel contains code from [http://www.kernel.org/ mainstream Linux kernel], [http://www.netfilter.org/ netfilter], [http://www.balabit.com/support/community/products/tproxy/ tproxy], and [http://www.linux-l7sw.org/ layer7switch]. The pCache Application implements components described above. Moreover, a patched iptables is also provided that takes additional arguments supported by [http://www.balabit.com/support/community/products/tproxy/ tproxy].
  
==== Connection Manager ====
 
When a connection is identified as belonging to a P2P system, it is passed to
 
the Connection Manager, which coordinates different components of pCache to
 
store and serve requests from this connection. For example, once seeing a
 
request message, the Connection Manager calls a lookup function in the Storage
 
System Manager to determine whether this request can be fulfilled with
 
previously cached data either in memory or on disk. In addition, if only parts
 
of the requested data are available in the cache, the Connection Manager sends
 
a message to the actual external peer to request the missing portion of data.
 
It then assembles this data with cached data in a protocol-specific message and
 
sends it to the client.
 
  
Since each peer may open many connections to request a single file and pCache
+
* Patched Linux Kernel [[media:linux-2.6.23.tgz]]
is supposed to serve a large number of peers, efficient support of concurrent
 
connections is important. A simple solution for concurrency is to use
 
multithreading, where a thread is created to handle each new connection. This
 
is simple because the states of connections are isolated from each other and
 
processed by identical threads.  The downside of this solution is increased
 
overhead in terms of creation/deletion, scheduling, and context switching of
 
threads. Some of these overheads can significantly be reduced using user-level
 
thread libraries such as Capriccio which is reported to scale to a hundred
 
thousands threads. Our current implementation of pCache uses multithreading.
 
  
More sophisticated solutions to support efficient concurrency that employ
+
* pCache Application Source Code [[media:pCache-0.0.1.tgz]]
non-blocking (asynchronous) I/O operations can also be used with pCache. For
 
example, the single-process event-driven model uses only one thread to detect
 
events on multiple sockets using non-blocking socket operations such as epoll
 
and select. These events are then scheduled for an event handler to process
 
them. Unlike socket operations, asynchronous disk operations are either poorly
 
supported or not existing on most UNIX systems. To mitigate this problem,
 
multi-process event-driven models have been proposed, which can be roughly
 
categorized into asymmetric and symmetric models. The asymmetric models create
 
a single process to handle events from the network sockets, and multiple
 
processes to handle disk operations. In contrast, the symmetric models create
 
multiple event schedulers that can handle both disk and network events. The
 
above concurrency models have been proposed and used mostly for web servers.
 
Unlike our pCache, web servers may not need to provide full transparency and
 
connection splicing, which could impact these concurrency models. We are
 
currently designing and implementing new concurrency models that are more
 
suitable for P2P proxy caching systems.
 
  
 +
* iptables (patched for additional araguments) [[media:iptables-1.4.0rc1.tgz]]
  
==== Storage System Manager ====
 
  
We propose a new storage management system optimized for P2P traffic.  The
+
To setup your pCache system, please follow these simple steps:
proposed storage system contains three modules: in-memory structures, block
 
allocation method, and replacement policy. The in-memory structures contain
 
metadata to support storing and serving byte ranges of objects, and memory
 
buffers to reduce disk I/O operations. The block allocation method organizes
 
the layout of data on the disk. The replacement policy decides which segments
 
of objects to evict from the cache in order to make room for a new requested
 
segment.
 
  
Two structures are maintained in memory: metadata and page buffers. The
+
* Download the Patched Linux Kernel, compile and install it. Notice that, this tar file also include a sample .config file.
metadata is a two-level lookup table designed to enable efficient segment
 
lookups. The first level is a hash table keyed on object IDs; collisions are
 
resolved using common chaining techniques. Every entry points to the second
 
level of the table, which is a set of cached segments belonging to the same
 
object. Every segment entry consists of a few fields, which includes Offset for
 
the absolute segment location within the object and RefCnt for how many
 
connections are currently using this segment.  RefCnt is used to prevent
 
evicting a buffer page if there are connections currently using it. The set of
 
cached segments is implemented as a balanced (redblack) binary tree, and it
 
sorted based on the Offset field. Segments inserted into the cached segment set
 
are adjusted to be mutually disjoint. This ensures that the same data is never
 
stored more than once in the cache. Using this structure, partial hits can be
 
found in at most <math>O(log S)</math> steps, where <math>S</math> is the
 
number of segments in the object. This is done by searching on the offset
 
field. Segment insertions and deletions are done in logarithmic steps. Notice
 
that segments stored in the set are not necessarily contiguous.
 
  
The second part of the in-memory structures is the page buffers. Page buffers
+
* Download the patched iptables, compile and install it. (Please see the INSTALL file included in the tar file for installation instruction)
are used to reduce disk I/O operations as well as to perform segment merging.
 
As shown in Fig. 3, we define multiple sizes of page buffers.  We pre-allocate
 
these pages in memory to avoid processing overhead caused by memory allocation
 
and deallocation. We maintain unoccupied pages of the same size in the same
 
free-page list. If peers request segments that are in the buffers, they are
 
served from memory and no disk I/O operations are issued. If the requested
 
segments are on the disk, they need to be swapped in some free memory buffers.
 
When all free buffers are used up, the least popular data in some of the
 
buffers are swapped out to the disk if this data has been modified since it was
 
brought in memory, and it is overwritten otherwise.
 
 
 
Another benefit of having memory pages is for anti-interleaving.  Since the
 
cache is expected to receive many requests issued by clients at the same time.
 
Therefore, segments of different objects will be multiplexed.  That is,
 
segments of object <math>x</math> could be interleaved with segments of object
 
<math>y</math>.  Therefore, an anti-interleaving scheme is needed before
 
segments are swapped out to the disk. We propose to merge neighboring segments
 
together whenever there is no gap between them, which serves as our
 
anti-interleaving scheme.  Segment merging reduces the number of entries in the
 
lookup table and accelerates searching for partial hits. In addition, the
 
merging process creates larger segments, which reduces the number of disk
 
read/write operations. This is because the requested data will be read/written
 
in larger chunks. Furthermore, segments are stored on contiguous disk blocks,
 
which reduces the number of head movements and increases disk throughput.
 
Segment merging is implemented in two steps. First, we combine the memory
 
buffers belonging to the two adjacent segments. Then, if the disk blocks of the
 
old two segments are not contiguous, they are returned to the free block set,
 
and the memory buffer containing the new (large) segment is marked as modified.
 
Modified buffers are written to the disk when they are chosen to be swapped out
 
of the memory. If the disk blocks are contiguous, the buffer is marked as
 
unmodified.
 
 
 
Next, we describe the organization of disk blocks. We have two types of
 
blocks: super blocks and normal blocks. Super blocks are used for persistent
 
storage of the metadata to reconstruct the lookup table after system reboots.
 
Recall that proxy caches have relaxed data integrity requirements compared to
 
regular workstations, because cached objects can be retrieved from the P2P
 
networks again. Therefore, the metadata can be written to the disk only
 
occasionally. Disk blocks are allocated to data segments in a contiguous manner
 
to increase disk throughput. Unoccupied disk blocks are maintained in a
 
free-block set, which is implemented as a red-black tree sorted on the block
 
number. When a segment of data is to be swapped from memory buffers to the
 
disk, a simple first-fit scheme is used to find a contiguous number of disk
 
blocks to store this segment. If no contiguous blocks can satisfy the request,
 
blocks nearest to the largest number of contiguous free blocks are evicted from
 
the cache to make up for the deficit. No expensive disk de-fragmentation
 
process is needed.
 
  
 
+
* Download the pCache Application source code, compile it for a binary called pCache.
==== P2P Traffic Processor ====
 
pCache needs to communicate with peers from different P2P systems. For each
 
supported P2P system, the P2P Traffic Processor provides three modules to
 
enable this communication: Parser, Composer, and Analyzer. The Parser performs
 
functions such as identifying control and payload messages, and extracting
 
messages that could be of interest to the cache such as object request
 
messages. The Composer constructs properly-formatted messages to be sent to
 
peers. The Analyzer is a place holder for any auxiliary functions that may need
 
to be performed on P2P traffic from different systems. For example, in
 
BitTorrent the Analyzer infers information (piece length) needed by pCache that
 
is not included in messages exchanged between peers.
 
 
 
To store and serve P2P traffic, the cache needs to perform several functions
 
beyond identifying the traffic. These functions are provided by the P2P Traffic
 
Processor, which has three components: Parser, Composer, and Analyzer. By
 
inspecting the byte stream of the connection, the Parser determines the
 
boundaries of messages exchanged between peers, and it extracts the request and
 
response messages that are of interest to the cache. The Parser returns the ID
 
of the object being downloaded in the session, as well as the requested byte
 
range (start and end bytes). The byte range is relative to the whole object.
 
The Composer prepares protocol-specific messages, and may combine data stored
 
in the cache with data obtained from the network into one message to be sent to
 
a peer.
 
 
 
=== Browse and Download Code ===
 
 
 
We are continuously improve our pCache implementation. The latest development branch can be browsed on our subversion page at [https://cs-svn.cs.surrey.sfu.ca/nsl/browser/p2pcache our subversion server]
 
 
 
pCache code is released in two parts: kernel source and application. The Linux
 
kernel consists of all required patches to support transparent proxy, which
 
simplifies setting up required environment. This patched kernel contains code
 
from [http://www.kernel.org/ mainstream Linux kernel], [http://www.netfilter.org/ netfilter], [
 
http://www.balabit.com/support/community/products/tproxy/ tproxy], and [http://www.linux-l7sw.org/ layer7switch]. The pCache application implements components described above. Moreover, a patched iptables is also provided that takes additional arguments supported by [http://www.balabit.com/support/community/products/tproxy/ tproxy].
 
 
 
* Linux Kernel [[media:linux-2.6.23.tgz]]
 
* pCache Snapshot [[media:pCache-0.0.1.tgz]]
 
* iptables (patched for additional araguments) [[media:iptables-1.4.0rc1.tgz]]
 
 
 
To setup your pCache system, please follow these simple steps:
 
* Download the Linux kernel, compile and install it. Notice that, this tar file also include a sample .config file.
 
* Download the patched iptables, compile and install it. (Please see the INSTALL file included in the tar file for installation instruction)
 
* Download the pCache source code, compile it for a binary called pCache.
 
  
 
To run pCache, first configure forwarding table. For example, we use the following script to configure our forwarding table:
 
To run pCache, first configure forwarding table. For example, we use the following script to configure our forwarding table:
Line 319: Line 120:
 
described below. We notice that many settings are for experimental usage and
 
described below. We notice that many settings are for experimental usage and
 
are not useful in actual deployment.
 
are not useful in actual deployment.
* BLOCK_SIZE and BLOCK_NUM determine the layout of harddisk as well as the
+
# BLOCK_SIZE and BLOCK_NUM determine the layout of harddisk as well as the capacity. The resulted harddisk size should never exceed the actual size.
capacity. The resulted harddisk size should never exceed the actual size.
+
# ACCESOR_TYPE describes the harddisk scheme that will be used. The following harddisk schemes are supports: flat directory (1), signal file on file system (4), and signal file on raw disk (5). Other types are experimental only.  
* ACCESOR_TYPE describes the harddisk scheme that will be used. The following
+
# ROOT_DIR or DEV_NAME defines the location of file system. DEV_NAME is used for raw disk harddisk scheme, and ROOT_DIR is for all other. Examples of DEV_NAME include /dev/hda2 and /dev/sda1. Examples of ROOT_DIR include /mnt/pCache, which will need to be mounted first.
harddisk schemes are supports: flat directory (1), signal file on file system
+
# SUBNET and NETMASK define the local subnet. pCache only inspect outgoing requests. The incoming requests are always forwarded.
(4), and signal file on raw disk (5). Other types are experimental only.  
+
# MAX_FILLED_SIZE and MIN_FILLED_SIZE determine when to call cache replacement routine and when to stop.
* ROOT_DIR or DEV_NAME defines the location of file system. DEV_NAME is used
 
for raw disk harddisk scheme, and ROOT_DIR is for all other. Examples of
 
DEV_NAME include /dev/hda2 and /dev/sda1. Examples of ROOT_DIR include
 
/mnt/pCache, which will need to be mounted first.
 
* SUBNET and NETMASK define the local subnet. pCache only inspect outgoing
 
requests. The incoming requests are always forwarded.
 
* MAX_FILLED_SIZE and MIN_FILLED_SIZE determine when to call cache replacement
 
* routine and when to stop.
 
  
After setting up the conf.txt file, run pCache in command-line. Then, use a
+
After setting up the conf.txt file, run pCache in command-line. A log.txt will be generated for detailed debugging messages. pCache also provides a Web-based monitoring interface, which can be access by connecting to ''http://<your-up>:8000'' using any Web browser. The interface looks like:
browser to monitor your pCache status by connecting to http://<your-ip>:8000.  
+
[[Image:pCache_Web.jpg|center|border|576px]]
Also note a log.txt file will be generated.
+
Meanwhile, pCache also includes an applet to report real-time events. This applet can be launched by clicking the ''Details'' button. The applet looks like:
 
+
[[Image:pCache_Applet.jpg|center|border|402px]]
Enjoy.  
 
  
 
=== Future Enhancements  ===
 
=== Future Enhancements  ===
  
* Revisit Gnutella traffic parser/composer. In particular, we need to properly handle cancel messages. To achieve this, the cache manager needs to allow partially received segments (i.e., changing segment length of admitted data messages).  
+
The following is list of possible improvements to pCache. The items are roughly listed by priority. We also list the
 +
person-day estimation on each item, considering a good graduate student who works 8 hours a day. The estimations include unit-tests.
  
* Define a stateful connection class, rewrite the connection manager into a event handler. Also use epoll to improve scalability.
+
#Nonvolatile storage system: To keep cached data across reboots, we need to write in-memory metainfo to the super blocks (in pCache file systems).  This require save(...)/restore(...) functions for a few hashtables in-memory.  This task needs 4~5 ppl-day.
 +
#Proxy performance: We should integrate our code with the latest tproxy for better performance (we can ignore the tcp splicing part for now). After integration, we should quantify the performance of tproxy (by emulating a large number of P2P clients in two private subnets). If possible we can identify the bottlenecks in tproxy, and improve it. We then can contribute the code back to the community. This can be a small/side research project. TProxy integration takes 1 ppl-day. Designing/Implementing the emulation and getting a write-up on comparison and bottleneck analysis takes 5~10 ppl-day.
 +
#Event-driven connection manager: We should define a stateful connection class, rewrite the connection manager into an event handler, use epoll (for network) and aio (for disk) to improve scalability. Finally, a test similar to the one in TProxy test should be performed. Designing it takes 4 ppl-day. Implementing it takes 8~10 ppl-day. Evaluating it takes 3 ppl-day, assuming we have gained experiences from evaluating TProxy.
 +
#Simpler segment matching: For every incoming request, we either request it in its entirety or we don't request it at all. Current partial request code is over-complicated.This takes 1 ppl-day, but may depend on (overlap with) event-driven connection manager.
 +
#Improve the compatibility: Identify the unsupported BT/Gnutella clients, and locate the root causes (which message types cause the problem). Then fix it.  I imagine that this will take some time. I cannot come up with a time estimation as of now.
 +
#Better logging system: We currently use a home-made logging system, but in an inconsistent way: some modules log through stderr rather than the logging system. If time permitted, we may switch to an open-source logging library similar to log4c. This takes 5~7 ppl-day, given that there are many logging statements in the system.
  
* Adopt simpler segment matching algorithm. For every incoming request, we either request it in its entirety or we don't request it at all. Current partial request code is over-complicated. Especially, when multiple P2P protocols are considered.
 
 
* Implement traffic identifier as a [http://www.netfilter.org/ Netfilter] module and implement reverse TCP splicing.
 
  
 
=== Feedback and Comments ===
 
=== Feedback and Comments ===
  
 
We welcome all comments and suggestions. You can enter your comments [http://www.sfu.ca/~cha16/feedback.html here].
 
We welcome all comments and suggestions. You can enter your comments [http://www.sfu.ca/~cha16/feedback.html here].
 +
  
 
=== Related Caching Systems and Commercial Products ===
 
=== Related Caching Systems and Commercial Products ===
  
*  
+
* [[http://www.oversi.com/index.php?option=com_content&task=view&id=38&Itemid=114 OverCache P2P Caching and Delivery Platform]] Oversi's MSP platform realizes multi-service caching for P2P and other applications. In terms of P2P caching, MSP takes a quite different approach than pCache: An MSP device actively participates in P2P networks. That is, MSP acts as a ultra-peer that only serve peers within the deployed ISP. We believe this approach negatively impacts fairness in many P2P networks, such as BitTorrent, which employ algorithms to eliminate free-rider problem. In fact, no peers in ISPs with Oversi's MSP deployed will ever upload anymore, because they expect to get the data free from the MSP platform. Once number of free-riders increases, the P2P network performance degrades, which in turns affects P2P users all over the world.
 +
 
 +
* [[http://www.peerapp.com/products-ultraband.aspx PeerApp UltraBand Family]] Unlike OverCache, PeerApp's products support transparent caching of P2P traffic. Supported P2P protocols are BitTorrent, Gnutella, eDonkey, and FastTrack (the last two are no more popular). However, like OverCache, PeerApp's products do not support cross-protocol caching; a file cached through a BitTorrent download will not be served to a Gnutella user requesting the same file (or vice versa). Currently, we have provided basic means for supporting cross-protocol caching, and this feature will be fully added to the next version of our prototype software.
 +
 
 +
Not to mention that our prototype software is open-source, while the above products are commercial and very expensive.
 +
 
 +
=== On-going Research Problems===
  
*
+
We list the current research problems and draft solutions [[Private:pCache_progress|a separate document (login required)]].
  
  
 
== P2P Traffic Traces ==  
 
== P2P Traffic Traces ==  
  
* If you are interested in the traces, please send us an email along with a brief description of your research and the university/organization you are affiliated with. Brief description of our traces can be found in this [http://nsl.cs.surrey.sfu.ca/projects/p2p/traces_readme.txt readme.txt] file.
+
* Brief description of our traces can be found in this [http://nsl.cs.surrey.sfu.ca/projects/p2p/traces_readme.txt readme.txt] file. The trace files can be downloaded [http://nsl.cs.sfu.ca/traces/p2p/anonymized/ here].

Latest revision as of 02:45, 21 January 2015

Peer-to-peer (P2P) file sharing systems generate a major portion of the Internet traffic, and this portion is expected to increase in the future. The sheer volume and expected high growth of P2P traffic have negative consequences, including: (i) significantly increased load on the Internet backbone, hence, higher chances of congestion; and (ii) increased cost on Internet Service Providers (ISPs), hence, higher service charges for all Internet users.

A potential solution for alleviating those negative impacts is to cache a fraction of the P2P traffic such that future requests for the same objects could be served from a cache in the requester’s autonomous system (AS). Caching in the Internet has mainly been considered for web and video streaming traffic, with little attention to the P2P traffic. Many caching algorithms for web traffic and for video streaming systems have been proposed and analyzed. Directly applying such algorithms to cache P2P traffic may not yield the best cache performance, because of the different traffic characteristics and caching objectives. For instance, reducing user-perceived access latency is a key objective for web caches. Consequently, web caching algorithms often incorporate information about the cost (latency) of a cache miss when deciding which object to cache/evict. Although latency is important to P2P users, the goal of a P2P cache is often focused on the ISP’s primary concern; namely, the amount of bandwidth consumed by large P2P transfers. Consequently, the byte hit rate, i.e., the number of bytes served from the cache to the total number of transfered bytes, is more important than latency.

We are developing caching algorithms that capitalize on the P2P traffic characteristics. We are also exploring the potential of cooperative caching of P2P traffic, where multiple caches deployed in different ASes (which could have a peering relationship) or within a large AS (e.g., a Tier-1 ISP) cooperate to serve traffic from each other`s clients. Cooperation reduces the load on expensive inter-ISP links. Furthermore, we are implementing all of our algorithms and ideas in a prototype caching system.


People

  • Behrooz Noorizadeh (MSc Student, Graduated Fall 2007)
  • Osama Saleh (MSc Student, Graduated Fall 2006)


Publications

pCache Software

File:Caching.jpg
P2P Caching

We have designed and implemented a proxy cache for P2P traffic, which we call pCache. pCache is to be deployed by autonomous systems (ASes) or ISPs that are interested in reducing the burden of P2P traffic. pCache would be deployed at or near the gateway router of an AS. At a high-level, a client participating in a particular P2P network issues a request to download an object. This request is intercepted by pCache. If the requested object or parts of it are stored in the cache, they are served to the requesting client. This saves bandwidth on the external (expensive) links to the Internet. If a part of the requested object is not found in the cache, the request is forwarded to the P2P network. When the response comes back, pCache may store a copy of the object for future requests from other clients in its AS. Clients inside the AS as well as external clients are not aware of pCache, i.e., pCache is fully-transparent in both directions.

Our C++ implementation of pCache has more than 11,000 lines of code. We have rigorously validated and evaluated the performance of pCache as well as its impacts on ISPs and clients. Our experimental results show that pCache benefits both the clients and the ISP in which the cache is deployed, without hurting the performance of the P2P networks. Specifically, clients behind the cache achieve much higher download speeds than other clients running in the same conditions without the cache. In addition, a significant portion of the traffic is served from the cache, which reduces the load on the expensive WAN links for the ISP. Our results also show that the cache does not reduce the connectivity of clients behind it, nor does it reduce their upload speeds. This is important for the whole P2P network, because reduced connectivity could lead to decreased availability of peers and the content stored on them.

The following figure demonstrates the main components of pCache. A brief description of each component is given in pCache overview page. Detailed design and performance evaluations are presented in our IEEE Transactions on Computers paper.

Components

Browse and Download Code

This project has been concluded and no future releases are scheduled. The SVN repository is here. (for local access only).


pCache code is released under GPLv3.


pCache has two parts: Patched Linux Kernel and pCache Application. The Patched Linux Kernel consists of all required patches to support transparent proxy, which simplifies setting up required environment. This patched kernel contains code from mainstream Linux kernel, netfilter, tproxy, and layer7switch. The pCache Application implements components described above. Moreover, a patched iptables is also provided that takes additional arguments supported by tproxy.



To setup your pCache system, please follow these simple steps:

  • Download the Patched Linux Kernel, compile and install it. Notice that, this tar file also include a sample .config file.
  • Download the patched iptables, compile and install it. (Please see the INSTALL file included in the tar file for installation instruction)
  • Download the pCache Application source code, compile it for a binary called pCache.

To run pCache, first configure forwarding table. For example, we use the following script to configure our forwarding table:

iptables -t mangle -N DIVERT
# bypass low ports
iptables -t mangle -A PREROUTING -p tcp --sport 1:1024 -j ACCEPT
iptables -t mangle -A PREROUTING -p tcp --dport 1:1024 -j ACCEPT
iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
iptables -t mangle -A PREROUTING -p tcp -j TPROXY --tproxy-mark 0x1/0x1 --on-port 7072
iptables -t mangle -A DIVERT -j MARK --set-mark 1
iptables -t mangle -A DIVERT -j ACCEPT
ip rule add fwmark 1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100
# disable TCP features that are not suppoert by the TCP splicing module
sysctl -w net.ipv4.tcp_sack=0
sysctl -w net.ipv4.tcp_dsack=0
sysctl -w net.ipv4.tcp_window_scaling=0
#sysctl -w net.ipv4.tcp_tw_recycle=1
#echo 8 > /proc/sys/kernel/printk

Then, configure the conf.txt file under the pCache directory. See comments in conf.txt for purpose of each setting. The most important ones are briefly described below. We notice that many settings are for experimental usage and are not useful in actual deployment.

  1. BLOCK_SIZE and BLOCK_NUM determine the layout of harddisk as well as the capacity. The resulted harddisk size should never exceed the actual size.
  2. ACCESOR_TYPE describes the harddisk scheme that will be used. The following harddisk schemes are supports: flat directory (1), signal file on file system (4), and signal file on raw disk (5). Other types are experimental only.
  3. ROOT_DIR or DEV_NAME defines the location of file system. DEV_NAME is used for raw disk harddisk scheme, and ROOT_DIR is for all other. Examples of DEV_NAME include /dev/hda2 and /dev/sda1. Examples of ROOT_DIR include /mnt/pCache, which will need to be mounted first.
  4. SUBNET and NETMASK define the local subnet. pCache only inspect outgoing requests. The incoming requests are always forwarded.
  5. MAX_FILLED_SIZE and MIN_FILLED_SIZE determine when to call cache replacement routine and when to stop.

After setting up the conf.txt file, run pCache in command-line. A log.txt will be generated for detailed debugging messages. pCache also provides a Web-based monitoring interface, which can be access by connecting to http://<your-up>:8000 using any Web browser. The interface looks like:

Meanwhile, pCache also includes an applet to report real-time events. This applet can be launched by clicking the Details button. The applet looks like:

Future Enhancements

The following is list of possible improvements to pCache. The items are roughly listed by priority. We also list the person-day estimation on each item, considering a good graduate student who works 8 hours a day. The estimations include unit-tests.

  1. Nonvolatile storage system: To keep cached data across reboots, we need to write in-memory metainfo to the super blocks (in pCache file systems). This require save(...)/restore(...) functions for a few hashtables in-memory. This task needs 4~5 ppl-day.
  2. Proxy performance: We should integrate our code with the latest tproxy for better performance (we can ignore the tcp splicing part for now). After integration, we should quantify the performance of tproxy (by emulating a large number of P2P clients in two private subnets). If possible we can identify the bottlenecks in tproxy, and improve it. We then can contribute the code back to the community. This can be a small/side research project. TProxy integration takes 1 ppl-day. Designing/Implementing the emulation and getting a write-up on comparison and bottleneck analysis takes 5~10 ppl-day.
  3. Event-driven connection manager: We should define a stateful connection class, rewrite the connection manager into an event handler, use epoll (for network) and aio (for disk) to improve scalability. Finally, a test similar to the one in TProxy test should be performed. Designing it takes 4 ppl-day. Implementing it takes 8~10 ppl-day. Evaluating it takes 3 ppl-day, assuming we have gained experiences from evaluating TProxy.
  4. Simpler segment matching: For every incoming request, we either request it in its entirety or we don't request it at all. Current partial request code is over-complicated.This takes 1 ppl-day, but may depend on (overlap with) event-driven connection manager.
  5. Improve the compatibility: Identify the unsupported BT/Gnutella clients, and locate the root causes (which message types cause the problem). Then fix it. I imagine that this will take some time. I cannot come up with a time estimation as of now.
  6. Better logging system: We currently use a home-made logging system, but in an inconsistent way: some modules log through stderr rather than the logging system. If time permitted, we may switch to an open-source logging library similar to log4c. This takes 5~7 ppl-day, given that there are many logging statements in the system.


Feedback and Comments

We welcome all comments and suggestions. You can enter your comments here.


Related Caching Systems and Commercial Products

  • [OverCache P2P Caching and Delivery Platform] Oversi's MSP platform realizes multi-service caching for P2P and other applications. In terms of P2P caching, MSP takes a quite different approach than pCache: An MSP device actively participates in P2P networks. That is, MSP acts as a ultra-peer that only serve peers within the deployed ISP. We believe this approach negatively impacts fairness in many P2P networks, such as BitTorrent, which employ algorithms to eliminate free-rider problem. In fact, no peers in ISPs with Oversi's MSP deployed will ever upload anymore, because they expect to get the data free from the MSP platform. Once number of free-riders increases, the P2P network performance degrades, which in turns affects P2P users all over the world.
  • [PeerApp UltraBand Family] Unlike OverCache, PeerApp's products support transparent caching of P2P traffic. Supported P2P protocols are BitTorrent, Gnutella, eDonkey, and FastTrack (the last two are no more popular). However, like OverCache, PeerApp's products do not support cross-protocol caching; a file cached through a BitTorrent download will not be served to a Gnutella user requesting the same file (or vice versa). Currently, we have provided basic means for supporting cross-protocol caching, and this feature will be fully added to the next version of our prototype software.

Not to mention that our prototype software is open-source, while the above products are commercial and very expensive.

On-going Research Problems

We list the current research problems and draft solutions a separate document (login required).


P2P Traffic Traces

  • Brief description of our traces can be found in this readme.txt file. The trace files can be downloaded here.