<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-CA">
	<id>https://nmsl.cs.sfu.ca/index.php?action=history&amp;feed=atom&amp;title=PCacheOverview</id>
	<title>PCacheOverview - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://nmsl.cs.sfu.ca/index.php?action=history&amp;feed=atom&amp;title=PCacheOverview"/>
	<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=PCacheOverview&amp;action=history"/>
	<updated>2026-05-02T11:50:51Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.35.1</generator>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=PCacheOverview&amp;diff=6636&amp;oldid=prev</id>
		<title>Omossad: Created page with &quot;__NOTOC__   Overview of the main components of pCache; more details can be found in [http://www.cs.sfu.ca/~mhefeeda/Papers/tc11.pdf  our paper]. Image:pCache-design.jpg||fra...&quot;</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=PCacheOverview&amp;diff=6636&amp;oldid=prev"/>
		<updated>2021-08-09T06:44:15Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;__NOTOC__   Overview of the main components of pCache; more details can be found in [http://www.cs.sfu.ca/~mhefeeda/Papers/tc11.pdf  our paper]. Image:pCache-design.jpg||fra...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;__NOTOC__ &lt;br /&gt;
&lt;br /&gt;
Overview of the main components of pCache; more details can be found in [http://www.cs.sfu.ca/~mhefeeda/Papers/tc11.pdf  our paper].&lt;br /&gt;
[[Image:pCache-design.jpg||frame|right|pCache Architecture]]&lt;br /&gt;
&lt;br /&gt;
== Transparent Proxy and P2P Traffic Identifier ==&lt;br /&gt;
&lt;br /&gt;
These two components reside on the gateway router. They transparently inspect&lt;br /&gt;
traffic going through the router and forward only P2P connections to pCache.&lt;br /&gt;
Traffic that does not belong to any P2P system is processed by the router in&lt;br /&gt;
the regular way and is not affected by the presence of pCache. This is done&lt;br /&gt;
using the [http://www.netfilter.org/ Netfilter] framework for custom packet handling. &lt;br /&gt;
&lt;br /&gt;
[http://www.netfilter.org/ Netfilter] defines hook points at various packet processing stages, such as&lt;br /&gt;
PREROUTING, LOCAL_INPUT, LOCAL_OUT, FORWARD, and POSTROUTING. [http://www.netfilter.org/ Netfilter] allows&lt;br /&gt;
us to register callback functions at any of these hook points to be invoked&lt;br /&gt;
when packets reach those hook points. [http://www.netfilter.org/ Netfilter] is commonly used with [http://www.netfilter.org/projects/iptables/index.html iptables],&lt;br /&gt;
which provides an interface to define rulesets to be applied on packets. Each&lt;br /&gt;
ruleset has a number of classifiers (fields to be matched) and an action. To&lt;br /&gt;
support transparent web proxy, a callback function is registered at the&lt;br /&gt;
PREROUTING hook point to intercept packets with the destination port number set&lt;br /&gt;
to 80 on TCP. Once intercepted, the destination IP address and port number of&lt;br /&gt;
each packet will be changed to those of the process running the proxy cache.&lt;br /&gt;
Thus, HTTP packets will be redirected to the web proxy cache for further&lt;br /&gt;
processing. Although the destination IP address is lost during this&lt;br /&gt;
redirection, the web proxy cache can know the address of the web server because&lt;br /&gt;
HTTP 1.1 requests include the server location in the header. This simple&lt;br /&gt;
redirection, however, does not work for proxy caches for P2P traffic, because&lt;br /&gt;
the address of the remote peer is not included in the request messages, and the&lt;br /&gt;
proxy server cannot find the remote peer. Hence, packets need to be redirected&lt;br /&gt;
to the proxy process without changing their destination IP and port numbers.&lt;br /&gt;
We notice that [http://www.netfilter.org/ Netfilter] supports very flexible, complicated, forwarding&lt;br /&gt;
rulesets. This allows us to run the gateway router and pCache as two processes&lt;br /&gt;
on the same machines, or to run them on two separate machines.&lt;br /&gt;
&lt;br /&gt;
We implement our transparent proxy based on the [http://www.balabit.com/support/community/products/tproxy/ tproxy] project. In&lt;br /&gt;
implementation, the proxy process creates a listening socket. A callback&lt;br /&gt;
function is registered at the PREROUTING hook point to intercept packets that&lt;br /&gt;
might be of interest to the proxy process. This function sets a pointer to the&lt;br /&gt;
listening socket in the structure containing the packet itself. It also sets a&lt;br /&gt;
flag in the packet. The route lookup procedure is modified to check the flag&lt;br /&gt;
bit. If it is set, the packet is sent to the local IP stack, even though its&lt;br /&gt;
destination is an external IP address. Using the pointer in the packet&lt;br /&gt;
structure, the packet is then redirected to the listening socket of the proxy.&lt;br /&gt;
A new (connected) socket is created between the proxy and the internal host.&lt;br /&gt;
This new socket uses the IP address and port number of the external host, not&lt;br /&gt;
of the proxy process. Another socket is created between the proxy and the&lt;br /&gt;
external host; it uses the IP address and port number of the internal host. Two&lt;br /&gt;
new entries are added to the socket table for these two sockets. Traffic&lt;br /&gt;
packets passing through the gateway router are checked at the PREROUTING hook&lt;br /&gt;
point to see whether they match any of these sockets. &lt;br /&gt;
&lt;br /&gt;
The P2P Traffic Identifier determines whether a connection belongs to any P2P&lt;br /&gt;
system known to the cache. This is done by comparing a number of bytes from the&lt;br /&gt;
connection stream against known P2P application signatures. We have implemented&lt;br /&gt;
identifiers for BitTorrent and Gnutella, which are the most common P2P systems&lt;br /&gt;
nowadays. We can readily support traffic identification for other protocols by&lt;br /&gt;
adding new identifiers to the cache. &lt;br /&gt;
&lt;br /&gt;
Since P2P systems use dynamic ports, the proxy process may initially intercept&lt;br /&gt;
some connections that do not belong to P2P systems. This can only be discovered&lt;br /&gt;
after inspecting a few packets using the P2P Traffic Identification module.&lt;br /&gt;
Each intercepted connection is split into a pair of connections, and all&lt;br /&gt;
packets have to go through the proxy process. This imposes overhead on the&lt;br /&gt;
proxy cache and may increase the end-to-end delay of the connections. To reduce&lt;br /&gt;
this overhead, we splice each pair of non-P2P connections using TCP splicing&lt;br /&gt;
techniques, which are usually used in layer-7 switching. We modify our&lt;br /&gt;
[http://www.netfilter.org/ Netfilter] callback function to support TCP splicing as well. Our implementation&lt;br /&gt;
is similar to an inactive layer-7 switching project, called [http://www.linux-l7sw.org/ l7switch]. For&lt;br /&gt;
spliced connections, the sockets in the proxy process are closed and packets&lt;br /&gt;
are relayed in the kernel stack instead of passing them up to the proxy process&lt;br /&gt;
in the application layer. Implementation details such as adjusting sequence&lt;br /&gt;
numbers in the spliced TCP connections had to be addressed, because these two&lt;br /&gt;
TCP connections start from different initial sequence numbers.&lt;br /&gt;
&lt;br /&gt;
== Connection Manager ==&lt;br /&gt;
&lt;br /&gt;
When a connection is identified as belonging to a P2P system, it is passed to&lt;br /&gt;
the Connection Manager, which coordinates different components of pCache to&lt;br /&gt;
store and serve requests from this connection. For example, once seeing a&lt;br /&gt;
request message, the Connection Manager calls a lookup function in the Storage&lt;br /&gt;
System Manager to determine whether this request can be fulfilled with&lt;br /&gt;
previously cached data either in memory or on disk. In addition, if only parts&lt;br /&gt;
of the requested data are available in the cache, the Connection Manager sends&lt;br /&gt;
a message to the actual external peer to request the missing portion of data.&lt;br /&gt;
It then assembles this data with cached data in a protocol-specific message and&lt;br /&gt;
sends it to the client. &lt;br /&gt;
&lt;br /&gt;
Since each peer may open many connections to request a single file and pCache&lt;br /&gt;
is supposed to serve a large number of peers, efficient support of concurrent&lt;br /&gt;
connections is important. A simple solution for concurrency is to use&lt;br /&gt;
multithreading, where a thread is created to handle each new connection. This&lt;br /&gt;
is simple because the states of connections are isolated from each other and&lt;br /&gt;
processed by identical threads.  The downside of this solution is increased&lt;br /&gt;
overhead in terms of creation/deletion, scheduling, and context switching of&lt;br /&gt;
threads. Some of these overheads can significantly be reduced using user-level&lt;br /&gt;
thread libraries such as Capriccio which is reported to scale to a hundred&lt;br /&gt;
thousands threads. Our current implementation of pCache uses multithreading.&lt;br /&gt;
&lt;br /&gt;
More sophisticated solutions to support efficient concurrency that employ&lt;br /&gt;
non-blocking (asynchronous) I/O operations can also be used with pCache. For&lt;br /&gt;
example, the single-process event-driven model uses only one thread to detect&lt;br /&gt;
events on multiple sockets using non-blocking socket operations such as epoll&lt;br /&gt;
and select. These events are then scheduled for an event handler to process&lt;br /&gt;
them. Unlike socket operations, asynchronous disk operations are either poorly&lt;br /&gt;
supported or not existing on most UNIX systems. To mitigate this problem,&lt;br /&gt;
multi-process event-driven models have been proposed, which can be roughly&lt;br /&gt;
categorized into asymmetric and symmetric models. The asymmetric models create&lt;br /&gt;
a single process to handle events from the network sockets, and multiple&lt;br /&gt;
processes to handle disk operations. In contrast, the symmetric models create&lt;br /&gt;
multiple event schedulers that can handle both disk and network events. The&lt;br /&gt;
above concurrency models have been proposed and used mostly for web servers.&lt;br /&gt;
Unlike our pCache, web servers may not need to provide full transparency and&lt;br /&gt;
connection splicing, which could impact these concurrency models. We are&lt;br /&gt;
currently designing and implementing new concurrency models that are more&lt;br /&gt;
suitable for P2P proxy caching systems.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Storage System Manager ==&lt;br /&gt;
&lt;br /&gt;
We propose a new storage management system optimized for P2P traffic.  The&lt;br /&gt;
proposed storage system contains three modules: in-memory structures, block&lt;br /&gt;
allocation method, and replacement policy. The in-memory structures contain&lt;br /&gt;
metadata to support storing and serving byte ranges of objects, and memory&lt;br /&gt;
buffers to reduce disk I/O operations. The block allocation method organizes&lt;br /&gt;
the layout of data on the disk. The replacement policy decides which segments&lt;br /&gt;
of objects to evict from the cache in order to make room for a new requested&lt;br /&gt;
segment.&lt;br /&gt;
&lt;br /&gt;
Two structures are maintained in memory: metadata and page buffers. The&lt;br /&gt;
metadata is a two-level lookup table designed to enable efficient segment&lt;br /&gt;
lookups. The first level is a hash table keyed on object IDs; collisions are&lt;br /&gt;
resolved using common chaining techniques. Every entry points to the second&lt;br /&gt;
level of the table, which is a set of cached segments belonging to the same&lt;br /&gt;
object. Every segment entry consists of a few fields, which includes Offset for&lt;br /&gt;
the absolute segment location within the object and RefCnt for how many&lt;br /&gt;
connections are currently using this segment.  RefCnt is used to prevent&lt;br /&gt;
evicting a buffer page if there are connections currently using it. The set of&lt;br /&gt;
cached segments is implemented as a balanced (redblack) binary tree, and it&lt;br /&gt;
sorted based on the Offset field. Segments inserted into the cached segment set&lt;br /&gt;
are adjusted to be mutually disjoint. This ensures that the same data is never&lt;br /&gt;
stored more than once in the cache. Using this structure, partial hits can be&lt;br /&gt;
found in at most &amp;lt;math&amp;gt;O(log S)&amp;lt;/math&amp;gt; steps, where &amp;lt;math&amp;gt;S&amp;lt;/math&amp;gt; is the&lt;br /&gt;
number of segments in the object. This is done by searching on the offset&lt;br /&gt;
field. Segment insertions and deletions are done in logarithmic steps. Notice&lt;br /&gt;
that segments stored in the set are not necessarily contiguous.  &lt;br /&gt;
&lt;br /&gt;
The second part of the in-memory structures is the page buffers. Page buffers&lt;br /&gt;
are used to reduce disk I/O operations as well as to perform segment merging.&lt;br /&gt;
As shown in Fig. 3, we define multiple sizes of page buffers.  We pre-allocate&lt;br /&gt;
these pages in memory to avoid processing overhead caused by memory allocation&lt;br /&gt;
and deallocation. We maintain unoccupied pages of the same size in the same&lt;br /&gt;
free-page list. If peers request segments that are in the buffers, they are&lt;br /&gt;
served from memory and no disk I/O operations are issued. If the requested&lt;br /&gt;
segments are on the disk, they need to be swapped in some free memory buffers.&lt;br /&gt;
When all free buffers are used up, the least popular data in some of the&lt;br /&gt;
buffers are swapped out to the disk if this data has been modified since it was&lt;br /&gt;
brought in memory, and it is overwritten otherwise.&lt;br /&gt;
&lt;br /&gt;
Another benefit of having memory pages is for anti-interleaving.  Since the&lt;br /&gt;
cache is expected to receive many requests issued by clients at the same time.&lt;br /&gt;
Therefore, segments of different objects will be multiplexed.  That is,&lt;br /&gt;
segments of object &amp;lt;math&amp;gt;x&amp;lt;/math&amp;gt; could be interleaved with segments of object&lt;br /&gt;
&amp;lt;math&amp;gt;y&amp;lt;/math&amp;gt;.  Therefore, an anti-interleaving scheme is needed before&lt;br /&gt;
segments are swapped out to the disk. We propose to merge neighboring segments&lt;br /&gt;
together whenever there is no gap between them, which serves as our&lt;br /&gt;
anti-interleaving scheme.  Segment merging reduces the number of entries in the&lt;br /&gt;
lookup table and accelerates searching for partial hits. In addition, the&lt;br /&gt;
merging process creates larger segments, which reduces the number of disk&lt;br /&gt;
read/write operations. This is because the requested data will be read/written&lt;br /&gt;
in larger chunks. Furthermore, segments are stored on contiguous disk blocks,&lt;br /&gt;
which reduces the number of head movements and increases disk throughput.&lt;br /&gt;
Segment merging is implemented in two steps. First, we combine the memory&lt;br /&gt;
buffers belonging to the two adjacent segments. Then, if the disk blocks of the&lt;br /&gt;
old two segments are not contiguous, they are returned to the free block set,&lt;br /&gt;
and the memory buffer containing the new (large) segment is marked as modified.&lt;br /&gt;
Modified buffers are written to the disk when they are chosen to be swapped out&lt;br /&gt;
of the memory. If the disk blocks are contiguous, the buffer is marked as&lt;br /&gt;
unmodified.&lt;br /&gt;
&lt;br /&gt;
Next, we describe the organization of disk blocks. We have two types of&lt;br /&gt;
blocks: super blocks and normal blocks. Super blocks are used for persistent&lt;br /&gt;
storage of the metadata to reconstruct the lookup table after system reboots.&lt;br /&gt;
Recall that proxy caches have relaxed data integrity requirements compared to&lt;br /&gt;
regular workstations, because cached objects can be retrieved from the P2P&lt;br /&gt;
networks again. Therefore, the metadata can be written to the disk only&lt;br /&gt;
occasionally. Disk blocks are allocated to data segments in a contiguous manner&lt;br /&gt;
to increase disk throughput. Unoccupied disk blocks are maintained in a&lt;br /&gt;
free-block set, which is implemented as a red-black tree sorted on the block&lt;br /&gt;
number. When a segment of data is to be swapped from memory buffers to the&lt;br /&gt;
disk, a simple first-fit scheme is used to find a contiguous number of disk&lt;br /&gt;
blocks to store this segment. If no contiguous blocks can satisfy the request,&lt;br /&gt;
blocks nearest to the largest number of contiguous free blocks are evicted from&lt;br /&gt;
the cache to make up for the deficit. No expensive disk de-fragmentation&lt;br /&gt;
process is needed.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== P2P Traffic Processor ==&lt;br /&gt;
&lt;br /&gt;
pCache needs to communicate with peers from different P2P systems. For each&lt;br /&gt;
supported P2P system, the P2P Traffic Processor provides three modules to&lt;br /&gt;
enable this communication: Parser, Composer, and Analyzer. The Parser performs&lt;br /&gt;
functions such as identifying control and payload messages, and extracting&lt;br /&gt;
messages that could be of interest to the cache such as object request&lt;br /&gt;
messages. The Composer constructs properly-formatted messages to be sent to&lt;br /&gt;
peers. The Analyzer is a place holder for any auxiliary functions that may need&lt;br /&gt;
to be performed on P2P traffic from different systems. For example, in&lt;br /&gt;
BitTorrent the Analyzer infers information (piece length) needed by pCache that&lt;br /&gt;
is not included in messages exchanged between peers. &lt;br /&gt;
&lt;br /&gt;
To store and serve P2P traffic, the cache needs to perform several functions&lt;br /&gt;
beyond identifying the traffic. These functions are provided by the P2P Traffic&lt;br /&gt;
Processor, which has three components: Parser, Composer, and Analyzer. By&lt;br /&gt;
inspecting the byte stream of the connection, the Parser determines the&lt;br /&gt;
boundaries of messages exchanged between peers, and it extracts the request and&lt;br /&gt;
response messages that are of interest to the cache. The Parser returns the ID&lt;br /&gt;
of the object being downloaded in the session, as well as the requested byte&lt;br /&gt;
range (start and end bytes). The byte range is relative to the whole object.&lt;br /&gt;
The Composer prepares protocol-specific messages, and may combine data stored&lt;br /&gt;
in the cache with data obtained from the network into one message to be sent to&lt;br /&gt;
a peer.&lt;/div&gt;</summary>
		<author><name>Omossad</name></author>
	</entry>
</feed>