Omossad: Created page with "NOTOC Overview of the main components of pCache; more details can be found in [http://www.cs.sfu.ca/~mhefeeda/Papers/tc11.pdf our paper]. Image:pCache-design.jpg||fra..."

2021-08-09T06:44:15Z

Created page with "__NOTOC__ Overview of the main components of pCache; more details can be found in [http://www.cs.sfu.ca/~mhefeeda/Papers/tc11.pdf our paper]. Image:pCache-design.jpg||fra..."

New page

__NOTOC__

Overview of the main components of pCache; more details can be found in [http://www.cs.sfu.ca/~mhefeeda/Papers/tc11.pdf our paper].
[[Image:pCache-design.jpg||frame|right|pCache Architecture]]

== Transparent Proxy and P2P Traffic Identifier ==

These two components reside on the gateway router. They transparently inspect
traffic going through the router and forward only P2P connections to pCache.
Traffic that does not belong to any P2P system is processed by the router in
the regular way and is not affected by the presence of pCache. This is done
using the [http://www.netfilter.org/ Netfilter] framework for custom packet handling.

[http://www.netfilter.org/ Netfilter] defines hook points at various packet processing stages, such as
PREROUTING, LOCAL_INPUT, LOCAL_OUT, FORWARD, and POSTROUTING. [http://www.netfilter.org/ Netfilter] allows
us to register callback functions at any of these hook points to be invoked
when packets reach those hook points. [http://www.netfilter.org/ Netfilter] is commonly used with [http://www.netfilter.org/projects/iptables/index.html iptables],
which provides an interface to define rulesets to be applied on packets. Each
ruleset has a number of classifiers (fields to be matched) and an action. To
support transparent web proxy, a callback function is registered at the
PREROUTING hook point to intercept packets with the destination port number set
to 80 on TCP. Once intercepted, the destination IP address and port number of
each packet will be changed to those of the process running the proxy cache.
Thus, HTTP packets will be redirected to the web proxy cache for further
processing. Although the destination IP address is lost during this
redirection, the web proxy cache can know the address of the web server because
HTTP 1.1 requests include the server location in the header. This simple
redirection, however, does not work for proxy caches for P2P traffic, because
the address of the remote peer is not included in the request messages, and the
proxy server cannot find the remote peer. Hence, packets need to be redirected
to the proxy process without changing their destination IP and port numbers.
We notice that [http://www.netfilter.org/ Netfilter] supports very flexible, complicated, forwarding
rulesets. This allows us to run the gateway router and pCache as two processes
on the same machines, or to run them on two separate machines.

We implement our transparent proxy based on the [http://www.balabit.com/support/community/products/tproxy/ tproxy] project. In
implementation, the proxy process creates a listening socket. A callback
function is registered at the PREROUTING hook point to intercept packets that
might be of interest to the proxy process. This function sets a pointer to the
listening socket in the structure containing the packet itself. It also sets a
flag in the packet. The route lookup procedure is modified to check the flag
bit. If it is set, the packet is sent to the local IP stack, even though its
destination is an external IP address. Using the pointer in the packet
structure, the packet is then redirected to the listening socket of the proxy.
A new (connected) socket is created between the proxy and the internal host.
This new socket uses the IP address and port number of the external host, not
of the proxy process. Another socket is created between the proxy and the
external host; it uses the IP address and port number of the internal host. Two
new entries are added to the socket table for these two sockets. Traffic
packets passing through the gateway router are checked at the PREROUTING hook
point to see whether they match any of these sockets.

The P2P Traffic Identifier determines whether a connection belongs to any P2P
system known to the cache. This is done by comparing a number of bytes from the
connection stream against known P2P application signatures. We have implemented
identifiers for BitTorrent and Gnutella, which are the most common P2P systems
nowadays. We can readily support traffic identification for other protocols by
adding new identifiers to the cache.

Since P2P systems use dynamic ports, the proxy process may initially intercept
some connections that do not belong to P2P systems. This can only be discovered
after inspecting a few packets using the P2P Traffic Identification module.
Each intercepted connection is split into a pair of connections, and all
packets have to go through the proxy process. This imposes overhead on the
proxy cache and may increase the end-to-end delay of the connections. To reduce
this overhead, we splice each pair of non-P2P connections using TCP splicing
techniques, which are usually used in layer-7 switching. We modify our
[http://www.netfilter.org/ Netfilter] callback function to support TCP splicing as well. Our implementation
is similar to an inactive layer-7 switching project, called [http://www.linux-l7sw.org/ l7switch]. For
spliced connections, the sockets in the proxy process are closed and packets
are relayed in the kernel stack instead of passing them up to the proxy process
in the application layer. Implementation details such as adjusting sequence
numbers in the spliced TCP connections had to be addressed, because these two
TCP connections start from different initial sequence numbers.

== Connection Manager ==

When a connection is identified as belonging to a P2P system, it is passed to
the Connection Manager, which coordinates different components of pCache to
store and serve requests from this connection. For example, once seeing a
request message, the Connection Manager calls a lookup function in the Storage
System Manager to determine whether this request can be fulfilled with
previously cached data either in memory or on disk. In addition, if only parts
of the requested data are available in the cache, the Connection Manager sends
a message to the actual external peer to request the missing portion of data.
It then assembles this data with cached data in a protocol-specific message and
sends it to the client.

Since each peer may open many connections to request a single file and pCache
is supposed to serve a large number of peers, efficient support of concurrent
connections is important. A simple solution for concurrency is to use
multithreading, where a thread is created to handle each new connection. This
is simple because the states of connections are isolated from each other and
processed by identical threads. The downside of this solution is increased
overhead in terms of creation/deletion, scheduling, and context switching of
threads. Some of these overheads can significantly be reduced using user-level
thread libraries such as Capriccio which is reported to scale to a hundred
thousands threads. Our current implementation of pCache uses multithreading.

More sophisticated solutions to support efficient concurrency that employ
non-blocking (asynchronous) I/O operations can also be used with pCache. For
example, the single-process event-driven model uses only one thread to detect
events on multiple sockets using non-blocking socket operations such as epoll
and select. These events are then scheduled for an event handler to process
them. Unlike socket operations, asynchronous disk operations are either poorly
supported or not existing on most UNIX systems. To mitigate this problem,
multi-process event-driven models have been proposed, which can be roughly
categorized into asymmetric and symmetric models. The asymmetric models create
a single process to handle events from the network sockets, and multiple
processes to handle disk operations. In contrast, the symmetric models create
multiple event schedulers that can handle both disk and network events. The
above concurrency models have been proposed and used mostly for web servers.
Unlike our pCache, web servers may not need to provide full transparency and
connection splicing, which could impact these concurrency models. We are
currently designing and implementing new concurrency models that are more
suitable for P2P proxy caching systems.

== Storage System Manager ==

We propose a new storage management system optimized for P2P traffic. The
proposed storage system contains three modules: in-memory structures, block
allocation method, and replacement policy. The in-memory structures contain
metadata to support storing and serving byte ranges of objects, and memory
buffers to reduce disk I/O operations. The block allocation method organizes
the layout of data on the disk. The replacement policy decides which segments
of objects to evict from the cache in order to make room for a new requested
segment.

Two structures are maintained in memory: metadata and page buffers. The
metadata is a two-level lookup table designed to enable efficient segment
lookups. The first level is a hash table keyed on object IDs; collisions are
resolved using common chaining techniques. Every entry points to the second
level of the table, which is a set of cached segments belonging to the same
object. Every segment entry consists of a few fields, which includes Offset for
the absolute segment location within the object and RefCnt for how many
connections are currently using this segment. RefCnt is used to prevent
evicting a buffer page if there are connections currently using it. The set of
cached segments is implemented as a balanced (redblack) binary tree, and it
sorted based on the Offset field. Segments inserted into the cached segment set
are adjusted to be mutually disjoint. This ensures that the same data is never
stored more than once in the cache. Using this structure, partial hits can be
found in at most <math>O(log S)</math> steps, where <math>S</math> is the
number of segments in the object. This is done by searching on the offset
field. Segment insertions and deletions are done in logarithmic steps. Notice
that segments stored in the set are not necessarily contiguous.

The second part of the in-memory structures is the page buffers. Page buffers
are used to reduce disk I/O operations as well as to perform segment merging.
As shown in Fig. 3, we define multiple sizes of page buffers. We pre-allocate
these pages in memory to avoid processing overhead caused by memory allocation
and deallocation. We maintain unoccupied pages of the same size in the same
free-page list. If peers request segments that are in the buffers, they are
served from memory and no disk I/O operations are issued. If the requested
segments are on the disk, they need to be swapped in some free memory buffers.
When all free buffers are used up, the least popular data in some of the
buffers are swapped out to the disk if this data has been modified since it was
brought in memory, and it is overwritten otherwise.

Another benefit of having memory pages is for anti-interleaving. Since the
cache is expected to receive many requests issued by clients at the same time.
Therefore, segments of different objects will be multiplexed. That is,
segments of object <math>x</math> could be interleaved with segments of object
<math>y</math>. Therefore, an anti-interleaving scheme is needed before
segments are swapped out to the disk. We propose to merge neighboring segments
together whenever there is no gap between them, which serves as our
anti-interleaving scheme. Segment merging reduces the number of entries in the
lookup table and accelerates searching for partial hits. In addition, the
merging process creates larger segments, which reduces the number of disk
read/write operations. This is because the requested data will be read/written
in larger chunks. Furthermore, segments are stored on contiguous disk blocks,
which reduces the number of head movements and increases disk throughput.
Segment merging is implemented in two steps. First, we combine the memory
buffers belonging to the two adjacent segments. Then, if the disk blocks of the
old two segments are not contiguous, they are returned to the free block set,
and the memory buffer containing the new (large) segment is marked as modified.
Modified buffers are written to the disk when they are chosen to be swapped out
of the memory. If the disk blocks are contiguous, the buffer is marked as
unmodified.

Next, we describe the organization of disk blocks. We have two types of
blocks: super blocks and normal blocks. Super blocks are used for persistent
storage of the metadata to reconstruct the lookup table after system reboots.
Recall that proxy caches have relaxed data integrity requirements compared to
regular workstations, because cached objects can be retrieved from the P2P
networks again. Therefore, the metadata can be written to the disk only
occasionally. Disk blocks are allocated to data segments in a contiguous manner
to increase disk throughput. Unoccupied disk blocks are maintained in a
free-block set, which is implemented as a red-black tree sorted on the block
number. When a segment of data is to be swapped from memory buffers to the
disk, a simple first-fit scheme is used to find a contiguous number of disk
blocks to store this segment. If no contiguous blocks can satisfy the request,
blocks nearest to the largest number of contiguous free blocks are evicted from
the cache to make up for the deficit. No expensive disk de-fragmentation
process is needed.

== P2P Traffic Processor ==

pCache needs to communicate with peers from different P2P systems. For each
supported P2P system, the P2P Traffic Processor provides three modules to
enable this communication: Parser, Composer, and Analyzer. The Parser performs
functions such as identifying control and payload messages, and extracting
messages that could be of interest to the cache such as object request
messages. The Composer constructs properly-formatted messages to be sent to
peers. The Analyzer is a place holder for any auxiliary functions that may need
to be performed on P2P traffic from different systems. For example, in
BitTorrent the Analyzer infers information (piece length) needed by pCache that
is not included in messages exchanged between peers.

To store and serve P2P traffic, the cache needs to perform several functions
beyond identifying the traffic. These functions are provided by the P2P Traffic
Processor, which has three components: Parser, Composer, and Analyzer. By
inspecting the byte stream of the connection, the Parser determines the
boundaries of messages exchanged between peers, and it extracts the request and
response messages that are of interest to the cache. The Parser returns the ID
of the object being downloaded in the session, as well as the requested byte
range (start and end bytes). The byte range is relative to the whole object.
The Composer prepares protocol-specific messages, and may combine data stored
in the cache with data obtained from the network into one message to be sent to
a peer.

PCacheOverview - Revision history

Omossad: Created page with "__NOTOC__ Overview of the main components of pCache; more details can be found in [http://www.cs.sfu.ca/~mhefeeda/Papers/tc11.pdf our paper]. Image:pCache-design.jpg||fra..."

Omossad: Created page with "NOTOC Overview of the main components of pCache; more details can be found in [http://www.cs.sfu.ca/~mhefeeda/Papers/tc11.pdf our paper]. Image:pCache-design.jpg||fra..."