Difference between revisions of "Private:pCDN:Peer Matching"

From NMSL
Line 2: Line 2:
 
* ACM MM: April 17, 2009
 
* ACM MM: April 17, 2009
 
* ACM MM (short): May 8, 2009
 
* ACM MM (short): May 8, 2009
 +
 +
 +
== Brief Description of BitTorrent ==
 +
In BitTorrent (BT), a content file is split into several pieces (typically 256KB in size). A separate file, called the torrent file, containing some metadata about the original file is then created by the content provider and uploaded to a central server called the tracker. The tracker keeps track of all peers participating in distributing the file. The metadata  contained within the torrent file include: the address of the tracker, the chosen piece size, and an SHA-1 hash for
 +
each piece.
 +
 +
Peers wishing to download the file first obtain the torrent file by downloading it from a Web server via the Hyper-Text Transfer Protocol (HTTP). The client running on the peer then contacts the tracker to obtain a list of peers to start downloading the original content file from. The list returned by the tracker returns k number of peers (default is 50) which are chosen at random to the inquiring peer. After reveiving this list, the peer attempts to connect with the majority of them and exchanges bitfield messages with them during the handshaking process. These messages advertise the availability information for the pieces that a peer already owns.
 +
 +
If a neighbor has a piece that the peer needs, the peer informs the neighbor with an interested message. However, the neighbor is not obliged to send the piece to the requesting peer immediately. A peer decides which of its interested neighbors it should send data to based on a so called choking/unchoking mechanism which executes every 10 seconds. When a neighbor is choked, the peer will refuse to send pieces to it. An unchoke message is sent to a small number of neighbors (default is 4) to inform them of the willingness of the peer to send the pieces that they are interested in. The decision of which neighbors to unchoke is based on a tit-for-tat policy in which the peer unchokes those neighbors which have given data to it at a high rate. This policy enables BT to deter free riding (i.e. selfish peers choosing only to download without contributing). In addition, a fifth neighbor is randomly unchoked every 30 seconds (i.e. every three cycles of the chocking/unchoking process), this is called optimistic unchoking. Optimistic unchoking is important in order to allow brand new peers (that do not have anything to contribute) to bootstrap.
 +
 +
When a peer gets unchoked, it has to decide which of the pieces that are available at the unchoking neighbor (and that it does not currently have) to request. This decision is based on an interest array that the peer creates and maintains after receiving the bitfield messages of its neighbors. This contains the number of neighboring peers owning each of the pieces of the file and is updated whenever the peer receives a have message from one of its neighbors announcing the availability of a new piece that the neighbor has just downloaded. A peer uses a local rarest-first policy in which it requests the piece that is least replicated. This policy enables fast replication of rarest pieces and ensures that torrent file will not become extinct in case a peer having these pieces leaves.
 +
  
 
== Previous (2008) Simulation==
 
== Previous (2008) Simulation==

Revision as of 13:12, 1 April 2009

Important Dates

  • ACM MM: April 17, 2009
  • ACM MM (short): May 8, 2009


Brief Description of BitTorrent

In BitTorrent (BT), a content file is split into several pieces (typically 256KB in size). A separate file, called the torrent file, containing some metadata about the original file is then created by the content provider and uploaded to a central server called the tracker. The tracker keeps track of all peers participating in distributing the file. The metadata contained within the torrent file include: the address of the tracker, the chosen piece size, and an SHA-1 hash for each piece.

Peers wishing to download the file first obtain the torrent file by downloading it from a Web server via the Hyper-Text Transfer Protocol (HTTP). The client running on the peer then contacts the tracker to obtain a list of peers to start downloading the original content file from. The list returned by the tracker returns k number of peers (default is 50) which are chosen at random to the inquiring peer. After reveiving this list, the peer attempts to connect with the majority of them and exchanges bitfield messages with them during the handshaking process. These messages advertise the availability information for the pieces that a peer already owns.

If a neighbor has a piece that the peer needs, the peer informs the neighbor with an interested message. However, the neighbor is not obliged to send the piece to the requesting peer immediately. A peer decides which of its interested neighbors it should send data to based on a so called choking/unchoking mechanism which executes every 10 seconds. When a neighbor is choked, the peer will refuse to send pieces to it. An unchoke message is sent to a small number of neighbors (default is 4) to inform them of the willingness of the peer to send the pieces that they are interested in. The decision of which neighbors to unchoke is based on a tit-for-tat policy in which the peer unchokes those neighbors which have given data to it at a high rate. This policy enables BT to deter free riding (i.e. selfish peers choosing only to download without contributing). In addition, a fifth neighbor is randomly unchoked every 30 seconds (i.e. every three cycles of the chocking/unchoking process), this is called optimistic unchoking. Optimistic unchoking is important in order to allow brand new peers (that do not have anything to contribute) to bootstrap.

When a peer gets unchoked, it has to decide which of the pieces that are available at the unchoking neighbor (and that it does not currently have) to request. This decision is based on an interest array that the peer creates and maintains after receiving the bitfield messages of its neighbors. This contains the number of neighboring peers owning each of the pieces of the file and is updated whenever the peer receives a have message from one of its neighbors announcing the availability of a new piece that the neighbor has just downloaded. A peer uses a local rarest-first policy in which it requests the piece that is least replicated. This policy enables fast replication of rarest pieces and ensures that torrent file will not become extinct in case a peer having these pieces leaves.


Previous (2008) Simulation

Experimental Setup

  • First, we get info of all the ASes in the world from Team CYRMU's IP to ASNum Mapping Project using netcat. Using this project, we get all ASes' numbers and their names along with the country they belong to. We filter to derive only CA ASes.
  • CAIDA provides dataset on inter-AS relationships among all ASes. We filter to derive inter-AS relationships within Canada using only CA ASes generated from step 1. This accounts for 375 ASes, 643 links (53 p2p, 5 s2s, 585 c2p).
  • Using these links and ASes, we generate a shortest policy path matrix which contains the shortest paths between any two ASes using the algorithm discussed in [Mao et al].
  • AT & T provides IP Range and AS number mapping. We use this dataset to extract only Canadian IP Ranges and corresponding ASes. We generate 84000 number of IPs belonging to these ASes; 10 each from every IP range – AS mapping.
  • Using IP-Geo database, we identify the latitude and longitude for each of these peers.
    • Maxmind IP-Geo database
  • We test 4 algorithms - AS Order, Geo-location, Random, Network - on a varied size of set of peers {100, 300, 500, 1000, 3000, 5000, 10000, 15000, 20000} randomly picked from 84000 peers generated in step 4. We run all 3 algorithms on the same set of peers and note insertion, lookup, deletion times as well as AS Hops. We repeat this experiment 5 times.
    • For testing of N peers, we take an array of 2N peers. First N peers are used for inserting and deleting. The rest N peers are used for searching. That is, we first insert N peers (from 1 to N), then lookup for next N peers (N+1 to 2N) and then delete N peers (from 1 to N). We note each individual time.
  • The simulation has been implemented in Java and the experiments have been performed on a 2GHz and 2GB RAM machine.

ISP Friendly Algorithm

Steps:

  • We populate IP to AS Map (IPASMap) using the IP Range and AS mapping present in AT & T dataset. This map can be used for searching an AS for a given IP.
  • We then load the pre-computed shortest policy path matrix (ShortestPathMatrix) which contains the shortest distance between every 2 individual ASes.
  • The data structure we use is a HashMap (ASMap) which contains ASNum as the key and the peerlist (the list of peers in that AS) corresponding to that AS.

Ideas

  • Verification of AS Shortest Paths
  • Experimental Setup
    • # Peers: {100, 300, 500, 1000, 3000, 5000, 8000}
    • # Senders: {2, 5, 10, 15, 20}
    • # Sessions: 1000
    • Algos: Random, Network, Geo, AS_Order(ISP-Friendly)
    • Metrics: Min, Avg, Max, Output
    • Output: Hops (Aggregate, P2P, C2P)
  • Search Efficiency:
    • If multiple shortest paths are existing, choose the least "cost" (s2s=0, p2p=1, c2p=2)
  • Check practicality
    • What kind of data we need initially and periodically
    • How often we need them
    • We implement the algorithm to determine AS relationships
  • Large graphs
    • Can we get the US & CA graph?
    • Can we optimize this matrix.
  • Usage of ISP + Network / Geolocation Algorithms
    • We reduce the number of IP hops
  • Diameter of the Canada's AS graph: 13 AS Hops


References and Links

  • IP to ASNum Mapping project from Team CYRMU
  • CAIDA's Inter-AS relationships
  • On AS-level Path Inference
  • AT & T IP-Range and ASNum mapping
  • C. Tian, X. Liu, H. Jiang, W. Liu, and Y. Wang, “Improving BitTorrent traffic performance by exploiting geographic locality,” in IEEE Global Telecommunications Conference (GLOBECOM 2008), 30 2008-Dec. 4 2008, pp. 1–5.
  • B. Liu, Y. Cui, Y. Lu, and Y. Xue, “Locality-awareness in BitTorrent-like P2P applications,” IEEE Transactions on Multimedia, vol. 11, no. 3, pp. 361–371, April 2009.
  • W. Li, S. Chen, and T. Yu, “UTAPS: An underlying topology-aware peer selection algorithm in BitTorrent,” in Proceedings of the 22nd International Conference on Advanced Information Networking and Applications (AINA 2008), March 2008, pp. 539–545.
  • J. Ledlie, P. Gardner, and M. I. Seltzer, “Network coordinates in the wild,” in Proceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation, 2007, pp. 299–311.
  • M. Steiner and E. W. Biersack, “Where is my peer? evaluation of the vivaldi network coordinate system in azureus,” in Work in Progress Proceedings of the 8th International IFIP-TC6 Networking Conference (Networking 2009), Aachen, Germany, May 2009.
  • S. L. Blond, A. Legout, and W. Dabbous, “Pushing BitTorrent locality to the limit,” INRIA, Tech. Rep. INRIA-00343822, Dec. 2008.