Difference between revisions of "Private:pCDN:Peer Matching"

From NMSL
 
(25 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 +
== Important Dates ==
 +
* ACM MM: April 17, 2009
 +
* ACM MM (short): May 8, 2009
  
  
== Proposed Algorithm ==
+
== Brief Description of BitTorrent ==
 +
In BitTorrent (BT), a content file is split into several '''pieces''' (typically 256KB in size). The pieces are sub-divided further into what are called '''blocks''' - typically 16K. A separate file, called the '''torrent''' file, containing some metadata about the original file is then created by the content provider and uploaded to a central server called the '''tracker'''. The tracker keeps track of all peers participating in distributing the file. The metadata  contained within the torrent file include: the ''address of the tracker'', the chosen ''piece size'', and an ''SHA-1 hash for each piece''.
  
 +
Peers wishing to download the file first obtain the torrent file by downloading it from a Web server via the Hyper-Text Transfer Protocol (HTTP). The client running on the peer then contacts the tracker to obtain a list of peers to start downloading the original content file from. The list returned by the tracker returns ''k'' number of peers (default is 50) which are chosen at random to the inquiring peer. After receiving this list, the peer attempts to connect with the majority of them and exchanges '''bitfield''' messages with them during the ''handshaking'' process. These messages advertise the availability information for the pieces that a peer already owns.
  
 +
If a neighbor has a piece that the peer needs, the peer informs the neighbor with an '''interested''' message. However, the neighbor is not obliged to send the piece to the requesting peer immediately. A peer decides which of its interested neighbors it should send data to based on a so called '''choking/unchoking''' mechanism which executes every 10 seconds. When a neighbor is choked, the peer will refuse to send pieces to it. An '''unchoke''' message is sent to a small number of neighbors (default is 4) to inform them of the willingness of the peer to send the pieces that they are interested in. The decision of which neighbors to unchoke is based on a '''tit-for-tat policy''' in which the peer unchokes those neighbors which have given data to it at a high rate. This policy enables BT to deter free riding (i.e. selfish peers choosing only to download without contributing). In addition, a fifth neighbor is randomly unchoked every 30 seconds (i.e. every three cycles of the chocking/unchoking process), this is called '''optimistic unchoking'''. Optimistic unchoking is important in order to allow brand new peers (that do not have anything to contribute) to bootstrap.
  
== Ideas ==
+
When a peer gets unchoked, it has to decide which of the pieces that are available at the unchoking neighbor (and that it does not currently have) to request. This decision is based on an '''interest array''' that the peer creates and maintains after receiving the bitfield messages of its neighbors. This contains the number of neighboring peers owning each of the pieces of the file and is updated whenever the peer receives a have message from one of its neighbors announcing the availability of a new piece that the neighbor has just downloaded. A peer uses a '''local rarest-first''' policy in which it requests the piece that is least replicated. This policy enables fast replication of rarest pieces and ensures that torrent file will not become extinct in case a peer having these pieces leaves.
 +
 
 +
If the client has not received anything after a certain period (default: 60 seconds), it marks a connection as '''snubbed''', in that the peer on the other end has chosen not to send. See the definition of choked for reasons why an uploader might mark a connection as choked. The real function of keeping track of this variable is to improve download speeds. Occasionally the client will find itself in a state where even though it is connected to many peers, it is choked by all of them. The client uses the snubbed flag in an attempt to prevent this situation. It notes that a peer with whom it would like to trade pieces with has not sent anything in a while, and rather than leaving it up to the optimistic choking to eventually select that peer, it instead reserves one of its upload slots for sending to that peer.
 +
 
 +
The group of users that are collectively connected for a particular file are called the '''swarm'''. Example, if you start a BitTorrent client and it tells you that you're connected to 5 peers and 1 seeds, then the swarm consists of you and those 6 other people.
 +
 
 +
When a download is almost complete, there's a tendency for the last few blocks to trickle in slowly. To speed this up, the client sends requests for all of its missing blocks to all of its peers. This is called the '''end-game mode'''. To keep this from becoming horribly inefficient, the client also sends a cancel to everyone else every time a block arrives. When to enter end game mode is an area of discussion. Some clients enter end game when all pieces have been requested.
 +
 
 +
Occasionally a BitTorrent peer will be choked by all peers which it was formerly downloading from. In such cases it will usually continue to get poor download rates until the optimistic unchoke finds better peers. To mitigate this problem, when over a minute goes by without getting any piece data while downloading from a peer, BitTorrent assumes it is "snubbed" by that peer and doesn't upload to it except as an optimistic unchoke. This frequently results in more than one concurrent optimistic unchoke, (an exception to the exactly one optimistic unchoke rule mentioned above), which causes download rates to recover much more quickly when they falter.
 +
 
 +
 
 +
== Previous (2008) Simulation==
 +
=== Experimental Setup===
 +
* First, we get info of all the ASes in the world from Team CYRMU's IP to ASNum Mapping Project using netcat. Using this project, we get all ASes' numbers and their names along with the country they belong to. We filter to derive only CA ASes.
 +
** http://www.team-cymru.org/?sec=8&opt=26
 +
* CAIDA provides dataset on inter-AS relationships among all ASes. We filter to derive inter-AS relationships within Canada using only CA ASes generated from step 1. This accounts for 375 ASes, 643 links (53 p2p, 5 s2s, 585 c2p).
 +
** http://as-rank.caida.org/data/2008/as-rel.20080414.a0.01000.txt
 +
* Using these links and ASes, we generate a shortest policy path matrix which contains the shortest paths between any two ASes using the algorithm discussed in [Mao et al].
 +
** http://www.eecs.umich.edu/~zmao/Papers/routescope.pdf
 +
** To test whether the distances computed are correct, we verified with Rogers BGP site and Teleglobe
 +
*** https://supernoc.rogerstelecom.net/ops/ and
 +
*** http://lg.as6453.net/bin/lg.cgi
 +
* AT & T provides IP Range and AS number mapping. We use this dataset to extract only Canadian IP Ranges and corresponding ASes. We generate 84000 number of IPs belonging to these ASes; 10 each from every IP range – AS mapping.
 +
** http://www.research.att.com/~jiawang/as_traceroute/
 +
* Using IP-Geo database, we identify the latitude and longitude for each of these peers.
 +
** Maxmind IP-Geo database
 +
* We test 4 algorithms - AS Order, Geo-location, Random, Network - on a varied size of set of peers {100, 300, 500, 1000, 3000, 5000, 10000, 15000, 20000} randomly picked from 84000 peers generated in step 4. We run all 3 algorithms on the same set of peers and note insertion, lookup, deletion times as well as AS Hops. We repeat this experiment 5 times.
 +
** For testing of N peers, we take an array of 2N peers. First N peers are used for inserting and deleting. The rest N peers are used for searching. That is, we first insert N peers (from 1 to N), then lookup for next N peers (N+1 to 2N) and then delete N peers (from 1 to N). We note each individual time.
 +
* The simulation has been implemented in Java and the experiments have been performed on a 2GHz and 2GB RAM machine.
 +
 
 +
===ISP Friendly Algorithm===
 +
Steps:
 +
* We populate IP to AS Map (IPASMap) using the IP Range and AS mapping present in AT & T dataset. This map can be used for searching an AS for a given IP.
 +
 
 +
* We then load the pre-computed shortest policy path matrix (ShortestPathMatrix) which contains the shortest distance between every 2 individual ASes.
 +
 
 +
* The data structure we use is a HashMap (ASMap) which contains ASNum as the key and the peerlist (the list of peers in that AS) corresponding to that AS.
 +
 
 +
===Ideas===
  
 
* Verification of AS Shortest Paths
 
* Verification of AS Shortest Paths
Line 32: Line 76:
 
** We reduce the number of IP hops
 
** We reduce the number of IP hops
  
* Diameter of the graph
+
* Diameter of the Canada's AS graph: 13 AS Hops
 +
 
  
== References and Links ==
+
==References and Links==
  
 
* IP to ASNum Mapping project from Team CYRMU
 
* IP to ASNum Mapping project from Team CYRMU
Line 44: Line 89:
 
* AT & T IP-Range and ASNum mapping
 
* AT & T IP-Range and ASNum mapping
 
** a. http://www.research.att.com/~jiawang/as_traceroute/
 
** a. http://www.research.att.com/~jiawang/as_traceroute/
 +
* Azureus User Guide
 +
** http://azureuswiki.com/index.php/User_Guide
 +
* Explanation of some of the funny words used in BitTorrent
 +
** http://azureuswiki.com/index.php/This_funny_word
 +
* [http://mm.aueb.gr/index.php?option=com_content&view=article&id=53:bittorrent&catid=49:software&Itemid=94 BitTorrent for the OMNeT++ Simulation Platform]
 +
* C. Tian, X. Liu, H. Jiang, W. Liu, and Y. Wang, “Improving BitTorrent traffic performance by exploiting geographic locality,” in IEEE Global Telecommunications Conference (GLOBECOM 2008), 30 2008-Dec. 4 2008, pp. 1–5.
 +
* B. Liu, Y. Cui, Y. Lu, and Y. Xue, “Locality-awareness in BitTorrent-like P2P applications,” IEEE Transactions on Multimedia, vol. 11, no. 3, pp. 361–371, April 2009.
 +
* W. Li, S. Chen, and T. Yu, “UTAPS: An underlying topology-aware peer selection algorithm in BitTorrent,” in Proceedings of the 22nd International Conference on Advanced Information Networking and Applications (AINA 2008), March 2008, pp. 539–545.
 +
* J. Ledlie, P. Gardner, and M. I. Seltzer, “Network coordinates in the wild,” in Proceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation, 2007, pp. 299–311.
 +
* Y. Chen, Y. Xiong, X. Shi, B. Deng, and X. Li, "Pharos: A Decentralized and Hierarchical Network Coordinate System for Internet Distance Prediction," in Proceedings of the IEEE Global Telecommunications Conference, Nov. 2007, pp. 421-426.
 +
* M. Steiner and E. W. Biersack, “Where is my peer? Evaluation of the Vivaldi network coordinate system in Azureus,” in Work in Progress Proceedings of the 8th International IFIP-TC6 Networking Conference (Networking 2009), Aachen, Germany, May 2009.
 +
* S. L. Blond, A. Legout, and W. Dabbous, “Pushing BitTorrent locality to the limit,” INRIA, Tech. Rep. INRIA-00343822, Dec. 2008.
 +
* Z. Chen, H. Yin, C. Lin, F. Wu, and X. Liu, "Towards a Universal Friendly P2P Media Streaming Application: An Evaluation Framework," LNCS 5353, Springer, 2008.
 +
* B. Liu, Y. Lu, Y. Cui, and Y. Xue, "A Measurement Study on AS-aware P2P Streaming Strategies," in Proceedings of the 3rd International Conference on Communications and Networking, China, 2008.

Latest revision as of 17:54, 6 August 2010

Important Dates

  • ACM MM: April 17, 2009
  • ACM MM (short): May 8, 2009


Brief Description of BitTorrent

In BitTorrent (BT), a content file is split into several pieces (typically 256KB in size). The pieces are sub-divided further into what are called blocks - typically 16K. A separate file, called the torrent file, containing some metadata about the original file is then created by the content provider and uploaded to a central server called the tracker. The tracker keeps track of all peers participating in distributing the file. The metadata contained within the torrent file include: the address of the tracker, the chosen piece size, and an SHA-1 hash for each piece.

Peers wishing to download the file first obtain the torrent file by downloading it from a Web server via the Hyper-Text Transfer Protocol (HTTP). The client running on the peer then contacts the tracker to obtain a list of peers to start downloading the original content file from. The list returned by the tracker returns k number of peers (default is 50) which are chosen at random to the inquiring peer. After receiving this list, the peer attempts to connect with the majority of them and exchanges bitfield messages with them during the handshaking process. These messages advertise the availability information for the pieces that a peer already owns.

If a neighbor has a piece that the peer needs, the peer informs the neighbor with an interested message. However, the neighbor is not obliged to send the piece to the requesting peer immediately. A peer decides which of its interested neighbors it should send data to based on a so called choking/unchoking mechanism which executes every 10 seconds. When a neighbor is choked, the peer will refuse to send pieces to it. An unchoke message is sent to a small number of neighbors (default is 4) to inform them of the willingness of the peer to send the pieces that they are interested in. The decision of which neighbors to unchoke is based on a tit-for-tat policy in which the peer unchokes those neighbors which have given data to it at a high rate. This policy enables BT to deter free riding (i.e. selfish peers choosing only to download without contributing). In addition, a fifth neighbor is randomly unchoked every 30 seconds (i.e. every three cycles of the chocking/unchoking process), this is called optimistic unchoking. Optimistic unchoking is important in order to allow brand new peers (that do not have anything to contribute) to bootstrap.

When a peer gets unchoked, it has to decide which of the pieces that are available at the unchoking neighbor (and that it does not currently have) to request. This decision is based on an interest array that the peer creates and maintains after receiving the bitfield messages of its neighbors. This contains the number of neighboring peers owning each of the pieces of the file and is updated whenever the peer receives a have message from one of its neighbors announcing the availability of a new piece that the neighbor has just downloaded. A peer uses a local rarest-first policy in which it requests the piece that is least replicated. This policy enables fast replication of rarest pieces and ensures that torrent file will not become extinct in case a peer having these pieces leaves.

If the client has not received anything after a certain period (default: 60 seconds), it marks a connection as snubbed, in that the peer on the other end has chosen not to send. See the definition of choked for reasons why an uploader might mark a connection as choked. The real function of keeping track of this variable is to improve download speeds. Occasionally the client will find itself in a state where even though it is connected to many peers, it is choked by all of them. The client uses the snubbed flag in an attempt to prevent this situation. It notes that a peer with whom it would like to trade pieces with has not sent anything in a while, and rather than leaving it up to the optimistic choking to eventually select that peer, it instead reserves one of its upload slots for sending to that peer.

The group of users that are collectively connected for a particular file are called the swarm. Example, if you start a BitTorrent client and it tells you that you're connected to 5 peers and 1 seeds, then the swarm consists of you and those 6 other people.

When a download is almost complete, there's a tendency for the last few blocks to trickle in slowly. To speed this up, the client sends requests for all of its missing blocks to all of its peers. This is called the end-game mode. To keep this from becoming horribly inefficient, the client also sends a cancel to everyone else every time a block arrives. When to enter end game mode is an area of discussion. Some clients enter end game when all pieces have been requested.

Occasionally a BitTorrent peer will be choked by all peers which it was formerly downloading from. In such cases it will usually continue to get poor download rates until the optimistic unchoke finds better peers. To mitigate this problem, when over a minute goes by without getting any piece data while downloading from a peer, BitTorrent assumes it is "snubbed" by that peer and doesn't upload to it except as an optimistic unchoke. This frequently results in more than one concurrent optimistic unchoke, (an exception to the exactly one optimistic unchoke rule mentioned above), which causes download rates to recover much more quickly when they falter.


Previous (2008) Simulation

Experimental Setup

  • First, we get info of all the ASes in the world from Team CYRMU's IP to ASNum Mapping Project using netcat. Using this project, we get all ASes' numbers and their names along with the country they belong to. We filter to derive only CA ASes.
  • CAIDA provides dataset on inter-AS relationships among all ASes. We filter to derive inter-AS relationships within Canada using only CA ASes generated from step 1. This accounts for 375 ASes, 643 links (53 p2p, 5 s2s, 585 c2p).
  • Using these links and ASes, we generate a shortest policy path matrix which contains the shortest paths between any two ASes using the algorithm discussed in [Mao et al].
  • AT & T provides IP Range and AS number mapping. We use this dataset to extract only Canadian IP Ranges and corresponding ASes. We generate 84000 number of IPs belonging to these ASes; 10 each from every IP range – AS mapping.
  • Using IP-Geo database, we identify the latitude and longitude for each of these peers.
    • Maxmind IP-Geo database
  • We test 4 algorithms - AS Order, Geo-location, Random, Network - on a varied size of set of peers {100, 300, 500, 1000, 3000, 5000, 10000, 15000, 20000} randomly picked from 84000 peers generated in step 4. We run all 3 algorithms on the same set of peers and note insertion, lookup, deletion times as well as AS Hops. We repeat this experiment 5 times.
    • For testing of N peers, we take an array of 2N peers. First N peers are used for inserting and deleting. The rest N peers are used for searching. That is, we first insert N peers (from 1 to N), then lookup for next N peers (N+1 to 2N) and then delete N peers (from 1 to N). We note each individual time.
  • The simulation has been implemented in Java and the experiments have been performed on a 2GHz and 2GB RAM machine.

ISP Friendly Algorithm

Steps:

  • We populate IP to AS Map (IPASMap) using the IP Range and AS mapping present in AT & T dataset. This map can be used for searching an AS for a given IP.
  • We then load the pre-computed shortest policy path matrix (ShortestPathMatrix) which contains the shortest distance between every 2 individual ASes.
  • The data structure we use is a HashMap (ASMap) which contains ASNum as the key and the peerlist (the list of peers in that AS) corresponding to that AS.

Ideas

  • Verification of AS Shortest Paths
  • Experimental Setup
    • # Peers: {100, 300, 500, 1000, 3000, 5000, 8000}
    • # Senders: {2, 5, 10, 15, 20}
    • # Sessions: 1000
    • Algos: Random, Network, Geo, AS_Order(ISP-Friendly)
    • Metrics: Min, Avg, Max, Output
    • Output: Hops (Aggregate, P2P, C2P)
  • Search Efficiency:
    • If multiple shortest paths are existing, choose the least "cost" (s2s=0, p2p=1, c2p=2)
  • Check practicality
    • What kind of data we need initially and periodically
    • How often we need them
    • We implement the algorithm to determine AS relationships
  • Large graphs
    • Can we get the US & CA graph?
    • Can we optimize this matrix.
  • Usage of ISP + Network / Geolocation Algorithms
    • We reduce the number of IP hops
  • Diameter of the Canada's AS graph: 13 AS Hops


References and Links

  • IP to ASNum Mapping project from Team CYRMU
  • CAIDA's Inter-AS relationships
  • On AS-level Path Inference
  • AT & T IP-Range and ASNum mapping
  • Azureus User Guide
  • Explanation of some of the funny words used in BitTorrent
  • BitTorrent for the OMNeT++ Simulation Platform
  • C. Tian, X. Liu, H. Jiang, W. Liu, and Y. Wang, “Improving BitTorrent traffic performance by exploiting geographic locality,” in IEEE Global Telecommunications Conference (GLOBECOM 2008), 30 2008-Dec. 4 2008, pp. 1–5.
  • B. Liu, Y. Cui, Y. Lu, and Y. Xue, “Locality-awareness in BitTorrent-like P2P applications,” IEEE Transactions on Multimedia, vol. 11, no. 3, pp. 361–371, April 2009.
  • W. Li, S. Chen, and T. Yu, “UTAPS: An underlying topology-aware peer selection algorithm in BitTorrent,” in Proceedings of the 22nd International Conference on Advanced Information Networking and Applications (AINA 2008), March 2008, pp. 539–545.
  • J. Ledlie, P. Gardner, and M. I. Seltzer, “Network coordinates in the wild,” in Proceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation, 2007, pp. 299–311.
  • Y. Chen, Y. Xiong, X. Shi, B. Deng, and X. Li, "Pharos: A Decentralized and Hierarchical Network Coordinate System for Internet Distance Prediction," in Proceedings of the IEEE Global Telecommunications Conference, Nov. 2007, pp. 421-426.
  • M. Steiner and E. W. Biersack, “Where is my peer? Evaluation of the Vivaldi network coordinate system in Azureus,” in Work in Progress Proceedings of the 8th International IFIP-TC6 Networking Conference (Networking 2009), Aachen, Germany, May 2009.
  • S. L. Blond, A. Legout, and W. Dabbous, “Pushing BitTorrent locality to the limit,” INRIA, Tech. Rep. INRIA-00343822, Dec. 2008.
  • Z. Chen, H. Yin, C. Lin, F. Wu, and X. Liu, "Towards a Universal Friendly P2P Media Streaming Application: An Evaluation Framework," LNCS 5353, Springer, 2008.
  • B. Liu, Y. Lu, Y. Cui, and Y. Xue, "A Measurement Study on AS-aware P2P Streaming Strategies," in Proceedings of the 3rd International Conference on Communications and Networking, China, 2008.