Private:pCDN:Systems Issues

From NMSL

This page describes all the system issues we have encountered when developing pCDN system.

Dynamic IPs and Ports

We use heartbeat messages to maintain the client states on the server. That is, whenever the server have not seen any heartbeat messages from a particular client, the server removes that client from the connection table, and invalidates all its contents. In addition, we use a client's IP addresses and port numbers (both private and public) as its identifier. This combination causes several issues.

Consider a client behind a NAT box that times out UDP port mapping every <math>t_n</math> seconds. If we send heartbeat messages less frequent than <math>t_n</math>, the NAT box may remove our port mapping at time <math>t_n</math>. Later on, say at time <math>t_s</math> where <math>t_s > t_n</math>, the same client sends another message to the server. This message implicitly creates a different port mapping on the NAT box. Depending on NAT implementations, the new port mapping may not be identical to the previous one. In this case, the server cannot properly identify the existing client, and will treat it as a new client with no content. The server assumes that client has no content because only Join/Leave messages carry the list of available contents. The server then add this new client (with no content) into the connection table. A few minutes later, the client with the old NAT port mapping gets timed out and removed. Then, the server loses track of the contents that are actually available at that client.

There are two trivial solutions. First, we can ask clients with unknown socket addresses to rejoin our network. However, this will trigger these clients to resend all their contents that could occupy significant portion of server bandwidth. Second, we can send the heartbeat messages more often to avoid the port mapping on NAT boxed from being timed out. Unfortunately, this may also increase the network load.

To better resolve this, we propose two mechanisms: permanent Id and adaptive heartbeat interval.

  • Permanent Id: Clients should maintain a unique and permanent Id. We achieve the uniqueness using the MD5 hash. We store the Id in the client's configuration file for permanency. We should note that the configuration file of a fresh installed client should contain no Id. This can be implemented by reserving an Id as the Initial Id, such as all 0xFF. When a client starts up, it loads its Id from the configuration file. The client generates a new Id if its configuration file contains the Initial Id. The new Id is created by hashing the first MAC address of that client machine. If this fails, we hash a random number and use it as the new Id. All messages from clients to the server should contain client's Id, which enables the server to determine whether there is an IP/port change. If the server notices an IP/port change, it performs two taks. First, the server should send a JoinAck message to the client, so that the client can update it's public IP address and port number. Second, the server updates its connection and content tables.
  • Adaptive HeartBeat Interval: There are two alternatives to make heartbeat interval adaptive to NAT box implementation.
    • Alternative #1: We distinguish clients behind and not behind NATs. We set the heartbeat interval <math>t_s</math> to be 30 mins, and maximal retries to be 3 times for clients that are not behind NATs. We use an adaptive heartbeat interval for clients behind NATs. This heartbeat interval <math>t_s</math> is initialized to be 2 mins, and is reduced by 5 secs whenever the server notices an IP/port change. <math>t_s</math> should never go below 5 secs.
    • Alternative #2: We create two types of heartbeat messages: for maintaining client states at the server and for maintaining UDP states at the NAT box. We call the these two types of heartbeat messages as: type-S and type-N heartbeat messages, respectively. The type-S heartbeat messages are sent at very low frequency: at 30 mins interval with 3 retries. Unlike Alternative #1, even clients behind NAT send type-S heartbeat at this low rate. Both the server and clients send type-S heartbeat messages. The type-N heartbeat messages are sent only be clients at very high frequency: potentially once every second. This is to refresh the port mapping on NAT boxes. To avoid the message flooding, the type-N heartbeats should have a small TTL value, say <math>n</math>. Choose proper <math>n</math> value is not easy, considering there might be several chained NAT boxes between clients and the server. A feasible <math>n</math> value should fall between the hop count to the outer-most NAT box and the hop count to the server. An optimal <math>n</math> should be as close to the hop count to the outer-most NAT box as possible. We set <math> n = 3 </math> as the initial TTL value. We increase <math>n</math> by 1 if the server observes an IP/port change, which indicates our port mapping still gets timed out because the type-N heartbeats did not go far enough. We decrease <math>n</math> by 1 if the server receives a type-N heartbeat, which indicates type-N heartbeats are flooding our server. We set <math>n=0</math> if we observe the <math>n</math> fluctuates for 3 times. Clients with a zero <math>n</math> do not send type-N heartbeats.

Streaming Friendly Scheduling

NAT Traversal