Computer Networks
Title: Startup Process and Initial Offset Placement in P2P Live Streaming Systems
Authors: C. Li, C. Chen, and B. Zhang

Summary:
This paper first presents a measurement study on PPLive, which is a popular P2P
live streaming system. The authors use both passive (Ethereal) and active
(crawler) measurements to study the startup process of PPLive. The authors
divide the startup process of PPLive into several stages and statistically
analyze these stages over thousands of experiments. They then describe a
Proportional Placement (PP) model for a new PPLive peer to select the initial
offset, and they fit the measurement data to this model for model parameters.
In the second part of this paper, the authors show that the PP model is stable
if the initial offset is determined by the offset lag, and is unstable if the
initial offset is determined by the buffer width. Last, they identify stable
conditions when the initial offset is determined by buffer width.

Recommendation:
Major revision.

Comments:
I like the analysis of PPLive startup process and the measurement study. The
authors indeed put significant efforts in them. However, I miss real
motivations of this paper. In the first half of this paper, the authors reverse
engineer the PPLive streaming system, and they summarize (in Sec. 2.7) that
PPLive "is very likely" using the presented PP model. In addition, PPLive "also
likely" implements a good peer selection algorithm. These two remarks are not
concrete because the authors should at least give the likelihood of PPLive is
using the PP model and good peer selection algorithm. Not mention that the
definition of "good peer selection" is very vague and is not discussed in this
paper at all. More importantly, based on these two conclusions (in Sec. 2.7),
the PPLive system works just fine if not great. If this is true, what are the
problems the author trying to address? Furthermore, the authors indicate they
are in collaboration with PPLive engineers in various paragraphs. Then, why
bother to reverse engineer the PPLive client, and get partially good results?


In the second half of this paper, the authors claim that their analytical
results can guide engineers to find better design. But it is not clear how
engineers can use the two observations given in Sec. 3.2 to improve their
streaming system, such as PPLive. More importantly, the recommendations are not
evaluated at all: (at least) simulations are required to show that the
recommendations indeed result in better performance, which is not clear in the
current presentation. 


I encourage the authors to properly motivate the problem(s), clearly describe
their contributions, and improve the presentation of this paper. Detailed
comments are listed below.

1) There are some typos and grammar issues, as well as indent problems. Please
proof read the manuscript.

2) In Sec. 2.2, why conduct the ethereal measurement from a single host?  The
results may only be applicable on that computer only. It will be more
convincing if the measurements were conducted at various locations.

3) In Secs. 2.2 and 2.3, the authors conduct two measurements: ethereal and
crawler. The authors then combine the results from both of them for analysis.
It is not clear whether such aggregation is reasonable because, as pointed out
by the authors, the ethereal measurements were done at a single host and the
crawler measurements are global. The authors may want to clarify the
aggregation is legitimate.

4) In the description of Fig. 3 (and for other figures), the authors refer to
the curves by color, which is not a good idea because many readers only have
monochrome printers. Using markers and line types will be much nicer.

5) In page 8, the authors propose two interpolations to guess the $T_{off}$.
Neither of the interpolations make sense to me, and the authors should consider
to justify them. Fig. 4 does not show anything because it only contains the
results from the hypothesized interpolations; the curves in Fig. 4 are close
may actually mean all the interpolations are equally *bad*. I strongly
recommend the authors to compare the interpolation outcomes against the
measurement data collected by ethereal, which can serve as the ground truth.

6) Some notations/functions are used before being defined. For example, in page
10, the scope curve is used before being properly defined.

7) Several figures are not well explained. For example, in page 13, the authors
claim: " It seems more likely that PPLive adopts the PP scheme based on peer
buffer width since $\alpha_W$ shown in Fig. 9 has sharper distribution." But
the authors never mention/explain what they plot in Fig. 9, which prevent many
readers from getting any message out of Fig. 9 (some other figures have the
same issue).

8) Sec. 2.6, what is "the availability problem"? Please define it.

9) I'm not sure what is presented in Fig. 10. Is it a procedure, an algorithm,
or a protocol. The caption and description are not consistent. More details
need to be added to help readers to understand the pseudocode.

10) In Sec. 3, quantitatively evaluating the proposed schemes for choosing
initial offset is required. This can be done using analysis, such as proving
its optimality, or using simulations, such as showing the performance gain in a
P2P network. In addition, the purpose of each lemma/claim must be clear.
Several lemmas/claims in Sec. 3 are not well elaborated and thus cannot be used
by the research community and engineers. I really look forward to see a list of
recommendations that can be readily adopted by real systems. I believe the
materials are there, but are not presented in the best way.