Computer Networks Title: Startup Process and Initial Offset Placement in P2P Live Streaming Systems Authors: C. Li, C. Chen, and B. Zhang Summary: This paper first presents a measurement study on PPLive, which is a popular P2P live streaming system. The authors use both passive (Ethereal) and active (crawler) measurements to study the startup process of PPLive. The authors divide the startup process of PPLive into several stages and statistically analyze these stages over thousands of experiments. They then describe a Proportional Placement (PP) model for a new PPLive peer to select the initial offset, and they fit the measurement data to this model for model parameters. In the second part of this paper, the authors show that the PP model is stable if the initial offset is determined by the offset lag, and is unstable if the initial offset is determined by the buffer width. Last, they identify stable conditions when the initial offset is determined by buffer width. Recommendation: Major revision. Comments: I like the analysis of PPLive startup process and the measurement study. The authors indeed put significant efforts in them. However, I miss real motivations of this paper. In the first half of this paper, the authors reverse engineer the PPLive streaming system, and they summarize (in Sec. 2.7) that PPLive "is very likely" using the presented PP model. In addition, PPLive "also likely" implements a good peer selection algorithm. These two remarks are not concrete because the authors should at least give the likelihood of PPLive is using the PP model and good peer selection algorithm. Not mention that the definition of "good peer selection" is very vague and is not discussed in this paper at all. More importantly, based on these two conclusions (in Sec. 2.7), the PPLive system works just fine if not great. If this is true, what are the problems the author trying to address? Furthermore, the authors indicate they are in collaboration with PPLive engineers in various paragraphs. Then, why bother to reverse engineer the PPLive client, and get partially good results? In the second half of this paper, the authors claim that their analytical results can guide engineers to find better design. But it is not clear how engineers can use the two observations given in Sec. 3.2 to improve their streaming system, such as PPLive. More importantly, the recommendations are not evaluated at all: (at least) simulations are required to show that the recommendations indeed result in better performance, which is not clear in the current presentation. I encourage the authors to properly motivate the problem(s), clearly describe their contributions, and improve the presentation of this paper. Detailed comments are listed below. 1) There are some typos and grammar issues, as well as indent problems. Please proof read the manuscript. 2) In Sec. 2.2, why conduct the ethereal measurement from a single host? The results may only be applicable on that computer only. It will be more convincing if the measurements were conducted at various locations. 3) In Secs. 2.2 and 2.3, the authors conduct two measurements: ethereal and crawler. The authors then combine the results from both of them for analysis. It is not clear whether such aggregation is reasonable because, as pointed out by the authors, the ethereal measurements were done at a single host and the crawler measurements are global. The authors may want to clarify the aggregation is legitimate. 4) In the description of Fig. 3 (and for other figures), the authors refer to the curves by color, which is not a good idea because many readers only have monochrome printers. Using markers and line types will be much nicer. 5) In page 8, the authors propose two interpolations to guess the $T_{off}$. Neither of the interpolations make sense to me, and the authors should consider to justify them. Fig. 4 does not show anything because it only contains the results from the hypothesized interpolations; the curves in Fig. 4 are close may actually mean all the interpolations are equally *bad*. I strongly recommend the authors to compare the interpolation outcomes against the measurement data collected by ethereal, which can serve as the ground truth. 6) Some notations/functions are used before being defined. For example, in page 10, the scope curve is used before being properly defined. 7) Several figures are not well explained. For example, in page 13, the authors claim: " It seems more likely that PPLive adopts the PP scheme based on peer buffer width since $\alpha_W$ shown in Fig. 9 has sharper distribution." But the authors never mention/explain what they plot in Fig. 9, which prevent many readers from getting any message out of Fig. 9 (some other figures have the same issue). 8) Sec. 2.6, what is "the availability problem"? Please define it. 9) I'm not sure what is presented in Fig. 10. Is it a procedure, an algorithm, or a protocol. The caption and description are not consistent. More details need to be added to help readers to understand the pseudocode. 10) In Sec. 3, quantitatively evaluating the proposed schemes for choosing initial offset is required. This can be done using analysis, such as proving its optimality, or using simulations, such as showing the performance gain in a P2P network. In addition, the purpose of each lemma/claim must be clear. Several lemmas/claims in Sec. 3 are not well elaborated and thus cannot be used by the research community and engineers. I really look forward to see a list of recommendations that can be readily adopted by real systems. I believe the materials are there, but are not presented in the best way.