Review 1 Reviewer Originality Technical Merit Readability Relevance Reviewer Confidence Top 25%, but not top 10% (3) Top 10% (4) Top 10% (4) Top 10% (4) Informed outsider (2) Short summary assessment (What are the main contributions of this paper? Do you consider the issues addressed important and/or interesting? Comment on novelty, creativity, impact and technical depth(1-5 sentences)) The paper proposes two fully decentralized algorithms to solve the Distributed Rate Limiting problem. This mechanism can be utilized to control the aggregate bandwidth in cloud-based services. The authors address a tough issue through a rigorous methodology that improves the previous empirical approach. Strengths: (What are the most important reasons to accept this paper? (1-3 sentences)) The topic is relevant. There is a clear improvement with respect to the state of the art. The formalization of the problem and the mathematical demonstrations of convergence and stability properties are sound. Weaknesses: (What are the most important reasons NOT to accept this paper? (1-3 sentences).) None Detailed Comment to the Authors (Please provide detailed comments that can be used by the authors to improve this paper) This is a good paper that does not require many additional efforts. There is some strong assumptions for a context of a geographically distributed system, such as - synchronous updates of local-limiter capacity information, - no loss of information. I would appreciate a discussion on the effects of communication delays and node failures (or temporary unreachable) on the stability of the proposed algorithms. The implementation issues would require a more detailed analysis. The curves in the figures must be re-plotted because they are appreciable only if color printed. The text requires some corrections. There are too many footnotes. Review 2 Reviewer Originality Technical Merit Readability Relevance Reviewer Confidence Top 10% (4) Top 10% (4) Top 10% (4) Top 10% (4) Fairly confident, well-versed in this area (3) Short summary assessment (What are the main contributions of this paper? Do you consider the issues addressed important and/or interesting? Comment on novelty, creativity, impact and technical depth(1-5 sentences)) The paper presents two algorithms that achieve emulation of a centralized best-effort and processor sharing algos in a distributed manner. The results are theoretically proven with convergence properties, and are backed up with simulation results. These algorithms are important for distributed services and systems. Strengths: (What are the most important reasons to accept this paper? (1-3 sentences)) The alorithms presented in the paper achieve good properties in a distributed manner that approach the properties of a centralized scheduler. This is useful to achieve in today's distributed service environments. The theoretical results seem to be correct, and are also backed up by simulation results. The paper is well-written and results well-explained intuitively. Weaknesses: (What are the most important reasons NOT to accept this paper? (1-3 sentences).) Some of the asaumptions in the simulation based on theoretical assumptions could have been relaxed. A real implementation even if at a small scale would have been more convincing, esp. regarding the asynchronous update issue, and in terms of overhead. Detailed Comment to the Authors (Please provide detailed comments that can be used by the authors to improve this paper) The results seem to be interesting and useful for distributed rate control. The theoretical results seem to be correct, and the simulation results also seem to back up the main claims. While the results do hold up for the assumptions made in the theoretical framework, it would have been nice to relax some of the assumptions in the results to see how well they hold up. In particular, many of the issues discussed in Sec 5 should have been added to the simulations. I found the discussion on node failuers a little too simplistic. What is the guarantee that a best friend can pick up the slack capacity of a failed node? Also, while not required for this paper, a real implementation would be useful as well. There are several typos in the paper that need to be fixed: - Pg 1: enterprizes->enterprises - Sec 2: "We wise"? - Sec 3, Eqn at bottom of Pg 7: RHS should be \sum_{s=1}... - Comment 5: "since it that case" -> "since in that case" - In 4.1, Table references to Tbl 1 and 2 appear as Tbl 4.1. - Pg 11: "drown"->"drawn" Review 3 Reviewer Originality Technical Merit Readability Relevance Reviewer Confidence Top 50%, but not top 25% (2) Top 10% (4) Top 50%, but not top 25% (2) Top 10% (4) Informed outsider (2) Short summary assessment (What are the main contributions of this paper? Do you consider the issues addressed important and/or interesting? Comment on novelty, creativity, impact and technical depth(1-5 sentences)) The paper presents two algorithms, termed C3P and D2R2, to control network usage in cloud-based systems. The paper shows that existing algorithms (GRD/FPS) recently proposed at SIGCOMM07 can be significantly improved while preserving simplicity and yet adding analytical tractability and provable convergence. Compared to previous work, numerical results for C3P and D2R2 show quite similar performance, but much reduced loss rate (C3P) or much better fairness (D2R2). Overall, the paper makes a convincing, technical and yet practical contribution that would be valuable for SIGMETRICS 2008. Strengths: (What are the most important reasons to accept this paper? (1-3 sentences)) - The proposed methods are analytically tractable, practical and easy to implement - The results on loss rate and fairness are quite good compared to GRD/FPS - Important technical issues such as stability/convergence are addressed Weaknesses: (What are the most important reasons NOT to accept this paper? (1-3 sentences).) - The performance of C3P/D2R2 does not increase too much compared to GRD/FPS - The experimental part is quite small: representativeness of experiments is claimed but not proved (e.g., with sensitivity analyses or random experiments) - The paper requires some effort to read Detailed Comment to the Authors (Please provide detailed comments that can be used by the authors to improve this paper) The paper has several merits and some demerits, but overall is a good technical contribution and I believe has sufficient merit for acceptance. From one side, it is nice to have a work on a topic (cloud-based services) that is new and emerging and still lacks of analytical evaluations as those provided in the paper. On the negative side, there is a certain fear in the reviewer that the author is applying some standard distributed algorithm to a problem that is fancy, but not technically different from other analyses of distributed environments, I think you should better pointing out the aspect of novelty of your technical part with respect to existing schemes for usage control in distributed/nertwork environments. These doubts motivate the low score I placed on the originality, where I also accounted the fact that the authors are starting from a previous comprehensive analysis in [24], and therefore do not seem to provide a completely original problem as it is evident in the fact that C3P and D2R2 (which recall well-known "star wars" names, but this is nice) seem to be correspective of GRD and FPS, respectively. The techincal part is well-done and it is nice to see how a simple algorithms yields the nice technical derivations in the main theorems. The authors had the good idea of keeping complexity at the minimum level to preserve analytical tractability and interest. As a final remark, the validation shows clearly that C3P and D2R2 are better than GRD and FPS in terms of loss rates or fairness, but perform quite similarly. The numerical experiments seem the weak part of the paper, since limited to really few experiments and, as I explain later in my review, still lacking of certain information on the parameters. I would have preferred to see more experiments and a sensitivity analysis with respect to the model parameters, but overall I feel quite convinced by your validation and I believe that, despite this important lacks, the paper is acceptable for publication. Detailed comments on the individual sections are given below. **Detailed remarks * Section 1 The introduction has the merit of pointing out clearly the scope and the practical relevance of the work, but in my opinion does not fully do its task of introducing to the technical part. Here I personally feel the lack of a figure that sketches the structure of a cloud-based service and of the DRL mechanisms. Further, the fairness postulate expressed in this way where does it come from? Cite [24] if you took it from there. The subsection 1.1 is not very well written as well, a table with a summary of notation would help, further many things are only partially defined. Just to make some examples: - {\cal F}_i is a "flow population" which is unclear what is it, is it a number of jobs, or a rate, or what else? - C_i is a portion of the aggregate bandwidth, but it is introduced as a "level" which is too generic. This does not help the reader. - u_s^(i) I may guess that is a demand, but is not defined precisely I frankly suggest the authors to fix, possibly rewrite, this subsection in a much more careful way. My feeling as a reader is that you put a barrier here to the real understanding of the model. Equation (3) seem to come out of nowhere as well, please explain better why (2) would imply (3). The footnote 3 is not completely clarifying, why you should name a maximum throughput "fair-share"? Section 1.2 is fine to me. In Section 1.3 it is unclear if 50 limiters is a number that may be reached in practice, is a limiter uniquely associate with each single farm? Can you point to data or papers which clearly indicate that real infrastructures can need much more than 50 limiters? *Section 2 This is a pretty nice section and it is rather interesting to see how C3P develops very well in formulas. I have two recommendations here: first you should better explain the role and the optimal choice of the parameters \eta . Theorem 1 is claryfying, but it comes too late in my opinion, you should first hint at the meaning of this paramters early in the section. The second passage in the proof of Lemma 1 is also not very clear, I have the impression that you are basically saying that all contributions over the different arcs of the graph eventually sum to zero, but the way you remove the summation on i is not very clear. Please add a comment. The proof of Theorem 1 is really nice. As a minor comment, I would first define lambda as lambda(t)=M(t)-p_i(t) and also carry on the lambda(t) notation instead of lambda which may raise the suspect that is a constant. Can you also provide a proper reference for Comment 1? This may be an easy comment but it is not immediately evident to a non-technical reader. As a comment that holds for this section but also for the rest of the paper, I frequently noted a difficulty in matching names and verbs.. e.g., just before (18) should be "Then if \eta satisfies" and not satisfy. I invite the authors to carefully recheck this type of error which I frequently found throughout the text. In Section 2.2 you should better explain and provide reference for the AIMD environments. How does this restriction affect the statement in Theorem 3, what if the environment is not AIMD? A simulation experiment at the end of the paper or qualitative comments would clarify the reader. *Setion 3 Reading the introduction to D2R2, I think that your C3P introduction should move in this direction. For instance, alpha and eta are properly commented here. In the proof of Theorem 4, expression of B(t) there is a beta that seem to be a typo, please correct it. Further, saying that entries are greater than delta does not tell me that a matrix is stochastic, so the sentence "Thus B(t) is a stochastic matrix" should rephrased with proper comments. The part between Theorem 4 and Theorem 5 is not particurlarly interesting without a text comment, how does the fact that "v^alpha <= v^*" may help in pratice? You give Theorem 5 to further refine the information, but you should better build intution in the reader on the practical importance of knowing the value of the fair shares. *Section 4 You say at the beginning of the section that 1% loss rate have negligible effects, but it would be more informative to know for which threshold of loss rate I do see an effect. Is it 2% or 5%? In Section 4.1 I would add a table which summarizes the parameters used in experiments. Further, there is really little evidence to convince that these parameters have not been chosen by the authors to give a lucky, although artificial, case study where C3P and D2R2 are better than GRd and FPS. Given that I feel that this is serious work, I think I can trust the authors, but in general the idea of using a single validating example depending on 10 parameters it makes me unconfortable and reduces work credibility. If I were you, I would try to compact space somewhere in the paper and do a small subsection here that at least gives intuition on how the result depend on the parameters in play. In addition, I didn't find a description of the demands you have simulated, this is a very negative lack. The results in Table 1 are presented to stress a 10% gap, but in my opinion this is not very convincing. From what I see, more or less the methods perform similarly, except at most for GRD, and therefore I would point out much more clearly that the main advantage of D2R3 and C3P is visible on other indexes such as loss rate or fariness. Concerning the latter, the use of JFI is OK to me because I am aware that is a index easily found in statistic textbook, but I would have preferred to have some comments saying something like "this is a standard index to assess fairness" with a citation. Another remark concerns the quality of the figures, it seems to me that you are strecthing in some nasty way the plots using some latesx comments, with the result of having poorly readable text in the figures. For instance, reading "Loss rate" on the y-axis of Figure 4 requires a good pair of glasses. Also Figure 5 seem to have simlar problems. In the last sentence of Section 4 you mention that for a certain value of eta, the convergence seem to depend on the traffic mix distributions. I have the impression that in the paper there is need of a more systematic treatment of the selection of eta: given that this is a degree of freedom in the implementation of the algorithms, how would you select it? *Section 5 This seem a good complement part, but maybe you should put it in appendix. Review 4 Reviewer Originality Technical Merit Readability Relevance Reviewer Confidence Top 25%, but not top 10% (3) Top 25%, but not top 10% (3) Top 25%, but not top 10% (3) Top 25%, but not top 10% (3) Informed outsider (2) Short summary assessment (What are the main contributions of this paper? Do you consider the issues addressed important and/or interesting? Comment on novelty, creativity, impact and technical depth(1-5 sentences)) This paper addresses the issue of designing an a decentralized algorithm for rate limiters. Previous distributed rate limiter (DRL) algorithms (GRD and FPS) have high communication overhead due to broadcast gossip. The proposed algorithms in the paper (C3P and D2R2) are based on immediate neighbor information exchange. Theoretical proofs of convergence are provided. Empirical tests and comparisons are given. Strengths: (What are the most important reasons to accept this paper? (1-3 sentences)) - Improvement on existing DRL algorithms. - Simple decentralized algorithms, based on neighbors state and linear filtering. - Analytic foundation is established. Weaknesses: (What are the most important reasons NOT to accept this paper? (1-3 sentences).) - Not clear how the proposed algorithms will behave in a highly dynamic environment. - Only a ring connectivity model is used in the evaluation. Detailed Comment to the Authors (Please provide detailed comments that can be used by the authors to improve this paper) - C3P and D2R2 have a flavor of stochastic approximation and linear filtering, respectively. It might be worthwhile to investigate their behavior given that sense. - Have you considered other connectivity models (random planar graphs, small world graphs, ...) - Might be useful if you drive the algorithms with realistic, dynamic traffic data.