Review 1
Recommendation
A = (top 10% of reviewer's perception of all INFOCOM submissions, but not top 5%) (5)
Contributions (What are the major issues addressed in the paper? Do you consider them important? Comment on the novelty, creativity, impact, and technical depth in the paper.)
This paper addresses load distribution between networks of processors in the context of "cloud" computing for large internet services. The basic idea is that service demand is divided between multiple processor clusters to minimize energy consumption (and more generally, energy cost) based on a non-linear relationship between load/processor-use and power. The authors first define and prove the optimality of a load distribution method in a single cluster with N processors, and then go on to devise a distributed synchronous and asynchronous method for dividing load between multiple server clusters, again proving the optimality of these approaches. Although the idea of load distribution does not sound novel compared to other load balancing problems, the context of the problem adds some novelty, and the non-linear relationship between load and power and related proofs add technical depth to the paper.
Strengths (What are the major reasons to accept the paper? [Be brief.])
The introduction motivates the problem very well. The use of real energy prices adds a realistic element to the paper. The related works section covers a broad range of related topics.
Weaknesses (What are the major reasons NOT to accept the paper? [Be brief.])
Nothing significant.
Detailed Comments (Please provide detailed comments that will help the TPC assess the paper and help provide feedback to the authors.)
The paper presents a distributed dynamic load balancing scheme for data centers. The objective is to minimize total energy cost by dynamically allocating load to different data centers depending on their (different) energy cost. One key observation is used here: at the optimal point, all distributed locations should have loads such that they have the same derivative of the modified cost function. Using this observation, a distributed load-balancing scheme is proposed and its convergence property analyzed. The paper is very well motivated, with a good set of real data. Assumptions made in the paper are well justified. The related work is discussed in clarity and good insight. The paper presents a nice and clean technical approach. The paper also connects the solution of the paper to two other problems that are not related in topics, but share the same technical insight. The authors comment that the paper’s main technical contribution is the non-trivial convergence results of the IDC and AIDC. However, the proof is entirely omitted in the paper (included in an online document). It would be nice if the authors can at least include a sketch of the proof in the paper given its importance. In the numerical results, in Figure 6, there is a region where the cost reduction increases slightly with the load. Better justification is needed. Otherwise, it raises concern on the validity of the numerical results.
Review 2
Recommendation
B+ = (top 20% of reviewer's perception of all INFOCOM submissions, but not top 10%) (4)
Contributions (What are the major issues addressed in the paper? Do you consider them important? Comment on the novelty, creativity, impact, and technical depth in the paper.)
This paper addresses an interesting problem of allocating the demand in the network of processors to minimize the global energy consumption subject to a performance constraint.
Strengths (What are the major reasons to accept the paper? [Be brief.])
It proposes decentralized algorithms to solve the problem and proves their convergence.The paper is written in a coherent way. The problem is clearly introduced and well formulated. The proposed algorithms are simple and promising. The simulation results can help demonstrate and evaluate the proposed solutions.
Weaknesses (What are the major reasons NOT to accept the paper? [Be brief.])
The formulation of the optimization problem. Though convex objective functions are easy to formulate and solve, it is not obvious that the energy consumption function is convex, thus it is interesting to explore other objective functions.
Detailed Comments (Please provide detailed comments that will help the TPC assess the paper and help provide feedback to the authors.)
There may be several directions to extend the paper: 1. A logarithmic barrier is introduced in the objective function so as to avoid dealing with boundary constraints. But this introduces the error in the optimal solution, and it would be more convincing to deal with the original optimization directly. 2. The algorithms require pairwise communication. It would be important to explore the effect of communication delay, especially heterogenous delay on the performance of the algorithms.
Review 3
Recommendation
C = (top 50% of reviewer's perception of all INFOCOM submissions, but not top 30%) - weak reject (2)
Contributions (What are the major issues addressed in the paper? Do you consider them important? Comment on the novelty, creativity, impact, and technical depth in the paper.)
The paper addresses the problem of optimizing power consumption in a distributed data center, where each site has a different power cost and the goal is to distribute the aggregate demand across the sites so as to minimize cost. The problem is important to look at given the increasing cost concern that power poses to data centers today. The paper's main contribution is to formulate a nonlinear optimization problem to optimize power given a certain speed scaling model, i.e., the power required to support a certain rate of requests. However, the validation is based on simulations using mostly synthetic data. It is not clear that the problem model is realistic enough.
Strengths (What are the major reasons to accept the paper? [Be brief.])
Power cost optimization is distributed data centers is an interesting and important problem. The problem model is theoretically interesting and conditions for convergence of the algorithm are formalized.
Weaknesses (What are the major reasons NOT to accept the paper? [Be brief.])
Simplistic problem model, eg bandwidth costs of assigning jobs to different locations, the message passing delays, the granularity at which jobs can be split, etc are all ignored. Simulation evaluation using some synthetic workload. It is unclear to what extent the claimed savings will hold in reality.
Detailed Comments (Please provide detailed comments that will help the TPC assess the paper and help provide feedback to the authors.)
The proposed problem model is theoretically somewhat interesting, but the model and evaluation seem too simplistic. The paper's results are sensitive to the speed scaling model that is not validated in the paper. The references point to theory papers or to an Infocom'09 paper [45] model for a single processor. Using this model for a data center cluster ignoring costs for bandwidth, I/O, and ignoring delay seems too simplistic. The request workload used in the evaluation is not really described. The proposed solution is not compared to prior solutions and is only compared to a naive randomized load balancer. Also, it is unclear in Figure 6 why the curve for a higher amortization cost of gamma = 0.5 yields more cost reduction than gamma = 0.1. Intuition suggests it should be otherwise, i.e., power savings should go down with increasing fixed idle power costs. The savings of almost 50% under low load compared to the randomized load balancer even with gamma=0.5 seems rather high and I am not able to understand why random load balancing is so poor given that the cost is a convex function of the load. Many other fixed costs of running web servers, database servers etc are not accounted for. The number of processors is assumed fixed, but in practice processors can be shut down completely, but this is not accounted for in the model. The authors may want to look at the following recent related work: “Cutting the Electric Bill for Internet-Scale Systems” A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag and B. Maggs. SIGCOMM 2009.