How Green is Green?
How Green is Green?
2008
Confronted with an array of breakfast options ranging from “Synthetic Plasti-Puffs” to “Shredded Fiber Bricks” most of us resort to habit or random choice. But standard nutrition content labeling allows a diligent consumer to bring information to bear on the choice.
Of course we would devote more care to selecting a new computing system than we expend on our breakfast choice, but it is unfortunate that in one important respect we have far more data to inform the latter task than the former.
Wu Feng and others have been working to remedy that situation. The establishment of the Green500 list elevated performance per watt to the system level. Though many, Wu included, have acknowledged the choice of Linpack FLOPS per Watt was overly simplistic, the list was a step in the right direction.
Now it is time to take the next steps. Wu will be hosting a birds-of-a-feather gathering at SC08 to continue the discussion that has been going on informally up to now. It is good and right that he and his colleagues in the user community are driving this effort: we need a rational set of metrics to keep vendors honest. There has already been too much green paint applied to obviously brown designs.
I learned a long time ago that the key to effective design is being able to recognize a good solution and a bad solution. So it is with the design of benchmarks and metrics. What would constitute a good “green” metric?
Multiple Constituencies
First, the metric must address multiple levels in the political decision tree. In some form it must allow comparison between two systems at the CIO level (where the numbers most probably take the form of a scalar). At the same time the metric must be usable by problem domain experts: it should be possible to answer the question “how efficient is this system likely to be for my particular application?” That suggests a vector of measures, each of which can be independently weighted (to form a composite score) or isolated (to allow feature by feature comparisons among multiple systems.) So the green score should be a vector whose magnitude can be taken and has meaning. (In particular, larger magnitude should be “better.”)
Meaningful Measures
I’ve made some suggestions elsewhere (here for instance) that components of the HPC Challenge suite form a good set of basis vectors to characterize power efficiency. HPCC has the advantage that it is widely accepted, and can characterize arithmetic, memory, cache, and communication performance at the system level. Further, many developers can use HPCC measures to estimate performance on their own applications or to drive the design of new applications. Whatever we use for a “green-ness” measure, it should have this key property.
Game Proof vs. Good Behavior
Eventually every benchmark gets perverted or worked around by designers who optimize for the benchmark independent of actual utility. As this is inevitable, the benchmark suite should encourage the construction of useful, well balanced systems. SPEC CPU attempted to do this by using the geometric mean of all the component scores to create the composite score. This had the property of “discarding outliers” so that a processor vendor couldn’t reach spectacular heights by building a one trick pony. (For instance, if SPEC FP had contained a program that spent all its time calculating Hankel functions, we might have seen hardware implementations of Ha(1)(x) were it not for the fact that the geometric mean should moderate the influence of any single benchmark component.
The desire to reduce a set of numbers to a single scalar suggests a weighting function must be applied to the set of measures. SPEC CPU chose a “normative” processor standard for each generation, and this is likely to be useful for green comparisons as well. Unfortunately, choosing a norm is harder than it looks, as the normalization inherently creates a weighting function: the normal machine implicitly creates a model of “good.” In this case new designs that are influenced by their composite green score could end up looking like the normative machine. Therefore, it is important that the norm should look like a machine that we’d want to be programming in a few years. So, if inter-processor communication is important, the norm should not use Ethernet or twisted string for its inter-node fabric. Similarly, if memory bandwidth is important, the norm should have a good balance between calculation speed (say DGEMM) and memory performance (as in STREAM Triad, for instance.) A good benchmark should encourage good system design.
I am sure that there are other considerations, and the community should talk about them. SiCortex is proposing a measurement and analysis methodology on a new website dedicated to exploring this issue. It includes a calculator that allows head to head comparison of systems using their HPCC reported scores and rated power. Check it out there and join in the discussion on their blog. (I’ll see you over there when I change into my other hat. If the website isn’t active yet, check back in a day or two. We’re attempting a simultaneous release of this posting and the new site, but the chicken/egg race is inevitable.)
Wu Feng will be hosting a BoF gathering at SC08 to bring some structure to the discussion he has been leading for several years. It should make for a very interesting afternoon.

On a personal note, it has been several months since the last posting. My wife and I spent a busy summer and fall preparing for the arrival of our son, Benjamin. We are grateful to our friends and colleagues who have been so supportive in our new adventure.
For the record, the cereal of choice in the Reilly household is Cheerios. It is the universal toddler appeasement food, and is easy to remove from seat covers and carpets.
matt

Cereal Computing
11/5/08
Faced with a staggering array of choices and sensory overload, how are we to make an over-informed decision?
The rise of “Green Computing” adds yet a new dimension to consider in the choice of a new computer system. As of yet, there are few rational metrics to inform the decision process.