Make It Green
Make It Green
2008
When I was a little boy, my father built a jungle gym in our back yard. To an adult, it was a cylindrical arrangement of pipes and fittings. To my friends and me, it was a spaceship, a submarine, a helicopter, the site of a thousand and one adventures: a vivid example of the power of a child’s mind to transform one thing to another simply by saying it is so.
This flexible notion of reality is called “imagination” when we see it in children. When adults adopt it, it is called “delusion” or “buncombe.” The latter is an accurate characterization of much of what I’ve heard from vendors on “green computing.” Sure, some systems can lay legitimate claim to being energy efficient, but far too many trumpet achievements of dubious merit.
I have pointed out some of these before. I’m especially irked by otherwise responsible vendors who shout loud and often about how in-cabinet water cooling jackets shrink the cost of the cooling system by up to 30%. Such measures decrease total compute energy consumption by 30% of the cooling cost (which is about 1/3 of the total energy), so the net improvement is closer to 10%. Yes, we should be building better and more efficient cooling plants, but concentrating energy efficiency efforts on cooling is too little to late: we should be reducing the amount of wasted power in the computing devices themselves. Moving the heat around is not nearly as effective as not wasting it in the first place.
It is truly puzzling to hear rational, respected, creative industry leaders present such a weak case for their green initiatives: they know better. IBM BlueGene systems have shown the superiority of small, relatively slow, simple processors versus the bloated high-clock-rate, complex desktop processors when it comes to delivered computes per watt. (And, of course, SiCortex has taken this a few steps further along.)
But the problem does not lie entirely with the vendors: delusional talk has too often found an uncritical audience. It is time that the high performance and high processor count community developed a set of rational metrics that can cut through the marketing talk and truly gauge the power efficiency of a system. Up to now, we’ve used Linpack FLOPS per Watt as a criteria. But HPC is more than just multiplying two large matrices together: the user community is much broader than that. We’ve seen, of late, absurd claims for power efficiency by systems whose primary purpose is matrix multiplication. Yes, these are the most power efficient DGEMM implementations, but claiming the green flag for a multi-megawatt behemoth should not rest on a single function.
As I’ve pointed out elsewhere, time to solution can be viewed as the combination of four components: time to complete the arithmetic calculations, time spent moving data to and from memory, time spent communicating between processors, and time spent on I/O. Few universally credible benchmarks exist for the last of these: the best tend to be financial in nature -- to wit, how much money did you spend on disk spindles? The first three however are routinely measured and I believe the HPCC benchmark suite has good metrics for each.
I’m willing to admit that the HPCC Linpack measure is as good as we’re going to get at estimating the capability of a processor to do arithmetic. Let’s use that to measure the Tarith component. Further, I think John McCalpin was right on target with the Stream TRIAD metric: we should use that to measure the memory transfer capability of a machine. Finally, the communications aspect is best measured -- in my view -- by the PTRANS (parallel matrix transposition) bandwidth. The other communications measures are too specific and too removed from communication operations that are both useful and general.
So I propose that we measure green-ness by a linear combination of the Linpack, embarrassingly parallel Stream TRIAD, and PTRANS score as reported by the HPCC suite. Divide each of these scores by the wall power consumed by the system. Now normalize each with respect to a well balanced system. I propose here the Cray XT3: its single processor Opteron was very well balanced with respect to its excellent memory bandwidth and world-class communication hardware. Effective high-processor-count systems should be measured against this venerable and well balanced solution. The HPCC database has a report filed for an 1100 processor four rack XT3. Assuming the datasheet dissipation of 15KW per rack, the standard normal system power metrics are:
•Linpack MFLOPS/Watt: 52.4
•Stream TRIAD MBytes/Second/Watt: 57.3
•Global PTRANS MBytes/Second/Watt: 2.81
So, if we chose the XT3 as the normalizing factor, we obtain a relative weighting for each of the three metrics. For those who absolutely need a single number to characterize a system, add the three measures together and divide by three.
How would this work? Green claims would be validated against just these three metrics. Take the system’s Linpack MFLOPS/Watt and divide by 52.4. Do the same for the Stream and PTRANS benchmarks, dividing by 57.3 and 2.81, respectively. Now add the three results and divide by three. This is the system’s green score. Here’s what happened when I did this experiment for three well known systems:

That feels much better than relying on a single metric. The spreadsheet that generated the metric is available here: power_perf.xls. Note that I extrapolated some of the power measurements from other reports, as power numbers are rather hard to come by. I’d appreciate corrections. (Note to some who read this via an RSS feed, for reasons I can’t explain some readers miss the images and graphs, like the one above -- see if you can go right to the BigNComputing website.)
Brighter minds than mine have proposed using the entire HPCC benchmark suite rather than just three scores. This is a case where brighter minds are wrong. From the mathematical point of view, I’m more comfortable with a basis set of three components, as I’m pretty sure that they are, by and large, orthogonal to each other. The complete HPCC suite has far too much redundancy (or non-orthogonality). But, more importantly, quantifying against the larger suite offers many opportunities for manipulation and mischief. In particular, clever vendors and marketing departments will quickly figure out that they can game the measurement by building a system that is absurdly heavy in a few communication related metrics. Keeping the set to just three components allows us to wrap our minds around the whole picture. Simplicity, simplicity, simplicity.
We should also consider whether cooling power should be factored in. I don’t have a dog in that hunt, but I will say that measuring the burden on the cooling system is a very difficult proposition. In any case, the power measurement should at least include wall-delivered-power delivered to the entire system (including switches) not chip power or unit power.
This seems like an improvement over our current approach of dividing Linpack score by power. That approach was useful, as it raised awareness of the issue, But now the community needs and deserves better. We have an opportunity to set a new tone in the green computing discussion.
Childhood and Green Computing
7/24/08
The twinkies bulge with
durable stuff they call “creme”
unknown to real cows.
(“Twinkies” is a registered trademark of the Interstate Bakeries Corporation.)