On with the show!
On with the show!
2008
Most of Monday was taken up with booth preparation and final touch ups to demos. The show officially opened last night with muchies and fanfare.
During a lull I dropped by the Convey booth in the early afternoon while they were still setting up. The team is justifiably proud of what they’ve done. (Last week I got a chance to talk with Steve Wallach and to see the sneak preview he gave at the Society of Exploration Geophysicists.)
So what IS the new Convey machine about? Here’s my take:
The website says it best: "First, we start with a high performance memory system." http://www.conveycomputer.com/ The basic concept behind the architecture is to build a "coprocessor" and attach it to a standard x86 Intel desktop processor.
A system consists of an x86 socket with a link to a second module that houses 10+/- Xilinx FPGAs. Several of the FPGAs are connected to banks of memory DIMMs that look to be fully buffered (I saw an "extra" non-dram chip on each DIMM.) Steve said that the total power dissipation was "400 watts for the x86 processor and 400 watts for the coprocessor."
In operation, an application is compiled with a C or FORTRAN compiler based on the open64 compiler suite (heavily modified). The compiler generates code for the x86 and looks for opportunities to move functions from the x86 into the coprocessor. When it finds these, it generates code for the coprocessor that can run in parallel with code on the x86. The code for both is included in a "fat binary." One compiler. One program image. At runtime, all memory references from the x86 go through the coprocessor so the coprocessor knows how to maintain coherence.
What does this mean? I think it means that Convey is building an accelerator that is much better integrated than the GPGPU approach: it is part of the main processor’s memory pipeline and uses a single compiler that eats standard dusty deck programs in C and FORTRAN. That’s a pretty powerful answer to the approach taken by heterogeneous systems so far. Ignore the “reprogrammability” and the FPGA stuff: Convey has connected a coprocessor to an x86 and given it the memory channel that high performance technical computing workloads crave: 80GB/s spread over 8 (or more?) independent RAM arrays. That’s the key.
In the larger scheme of things, it also means that invention in the computer industry is still possible. We may have a lot of computing history behind us, but the end of the story will never be written. Congratulations to Steve and his investors on an interesting new systems company.
Supercomputing Day 2
11/18/08
Congrats to the Purdue Cluster Challenge team on a very good first day.