High Performance Computing

Local Memory and/or Cache OK, this subject is just too big. High performance computing covers everything from hot graphics workstations to servers to supercomputers, and each of them has different needs and constraints. The industry has been almost exclusively focused on the needs of the compute market for the last two decades, but the focus has been on the consumer end of the market and that continues today as that market swings heavily into portables. The huge volumes in the consumer market enable those vendors to get most of their needs fairly well served. But the high performance getting pushed farther and farther towards the fringe. That is a problem, particularly as cloud-based services grow in importance, despite the fact that the numbers of computers involved cannot reach the same magnitude as those in the consumer products space.

Supercomputer developers have been the most innovative computer builders on the planet since computers were invented. They have always placed very aggressive demands on semiconductor vendors. They need more transistors in less space at lower power, interconnected with higher bandwidth and lower latency than any other application. The most vivid testimony to the fact the semiconductor industry has not been able to keep up with their needs has been the number and size of the sub-stations and cooling plants being built around the computer rooms where they are installed.

The most visible trend in high performance computing has been the rise of multi-processor computing techniques. The trend has been so pervasive that processor vendors have gotten in on the act by producing multi-core processors. Clusters of closely connected processors have become the architectural norm. But there is close and then there is intimate. Tezzaron’s Di3D technology offers a degree of interconnect intimacy that is actually better than what can be achieved between processors on a single multi-core processor die. Suppose you had an 8 x 8 array of cores and they were stacked eight high to produce a 512 processor die that was no longer, wider or thicker than an ordinary 2D die? Or suppose layers of memory bits were stacked between the processor cores, leaving the amps and decoders that operate those bits on the same die with the processor, so that all those bits were just as fast as they would have been, had they been left on-die with the processor cores. How many cores might fit on the die now? Or how much smaller and cheaper might that die be?

What about servers? What about PCs? What about portables, for that matter? While the supercomputer folks may be the earliest adopters of Tezzaron’s Di3D™ architecture (because, clearly, they need it the worst and have a history of being very innovative) innovators who decide they need an unfair advantage to make a dent in other compute-oriented markets, will turn to Di3D for the same fundamental reasons; for higher transistor density, for shorter interconnect delays, for lower interconnect power, and for lower development costs than they can get by relying conventional 2D techniques…particularly if they cannot get access to the most aggressive 2D technologies before their larger competitors.