What's to become of the generation of designers who grew up aspiring to be computer architects? With even Intel throwing in the towel on making faster CPU cores and switching its focus to on-chip multiprocessing, it appears that the age of the architect is over. It was good while it lasted. Microprocessor architects managed to re-create almost the whole history of the mainframe computer industry, from the first primitive vacuum-tube state machines to the massive superscalar architectures of the last IBM 360s. They used all the tricks, from microprogramming and stripped-down pipelines with load-store architectures to speculative execution and branch prediction. Best of all, hardly anyone was unkind enough to comment that all this ground had been covered already, just at a lesser level of integration. But the gurus of cores have run into diminishing returns on a number of fronts. Most fundamentally, their ability to increase clock frequency with each new generation has been undermined by the leveling off of timing budgets that has set in with 90-nanometer processes. This is due to a number of factors, including the rapid increase in energy density, the inability to control leakage current in fast transistors and the growing variation in circuit parameters from even the most stable processes. But there have been other limits as well. The notion of simply increasing the number of execution pipelines to increase instructions executed per clock cycle has hit the wall, as computer scientists warned it would, somewhere between three and five. Only very unusual hand-optimized loops appear to benefit from more instruction-level parallelism than this. Similarly, the value of supporting multiple threads in hardware appears to roll off after about three to five. With both clock frequency and instructions per clock flattening out, it appears the game is about up for CPUs. Even worse, in most systems, theoretical CPU performance is rendered almost irrelevant by the fact that the CPU sits idle most of the time waiting for memory accesses. Even multithreading can go only so far in solving this problem. So the next logical step is to put multiple cores on one die and hope the task mix is rich enough to keep them all active. The good news is that in some commercially important applications, such as network routers that perform deep packet inspection for a large number of virtual connections, that may be a good assumption. The bad news? Such multicore architectures put incredible strain on two of the weakest links in VLSI design: on-chip interconnect and package/board design. The problem is one of memory bandwidth. If the memory wasn't fast enough to keep one CPU from becoming idle, it probably isn't fast enough to keep eight CPUs from becoming idle. And if designers cite signal integrity and interconnect variations as among the biggest problems in chip design, having multiple CPU local buses, multiple instances of two-layer caches and multiple memory buses on-chip probably isn't going to make things easier. So now a lot of energy is going into not the CPU core, but the surroundings. Several recent designs, including the IBM Cell chip and the Raza Microelectronics XLR, have employed point-to-point ring networks-reminiscent of optical networks-on-chip to join multiple cache structures to multiple DRAM ports. Now it's memory and interconnect that are getting complex. So maybe there's still a role for architects after all. Ron Wilson (rwilson@cmp.com) covers microprocessors, programmable/reconfigurable logic and the chip design process. |