Optimizing High Performance CPUs, GPUs and DSPs? Use logic and memory IP - Part II

Ken Brock, Synopsys
EDN (November 21, 2013)

Miss Part I? Click here

In Part I of this two-article series we described how the combination of logic libraries and embedded memories within an EDA design flow can be used to optimize area in CPU, GPU or DSP cores. In Part II we explore methods by which logic libraries and embedded memories can be used to optimize performance and power consumption in these processor cores.

Maximizing Performance in CPU, GPU and DSP Cores
Clock frequency is the most highly publicized attribute of CPU, GPU and DSP cores. Companies that sell products that employ CPU cores often use clock frequency as a proxy for system-level value. Historically, for standalone processors in desktop PCs, this technique has had value. However, for embedded CPUs it’s not always easy to compare one vendor’s performance number to another’s, since the measurements are heavily influenced by many design and operating parameters. Often, those measurement parameters do not accompany the performance claims made in public materials and, even when the vendors make them available, it’s still difficult to compare the performance of two processors not implemented identically or measured under the same operating conditions.

Further complicating matters for consumers of processor IP, real-world applications have critical product goals beyond just performance. Practical tradeoffs in performance, power consumption and die area — to which we refer collectively as “PPA” — must be made in virtually every SoC implementation; rarely does the design team pursue frequency at all costs. Schedule, total cost and other configuration and integration factors are also significant criteria that should be considered when selecting processor IP for an SoC design.

Understanding the role common processor implementation parameters have on a core’s PPA and other important criteria such as cost and yield is key to putting IP vendors’ claims in perspective. Table 3 summarizes the effects that a CPU core’s common processor implementation parameters may have on its performance and other key product metrics.

Click here to read more ...