SoC Configurable Platforms -> Configurable VLIW is well-suited for SoC

Configurable VLIW is well-suited for SoC

Configurable VLIW is well-suited for SoC
By Cary Ussery, President and Chief Executive Officer, Improv Systems Inc., Beverly, Mass., EE Times
August 14, 2000 (3:01 p.m. EST)
URL: http://www.eetimes.com/story/OEG20000814S0034

Configurable processors are poised to dramatically alter the landscape of advanced system-on-chip IC design. Traditionally, designers have had to choose between rapid time-to-market with programmable processors (both microprocessors and digital signal processors) and high performance using custom ASICs. Configurable pro-cessors help to bridge the gap by offering programmability coupled with the ability to rapidly add custom logic to accelerate performance.

A handful of companies have begun offering configurable processors. There are two major classes: Configurable RISC processors and configurable very-long-instruction-word (VLIW) processors. While configurable RISC processors offer designers incremental capabilities over today's embedded processor solutions, configurable VLIW processors provide a broad range of opportunities that provide a credible alternative to custom ASIC design.

For embedded microcontrollers, reduced instruction -set computing (RISC) architectures have become the mainstay. There are a handful of configurable RISC processors, including those from Tensilica (Santa Clara, Calif.) and Arc Cores Ltd. (Elstree, England). These processors provide the ability to introduce custom instructions into the RISC processor to accelerate a common operation. The custom logic for those operations is added into the sequential data path of the processor.

While useful, that type of acceleration does not provide enough added performance to make RISC processors a viable alternative to ASICs. The performance improvements come from reducing operations that take multiple RISC instructions to execute down to a single operation. This can remove the need to include a custom logic block as a co-processor in a system. However, the incremental performance improvements fall short of being able to rival the ability to parallelize data flow through a custom logic block-the approach taken by most high-performance logic blocks.

In recen t years, VLIW approaches have become increasingly prominent in the high-end DSP sector. The reason for this is straightforward: VLIW provides parallel execution of operations to significantly increase performance. Unlike superscalar approaches, the overhead of determining this parallelism is paid in the compiler rather than each time the application is run. The price to be paid for performance comes in the width of the instruction and the resultant potential increase in the memory image for a given application.

Like configurable RISC processors, designers working with configurable VLIW processors can achieve significant gains by adding custom logic into the da-

ta path. However, the VLIW approach offers significant opportunities for designers above and beyond those afforded by configurable RISC processors. The possible opportunities for configuring a VLIW processor include:

- Defining the collection of parallel data path elements in the processor;

- Adding custom computation u nits into the processor;

- Configuring the VLIW instruction to trade off parallelism for instruction word width; and

- Changing the number of memory accesses in and out of the processor data path.

With so many significant opportunities for providing a configurable VLIW processor, one must ask why more are not available. The reason is that providing these capabilities is extremely challenging technically, both architecturally and in the tools that support the processor. In particular, VLIW processors are difficult to design and to support with compilers. Some recent VLIW-based or superscalar cores such as BOPS, 3DSP, ZSP and TI's C6x series provide no configurability at all.

However, even recent entries in configurable VLIW have fallen far short of these opportunities. For instance, some processors, such as the Carmel from Infineon (Munich, Germany), introduced the limited capability to add custom units into predefined locations in the data path of the Carmel processor. Howeve r, as with the Carmel approach in general, access to those extensions is available only through custom assembly programming.

Improv's Jazz processor is a configurable VLIW processor that has some architectural characteristics that are specifically tuned to address configurability. Unlike many VLIW processors, the Jazz processor does not aggregate computation units (ALU, multiplier, shifter) into a single data path but provides a flat collection of computation units. That lets the compiler use the computation units to their best advantage in each instruction. Also, the Jazz processor is part of Improv's general programmable system architecture (PSA,) which provides a unique approach to combining multiple processors into a single structure. One aspect of the approach is the ability to attach multiple memory ports into each Jazz processor.

The Jazz processor is configured using a graphical tool, called the Jazz Composer, that provides an intuitive drag-and-drop facility. The designer can configu re specific characteristics of the base processor structure, including data width of the processor, number of constant registers and depth of the hardware task queue. Similar features are available in most configurable processors. Jazz Composer takes configurable processing to a new level by allowing the designer to address all of the opportunities discussed earlier.

To increase performance with a configurable processor, the general belief is that the designer must add custom logic and instructions. However, with Improv's Jazz processor, the designer can increase performance without any hardware design. This is achieved by creating different combinations of computation units in the processor to create a mix that is specifically tuned to an application domain.

Creative opportunities

The Jazz processor can contain multiple computation units, including arithmetic logic units (ALUs), MACs and shifters. Improv provides a robust collection of such units in its base offering. Designer s can define the collection of computation units in the processor to change the number and type of operations that can be executed each instruction. For instance, a designer might want to create a processor with three ALUs, one shifter and one MAC for ALU-intensive application domains or create a processor with two ALUs, two shifters and two MACs for more MAC-intensive and balanced application domains.

For most applications, combinations of general-purpose computation units can provide enough performance. However, for very high-performance applications, such as network processing, multichannel speech processing and image/video processing, it can be important to find every opportunity to increase performance while maintaining programmability. Designers can analyze applications and identify critical, high-impact operations that can be implemented in custom logic and added into the processor.

In Improv's system, designers can define their own custom computation units, called designer-defined co mputation units (DDCUs).

Those units are described using Verilog (supported by a pre-processor) using a template file provided by Improv. The Verilog files are added into the system and are used to generate complete processors and multiprocessor structures.

Rather than limit operations to execute on a specific computation unit, Improv provides a more flexible methodology. Operations are implemented as Java methods with an embedded directive that identifies the op-code pneumonic to which the operation is to be mapped. By separating the definition of operations from the definition of computation units, the designer can define operations that are implemented on different computation units. During compilation, the compiler selects the specific computation unit that will execute the operation.

VLIW offers significant performance opportunities. But for some applications, the trade-off between the size of the instruction word and potential performance needs to be considered. Improv's Jazz C omposer allows the designer to define the number of slots available in the instruction for computation units and then assign one or more computation units into each slot. That lets the designer populate the processor with a generous mix of computation units without paying a high price in instruction width.

Configurable VLIW processors bring a new range of capabilities that will revolutionize the SoC market. With high levels of parallelism, coupled with custom data path elements, such processors can remove the need for custom logic blocks throughout the SoC design.