Retargeting IP -> ASIC generation revamped for IP reuse

ASIC generation revamped for IP reuse
By Mohammad Ayub Khan, Vice President, Software and System Engineering, TriMedia Technologies Inc., Milpitas, Calif., EE Times
March 26, 2001 (3:08 p.m. EST)
URL: http://www.eetimes.com/story/OEG20010326S0056

Today, a large amount of intellectual property (IP) represents the remnants of yesterday's ASIC applications, which come in various process technology levels. However, true IP reuse for system-on-chip (SoC) is only possible when the system designer and chipmaker move away from the fixed-point ASIC approach and adopt IP cores for reusability. This approach is based on general-purpose components and surrounding application-specific functions such as accelerators and peripherals.

Take advanced consumer electronics applications, for example. Deploying a programmable VLIW processor IP core with the required application-specific accelerators allows the designer to make the most from previous design efforts. This design approach is becoming more important as greater SoC functionality is being translated from hardware into software. The linchpin for complete IP reusability in this instance is the anchor block or programmable VLIW processor cor e, along with programmable buses and interface ports.

But this design methodology is slightly different than conventional IC design. Instead of a singular IC model, the design engineer uses a modular design approach. Interfaces to these modules, which are OEM-dependent, are especially critical in SoC designs. Many OEMs rely on their own bus models, for example, since they have extensive design experience with those buses and have a good handle on their performance. Moreover, they may have a number of peripherals already integrated on those buses. Thus, a system designer can integrate these proven blocks with licensed anchor blocks or cores in an expedient manner. Here, the programmable anchor block should be able to support multiple buses in a finite amount of time. Thanks to programmable interfaces, IP reuse becomes possible when these cores or modules are designed so that they become bus-independent, allowing them to also be used in other applications.

There are other factors design e ngineers should consider to achieve efficient IP reuse. A main consideration is that SoC platforms should comply with specific market segments, each with its own traits. However, there is commonality within those traits that the designer should pay special attention to. This includes performance, power, die size, flexibility, technology and IP reuse itself.

In the area of performance, depending on the application, a SoC design may call for low to medium to high performance, while power ranges from high to low. However in most designs today, low power is greatly in demand. Die size is an important consideration from a design cost point of view; flexibility is playing an even more critical role due to evolving standards and user demands for more features and functions. In some SoC designs, special technologies are required, for example, processing, packaging, custom IP cores, and special voltages. Lastly, as for IP reuse, the designer should determine the importance level of having wide reuse of the se IP cores.

Portability the rule

With the increased usage of the Internet in virtually every aspect of life, portability is becoming the rule rather than the exception for Internet-enabled appliances. This portability demands low power consumption. Reducing power consumption in a SoC design is paramount for IP reuse since a great majority of applications now demand increasingly lower power. It is wise for the designer to get a good handle on certain techniques that help to pare down power consumption. Included are low-voltage design, power management and active clocking.

Lowering the voltage in a SoC design is the best method to cut power consumption since power is a function of the voltage squared. The designer can lower the voltage on SoC designs that aren't latency sensitive. The logic is slowed down, which requires a slower clock. To maintain bandwidth, the designer can place added pipeline stages in the design to cut back the amount of logic between each stage. C onsequently, the engineer gets the same bandwidth and clock frequency, but a much lower-power SoC design.

In the power management category, dropping the power rail to ground in inactive cores can be used to turn them off completely. This eliminates power loss during the OFF period. However, the designer must keep critical states in nonvolatile storage and to hold outputs at proper switching levels to minimize totem-pole current in neighboring devices. To offset this complete core turnoff, the designer must add reset circuitry and special scheduling since a large number of clock cycles are needed to turn on the inactive core.

As for active clocking techniques to reduce power, the designer has two options. One uses clock enables; the other is clock frequency control. Power consumed in a SoC design is proportional to clock frequency. Units or cores not active in a SoC can be turned off using clock enables since all the cores don't need to operate at the same time. With clock frequency cont rol, the designer slows down the original clock frequency of selected clocks. Many parallel operations in a SoC design aren't used or finish earlier than anticipated. However, they must continue to operate to handle periodic interrupt polling. In these instances, parallel operations can be slowed down by controlling a core's clock frequency, thus reducing power.

OEMs using SoC designs may already have an established framework for logic verification. The essence of supporting co-simulation is to provide clear functional interfaces to the co-simulation units, meaning the points where the simulated system needs to be partitioned to insert interfaces to whatever co-simulation framework is to be used.

The result is a simulation system that supports a SoC model of simulation in which different cores can be plugged in, and standalone simulators or SoC simulators, using different co-simulation frameworks, can be readily built. In this case, not all combinations need to be implemented-only the o nes system designers need.

This simulator architecture is a modified discrete-event simulation based on all-in-C implementations of co-simulatable units (CSUs), which, when wrapped to interface to a particular co-simulation framework, form atomic co-simulation units. These CSUs have clear functional interfaces, meaning function calls and callbacks, not ports and signals, which allow co-simulation shells to be easily created for any co-simulation framework. If co-simulation is not being performed (i.e., standalone simulators), the implementation is entirely in C and free of any co-simulation framework.

Aside from co-simulation, the clear interfaces between CSUs allow easy evolution and adaptation of a system in a standalone simulation. For example, a purely instruction-level simulation can be supported when the designer bypasses the bus and memory models; or a system might initially model caches and SDRAM timing, but omit bus simulations and later add cycle-accurate simulation of the bus timing for greater accuracy.

To efficiently support SoC customization and at the same time, avoid seriously compromising speed in standalone simulation, the designer needs to wisely choose the boundaries between CSUs. Too many interfaces can introduce unnecessary performance overhead. Conforming to the goals of this architecture, a system is usually divided into CSUs representing the processor core itself, system bus interfaces, and devices on the system buses. CSUs are isolated from dependencies on one another so that they can easily be replaced or reconnected in different system configurations. Direct connections between them can be established without one unit having to know anything about the other except for its clearly defined interfaces.

Co-simulation shell

TSS is Philips' proprietary hardware/software modeling system that supports clock cycle-based, signal-level interfaces between modules. To support TSS co-simulation, each CSU is wrapped in a co-simulation sh ell. Each shell is a regular TSS module that implements all the module ports it represents in the TSS netlist. It also implements all signaling protocols in terms of the functional interfaces of the C-based CSU. Nontrivial protocol implementation is performed via state machines clocked by the appropriate TSS clock. Nontrivial protocols respond to function calls from the CSU's functional interface and make callbacks of the CSU's interface. A clock process triggered by the proper TSS clock calls the CSU's clock function.

CoWare, used for hardware and software partitioning, supports functional or clock cycle-based signal-level interfaces between software modules and clock signal- level interfaces between hardware. CoWare provides the designer the framework for plugging in all modules required in a given SoC design. This includes the anchor block and all hardware acceleration blocks.

Leaves GUI behind

SystemC is an open-source answer to CoWare without the GUI. It is a gene ral hardware/software modeling system that supports functional- or clock-cycle-based, signal-level interfaces between software modules and clock cycle-based, signal-level interfaces between hardware modules. SystemC is built with C++ classes, which provide extensibility without extending the language. To support SystemC co-simulation, each CSU is wrapped in a cosimulation shell. Each shell is a module implemented at the buscycle-accurate or cycle-accurate abstraction level using bus ports.

Seamless is a specialized processor and system modeling system in which the processor core and its bus interfaces are replaced by software simulations. It is supported as a set of C library functions and macros from Mentor Graphics Corp.(Beaverton, Ore.). Only the processor core and bus interfaces are replaced by a C simulation. The remain-ing portion of the system is simulated by the logic simulation in RTL. Interfacing a TriMedia core simulation to Seamless is achieved by writing a wrapper that interfaces the core CSU to the Seamless CVE kernel.