ESL enables software-driven SoCs

By Seyul Choe
VP and General Manager
Asia-Pacific
CoWare Inc.

Electronic system-level (ESL) design is a set of methodologies that enables SoC engineers to efficiently develop, optimize and verify complex system architectures and embedded software. ESL design also provides the foundation for the verification of downstream registertransfer level (RTL) implementation. IC vendors have adopted ESL design to develop softwarerich, multiprocessor devices that deliver the advanced functionality and high performance essential in next-generation devices.

Embedded software

A survey of SoC engineers by International Business Strategies (Figure 1) found that the relative design efforts expended on embedded software and hardware architectures have increased greatly. Hardware implementation efforts such as RTL design, synthesis and physical design have increased to a much lesser extent. Note that the embedded software to which this survey refers is that provided by the semiconductor manufacturer. It does not include the software developed by the system manufacturer to differentiate the end-product.

Figure 1: The significant shift in effort from implementation to software and architecture shows that the chip has truly become an embedded system.

This significant shift in effort from implementation to software and architecture shows that the chip has truly become an embedded system.

The detailed conclusions are:

The embedded software development effort at the 250nm node constituted about 35 percent of total design effort. By the 90nm node, it had grown to 55 percent of total effort. The report also shows that the embedded software development cost can be as much as $30 million (at U.S. labor rates) for an 80Mgate SoC, despite extensive software reuse.
The architectural design effort at 250nm was negligible. By 90nm, it had grown to 11 percent of total effort. For an 80Mgate SoC, the architectural development cost can exceed $6 million.
By contrast, the hardware implementation effort shrank from 63 percent of total effort at 250nm to 33 percent at 90nm. Of course, the absolute hardware-implementation effort has grown because the chips deploy more resources. However, its growth has significantly lagged compared to those of the other two categories.

The increase in embedded software effort is mainly due to the proliferation of wireless and multimedia standards that enable consumer product compatibility and interoperability. Standards such as JPEG, MPEG, 3G, GSM/EDGE, 802.11/a/b/g WLAN, Bluetooth and ultrawideband are essential to the commercial success of the modern electronics industry.

Meanwhile, the increase in architectural development effort is attributed to the need to assemble and optimize complex processing and storage resources and the communication protocols necessary to execute the embedded software with the requisite performance. Indeed, advanced SoCs now deploy at least three microprocessors and three DSPs. Even a mainstream design has one microprocessor and two DSPs, while two of each is common.

In other words, embedded software now drives SoC design.

Semiconductor vendors have adopted ESL design because it enables early software development and faster design and hardware/software (HW/ SW) verification. It also provides a functional testbench that can verify if the downstream RTL implementation complies with the system specification.

Moreover, ESL design tools offer the synthesis of application- optimized custom processors, as well as the rapid development and implementation of advanced algorithms.
Early software development— With such a large software development task, it is important to begin software development as early as possible, even when there is a high degree of legacy software reuse.

Using ESL design methodologies based on SystemC language, a high-level model that emulates the SoC’s behavior and its cycle-accurate timing can be achieved. This model, known as the transaction-level model (TLM), enables software engineers to commence development months before the availability of an RTL design or silicon prototype.

Fast, derivative design—The fickle consumer market demands a constant flow of “new and improved” products. Some ESL enables software-driven SoCs improvements may be achieved by re-programming the SoC, but more software may demand more hardware resources. Hence, designers must have rapid hardware and derivative design methodologies.

RTL platforms were devised to alleviate the derivative design problem by providing a pre-verified architecture for future designs. However, the difficulty of optimizing the RTL architecture and integrating RTL IP to meet new market demands slows the process considerably. A non-optimized architecture can negatively impact performance and power consumption. Ultimately, the design team may be forced to exclude functionality to meet performance and power consumption targets.

TLM operates at the level of function calls and data packet transfers. This is the abstraction level at which “design intent” is most meaningfully captured— a level that provides SoC designers with a direct and clear view of system behavior. SystemC TLM models of silicon IP are easily integrated into the SoC architecture TLM. This enables the SoC architect to rapidly explore and analyze multiple candidate hardware architectures and HW/SW partitioning schemes—each with different performance and economic trade-offs—to identify the optimum architecture. This methodology clearly speeds up original design, but the biggest payback is in fast-turn derivative designs using the original SoC TLM as an easily modifiable platform.

Fast verification—The TLM’s level of abstraction is clearly higher than that of the RTL, which details intra-block circuit states, nanosecond-accurate transitions and bit-accurate bus behavior. Consequently, the use of cycle-accurate TLM speeds up hardware verification and HW/ SW co-verification by a factor of 1,000 or more over RTL. This methodology not only enables the generation of a functional testbench to verify system behavior and RTL implementation, but also supports the co-simulation of SystemC with RTL. This enables the use of the SoC TLM as a testbed in which the downstream RTL implementation blocks may be verified as they become available.

Qualcomm Inc.’s experience demonstrates improved HW/SW co-verification at the system level over that at the C/ RTL implementation level. A Viterbi decoder design executed a packet in 20ms, but took 6hrs to simulate at C/RTL level. Qualcomm estimates that 1,000 packets must be simulated to achieve a reasonable confidence level, but considers the necessary 6,000hrs of simulation time to be impractical. Co-verification of 1,000 packets with a TLM would have taken 6hrs or less. Application-optimized processor synthesis— The ever-increasing need for processing capacity is often met by the deployment of additional standard generalpurpose processor cores. However, a general-purpose core is architected to address a wide range of applications. It may not execute a given software algorithm with the requisite performance and may consume more chip area and power than necessary. And it generally costs a great deal in additional IP license and royalty fees.

This problem can be solved with a processor that uses an instruction set optimized to the needs of the application. A custom instruction-set processor can deliver the required performance with only those hardware resources that are necessary.

Using ESL tools, such a processor can be automatically synthesized either from an architectural description or the custom instruction set itself. The ESL tools also automatically generate the processor’s software development tools, such as instruction- set simulators, assemblers, linkers, disassemblers, debuggers and C compilers.

Advanced algorithm development— Many of the advanced algorithms used in consumer devices, such as JPEG and MPEG, are DSP algorithms. Such algorithms must be implemented to meet the device’s performance and power consumption targets, which can vary for different devices. Advanced algorithms are generally designed first as reference algorithms in floating-point arithmetic. Indeed, standard algorithms such as JPEG and MPEG are generally made available in this form. The algorithm is then transformed into a fixed-point arithmetic form from which embedded software and RTL implementations are then derived.

Graphical ESL design and simulation tools targeted at DSP-type algorithms enable this flow. Algorithm development is accelerated with predesigned libraries of customizable DSP algorithms for communications and multimedia applications. There are also libraries of standards- compliant algorithms such as 3G W-CDMA, GSM/ EDGE, IS-95 CDMA, 802.11/a/ b/g WLAN, Bluetooth and ultrawideband. After HW/SW partitioning, development of the RTL implementation is accelerated by the use of microarchitecture libraries.

TLM methodology

TLM is the virtual integration platform that enables early software development and ESL design and verification tasks. The central position of TLM in SoC design is shown in Figure 2.

Figure 2: In the TLM method, blocks communicate via buses.

The SoC TLM is essentially a network model of the device’s resources, devoid of implementation detail. The behavior of functional blocks is modeled in terms of their input stimuli and output responses. The blocks communicate via buses, to which each block is connected by an API. Communications are modeled as data flow schemes with associated data transfers. This avoids unnecessary implementation detail that clouds the designer’s view of system behavior and slows simulation. The separation of block behavior from communications enables the fast modification or replacement of functional blocks without bus redesign, and vice versa, which is critical to rapid IP integration and complex “what if” analysis.

There are three common TLM use cases: programmer’s view (PV), architect’s view (AV) and verification view (VV). Although these cases represent three different system views, most of the models can be deployed in all three. Programmer’s view—This TLM is a functionally correct model of the SoC, which enables the deployment of legacy software and early development of new software. Typically, the PV TLM consists of functional models of the processors, memories, peripherals and a router that directs transactions to the correct memory or peripheral (Figure 3).

Figure 3: Looking at the progammer’s view, the API of the target RTOS and processor’s compiler are used to develop the software object code.

PV gives the software developer access to the necessary system resources and attributes, such as register visibility, register accuracy and interrupt handling, as well as direct links with the instruction-set simulator of the target processors and debug environments. The API of the target RTOS and the target processor’s compiler are used to develop the software object code. Application software development at this stage requires only a data flow schema. Consequently, the PV is untimed and a PV simulation executes well into the range of millions of instructions per second.

Architect’s view—This TLM is the same model plus the SoC’s timing attributes. This model enables the design team to analyze SoC performance to identify bottlenecks well in advance of design implementation (Figure 4). Final HW/SW partitioning decisions are made in this view. Timing is captured either explicitly or implicitly. Explicit timing closely models the performance of the SoC’s hardware. It is expressed as a function of system events and event synchronizations, while intrablock timing can be arbitrarily accurate.

Figure 4: Architect’s view TLM enables analysis of SoC performance to identify bottlenecks well in advance of design implementation.

An implicitly timed model uses timing annotation embedded in the TLM API calls. Timing annotation is thus independent of functionality. This enables rapid modification and performance-profiling of various functional-block candidate architectures and implementations. It also increases simulation speed over that of the explicitly timed model.

Figure 5: HW/SW development using the three primary TLMs help co-execute using easy-to-design transactors and converters.

An instruction-accurate instruction- set simulator can be connected to an architectural view via a PV-AV transactor, enabling analysis of system performance while executing software.A PV-AV transactor can also enable the evaluation of an architecture executing applications in an OS. The OS is booted in PV mode, while the application is executed in AV mode.

Verification view—This is essentially the AV model enhanced with cycle-accurate timing. This model enables hardware and HW/SW co-verification with accuracy that is predictive of real chip timing. It also enables the development team to create a testbench for RTL verification.

RTL models written in Verilog and/or VHDL may be instantiated into the VV TLM model as they become available, enabling in-system verification and debug prior to the availability of the complete SoC implementation model.

VV simulation typically executes three orders of magnitude faster than both RTL simulation and C/RTL HW/ SW co-verification.

The three TLM views are combined into an overall ESL design flow (Figure 5). The different models can co-execute using easy-to-design transactors and converters.

Design adoption success

Semiconductor companies have moved to a standard operating procedure of developing software on pre-silicon multiprocessor system simulation models that execute with near realtime performance. In some cases, simulation models boot common RTOSes such as embedded Linux in only 2s. The TLM methodology also possesses the accuracy that is critical to the optimization of chip architecture for performance and power, prior to committing to silicon prototype production.

For instance, a large Japanese printer company adopted ESL design methodology because its RTL-based methodology could no longer cope with the architectural modifications required by each generation of printers. The company uses the same basic algorithms for its whole product range, from lowend home printers to high-end network printers. However, major variations in data communications, processing and storage requirements between the different printer types mandate different implementations of those algorithms, including different memory and communications bus architectures.

Optimization of these different architectures could be effectively undertaken only at the TLM level of abstraction offered by ESL design. However, simply moving to ESL design without a link to the RTL implementation would have caused downstream problems. The company established that link with pin-accurate transactors that enable co-verification of SystemC TLM with the RTL.

This move from RTL to ESL design is a classic migration path—it is a “middle-out” design flow that enables the reuse of legacy RTL IP. However, where there are no legacy constraints, the “top-down” design flow can be adopted. This is what Toshiba did with the design kit for its user-configurable media embedded processor. The foundation of the kit is an ESL design environment that enables the designer to customize the configuration for a particular application. Designers can explore different configurations to determine the optimum and not only validate the architecture, but also verify that individual hardware and software modules meet the system requirements.

LSI Logic adopted a similar approach with its ZSP DSP cores. LSI developed cycle- and transaction-accurate SystemC models for each of the core variants to enable designers to model the core’s performance within an SoC architecture. The models also enable the designer to debug the hardware and software interactions. Designers can analyze processor throughput and latency, as well as memory performance.

ESL design and verification methodologies enable the designer to focus on those system design attributes that differentiate and impart value to products and IP—namely, functionality and performance. These are determined by advanced algorithms, complex multiprocessor and memory architectures, sophisticated communications protocols and application- optimized processors—all driven by embedded software. The elegance of the RTL implementation is relevant to efficient implementation, but the value is in the system design.