A Complete Design Solution for Structured ASICs
Introduction
The two conventional digital IC implementation technologies – ASICs and FPGAs – have recently been augmented by an emerging technology known as structured ASICs (SAs). These devices offer complexity and performance that approach that of traditional ASIC components coupled with the flexibility and low non-recurring engineering (NRE) costs that are typically associated with FPGAs.
In order to take full advantage of the various SA technologies, however, design engineers require access to a complete, integrated electronic design automation (EDA) environment. This whitepaper first summarizes the problems associated with conventional ASIC implementation technologies; next, introduces SA platforms and architectures; then describes the requirements for a true SA-capable design environment; and lastly, reviews architectural evaluation and development tools.
Problems with Traditional ASIC Platforms
Conventional ASIC implementation technologies are applicable to the largest, most complex, and highest performance designs. They also have a low per-unit cost when used in production runs of 50,000 or more. However, a large proportion of today’s digital IC designs require mid-volume runs on the order of 10,000 to 50,000 units. These designs are not well-served by conventional ASICs – the dominant form of which is known as standard cell (SC) – because such devices are extremely expensive and time-consuming to develop.
As SC-based ASIC designs move into the deep submicron (DSM) domain – specifically the 90-nanometer node and below – power, timing, and signal integrity (SI) issues become evermore complex. Reaching closure on these issues now takes so much effort, that design teams devote more time to addressing these aspects of the device than they spend architecting, capturing and verifying the logical functionality of the design.
In addition to protracted development times, the photomasks associated with a new ASIC are becoming prohibitively expensive (on the order of $1 million for a 90-nanometer device of typical complexity). Furthermore, the lengthy manufacturing turnaround times required to actually fabricate these devices significantly impacts their time-to-market. The long development and manufacturing times associated with SC-based ASICs pose particular problems with regard to today’s short-lived product life cycles; they also affect this implementation technology’s ability to address constantly evolving standards and protocols.
Introducing Structured ASIC Platforms
One solution to the problems associated with conventional ASIC technologies is the SA concept, featuring a mixture of pre-defined logic and pre-defined interconnects. Depending on the particular SA architecture, design engineers need specify only one, two, or very few metallization layers in order to complete the device.
Due to the fact that SAs need only a limited number of metallization layers to be realized by the design team, the costs associated with generating the device’s photomasks are dramatically reduced. Furthermore, the fact that the device is largely prefabricated allows the base component costs to be shared by multiple users and dramatically shrinks the turnaround time to working silicon. In turn, this means that SAs can undergo faster and cheaper modification cycles in order to accommodate evolving standards and protocols.
Overall, the capacity, performance, and power consumption of a SA is much closer to that of a traditional SC-based ASIC realization of the design as opposed to an FPGA implementation. Additionally, the faster design time, lower photomask costs, and quicker turnaround to final silicon – along with the lower costs resulting from the fact that the majority of the device is pre-fabricated – means that the per-unit cost of SAs is extremely reasonable for medium-low to medium-high production runs.
A Generic Structured ASIC Architecture
Although there are a wide variety of alternative SA architectures, they are all based on a fundamental basic logic element called a “tile” or a “module.” This tile contains a small amount of combinatorial logic and – depending on the particular architecture – it may also contain registers, buffer drivers and possibly a small amount of local RAM.
An array (“sea”) of these tiles is then prefabricated across the face of the chip. SAs also typically contain additional prefabricated elements, which may include configurable general-purpose I/O blocks, microprocessor cores, embedded (block) RAM, phase-locked loops (PLLs), and so forth (see Figure 1).
Figure 1. A generic SA architecture
In many respects, SAs are similar to the gate array form of ASIC. The key differentiator with regard to SAs is that the majority of the metallization layers are also prefabricated. This means that the transistors forming the core logical functions comprising each tile (gates, multiplexers, lookup tables, and so on) are already wired together. Also, much of the local and global interconnect has been pre-implemented.
Fine-Grained versus Medium-Grained Architectures
Many SA vendors have opted for a medium-grained architecture. In this case, the basic tile might contain some generic logic in the form of gates and/or multiplexers along with one or more flip-flops (Figure 2a). Alternatively, some medium-grained SA architectures are based on tiles containing one or more lookup tables combined with some register elements (Figure 2b).
Figure 2. Two examples of medium-grained SA tiles
In both of these cases, the polarity of the flip-flops’ clock inputs (i.e. whether each register should be positive- or negative-edge-triggered), and the polarity of their set and reset inputs, can be determined by the customized metallization layers.
Alternatively, some SA architectures are based on an extremely fine-grained version of a tile that comprises only unconnected transistors and resistors. These architectures are extremely close to those of modern high-end gate array devices. The difference being that – in the case of the SA – metallization has already been added to nearly connect these components in a variety of pre-defined configurations. The user-definable metallization layers are used to complete the appropriate connections and to link the tiles into the local and global routing architecture.
Hierarchical Architectures
As yet another alternative, some SA architectures commence with a base tile containing only generic logic in the form of prefabricated gates and/or multiplexers and/or lookup tables (Figure 3).
Figure 3. An example base tile featuring gates and multiplexers
In this case, an array of these base tiles (say 4 x 4, or 8 x 8, or 16 x 16) are combined with special tiles containing registers, memory elements, and other logic to form a master tile, then an array (“sea”) of these master tiles is prefabricated across the face of the chip.
Additional Architectural Considerations
Fine-grained tiles have the advantage of better logic utilization and less waste, but this comes at the price of more connections, associated routing congestion, larger interconnect delays, and – in some cases – lower performance. By comparison, medium-grained architectures offer lower routing congestion, lower interconnect delays, and – possibly – higher performance, but this usually comes at the cost of less efficient logic utilization. To further complicate things, different designs may be better suited to fine- or medium-grained architectures depending on the style of the design and the class of the target application. (See also the discussions on Architectural Evaluation and Development tools below.)
Alternative SA implementations require the customization of different numbers of metallization layers ranging from one to four tracking layers (and associated via layers). In fact, at least one SA vendor has presented an architecture requiring the customization of only a single via layer. Using fewer customizable layers reduces photomask costs, production costs, and back-end production times. Fewer customizable layers also mean that the prefabricated track segments are extremely well characterized in terms of parasitic effects, delays, and signal integrity issues. However, limiting the number of customizable layers also correlates to lower performance and places a greater strain on the design tools, which have to work within the constraints imposed by the predefined metallization.
Some SA architectures provide pre-routed clock and global structures, while others require these to be user-defined; in the latter case, a design environment that includes clock synthesis technology is important. Furthermore, SA architectures with a pre-routed clock structure also come equipped with pre-defined SCAN elements, while others require this to be user-defined. In the latter case, the design environment must include an integrated design-for-test (DFT) capability.
The Requirements for a True SA-Centric Design Environment
One problem associated with SA technologies is that many design tool offerings are not well-suited to the task. ASIC tools that were originally conceived for use with SC-based architectures are extremely efficient when working at the fine-grained level in a flexible floorplan, but these tools typically have problems working with coarser hierarchical structures based on larger tiles and a fixed floorplan imposed by SA vendors.
By comparison, tools that were created with FPGAs in mind are better suited to handle the hierarchical structures found in many SA architectures, but they are not well-equipped to handle ASIC-like global and detailed placement, parasitic extraction, clock-tree synthesis, the creation of DFT structures, signal integrity (SI) analysis and timing and noise sign off.
Another key consideration is that of routing: SA design tools based on traditional SC routing algorithms are employing a variety of ad hoc solutions, such as disabling certain metal layers or writing scripts to focus the router on specific areas. However, most of these solutions require a significant amount of user intervention and produce inefficient results for SA technologies that require programming of just one or two metal layers to finish the design.
What is required is a unified design environment that combines the ability to work with hierarchical architectures (including the ability to perform architecture-specific synthesis such as Boolean matching and LUT/multiplexer mapping) with high-end ASIC capabilities such as heterogeneous placement, clock-tree synthesis, and DFT, along with parasitic extraction, timing analysis, and SI analysis.
Furthermore, in order to handle today’s extremely large and complex designs, the design environment needs to employ a single unified data model. This model, which must remain resident in memory to provide the high levels of performance required by large and complex designs, must contain all of the logical, timing, and physical data required by the implementation engines (optimization, placement, routing, and so on) and the analysis engines (parasitic extraction, static timing analysis, signal integrity analysis, et al.).
In order to address the complexities of the ultra-deep submicron domain, all of the implementation and analysis engines must have immediate and concurrent access to exactly the same data. What this means in real terms is that at the same time the router is laying down a track, the parasitics are being extracted, delay calculations are being performed, the signal integrity of that route is being evaluated, and the router immediately uses this data to make any necessary modifications on the fly.
The Solution: Blast Create SA and Blast Fusion SA
Unlike the vast majority of EDA tools – which are designed to address either ASIC or FPGA designs, but not both – Magma’s Blast Create™ logic design and Blast Fusion® physical design environments are equipped to handle the requirements of both architectural extremes. This means that these tools are well-equipped to address the needs of emerging SA technologies. In fact, special versions of these tools – Blast Create SA and Blast Fusion SA – have been made available to fully address the needs of SA architectures. The combination of Blast Create SA and Blast Fusion SA provide a complete RTL-to-GDSII flow for any SA architecture.
Blast Create SA: RTL-to-Placed Netlist for Structured ASICs
By combining Magma’s industry-leading Blast Create technologies with SA-based structure-specific optimizations, Blast Create SA offers a better quality-of-results (QoR) and higher predictability than point-tool solutions for SA designs.
SA tiles can contain a mixture of logic gates, multiplexers, and LUTs along with advanced sequential elements, local memory, and active buffers to drive high fan-out nets. In addition to containing multiple layers of tile hierarchy, SA architectures can also feature a variety of hard, firm and soft IP blocks.
Blast Create SA automatically imports all of the SA logic and physical cell libraries required to automatically create a floorplan that subsequently drives the entire implementation process. Transparent to the user, the Blast Create SA physical synthesis engine performs a mixture of Boolean matching and LUT/multiplexer mapping as required by each unique SA architecture. This engine also fully utilizes any on-chip buffer and dedicated driver cells to meet timing constraints as required.
Physical synthesis includes global placement; buffer insertion; retiming, replication, and re-synthesis; mapping, un-mapping, and re-mapping and detailed placement followed by legalization. All of these tasks rely on the availability of accurate timing models that drive the whole process to create design-rule-check (DRC) - clean, manufacturable SA devices. Blast Create SA has the detailed knowledge of the different SA architectures and the built-in analysis capabilities required to create these highly customized, architecture-specific timing models.
Unlike traditional ASIC-based synthesis solutions, Blast Create SA takes full advantage of the various SA tile structures and maps directly to SA tiles and on-chip hierarchical resources to achieve the highest possible performance. Blast Create SA also has the ability to pack unrelated logic together in the same tile to achieve more optimal resource utilization.
Unlike conventional FPGA-centric synthesis solutions, the Blast Create SA physical synthesis engine is fully capable of handling the complex physical constraints imposed by the various SA architectures. Its heterogeneous placement algorithm simultaneously places a mix of basic tiles and hard IP such as embedded memory blocks to achieve an optimal solution. Furthermore, in the case of conventional FPGA synthesis solutions, the packing algorithms are typically limited to one or two LUTs or multiplexers and registers in a logic element (LE) or logic cell (LC). By comparison, Blast Create SA is capable of packing throughout the hierarchy from basic tiles to clusters of tiles and master tiles.
Blast Fusion SA: Netlist-to-GDSII for Structured ASICs
Blast Fusion SA combines Magma’s industry-leading Blast Fusion technology with SA-based structure-specific optimizations, thereby providing a better QoR and higher predictability than point-tool solutions for SA designs.
Unlike traditional ASIC-based synthesis solutions, Blast Fusion SA’s physically-aware optimization algorithms take full advantage of the various SA tile structures. Similarly, Blast Fusion SA’s advanced detailed placement and routing technologies feature algorithms that are tuned to handle the complex constraints inherent in the pre-defined routing associated with SA architectures.
Unlike conventional FPGA synthesis technologies, Blast Fusion SA features advanced clock tree synthesis with simultaneous optimization of multiple clock domains for use with those SA architectures that don’t have pre-fabricated implementations of these structures. Furthermore, Blast Fusion SA includes ASIC-class (sign off) parasitic extraction, timing analysis, signal integrity analysis, power analysis, and DRC.
All of Blast Fusion SA’s implementation and analysis engines have immediate and concurrent access to Magma’s single unified data model, allowing the detailed placement and routing engines to access timing and signal integrity information and make any necessary modifications on the fly. The result is a legalized, placed, SI- and DRC-violation-free manufacturable solution that meets design constraints in the shortest time.
Architectural Evaluation
Unlike the various FPGA architectures that have undergone extensive analysis by the industry and academia, there is little data comparing the advantages and disadvantages of the various SA architectures.
The tradeoffs between different tile and hierarchical structures; the ratio between combinational and sequential resources, the use of hybrid resources such as RAM and DSP blocks, and the various interconnect structures is extremely complex. In order to address this issue, Magma has a tool called ArchEvaluator™, which allows users to fully assess the characteristics of different SA architectures.
In addition to its application by design teams who wish to evaluate and compare different SA architectures in the context of their particular design requirements in order to determine which device is best suited for the task, ArchEvaluator is also applicable to SA vendors in the process of developing and evolving new architectures.
Summary
Structured ASICs offer complexity and performance that approach that of traditional standard cell ASIC components coupled with the flexibility and low non-recurring engineering (NRE) costs that are typically associated with FPGAs. In order to take full advantage of the various structured ASIC technologies, however, design engineers require access to an appropriate electronic design automation (EDA) environment.
In order to address this issue, Magma has introduced Blast Create SA and Blast Fusion SA, which fully address the needs of SA architectures, and which provide a complete RTL-to-GDSII flow for any SA architecture. In the case of existing Magma users who are working with Blast Create and Blast Fusion to design conventional ASICs, there is no significant learning curve when it comes to using Blast Create SA and Blast Fusion SA.
Due to the fact that Magma’s Blast Create and Blast Fusion environments are equipped to handle the requirements of the architectural extremes associated with ASICs, structured ASICs, and FPGAs, these environments facilitate the designing of system-on-chip (SoC) devices based on a combination of traditional standard cell technologies with embedded structured ASIC cores, all in a single, unified design flow.
Magma is working closely with all leading structured ASIC vendors, library/IP providers and also with vendors specializing in the use of embedded structured ASIC cores in standard cell design flows.
In conclusion, Blast Create SA and Blast Fusion SA provide an easy-to-adopt solution that delivers higher predictability and better quality-of-results for structured ASIC designs compared to ASIC-centric or FPGA-based approaches. Also, the combination of Blast Create, Blast Create SA, Blast Fusion, and Blast Fusion SA provide for a seamless migration between FPGA, structured ASIC, and standard cell ASIC realizations of a design.
More information: www.magma-da.comRelated Articles
- Improving ASIC Design Verification using FPGAs and Structured ASICs
- Nextreme Structured ASICs: An alternative for designing cost-optimized ARM926EJ processor-based embedded systems
- How hybrid Structured ASICs provide low cost solutions for mid-range applications
- FPGAs and Structured ASICs: Low-Risk SoC for the Masses
- FPGAs and Structured ASICs: Low-Risk SoC for the Masses
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- UPF Constraint coding for SoC - A Case Study
- Dynamic Memory Allocation and Fragmentation in C and C++
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)
E-mail This Article | Printer-Friendly Page |