SOC: Submicron Issues -> Large PLDs need own physical models

Large PLDs need own physical models

Large PLDs need own physical models
By Jeff Garrison and Balaji Thirumalai, EE Times
October 16, 2000 (4:00 p.m. EST)
URL: http://www.eetimes.com/story/OEG20001016S0058

Jeff Garrison, Director of PLD Products, Synplicity Inc., Sunnyvale, Calif., Balaji Thirumalai, Manager, CAE Tools Product Planning, Altera Corp., San Jose, Calif.

As system complexity increases and time-to-market shrinks, designers frustrated by the high cost and long development cycles of traditional ASICs are turning to multimillion-gate programmable-logic devices (PLDs). Greater PLD complexity and lower development costs are resulting in a dramatic rise in the number and types of applications being implemented in PLDs, from high-volume consumer products to networking and telecommunications equipment.

But deploying the latest PLD technologies presents an entirely new set of difficulties. PLD process technology has reached very deep-submicron gate lengths and PLD designers are facing the same problems ASIC designers had to deal with three to four years ago. In the very deep-submicron era, timing delay is now dominated by the interco nnect and no longer by the logic. Traditional tools and design flows, which assume that delay is dominated by the logic, cannot account for interconnect effects early in the design cycle, before synthesis.

The solution is design automation that can perform physical modeling. But using already proven ASIC design techniques for PLDs is not the answer because of the unique interconnect architectures of PLDs. What is needed is an entirely new design automation technology aimed at PLDs. In particular, PLD physical synthesis can link more productive high-level design with the physical design stages, playing a key role in realizing these advanced complex designs.

Multimillion-gate PLDs with six or more layers of metal, clock speeds approaching 200 MHz and on-chip memory and intellectual-property cores are the leading edge of today's most complex programmable logic. But the same fine-line process technology that produces such high-density, high-speed devices has brought with it ASIC-like problems.

It's the interconnections

In particular, timing delays, once dominated by logic, are now determined largely by the interconnections between logic. In process technologies below 0.25 micron, up to 70 percent or even 80 percent of delay is due to routing interconnect. High-productivity design methods such as synthesis fail to adequately account for these effects. As a result, timing performance, which is largely dependent on interconnect, remains uncertain until after place and route.

But waiting for the details of timing until this late in the design cycle often requires repeated, time-consuming iterations through synthesis and place and route before designers are certain that critical timing requirements have been met. The effects of the changes that are made in timing, since these tend to be at the gate level, are therefore small.

Some designer s attempt to address the problem by using techniques created for ASIC design in their PLD designs. But these tools lack sufficient intelligence for PLD architectures. The physical interconnections of PLDs are formed differently and the rules for how they are made, as well as the electrical characteristics of those connections, are significantly different from those of ASICs. Moreover, PLD architectures and their interconnect characteristics differ substantially among vendors, so the standard physical models used by ASIC design tools are unfeasible.

Other designers, often those working in design teams, have resorted to gate-level floor planning, in which a circuit's HDL code is synthesized to get a netlist. Using a floor-planning tool, the designer imposes constraints on placement. These constraints-which usually ensure that certain key groups of gates stay near one another-are intended to keep critical timing paths as short as possible during the place-and-route phase.

But gate-level floor pl anning is a very tedious process. Because designers code their circuits in HDL, describing functional blocks at the register-transfer level (RTL), working at the gate level to determine which gates are actually part of the block in question means that the designer must manage a hundred times as many objects. Moreover, it is extremely difficult to relate these gate-level objects back to the HDL code that generated them. Additionally, small changes in the design can completely invalidate a gate-level floor plan, possibly requiring that weeks of effort to be thrown away.

Gate-level floor planning can also negate the reason for which PLDs are used in the first place: fast turnaround. Once critical blocks are placed with the floor planner, any change to the design requires that the floor plan be entirely re-created. This quickly translates into weeks of modifying RTL code, resynthesizing it to get a new netlist and floor planning again at the gate level to achieve a design's timing specification.

Finally, and most important, gate-level floor-planning tools cannot change the actual circuit implementation to improve a circuit for performance. Improvements such as logic tunneling and logic replication, which automatically move registers into difference physical regions, have the ability to significantly enhance performance. Thus, using traditional gate-level methods is exceedingly difficult and time-consuming.

If it were possible to get physical routing information about a design at the beginning of the design flow, before synthesis, sophisticated synthesis algorithms would then be able to perform simultaneous placement and optimization on the netlist. ASIC designers are starting to employ this strategy with first-generation ASICs.

A recent advance in PLD design automation brings a similar advantage to PLD designers. Physical synthesis technology promises to simplify and improve the process of achieving timing convergence in fast, complex PLDs. Using physical synthesis, designers can qui ckly and easily apply physical constraints along with standard timing constraints after the HDL source code is compiled but before a circuit is optimized and mapped to a particular PLD.

Superficially, these physical constraints resemble an RTL floor plan; thus, physical synthesis actually restructures the logic of a design based on its physical characteristics and creates placement. This restructuring reduces or eliminates iterations between synthesis and place-and-route and improves productivity as well as design performance. Additionally, designers working at the RTL make the constraint creation process much faster and more intuitive than is possible at the gate level.

One key advantage of physical synthesis at the RTL is that even if a designer makes changes to a functional block-for example, expanding the width of a bus-any physical constraints imposed on that block device remain valid. By contrast, a gate-level floor planner requires that the block be floor planned again each time it is modified.

In some cases, the simultaneous optimization and placement algorithms of physical synthesis improves a design's timing performance by as much as 40 percent. This opens up the option of using slower speed-grade devices, which can save tens to hundreds of thousands of dollars per design, depending on volume.

Synplicity's Amplify Physical Optimizer tool is the first physical-synthesis tool designed specifically for programmable logic. Amplify combines innovative physical-optimization technology with the logic-synthesis algorithms of Synplicity's Synplify synthesis environment for PLDs. It is a hierarchical optimization engine that leverages circuit topology and placement knowledge to produce significantly improved netlist results after physical-optimization synthesis.

Amplify uses its knowledge of PLD architectures and user-specified physical design constraints to make more predictable timing estimations within defined physical regions (that is, rows or MegaLABs in Altera's Fle x 10K, Apex and Acex devices). With this physical information, the tool derives more accurate timing estimations and uses them to perform additional optimization techniques during synthesis.

Certain critical-path-optimization techniques are possible only when physical information is known. Two of these techniques are automatic tunneling and logic replication. Automatic tunneling uses boundary optimization techniques to move logic automatically among the PLD's physical regions, reducing interconnect delay and improving speed, so the synthesis tool can perform placement to improve a critical path's timing. Logic replication automatically replicates logic cells in order to improve the timing of a critical path. It is used on paths going to multiple regions when some of those paths are critical. Replication makes separate copies of the block for each critical path, reducing fanout or improving the logic packing and the predictability of routing delay or both.

Actual benchmarks conducted by Altera and Synplicity demonstrate that designs that have been run through the Amplify physical-synthesis tool show significant performance improvements over netlists that have been traditionally synthesized. However, the benefits of physical synthesis are not merely the result of forward annotating synthesized placement constraints. The netlist produced by Amplify is structurally different from one produced with synthesis that is devoid of physical optimization. In theory, functionally grouping logic can improve design performance. But it does not prevent a critical path from traversing through some or all functional modules and thereby nullifying any advantage provided by functional grouping. Physical synthesis combines the benefits of improved delay estimation based on the physical constraints with critical-path optimization techniques.

Highly predictable

Since physical synthesis relies on the optimization of design structures based on device architecture, it works particularly well when t he device architecture is highly ordered and predictable. For example, the hierarchical architectures found in Altera devices lend themselves to physical synthesis. The MultiCore embedded architecture in Altera's Apex devices is an innovative combination of three different types of PLD structures: look-up tables like those found in Flex 10K and Flex 6000 devices; product-term blocks, like those found in MAX 7000 devices; and enhanced embedded memory blocks, like those found in Flex 10KE devices. Together, these structures make the integration of complex functions an easy and efficient process. The MultiCore architecture is made up of logic array blocks (LABs), each consisting of 10 Flex 6000 logic elements (LEs). These are combined into a new hierarchical structure called a MegaLAB structure, which is an array of LABs. Each contains 16 LABs and an advanced embedded structure called an embedded system block (ESB) to implement memories.

The MultiCore architecture enhances the continuous metal-routing s tructure of the Apex devices by introducing a fourth level to the routing hierarchy. In addition to the global row and column interconnect, the MegaLAB interconnect connects all LABs and the ESB within a MegaLAB structure. The Megalab interconnect allows increased performance by using local routing resources instead of global routing resources. Local interconnect also connects the LEs within the same LAB to adjacent LABs, as in the LAB interleaving of the Flex 6000.

The hierarchical structure of logic and interconnect in Altera devices lends itself well to physical synthesis because users can direct a critical path to be placed into several well-defined blocks, such as MegaLABs, MegaLAB rows, half-columns of MegaLABs and ESBs. These placement constraints can be forward annotated to the Quartus floor planner before place-and-route. Physical synthesis in Altera devices leads to real performance gains, as demonstrated by a suite of 10 real-world designs that was benchmarked to study Amplify's performanc e benefits. The benchmark data clearly shows that Amplify helped improve performance by 16 percent on average over a Synplify-only flow and helped meet or exceed the target frequency five out of 10 times compared to Synplify alone.

Working together, Altera and Synplicity have taken the first step to solve the problem of achieving timing closure by using physical synthesis.