So you have an algorithm or a compute-intensive function you want to implement in hardware. Does that mean you have to go through the traditional ASIC design flow, writing register-transfer-level VHDL or Verilog? Not at all, say an increasing number of providers of electronic system-level design and silicon intellectual property. But neither a magical nor 100 percent pushbutton solution exists. Most often, the move to higher levels of abstraction means making trade-offs in area, performance or quality-of-results and learning new methodologies and tools. EE Times' "Easy Paths to Silicon" seminar at last month's Embedded Systems Conference in San Francisco heard a number of alternatives to the traditional RTL ASIC design flow. The discussions covered C-language synthesis for ASICs and FPGAs, application-engine synthesis, coprocessor synthesis, application-specific processors and configurable processors. Most of these solutions aim at datapath-intensive designs. Not a single presenter offered a solution that was fully automated or appropriate for all design styles. Most noted that their solutions still require some knowledge of hardware design. "I know of no tools, including ours, that can offer pushbutton software-to-hardware compilation," said David Pellerin, CTO of Impulse Accelerated Technologies, which provides C-language synthesis for FPGAs. "A good VHDL or Verilog programmer with experience in FPGAs can still beat what a compiler can do today." What Pellerin and other presenters did say is that the new electronic system-level (ESL) solutions can help companies without extensive chip-design expertise get into silicon relatively quickly, shaving days, weeks or months off of a design cycle. And as chip complexity increases because of new IC process technologies, such solutions may become the only practical way to do design, they say. Gary Smith, chief EDA analyst at Gartner Dataquest, said the solutions discussed at the "Easy Paths to Silicon" seminar are all part of an emerging "algorithmic" ESL methodology. This flow, he said, has two types of tools: "algorithmic synthesizers" and "processor engine compilers." Algorithmic synthesizers, which develop a design from an architecture, include tools like Forte Design Systems' Cynthesizer, Mentor Graphics' Catapult C and Celoxica's DK, Smith said. Processor engine compilers are based on predefined platforms, though some of those can be used outside of a platform-based flow. They include tools from CriticalBlue, Synfora, CoWare and Tensilica. Hardware is not software As tempting as it might be to think that a software developer can push a button and get hardware, it doesn't happen that way. Software and hardware design are fundamentally different, said Forte Design Systems technical marketing manager David Pursley. Software design optimizes execution speed and clarity, he said, while hardware design focuses on throughput, latency, area, power and routability. In software, use of large arrays for intermediate memory storage is "free," in contrast to hardware, where large internal memories carry a substantial cost. But behavioral hardware design is not RTL, Pursley noted. With behavioral design, state machines are implicit rather than explicit, there's no sense of a clock and loops can be rolled or unrolled. That's why behavioral design is much faster. Forte offers a SystemC-based synthesis tool, Cynthesizer, that's thus far been adopted mainly by Japanese consumer electronics companies. It takes a SystemC description and performs compilation, data path analysis, resource allocation, scheduling, binding and production of Verilog RTL code. There are some basic steps common to any behavioral synthesis flow, Pursley said. The starting point is to "identify the design" and figure out what's going into hardware and software. This may involve some code reorganization. Subsequent steps include separating the design from the verification testbench and making the design synthesizable, which requires compliance with a SystemC synthesizable subset. But now there's a synthesizable design that might not meet quality-of-results, so the designer still needs to optimize for throughput, latency and area before pushing the button for synthesis. Although not presented at the seminar, Mentor Graphics Corp.'s Catapult C synthesis tool competes with Cynthesizer and differs because it uses ANSI C input rather than SystemC. The fundamental steps a designer would employ are similar. The Bluespec compiler from Bluespec Inc. offers behavioral synthesis from SystemVerilog assertions for control-dominated designs. The Forte, Mentor and Bluespec tools focus on ASIC design. Celoxica, in contrast, targets FPGA design with its DK Design Suite, based on the company's Handel-C language. The output is a completed netlist, and the company claims quality of results comparable to hand-crafted RTL. Celoxica's new Agility compiler uses SystemC. But here again, there's still some work involved. "By far the most difficult part of the entire design flow is extracting a nice clean specification model that the designer wants to implement," said Stephen Chappell, director of applications engineering at Celoxica. Then comes hardware/software partitioning, which is "still very much a manual project," he said. Then come more steps: creating a functional system model, an architectural and communications model, and an implementation model ready for synthesis. After FPGA programming, a final step is board-level integration into a hardware platform. Impulse also targets FPGAs, using a compiler from Los Alamos Labs based on a "streaming" programming model. Pellerin said it supports parallelism and is ideal for applications requiring repetitive communications at very high speeds, making it necessary to process streams of data in real-time. The steps Pellerin identified include creating a C language model, converting floating-point math to fixed-point, identifying "hot spots" to move to hardware and interactively refining the design. "No tool today will take C code and give you a perfect implementation," he said. Processor engine compilers While conventional behavioral synthesis tools provide software for generic silicon, the processor engine compilers imply that there's some kind of specialized silicon under the hood. Synfora Inc., for example, bases its C language "application engine synthesis" on a piece of proprietary configurable IP called a pipelined processing array (PPA). The idea, said Craig Gleason, director of hardware at Synfora, is to create fixed hardware accelerators that can take on compute-intensive tasks. These can use as much as 100 times less power and provide orders of magnitude better performance than a general-purpose processor executing the same function in software, he said. Users of Synfora's Pico tool must convert a single-threaded C language description to multiple threads, understand their performance target and identify functions best handled in parallel hardware. They will then need to verify the synthesized RTL code against the original C language description. Synfora targets video, audio, wireless, imaging and security algorithms. CriticalBlue's "co-processor synthesis" has a different focus: accelerating tasks in compiled, executable software by creating coprocessors that offload the main processor. It differs from other ESL solutions, said Glen Anderson, senior applications engineer at CriticalBlue, in aiming primarily at software, not hardware. But then, CriticalBlue doesn't promise orders-of-magnitude speedups. Anderson said users will typically get a 5x to 10x jump over what they'd see on an ARM processor. CriticalBlue's Cascade tool reads in executable code, identifies computationally intensive "hot spots" and generates RTL for VLIW-style coprocessors that include caches and buses and are ready to plug into an ARM core. The coprocessors run ARM instructions and look like an ARM pipeline with extensions, Anderson said. How about designing your own processor? That's daunting, but CoWare Inc.'s LisaTek product claims to make that easier. As described by Achim Nohl, software development manager at CoWare, LisaTek lets a designer describe a processor instruction-set architecture in the proprietary Lisa language. It then automatically generates a software development tool chain along with an RTL model. One is not limited by a a base architecture's constraints, Nohl said, as a coprocessor solution would. But LisaTek does require some expertise. It's mandatory, he said, to have a basic understanding of processor architectures and C programming. It would help to understand compilers and be familiar with RTL. Configurable processors Yet another do-it-yourself processor approach comes from Tensilica Inc., whose Xtensa processor architecture claims over 100 customers in such fields as networking and imaging. The company's Xpres C/C++ compiler offers a way to define instruction-set extensions in C. Dror Mayden, director of software development, sees Xpres as an alternative to conventional RTL design and behavioral synthesis. He said behavioral synthesis has not had good quality of results, is inflexible after tapeout and poses verification issues. "An alternative is the configurable processor, which can be extended by the designer with custom instructions, functional units, co-processors and datapaths," Mayden said. Claimed advantages are ease of use, post-tapeout programmability, and an ability to trade off flexibility for efficiency. Tensilica offers a language, Tensilica Instruction Extension (TIE), that lets designers describe processor extensions manually. TIE supports SIMD, VLI and fusion (pipeline parallelism) processor techniques. It generates software development tools and RTL for the hardware. More recently, Tensilica brought out XPRES, which generates TIE files automatically from C/C++ code. While less flexible than writing directly in TIE, it provides a faster route to silicon. Startup Stretch Inc. has licensed the Tensilica Xtensa RISC processor core and implemented it in a programmable fabric. The Stretch Instruction Set Extension Fabric (ISEF) adds a software-configurable datapath. Developers program and configure the Stretch processor chip entirely in C/C++. Compute-intensive loops can then be implemented on the array fabric. |