|
|||||||||||||||||||||||||||||
FPGA programming step by step
FPGA programming step by step FPGAs and microprocessors are more similar than you may think. Here's a primer on how to program an FPGA and some reasons why you'd want to. Small processors are, by far, the largest selling class of computers and form the basis of many embedded systems. The first single-chip microprocessors contained approximately 10,000 gates of logic and 10,000 bits of memory. Today, field programmable gate arrays (FPGAs) provide single chips approaching 10 million gates of logic and 10 million bits of memory. Figure 1 compares one of these microprocessors with an FPGA. Powerful tools exist to program these powerful chips. Unlike microprocessors, not only the memory bits, but also the logical gates are under your control as the programmer. This article will show the programming process used for FPGA design. As an embedded systems programmer, you're aware of the development processes used with microprocessors. The development process for FPGAs is similar enough that you'll have no problem understanding it but sufficiently different that you'll have to think differently to use it well. We'll use the similarities to understand the basics, then discuss the differences and how to think about them. Similarities Table 1: Step-by-step design process for microprocessors and FPGAs Architectural design Because you understand editing, compiling, assembling, linking, and loading in microprocessor programming, you can relate this to editing, compiling, synthesizing, placing, routing, and loading in FPGA programming. Editing Compiling The analogous operation in FPGA programming is the compilation of Verilog into register transfer logic (RTL) netlists. As the name implies, data is transferred into registers, subject to some clocking condition. At this stage FPGA programming departs from microprocessor programming in that an additional synthesis process is required to produce bits (or intermediate objects that can be converted to bits) that will control gates and fill registers and memories on an FPGA. This level is called gate-level logic, since it describes the logical gates of which the system will be composed. The output format is typically an Electronic Design Interchange Format (EDIF) file. There's a large difference between compiling and synthesizing, and you have to stretch some to encompass it. Whereas the compiler produces bits to control fixed-gate patterns (the microprocessor decoders, registers, arithmetic logic unit, and so on) the synthesizer defines gate patterns described by the logic of the program. That is, your program logic gets synthesized, or mapped into, logical gates, not into processor instructions that control multigate structures. This is absolutely amazing, and good FPGA programmers give thanks every day for living in the rare time in history (post 1990+) when you can design architectures with words and then synthesize your logic into (mostly silicon) gates that execute your logic. Not to get carried away, but it's absolutely wonderful. Linking The bit-based outputs of the microprocessor compilation process typically don't directly control gates but must be connected to other bit patterns. This is true because most programs run under the control of an operating system and must be connected, or linked, to the operating system. In fact, the location in memory of the actual compiled bits is usually unknown and not determined until linking and loading is completed. Further, there may be programs existing in a library that must also be linked to the compiled program before a useful product exists. The synthesis process, as we have discussed, produces bit patterns, in an intermediate format. We compile Verilog to RTL netlists, then synthesize Verilog to EDIF, then place and route EDIF to produce HEX or TTF files that can be loaded into an FPGA. These bit patterns will end up controlling logic gates and filling memory and registers. In the same way that C and other programs include objects defined in (possibly third-party) libraries, FPGA programs can include or import portions of systems from third-party intellectual property, in the form of FPGA-implementable programs or objects. Also, in the same way that the linking and loading process of embedded systems design connects various system objects, subsystems, or super systems like the operating system, including library objects (and loads or places them into specific memory locations), the place and route function in FPGA design places the synthesized subsystems into FPGA locations and makes connections (microprocessor links ~ FPGA routes) between these subsystems, enabling their operation as an integrated system. The actual linking and loading of compiled bits is essentially a process of fitting, in one dimension, the bit patterns distributed over a set of available linear memory addresses. The FPGA place and route process fits, in two dimensions, the bit patterns (logic subsystems) over a two dimensional array of available logic gates, and routes buses between these logic subsystems as necessary. The similarity in the processes is obvious. Loading Debugging programs Anyway, things don't work right out of the gate. We generally have to kick them and see what they do and where they expire. In embedded systems program development, we typically use debuggers, simulators, and emulators. These tools enable us to step through the program execution and observe the effects on flags, register contents, memory locations, and so on, and to try to match what we expect at a given place and time with what we see, in the simulator or emulator. Simulation vs. emulation: Debugging in FPGA design is largely based in simulation. You can see why. Emulation, in the context of embedded microprocessor programs, typically refers to executing programs on special in-circuit emulation (ICE) hardware designed to 1) run exactly like the target machine and 2) provide visibility, access, and control of the target machine in powerful ways, with emulation timing exactly equal to target timing. Thus, the essence of emulation is based on physical hardware that reproduces the microprocessor's gates while adding gates designed for debugging. This methodology works because the microprocessor consists of a fixed array of gates, implementing the CPU, among others. The FPGA does not have a "fixed" pattern of gates, in the sense of the CPU; in fact, the object of the FPGA programs is to define a pattern of gates, via the process described herein, that will implement our design. And our design will differ from every other design. This is why no FPGA emulators exist, in the sense of in-circuit emulators. Today the fastest processors cannot be emulated in hardware, since they already run as fast as possible, and the ICE circuitry adds additional gate levels, thereby slowing the ICE and preventing it from keeping up with the processor. For this reason, processor designs focus on built-in debugging support. This approach is somewhat analogous to FPGA design, where it's helpful to build in debug aids. Because you're used to building debug aids into programs, from PRINT statements on up, you'll understand the significance of this point, and be able to translate your understanding into the FPGA realm. Most microprocessor programming, particularly embedded programming, is based on the assumption that sophisticated operating systems and peripheral devices are "connected to" the program, through application interfaces. Thus debugging focuses on the inside of the machine and doesn't attempt to simulate the operating system or the peripherals but instead works with them. Testbenches in FPGA: Most FPGA systems are standalone systems connected to the real world, and functioning in interaction with the real world. Therefore a large part of debugging and test is concerned with simulating the real world to which the FPGA will attach. In reference to the way logic circuits were debugged historically, this is called the testbench and is considered an integral wholein other words, it can be compiled as a whole. Simulation typically steps through the operation of the testbench, which stimulates the FPGA. The simulation process observes the transformations and translations of signals as they propagate through the FPGA from the input pins and provides responses that eventually reach an output pin. Because there will likely be tens or hundreds of thousands of gates, you obviously don't watch all of them, but the most meaningful ones in the area of interest. For example, a FIFO buffer may be 48-bits wide and thousands of words long, but the gate output that most interests you at the moment may be the Buffer_Full signal, so this would be displayed on the simulation screen, along with other signals of interest. This process somewhat resembles focusing on flags or semaphores in microprocessor programming. You can group multiple signals. For example, eight wires may be given a name, Data_In_Bus, and the numeric values appearing on the bus may be displayed, as opposed to showing all eight lines constantly changing. State variables can be given ASCII names and displayed. In the same sense that you may want to focus on program execution within a subroutine, you may want to observe only the signal activity in a subsystem. Once you learn the tools, it becomes quite natural to debug an FPGA subsyste m you've designed. Bottom line: FPGA debugging is done by software simulation and by using actual hardware. Any debug use of the actual hardware must be designed by you. Netlist and Top ~ Main: Although it's not really a part of the process, it's worthwhile to understand that the output of the FPGA design process is a netlist or list of nets or wires that connect gate outputs to other gate inputs. Further, there is a top level from which everything descends. Think of the top as the Main point in a microprocessor program where the program starts. Although there may be 50 or more modules that are created independently in an FPGA design, when the process is finished, all will be linked in the netlist. Any module not in the list will have no effect. This is analogous to a subroutine that is never called. If there is no connection to a module, the module can't do anything. Documenting programs: Delivering product: The microprocessor hardware, boards, power supplies, connections, must be correctly designed, and the software must be burned in or downloaded as described above. The FPGA hardware, boards, power supplies, connections, etc., must be correctly designed, and the software must be burnt in or downloaded as described above. If the hardware is correct, the software can evolve. This allows bug fixes and feature addition. This is true for microprocessors and FPGAs. The Verilog or VHDL hardware description language (HDL)each is a high-level design languageprovides fast time-to-market using FPGAs. The HDL system allows design, debug, and verifyall within the same environment. As with microprocessor design, FPGA design can be command-line driven or IDE-based. Differences Unsynthesizable: Why would you even want to use a language containing nonsynthesizable constructs? There are several reasons. We'll look at two. First, you may want to represent a system that will later be partitioned into software and hardware subsystems. It's easier to design such a system using the full language and later restricting the hardware portion of the design to synthesizable constructs. A more general reason is the following. FPGA design should always include a testbench, which is the environment that provides inputs, including clock(s) and data, and accepts outputs from the FPGA. It's the software that describes the world as "seen by" FPGA pins. This world is compiled to be simulated but not synthesized. Think about this. The code in the FPGA must be mapped into real logical gates in the FPGA, therefore, by definition, it must be synthesizable, since synthesis is the process of converting RTL language into gate level language, and hence, into a field programmable gate array. But the code "outside of" the FPGA is not going to be put inside of an FPGA. It's going to be used by the designer to simulate the environment of the FPGA, while debugging. For this reason, it's useful to allow high-level programming constructs that simplify the construction of the testbench. If the testbench construction were limited to synthesizable constructs it would force the designer to use lower-level abstractio ns than is necessary. Figure 2 shows the synthesizable and nonsynthesizable portions of a design. This is a very important point, so we repeat it. The Verilog language, designed initially for simulating logic, offers powerful high-level constructs that are useful for simulating the "real world" to which the FPGA will be connected. The constructs will not be mapped directly into FPGA logic structures or be converted into a gate-level netlist. Since only a subset of the Verilog language is synthesizable, that testbench design is easier, but FPGA programming, per se, is more difficult. Newer versions of Verilog (coming on the market "real soon now") will change the "mix" of synthesizable to nonsynthesizable language abstractions, but this problem will probably always be with us to some degree. C and Verilog The situation is analogous to the assembly language vs. C programming history. When resources are scarce, it pays to design efficiently, therefore assembly language is best. When resources become free, the efficiency of the design process dominates, and C language programming is preferred. In the FPGA world, resources are not yet free, therefore Verilog is the language of choice. This will probably change over time. Note that a C interface to Verilog already exists, as shown in Figure 3. There are also variants of FPGAs that contain a microprocessor core on the silicon along with the FPGA circuitry. In such cases the microprocessors can be programmed using C while the FPGA gates would be programmed using Verilog or VHDL. The use of C in the functional testbench code makes sense, because it need not be synthesizable. For those embedded systems programming teams that would like to extend their capabilities into FPGA design, I recommend that those with the least understanding of hardware should focus on testbench design, while those more capable of hardware design focus on learning how to write synthesizable code. Optimization When the system designed in Verilog is compiled, the output is an RTL netlist. When input to a synthesizer, the Verilog is converted into a gate-level netlist, capable of being mapped into FPGA hardware (assuming successful synthesis.) Most synthesizers can produce a Verilog language description of this gate-level code. The beauty is that this gate-level Verilog can be compiled and simulated. Thus, we can debug at the actual gate level. The simulation of the RTL Verilog is called functional simulation, while the simulation of the synthesizer Verilog output is called gate-level simulation, as shown in Figure 4. What's the difference between functional and gate-level simulation? One difference is that, just as C compilers can optimize C code, synthesizers can optimize FPGA netlists. In fact, if you specify the goal, synthesizers can optimize to meet your goal. The goals are typically area vs. delay. Area optimization will attempt to use the fewest number of gates (silicon area) on an FPGA, at the expense of execution speed. Delay optimization attempts to maximize the execution speed, even if more FPGA area is required. The net result is that the functional code you wrote in Verilog at the RTL level may have different implementations, and signals that you used to debug the functional code may have been optimized out of existence. That is, they may disappear in the final gate level implementation. Thus, even though you've thoroughly tested and simulated the RTL code, you'll want to do the same at the gate level. Synthesizers typically allow constraints to be specified as part of the optimization process. One such constraint is b to prevent the synthesizer from doing whatever it wants to specific elements of the design. Another constraint is preserve_hierarchy as an alternative to flatten the design. Because hierarchical boundaries can prevent or limit optimization, synthesizers, which flatten the design will typically provide more optimal results. SDF and back-annotation When a synthesizer produces gate- level Verilog for an FPGA, it's strongly FPGA dependent; that is, the delays associated with vendor-specific FPGA structures are known and can be used to compute operation speeds. Thus simulation of gate-level code for a specific FPGA is realistic in this sense. But remember, we still have to place and route this gate-level netlist. This operation will add delay for longer routes, thus slowing the final execution speed. If your design must meet some real world spec, such as a 12MHz USB (48MHz clock) or 480MHz USB2.0 then you must run at this speed, or you haven't solved the problem. How can you tell whether the routed code will run fast enough? To solve this problem, place and route programs (supplied by the FPGA vendors) will also produce Verilog output and will produce SDF files, which are files in standard delay format, that capture the delays associated with the placed and routed netlist. Simulators can use this SDF information to back annotate the gate- level code, thus allowing simulation of the final FPGA design at its most accurate. Because the FPGA elements are well characterized, with typical setup and hold times, the simulator can detect failure to meet these specs. On graphical waveforms, the failures typically show up in red, while good timing is "in the green" when the desired clock frequency is used. When the FPGA runs in the green with the desired clock frequency used, and behaves in the testbench as is desired, you have an FPGA design that's ready to be downloaded to hardware for real-world testing. Perhaps we should point out here that the three versions of FPGA code simulation, RTL, functional, and gate-level will typically all use the same testbench code; that is, there are not three versions of the testbench. In review When gates were precious entities and tools 100% proprietary, it made ultimate sense to arrange these limited gates into universally used objects, such as CPU registers, ALUs, instruction decoders, and address decoders. You would then provide a set of instructions that linked and relinked these elements, so that, for example, two CPU register outputs could be connected (via buses) to an ALU input, then the ALU output connected to a destination register, and then the ALU input connected (via buses) to a specific memory address, and the ALU output connected to a different register, and so on and so on. It made perfect sense. When the scale and, therefore, the economics changes, everything changes. When gates are no longer precious but are commodities, the fixed elements approach no longer makes as much sense. The monstrous development in languages, tools, I/O devices, standards, and so on will keep CPU development and implementation alive for decades, if not centuries, but the economics are now and trending more so in the FPGA direction. This has recently been given another economic boost relative to ASICs. The cost of repairs in hardwarethat is, ASICsis increasing drastically with decreasing line width and increasing gate density, making FPGA technology even more relevant for embedded systems designers. Today over a million gates are available, tomorrow 10 million, accompanied on a single chip by millions of memory bits. You can make anything you want by describing it in a programming language, such as Verilog, and going through the process described above. FPGA elegance Programmers don't really see this, just as fish probably don't see water, because that's the nature of the process. It's less visible with high-level language and more visible with assembly language, but it's always there. In FPGA this "getting ready" doesn't really occur. Everything is where it belongs and happens all at once, in one clock cycle. This is not to say that you can't design registers, buses, and ALUs in FPGAs, but you'll find that you really don't spend much time "getting ready to do something." I won't push this point, because you have to design FPGAs before it hits you over the head. But remember: you read it here first. Another significant difference with FPGA design lies in the parallel nature of FPGA processes. Instead of a single program counter based "locus of control," an FPGA typically clocks all gates at once. Thus you can have many processes occur in parallel, instead of sequentially. This also takes some getting used to because it's so different from the way programmers think. Why would you want to program an FPGA in the first place? Well, if you're designing accounting programs, you don't. But many embedded systems are tightly coupled to the real world, and there are many problems that simply happen too quickly to be handled in software. In this case you can let your competitor have these problems (which tend to be expensive!) and you can stick with the slower, easier (low-profit) problems. Or you can program FPGA solutions. It's not mandatory. It's an opportunity. Ed Klingman worked as a research physicist at NASA for seven years, then founded Cybernetic Micro Systems, Inc, now celebrating it's 25th anniversary. He is author of the Prentice-Hall textbooks Microprocessor Systems Design, Vol I. and II. and numerous technical papers, and has been awarded 20 U.S. patents. You can reach him at klingman@geneman.com. Your article is one of the best I've seen in Embedded Systems in a while. Even though I've already learned a good bit of what you had to say, I could see how much easier it would have been to get started with a good introduction like that. I expect that I'll be recommending it to anyone I know who is getting started on FPGA design. You hint that you're working on future articles, so I'll include a few thoughts that occurred to me while reading the first article, based on my own experiences. You discuss the issue of unsynthesizable HDL, but bypass the potential issue of synthesis not doing what you "want." Though mostly a C programmer, before I ever saw Verilog I had seen quite a bit of PALASM and CUPL code, which is all written at the RTL or gate level. What surprised me sometimes was the number of iterations it took before the Verilog code I wrote actually produced what I wanted. Sometimes the synthesis tools themselves are limited in the "patterns" they can recognize, and the particular way you chose to implement your asynchronous reset or whatever function doesn't match their patterns. Further down the same path, I think you only barely touched on the huge difference between functional and gate level simulation. If you don't have experience and/or a good head for thinking about synchronous logic, it's very likely that you'll find that functions that looked perfect under functional simulation will have to be totally redesigned under gate level simulation, because all of a sudden you have to deal with propagation delays. I've even run into cases where the functional simulation FAILED but the gate level WORKED, because the propagation delays were necessary to ensure logic was in sync. You ask the question "why would you want to program an FPGA?" You had just hinted at the parallel nature of the FPGA, but didn't integrate the two. First of all, an FPGA can reduce the chip count, by serving as the glue logic as well as incorporating other pieces of the system. There is a wide range of available soft and hard IP cores, including microprocessors, that allows you to pull all these functions into a single chip. (You mention only a µC core on the silicon, but of course, you can pour soft IP into free gates and tailor a µC's size and functions to the application at hand.) In addition, FPGAs can implement DSP-like functions, including ones well beyond the scope of all but the most powerful DSPs. With things like built-in multiplier units, FIFOs and other memories, modern FPGAs are very well suited to some major data-crunching tasks. In any case, I just wanted to let you know how much I enjoyed the article. Greg Nelson The author replies: Thanks for the positive response. I agree with all of your comments and suggestions. Embedded Systems has a 'max length' for articles, and my article was as long as they allow. I hope to publish more FPGA articles, and focus on some of the issues you raise. Editor's note: plans are underway to increase the maximum length of articles. Copyright 2005 © CMP Media LLC |
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |