RISCy Business

RISCy business

RISCy business
By Jim Turley, Embedded Systems Programming
May 29, 2003 (4:21 p.m. EST)
URL: http://www.eetimes.com/story/OEG20030205S0025

Despite the hype over RISC, CISC processors live on. In fact, they dominate. Sometimes slow and steady really does win the race.

There's a tendency in our community to equate "RISC" with "good." RISC processors are considered more modern, efficient, and cost-effective than older CISC chips. They're also generously endowed with better performance, lower power consumption, and higher speeds. By tomorrow it seems RISC processors will probably rid the world of all known diseases, end war, and make your breath smell minty fresh.

It ain't necessarily so. RISC chips have their place—some of my best friends are RISC programmers—but they're not for everyone. To be blunt, RISC processors are chips that failed in the desktop computer market. They're overwhelmingly losers. They're used in embedded systems by default, not by design. Like Australia, the RISC community was populated by outcasts, failures, and undesirables. But like those early settlers of the Antipodes, RISC has transformed itself into a symbol of new hope and opportunity.

CISC: not dead yet
Another widespread belief is that RISC has overwhelmed the once-mighty CISC camp in design wins and chip sales and that CISC is trailing smoke as it spirals downward. Sorry, thank you for playing; we have some lovely parting gifts for you. Almost two-thirds of all the microprocessors and microcontrollers sold last year were 8-bitters, all of which were CISC architectures like the 8051 and 6805. Practically all 4-bit and 16-bit processors are also CISC designs, and they collectively made up another 30% of last year's sales. That leaves less than 10% of sales for 32-bit processors, and even a lot of those aren't RISC chips. CISC totally rules in high-volume (in other words, low-performance) systems and is still holding its own in high-end 32-bit chips.

To be fair, RI SC has overtaken CISC in the 32-bit embedded world. Until 1999, Motorola's 68k was the best-selling 32-bit processor since the category was created. SPARC, MIPS, AMD's 29000, Intel's i960, ARM, and even Motorola's own 88000 challenged that business throughout the '90s, but the 68k stood firm. ARM shipments finally overtook the 68k in 1999, and the gap has yawned wider ever since. ARM licensees (the company makes no chips of its own) now collectively outsell Intel's Pentium line by a hefty 3:1 margin.

RISC processors are doing well at the sharp end of the market, but are they really appropriate for embedded systems? That all depends on what characteristics you're shopping for. RISC chips might not deliver the advantages you think, and they have some disadvantages not every programmer is aware of.

RISC giveth, RISC taketh away
First, the easy stuff. RISC, of course, stands for reduced instruction set computing. So, by definition, we're dea ling with chips that have fewer assembly-level instructions compared to CISC chips; RISC chips do less. They have a smaller vocabulary, a limited repertoire. How is that a good thing?

The theory behind RISC is that reducing the number of features within the chip makes it go faster because it's "streamlined," unencumbered by the features and functions that CISC chips had accumulated over the years. This ignores the fact that those features had been added for a reason; CPU companies don't gratuitously add instructions just to waste transistors. Still, RISC's faster clock speeds are supposed to make up for their relative stupidity. They may not be able to do a lot, but they can do it faster.

That strategy doesn't always work. Right now the world's fastest microprocessor (in terms of clock frequency) is from Intel's Pentium line—an unrepentantly baroque CISC processor if ever there was one. Granted, Pentium 4 is a bizarre anomaly (in a lot of ways), but CISC chips have been kicking RISC butt for a while now.

The clock-speed story never really came true. More to the point, do you really want all that MHz? High frequencies radiate a lot of electromagnetic interference: radio waves that can interfere with your system. High-speed buses are a pain for hardware engineers to deal with, and fast processors require fast memories and big, fast caches. In short, your whole system gets more complex, error-prone, and expensive when you use high-speed "streamlined" processors.

From ADD to XOR
RISC is all about doing without. They do without many features, functions, and opcodes that CISC processors generally possess. If you're programming in a relatively high-level language such as C, Pascal, or FORTRAN, instruction sets may not matter to you. That's the compiler's problem. But the side effects are real; you can't simply remove half of a processor's instruction set and not have some downside.

First, there's the math. Early RISC chips had n either multiply nor divide instructions; some couldn't even subtract. Eventually multiply and subtract crept back into the feature set, but most RISCs still can't divide. If you—or your library—ever needs to divide two integers, you have to hand-code routines to do it. Don't even think about floating-point arithmetic; most RISC chips don't offer that either. (Neither do a lot of CISC chips, but floating-point units are more common among them.)

Even simple arithmetic gets complicated if the variables are in memory. It's normal for programs to update hundreds of variables, mostly stored in RAM. Ah, but RISC chips can't modify memory; that's against the rules. So instead of simply incrementing a counter you have to copy it from memory, load a constant (probably 1) into another register, add the two together, check for overflow, and write it back. One simple instruction on any CISC processor becomes a minor routine on RISC platforms.

You also have to do without bit-twiddling. If your system ma ssages I/O ports or other peripherals, it's a lot tougher when your processor can't handle anything smaller than a 32-bit word. To toggle a single bit in a status register, for example, you have to read the entire register (32 bits) into the processor, mask off the low-order and high-order bits you don't want, right-shift the result, check it against zero, exclusive-OR it with a mask, left-shift it back into position, and then write it (along with the 32 bits of memory or I/O around it) back out. Let's hope the memory or peripherals located at the other 31 bits don't get upset by all this reading and writing.

Unaligned memory accesses are another casualty of RISC doctrine. RISC chips were designed for workstations, where data is conveniently aligned on word boundaries because the compiler put it there. Bytes and halfwords either didn't exist or were zero-padded to fit a 32-bit word, wasting storage. In embedded systems, quantities often wrap around word boundaries or lie on unaligned addresses. For many RISC chips, that data is inaccessible. They simply can't handle unaligned loads or stores. Likewise, anything smaller than 32-bit words have to be sign-extended or padded with zeros, wasting RAM.

You get the idea. In contrast to this Spartan ethic, CISC chips are plushly endowed with all kinds of interesting and useful instructions. The x86 family of chips (8086, 386, and so on) has the very useful REP SCAS pair of instructions. These set up a hardware loop that searches for a byte pattern in an arbitrarily large memory array. With just two instructions—a mere 16 bits of code—the processor will perform a substring search. It even sets up its own hardware counter and returns the full address of each match. No compiler or assembly programmer could do better.

Another example is the TBLS instruction found on several of the 68300 chips from Motorola. This is a table-lookup-and-interpolate instruction that's either totally useless or the key feature in an embedded system. TBLS allows you to creat e a complex geometric function from a sparse table of data points. You define a few (X,Y) data points, then give the processor some value of X. It generates Y, even if X isn't in your table. TBLS finds the two nearest X values and, using linear interpolation, calculates what the Y value in between those two data points would be. In essence, the processor "connects the dots" in an XY scatter graph.

Sure, it's strange. But if you're doing motion control it's absolutely vital. The alternative is to code something like 256 different if-then-else cases, with all kinds of exceptions and boundary conditions. TBLS is a single instruction that executes in about 30 clock cycles.

Memory footprint
The TBLS and REP SCAS examples show how specialized instructions can really help performance, but they also save tons of code space. Both TBLS and REP SCAS replace dozens of lines of C code, which compile into hundreds or thousands of bytes of instructions . RISC's lack of memory-modification instructions means even more code has to be spent for simply updating variables. The list goes on. All this adds to RISC's reputation for poor code density. Code density is simply a gauge of the memory footprint of a compiled program. The smaller the executable binary, the better the code density, and the less RAM and ROM your embedded system requires. Some compilers produce better code density (in other words, smaller binaries) than others, but no compiler can change the code density of the processor.

Tweak your compiler switches all you want; most RISC processors will still require binaries that are about twice as big as for CISC processors. Double the amount of code space for exactly the same compiled program. Reducing the instruction set forces the compiler to produce reams of code instead of a few bytes of assembly. You're going to spend the transistors either way—either in your processor (as more complex instructions) or in your memory (as additional code s pace). There's no free lunch.

Memory bandwidth and power
Code density doesn't just eat RAM, it eats performance. If code density isn't important to you (in other words, you're not paying for the memory), its effect on performance might be. Having a code footprint twice as big means fetching roughly twice as many instructions. That means twice as many bus transactions, twice the penalty for stalls, and twice as much opportunity to miss the cache. Most embedded memory subsystems are considerably slower than their processors, so fetching more code emphasizes the slowest part of your system.

It's no coincidence that RISC processors have bigger on-chip caches than CISC processors; they need them more. Combine fast clock speed with slow memory and poor code density and you have a recipe for a real performance bottleneck. Cache sizes aren't something you can change, either. You get what the processor company gives you.

Finally, there's the power angle. RISC chips have a reputation for being low-power devices able to run on batteries, bright sunlight, or two lemons and a penny. It's true that most RISC processors use less energy than, say, Pentium 4. (But so do most space heaters.) That's largely due to their more modern silicon manufacturing, not any inherent power-saving characteristic of RISC. MIPS, ARM, and PowerPC chips use less power than Pentium and Athlon chips because they're willing to give up speed for power. Low-power chips are made, not born.

Oddly, RISC processors can wind up using more power than CISC chips, not less. It comes down to code density again. If you're fetching a lot of code across a 32-bit bus, those bus transactions consume a lot of energy. Today most processors expend more energy on their bus interfaces than on all of their internal logic combined. It's common to see 32-bit CPU chips with separate power supplies for internal logic and external buses. The internal logic uses very little power; the external bus interface uses quite a bit. The more code you fetch and the more data you transfer, the more power you burn in the most power-hungry part of the chip.

Software, software everywhere
Power consumption, code density, and performance are all nice when you're choosing a chip, but what about programming it? Programming is one area where CISC processors shine. CISC chips are by nature "mature" architectures that have been in the market for a long time. That's largely because prevailing marketing theory frowns on any new development that doesn't have "RISC" in the name, so there aren't very many new CISC families. What the older ones have going for them is a long and distinguished list of software tools, operating systems, debuggers, compilers, and the like.

Motorola's 68k and Intel's x86 families are the two predominant 32-bit CISC architectures, and they both enjoy a huge software following. Nearly any tool, driver, or middleware you want to n ame is available for these chips—often for free. And all of the bugs, quirks, and idiosyncrasies were discovered long ago by the hundreds of programmers who came before. If you're looking for stable, solid, well-supported, well-documented processors, look no further than CISC.

Jim Turley is an independent analyst, columnist, and speaker specializing in microprocessors and semiconductor intellectual property. He was past editor of The Microprocessor Report and Embedded Processor Watch. For a good time, write to jim@jimturley.com.

Reader Response

I've appreciated the addition of Jim Turley to the pages of Embedded Systems Programming; his reports on the microprocessor business have been informative and accurate. However, I must object to his March 2003 column ("RISCy Business," p.37); I am astonished at the depth and breadth of disinformation presented....
>>Read More

Industry Articles

RISCy Business