MIPS shows 20K 64-bit core

MIPS shows 20K 64-bit core
By Chris Edwards, EE Times UK
June 12, 2000 (9:54 a.m. EST)
URL: http://www.eetimes.com/story/OEG20000612S0003

LONDON — MIPS Technologies Inc. has developed a high-speed microprocessor core intended for game machines and other embedded systems that need 3-D graphics. It combines the 3-D extensions the company disclosed last year plus a dual-pipeline processor and a new on-chip bus.

The first implementation of the 20K architecture will be a standalone processor that NEC Corp. and Toshiba Corp. will be making by the end of this year. It will be followed in mid-2001 by a hard intellectual property (IP) core tuned for system-on-chip designs. Produced with 0.18-micron technology, this version is expected to run at up to 600 MHz. A shrink to a 0.15-micron process should push clock speeds to 750 MHz.

The 20K processor and the 20Kc IP core both use a dual-issue MIPS64 integer core joined to a 64-kbit-wide floating-point unit. Largely aimed at 3-D graphics geometry processing, the floating-point unit can be split in half to run two 32-bit operations in paral lel. Most 3-D geometry code needs an accuracy of no more than 32-bits.

Because the design is intended for systems that cannot afford a cooling fan, the designers have had to balance processing performance against power consumption. The design is separated into a number of domains that can be powered separately. At 300 MHz, the company reckons that the processor should consume about 900 mW.

"We have a lot of clock-conditioning circuitry," said Mark Pittman, director of product marketing for MIPS Technologies (Mountain View, Calif.). "A clocked-down 20Kc could fit into handheld devices, running at 250-to-300 MHz."

The low-power design extends to the implementation of its various caches. As with other low-power RISC processors, the data caches are split into sets that can be powered one at a time. The processor attempts to predict which set will hold the data it is looking for. If that misses, the processor will search the other three sets in turn before going to main memory.

This approach lowe rs average power consumption at the cost of increasing latency if there is a miss in the cache. A similar approach is used in the translation lookaside buffer (TLB), which is used to convert the virtual addresses used by software into physical memory locations. The 20Kc has both a microTLB and a main TLB.

On each load or store, the four-entry microTLB is checked first and the main TLB is only powered up if the microTLB does not contain an entry that corresponds to the virtual address that it is given. The content-associative memories used in most TLBs consume a large amount of power. As programs typically make repeated accesses to the same memory page before moving on to data in another, this approach helps save overall power.

To try to improve performance in branch-intensive code without moving to fully speculative execution — the 20K executes instructions in program order — the designers used a combination of branch prediction and lookahead techniques.

Instructions waiting to be exec uted are stored in two decoupled queues. On each cycle, four instructions are fetched from the cache, bringing a total of eight into the queue. Logic in the instruction queue attempts to predict the outcome of up to two branches sitting in the queue. The unit can also predict the return address of subroutine calls, using a four-entry deep call stack.

Better bus

For the 20K and 20Kc, the company has developed a follow-up to the SysAD bus used on MIPS' current processors. The MGBLink bus, intended for linking off-chip devices such as memories, uses series-terminated 1.5-V HSTL logic signals running at 150 MHz.

MGBLink is a split-transaction bus that allows the results of reads to arrive in a different order from which they were issued. A credit-based scheme takes care of flow control across the bus to different peripherals. The input and output buses are split. In the 20K family, MIPS has chosen to use a 64-bit input and a 32-bit output bus. This reflects the fact that write bandwidths a re typically lower in 3-D geometry processing than read bandwidths.

Both the input and output buses can be scaled to suit different application profiles, and a 32-bit input bus is possible.

"The input bus can be throttled back to 32-bit," Pittman said, "but with a high-performance processor you would run the risk of starving the processor."

For networking designs such as switches and routers, the company is considering a stripped-down version of the 20Kc.

"A version without the 3-D extensions is a possibility," Pittman said.

— Chris Edwards is a contributing editor to Electronics Times, EE Times' sister newspaper in the United Kingdom.