Startup to configure VLIW core for DSP, control

Startup to configure VLIW core for DSP, control
By Peter Clarke, EE Times
October 27, 1999 (1:40 p.m. EST)
URL: http://www.eetimes.com/story/OEG19991025S0049

LONDON—Siroyan Technology Ltd., a silicon intellectual property startup here, plans to develop a high-performance, configurable 32-bit very long- instruction-word (VLIW) processor core to address applications that require a mix of control and DSP capability.

Siroyan's goal for its core, code-named Rubicon, is to displace multiprocessor designs that are now based on a mix of microprocessors and DSPs in future versions of set-top boxes, voice-over-Internet Protocol gateways and other multimedia applications by means of a concept called clustering.

"Microprocessors are struggling to handle digital signal processing in real-time, and DSPs can't handle general microprocessor functions," said Adrian Wise, technical director of Siroyan. "Engineers are being forced to cobble together solutions which are inevitably a compromise." By contrast, he said, Siroyan "is starting with a clean sheet of paper. Our architecture will be optimized for applicati ons that require microprocessor and DSP functionality."

Rubicon's principal architect is Nigel Topham, who before joining Siroyan was director of the Institute for Computing Systems Architecture at Edinburgh University. Before that, he worked as an architect on the French ACRI supercomputer. Wise plans to launch Rubicon as a soft licensable processor core-that is, one that licensees can synthesize from the register transfer level to gates—together with an associated compiler, debugger and application libraries of software. Introduction is planned by the third quarter of 2001.

The architecture combines two features that have recently become popular: configurability to fit the processor implementation to its chosen application and VLIW superscalar execution to enhance performance. But it attacks VLIW through an innovative addition to a conventional VLIW architecture that Siroyan calls clustering.

Each cluster includes a number of execution units, together with a register file. A Rubicon processor comprises a series of clusters. Intercluster communication is provided through a register file interface rather than a conventional bus.

According to Wise, any processor that attempts to achieve a high degree of instruction-level parallelism is limited by the register file's ability to supply operands to a large number of execution units. Clustered VLIW overcomes that problem by limiting the number of execution units the register file must service to those in the same cluster.

"One of the very desirable properties of clustered VLIW is that it does scale reasonably well since, as execution units are added, the number of register files increases so that each register file still has a sensible number of ports," Wise said. "This contrasts with other VLIW and superscalar approaches, where the number of register file ports becomes the primary limitation to adding further execution units.

"As to how many c lusters make sense, this is a configuration option, so in many ways it is up to the user," he added. "We anticipate machines in the one-to-eight-cluster range being most effective."

However, the arrangement does make software compilation more difficult, since the instruction scheduler can only access execution units in the same cluster as the register file that contains the operands. One of Siroyan's main technical achievements, the company claims, is development of compiler techniques that efficiently address those restrictions.

"The work to date has concentrated on a cluster that issues up to three operations per cycle: one address generation operation and two data-processing operations, often a multiply and an accumulate," said Wise. "The exact instruction encoding has not been finalized to date. What I can say is that we will be supporting multiple instruction sets."

One is a wide VLIW-style instruction set "that will be used in loop code, for which the compiler is able to efficiently schedul e instructions that make use of the wide-issue capability of the machine," Wise said. "A second will be used for the 'scalar' code between the loops, where it is notoriously difficult to make use of a large number of execution units.

"In this way we will achieve the performance that the hardware is able to support without paying too great a penalty in terms of program code size," he said.

Wise said Rubicon will likely appear in 0.18-micron CMOS—which will be mature by the time the architecture is ready-before the transfer to 0.13 micron. He said sustained performance will be several hundred million to several billion operations per second, depending on the number of clusters.

"Counting the number of operations issued is not really meaningful as a means to compare processors," said Wise. "Instead we shall be using a range of the newer benchmarks that are relevant to our type of machine and market."

Those, he said, include benchmarks from the EDN Embedded Mi croprocessor Benchmark Consortium, and possibly some of the DSP-oriented benchmarks from Berkeley Design Technology Inc. (Berkeley, Calif.).