|
|||||
Co-Design for SOCs -> Threads provide speed and flexibility
Threads provide speed and flexibility The current system-on-chip paradigm doesn't work. In essence, SOCs are custom designs for the customer who commissioned them. After the silicon is designed and exists, what happens when the customer needs a variant of the chip to serve a different application, new specification or change in market conditions? With market cycles collapsing, especially for products in fast-moving consumer arenas, redesigns and respins of silicon become major issues. A better solution: the Thread Processor. A collection of VLIW-like thread engines, this architecture employs an on-chip real-time operating system to manage the multiple independent processes that represent the function of the chip. Basically, each thread engine consists of an "inner" execution mechanism based on a data-flow model and an "outer" control mechanism to manage the data-flow engine, which is based on translation tables. The inner execution mechanism provides extremely speedy processing , while the outer control mechanism gives the engine its configurability. As a result, the Thread Processor approach is easy to implement, offers scalability in performance and versatility and flexibility in configuring the SOC. It enables design reuse and maintains the huge investment in software. As instructions are read from program memory, they are translated on the fly into "threads." These threads consist of a series of primitives, called "POPs" for "primitive operations"; this translation process is done through ROM lookup tables. The ROMs control the personality of the machine-change the contents of the ROMs, change the personality and the functions to be implemented. Once the instructions are converted into threads they are scheduled and executed by the dynamic scheduler, which itself is controlled by a ROM. Again, change the contents of the ROM and the schedule and the or der of instruction execution change, too. In this simple way, several levels of machine configurability can be implemented. In simple cases, the contents of the translation ROMs represent instructions; in more-complex cases they represent complete functions such as counters/ timers, UARTs and so forth. Changing the contents of the ROMs will change the set of instructions, their execution order and the peripheral mix. The bottom line is that we have translated hardware functions into software functions: Tinker with the software and you change the functionality of the IC. By going one step further, we can now extend this translation concept to make the contents of the ROMs represent complete cores such as RISC processors, DSPs, MPEG decoders-all modeled in software. To see what the impact on SOC designs will be, let's drill down a bit deeper. As an example, a variety of different functions can be translated into threads.The translation process associated with function 0 can be made t o run a RISC instruction set; the translation process associated with function 1 can be made to run a counter/timer; the translation process associated with function 2 can be made to run a UART; and the translation process associated with function 3 runs an MPEG decoder. Everything comes together as a SOC-all contained in software and executed on the same hardware engine. What makes this possible in the thread engine is the virtual register file (VRF), which is a multiport high-speed memory, and the arithmetic logic unit and associated translation ROM. Again, change the contents of the ROM and you change the behavior of the VRF and the ALU. Together, these change the behavior of the execution logic and the functionality represented by the threads. Taken together the ROMs provide yet another level of configurability. The combination of translation ROMs, VRF and ALU represent the inner core of the Thread Processor, and the combination of the translation ROMs and translation logic form the oute r core. The inner core is that part of the machine that provides speedy execution, while the outer core provides configurability. In essence, the Thread Processor converts hardware implementations into software implementations. But there is no reason why the ROMs must be hardwired structures; they could just as well be programmable memory such as flash, E2PROM or even RAM. In using such programmable choices we wind up with SOCs that can be reconfigured through software downloads into the flash memory. Taking it one step further, when using RAM structures, the Thread Processor could almost be completely reconfigurable, perhaps even allowing the SOC to reconfigure itself, depending on situations that arise during power up or during unique operating circumstances. The concept of thread processing is not new in the industry; it has been around for a while.What is new, is the application of thread processing to silicon and its implementation on systems such as TeraGen's Thread Engine. Al so new is the implementation of the VRF and its ability for rapid context switching: We no longer load/re-load registers; we simply change the content of the address lines into the VRF to perform an immediate context switch. At TeraGen, we firmly believe that the 2000s will be the "decade of thread processing" and the result will be low-cost, high-speed, intelligent and reconfigurable new products.
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |