A case for using a specialized language for NPU Design
A case for using a specialized language for NPU Design
By Axel Tillmann, Chairman and CEO, Novilit, Inc., Marlborough, Mass., EE Times
August 5, 2002 (10:27 a.m. EST)
URL: http://www.eetimes.com/story/OEG20020802S0042
In science, two different approaches for finding solutions: the outdated trial-and-error method, and the modernly accepted methodology approach. When using a methodology to solve a problem, one tries first to describe the apparatus that needs to be built. Then a formal mathematical model is created until a full functional understanding has been reached. It is clear that general-purpose processes are less efficient than specialized processes, and the same holds true in the processor field. As an illustration, 10 years ago, the industry introduced the RISC processor architecture, which traded breadth of functionality for speed. This is a classic case of specialized design for a specialized application. Only the appropriate applications can take full advantage of the RISC processor. It requires a careful analysis of the task to be performed in order to make the right decision as to whether a given RISC processor or a gen eral-purpose processor should be used. There are myriad cases where programmers implemented code in C and compiled it for a RISC processor, simply because RISC processors had been touted as being faster than non-RISC processors, and later found out that the performance of the RISC didn't meet the expected benchmark. It is no different in the design of the embedded network processors that are becoming commonplace in the network and communications infrastructure. Network Processing Units (NPUs) are task-specific chips that have been optimized for quick data-path handling-that is, identifying and making proper routing decisions. Most have an adapted architecture for matching bit patterns quickly, and this works quite nicely for fixed, offset-driven protocols. Ideally, to utilize the special processing opportunities NPUs represent, an apparatus must be developed that includes the primary functional blocks for the complete description of protocol transactions, including data path logic, control path logic, quality of service assessments, billing structure identification, and security handling. From a scientific perspective, this apparatus is required because it solves many of the problems associated with implementation. After the basic research is accomplished for this new apparatus and its associated formal algorithms, the balance will consist of the details of implementation. But research is always harder than development, assuming that the proper methods of implementation are employed. Every aspect of our lives is becoming influenced more with each passing day by electronic information and communications, and the process has just begun. We need to invest and invent the methodologies needed to create the next generation of communications systems that will provision new service options. Such network services will power the next upswing in the networking and telecomm industries.
Today, protocols are being written in C, but developers achieve little beyond a low-level, complicated implementation. As a protocol language, however, for network processing, C is too flexible to be able to produce optimized output.
What is the path to better NPU implementations? The first step is to find a method to abstract and describe protocols and their behavior. We must reach a consensus upon the building blocks that are required for the next generation of network processors, and define the minimal functions required to build these protocol building blocks. Some of the functions may even need to be split up into smaller subfunctions in order to describe the optimal hardware implementation.
These functions are best described by establishing a new specification language and thereby a new methodology that can unambiguously describe the entire protocol stack. Ideally, this language will create a simulation of the model and define the building blocks, in addition to describing t he device functionally and positioning the protocol for implementation on the target hardware. By implementing protocols with a formal specification language, developers will achieve the maximum benefits of the NPU or any processor architecture.
Conceptually, the "minimal building block" is analogous to the mathematical principle of the "lowest common denominator." It can be likened to building a house out of bricks. To build a brick house, you only need two types of bricks, the full brick and the half-brick. Since a half-brick can easily be made by splitting a full brick in half, the building industry manufactures only one type of brick. No manufacturer offers a special brick for windows and doorways, because no matter what kind of house an architect designs, it can be built using a single brick shape.
One specialized specification that has been developed to address these issues in the network processing environment is called the Communications Machine Description Language (CMDL) which at the core assumes that the two high-level bricks needed to build any processor, but especially a network processor, are the branching calculator and the field value calculator. Modern protocols have a hierarchical bitstream structure which requires a sophisticated algorithm to deal with multi-branching logic trees .
In the past, designers adopted a binary tree structure, and most of the logic of the binary trees was used in their designs. Multi-branching trees are much more complex to handle and almost impossible to code into hardware. The branching calculator is further mandated by the fundamental requirement of fast protocol processing.
In order to optimize each functional block for the fastest speed, the branch must be subdivided into two or three different branching algorithms. This leads to different processor calls, and each must guarantee that the processing occurs within a single clock cycle.
Spanning bytes
The second algorithm is the field value c alculator, with its own set of requirements. We need to recognize that most protocol fields are no longer byte-aligned, and today they might span two, three, or more bytes. In addition, the direction of the field bits can be assembled onto the wire in as many as four different ways. This leads to four different bit-shift hardware blocks.
In CMDL these hardware blocks are called by different constructs:
- (Least significant bit first)
- (Least significant octet first)
- (Least significant bit last)
- (Least significant octet last)
A properly constructed specification language helps avoid the typical pitfalls in conventional programming. The language constructs can direct whole code segments to treat many field structures and substructures in one particular bit order.
Programming in C, on the other hand, requires the programmer to remember the correct bit order in every field to ensure that the appropriate subroutine is called. In fact, with some of the mode rn protocols, especially where telecom rides over datacom, such as SS7 over ATM and Voice over IP (VoIP), it becomes very complex for a C programmer to keep track of which bit order is being referenced.
Once a field has been constructed, there are basically two ways to calculate the next necessary action, with two possible hardware implementations. You can either double up the four shift operands, or keep them separate. For faster speed, one would make 8-bit shift hardware blocks, and for smaller size one would separate them.
These building blocks are only useful when associated with a language developed specifically for the task of building a protocol. When the necessary functions of a protocol are described in a structured methodology, it guarantees that only the necessary building blocks will be used.
This achieves the fastest processing, short of implementing the entire protocol fully in hardware (which, incidentally, should also be possible via the same specification languag e).
The difference between processor design and a full protocol implementation is the amount of silicon real estate that is used. The full protocol is merely the implementation of the protocol by creating logical trees using the hardware functional blocks above. Putting the full protocol implementation into an FPGA if reprogrammability is an issue, or into an ASIC if a mature protocol is being implemented, leads to the fastest possible processing.
The definition of the protocol apparatus and the associated specification language involves a longer feedback process. Simple protocols such as Ethernet can be easily handled with fewer building blocks than protocols such as UMTS, RPR, or H.323, to name a few. As long as you target FPGAs, you can afford to have overlooked one or two building blocks, because you can easily add them whenever you need to expand on the functionality.
In the creation of a specialized processor such as an NPU, this cannot be taken lightly. Once you have fou nd a specification language, you must test many complex protocols in order to verify the universal characteristics of the found solution.
Proper solutions can be found if the proper effort is put forward. As an industry, we have to stop delivering quick-and-dirty solutions that can have a considerable long-term negative economic impact. Our industry can render new methodologies that lead to growth through innovative new technology based upon sound scientific principles.