Extreme partitioning

Extreme partitioning
By Mitchell S. Alexander, Courtesy of Embedded Systems Programming
Oct 5 2005 (13:59 PM)
URL: http://www.embedded.com/showArticle.jhtml?articleID=171203272

Technical skill is mastery of complexity, while creativity is mastery of simplicity. --Sir Erik Christopher Zeeman, Catastrophe Theory-Selected Papers 1972-1977, Reading, MA: Addison-Wesley, 1977.

The esteemed mathematician Sir Erik Christopher Zeeman is notable in part because of his work on catastrophe theory, which later bore the concept popularized in the motion picture, The Butterfly Effect. This area of study is based on the idea that small changes in the initial conditions of a dynamic system can cause large changes in the long term, some of which are often unpredictable. In many ways catastrophe theory parallels the design of complex embedded systems. Small changes in the initial design of a complex embedded system can often have an unpredictable outcome and have a significant impact on the final system performance.

Most embedded system design is engineered using the technical skills of an engineering team to master the complexity of the design. However, in certain situations, designs naturally lend themselves to the simplicity of creative, elegant solutions. Sometimes these solutions embrace the hardware/software partitioning tradeoffs associated with increasing the chip count in favor of reducing firmware and software development time.

Twenty-nine processors
Some may call it overkill, others may call it wasteful; I call it extreme partitioning.

Many of the designs that my team works on are low-volume, fast-development projects with relatively high non-recurring engineering (NRE) costs. The desire to keep risk at a minimum is very strong. One method that helps to mitigate this risk is to use extreme partitioning. I define extreme partitioning as the use of extra hardware, often multiple processors dedicated to specific tasks, to greatly simplify the software-development effort. An example of this practice is demonstrated through the architecture of third-generation power feed equipment (PFE) for the telecommunications industry. This complex system, shown here in Figure 1, powers transoceanic fiber optic repeaters for voice and data lines. This single sophisticated system makes use of 29 processors: 25 are programmable and four are embedded in commercial off-the-shelf components. Twenty-nine processors in a single product!

Figure 1: Third-generation PFE product with twenty-nine processors

Design goals
Several goals pertaining to the embedded system were defined in the early stages of the development life cycle for the PFE:

• Deliver working prototypes within one year
• Reduce risks due to embedded architecture problems and firmware development as much as possible
• Use resources in parallel to reduce development time

Extreme partitioning is a perfect match for this complex product. Decomposing complex tasks into simpler ones helps to deliver products with shorter development times, as well as reducing risk. By using multiple processors, an engineer or programmer can be assigned to each processor type, an arrangement that maximizes resource use in a parallel fashion. This division of labor is ideal for shops that have access to the necessary resources.

Once my project team decided to use extreme partitioning, we added two additional design goals:

• Use Linux as the graphical user interface operating system; Java as the graphical user interface programming language; C as the DSP programming language; C++ as the Ethernet processor programming language
• Use a "universal control board" approach to standardize the embedded architecture

Decomposing the concept
Engineering design projects start with a concept and usually some type of requirements document. Investing enough time in the early stages of the project on system design and determining if enough of the requirements have been provided pays dividends in the low-level detailed design phase when the work moves quite rapidly. This investment in up-front time is required when using extreme partitioning. Applying the idea of extreme partitioning to a product concept requires adequate system requirements and engineers with good system design capabilities.

The six stages of decomposition in the PFE example are shown in Figure 2. This drawing illustrates the process of taking the concept of a single-bay third-generation PFE and decomposing it into extreme partitioning components. The process is iterative and requires many cycles before a valid and producible system design is developed. The process flips between physical and logical (or virtual) thinking; physical elements are shown with a yellow background, and logical ones are shown with a green background. Complexity increases as the design becomes more detailed moving from stage one through stage six.

Figure 2: Decomposition of concept into extreme partitioning components (stage 4 skipped) Click here to see a larger version of this image.

The first stage, which represents the concept of the product, is a combination of both physical and logical thinking. This stage is where the system engineers take the requirements and brainstorm about how to architect the high-level system. Requirements are converted into logical elements, which are then visualized into physical elements. Several cycles of physical/logical combinations are usually performed until the design team feels they have a handle on how the requirements can be implemented.

In the second stage, the design team partitions the requirements into logical units. These units are the highest level of partitioning and should be thought of as groups of functionality. The designers take the output from the first stage and separate the physical from the logical to form several of these logical units. In the PFE example, four groups of functionality are identified that contain firmware or software.

Next, in stage three the designers take the groups of functionality from the previous stage and assign them to physically partitioned units. This can be thought of as the module definition stage. This stage usually concludes with the definition of the necessary interfaces between the modules. In the example, five modules are identified. In the actual design, we identified four different types of modules, which resulted in five physical modules, as the power converters are identical in design.

In most system designs, a fourth stage that consists of logical partitioning will be required. In this stage, the designers take the physical modules and determine the logical functionality and partitioning of the embedded system hardware. In our case, we performed much of this beforehand when we envisioned using the universal control board (UCB) common architecture. This example demonstrates that the extreme partitioning process is flexible so long as all of the stages are completed.

The fifth stage takes the individual modules and identifies the types of embedded systems required to implement the required functionality in each. The goal of this stage is to identify the hardware requirements on the module level and to partition the reactive (real-time) elements from the non-reactive. In the PFE example, we were guided by our desire to meet one of the emergent design goals of using a common architecture for the custom embedded systems. The idea of using a UCB was included in the decision making. This resulted in all but one module using the same embedded architecture. The Local Control Unit does not fit into the same common architecture. For this module, a PC Server with two Xeon processors, LCD display with its own processor, and a 100Mb Ethernet switch are used.

The sixth stage decomposes the embedded system architecture into the logical elements that are determined from the design requirements. In the PFE example, we were able to identify six major elements that needed to be covered by the embedded system: an Ethernet interface, a display controller, I/O and LED processing, real-time control, voltage and current monitoring, and a way for the power converters to have deterministic data sharing (Ethernet is nondeterministic).

The final stage of this process takes the logical elements identified in the sixth stage and maps them to physical hardware in the embedded system. This turns out to be a bit more complex than it seems. In this stage, the problem of hardware and software partitioning comes into play. This is where the designers need to decide if specialized hardware is required (such as coprocessors) or if the functionality will be accomplished through software. In Figure 2, we have one-to-many and many-to-one relationships mapped between the logical elements from the sixth stage and the physical elements of the seventh stage. For example, the Ethernet interface logical element is mapped in a one-to-many relationship through the use of an Ethernet processor to handle the Ethernet encoding/decoding and the communications and display DSP to handle the asynchronous serial interface to the Ethernet processor. The physical element in stage seven called the communications and display DSP is mapped in a many-to-one relationship wit h the Ethernet interface and display controller logical elements from the sixth stage.

Generalized process model
Figure 3 shows the seven stages for the generalized process for extreme partitioning. The process begins with a combined logical/physical concept and then alternates between logical and physical partitioning until a final physical outcome is generated. Each stage is usually iterative, requiring several cycles of consideration before a final outcome is acceptable and the next stage can be entered. The model does not show any feedback mechanisms, although certainly they exist. For example, if a physical stage is entered, and the designers determine that a logical element is missing, the process must revert to at least the previous stage. In some cases the process must revert to the beginning concept stage.

Figure 3: Extreme partitioning generalized process model

Another interesting feature of this process is that it can be used recursively. For example, it can be used on the system level as shown, but can also be used on the module level, assembly level, board level, and even circuit level.

Results--system architecture
The internal architecture of the PFE is shown in Figure 4.

Figure 4: System architecture using twenty-nine processors, four programming languages, and three operating systems Click here to see a larger version of this image.

The four UCBs are shown on the right side, while the three components of the local control unit module are shown on the left. Each UCB contains the following embedded hardware:

• A 32-bit Ethernet processor
• An Altera Cyclone field programmable gate array (FPGA)
• A Texas Instruments 2800-series DSP for communications and display control
• A Texas Instruments 2800-series DSP for real-time control
• A header to interface with a vacuum fluorescent display
• Two fiber-optic receivers to interface with precision voltage and current monitors that are floating with reference to ground
• RS-422 transceivers
• Buffered I/O circuitry
• Various digital and analog circuitry that interfaces with the system cabinet and the high-voltage assemblies

Another benefit of this architecture is that none of the processors is over-taxed, either in processing power or in memory requirements. This speeds firmware development since engineers don't have to spend time optimizing code for speed, testing compiler optimizations, or optimizing memory usage.

This type of architecture is also easier to debug since there are more interfaces exposed to external test equipment, such as scopes, logic analyzers, JTAG-enabled debuggers, and protocol analyzers. Again, this helps to keep the development time, risk, and NRE costs down.

Results--parallel development
This architecture lends itself to parallel firmware and software development. The design goals of quickly delivering prototype units and to make use of parallel development efforts were realized through the use of extreme partitioning. The development team on the PFE project was composed of the following developers:

• An engineer to develop the Ethernet processor to serial bridge code in C++ using the Net+Works operating system
• An engineer to develop the FPGA design using Verilog
• An engineer to develop the communications and display code in C using the TI DSP BIOS real-time operating system
• An engineer to develop the real-time control code in C using the TI DSP BIOS real-time operating system
• Two engineers to develop modules for the GUI application running on the PC server in Java using the Linux operating system

Results--code design and resuse
Limited firmware functionality was partitioned onto each processor, which results in tasks that are small and manageable for both the engineer and the project manager. The idea is to keep each firmware design reasonably simple and to execute quickly on a modestly loaded processor. This approach also affects integration testing, which is kept at a manageable level since each processor is responsible only for a limited number of tasks.

My team was able to realize significant code reuse throughout the project as a direct benefit of using the common architecture of the UCB, and the concept of extreme partitioning. Table 1 illustrates the extent of the code reuse. This level of reuse also affected much of the unit testing, which was performed on the converter software products, and then only on the modified and new units for the other two modules' software products.

Table 1: Code reuse test estimates for test load and output monitor UCBs

The converter firmware was developed and debugged first. The firmware for the other two modules was derived directly from the converter code. Table 1 shows that between 90% and 95% code reuse from the converter firmware was realized. The Ethernet processor firmware is essentially identical across all of the similar software products. The major differences are in the embedded Java applet for diagnostics. The communications DSP firmware is also essentially identical across all of the similar software products. The major differences here are in the parsed message ID's and the resultant responses.

Results--product description
The third-generation single-bay PFE shown in Figure 1 serves as a sophisticated DC-to-DC converter with 1+1 redundancy. The system runs on the standard telecommunications supplied -48VDC and provides a voltage or current regulated output of 6,000V at 1,200mA (1.2 amps). Faults and status are communicated to a monitoring system by an external Ethernet connection. Five of the six blind-mated modules communicate with each other over an internal 100-Mb Ethernet local area network. Additionally, the two power converters communicate with each other over a point-to-point RS-422 link.

A graphical user interface (GUI) is provided by a Java application running on an industrial Linux PC server with dual 3.0GHz Xeon processors. A 17-inch LCD screen, keyboard, and pointing device complete the man-machine interface. The Java application polls each of the modules every 250ms and sends status information to the LCD, an external alarm interface, and the upstream monitoring system once per second.

The active test load (top module), each of the power converters (2nd and 3rd modules), and the output module (5th module) each contain a UCB developed specifically for this application. Each control board is stuffed according to its intended module destination. These control boards are not interchangeable, but they do provide a common architecture for each module.

Was it worth it?
All of the design goals were met. We created the world's first single-bay PFE system in about a year! This is a significant accomplishment since previous PFE systems consisted of at least six bays at approximately five times the cost. This example of complex engineering was in part possible through the use of extreme partitioning.

The benefits of extreme partitioning realized on this project include: faster time to market, reduced nonrecurring engineering fees, reduced risks due to firmware development and under-sized embedded system design, parallel development efforts, easier project management through the decomposition of work into more manageable tasks, an elegant design that's easier to debug and maintain, easier expansion of firmware to meet customer requests, reduced integration and unit testing, and significant code reuse through a common embedded system architecture.

At first glance, it seems as though extreme partitioning increases the number of firmware products required to complete the job. While this is true, it's also true that the development of these firmware products is made easier with extreme partitioning since each design only addresses a specific part of the solution. For example, the real-time DSP code handles only real-time events and not user-interface events. If you took all of the firmware tasks found in a system using extreme partitioning, it would be apparent that these tasks could all run on a single powerful processor, but the firmware complexity would be much higher. One could argue that a single DSP could handle the user interface, real-time control, voltage and current monitoring, I/O and LED processing, and data sharing with another module. This may be true, but it would be much harder to implement, debug, and get a product to market as quickly compared with a system designed using extreme partitioning.

Extreme partitioning is not for everyone or for every design. It has limited applicability. It should be used only in situations where faster development time is more important than cutting a bill of materials by a few dollars. It should only be used on products that can accommodate an increased chip count. As noted above, additional processors are added to handle specific tasks. It can also be used to reduce development time by allowing several developers to work on firmware designs simultaneously. In shops with limited resources, parallel development may not be possible. Therefore, this benefit will not be realized in these situations. Since there are more parts, there are more interfaces to debug. Extreme partitioning should be used cautiously in organizations with modestly-skilled engineers, as the increased number of interfaces and firmware products may be overwhelming.

Future research should be done on extreme partitioning and the idea of creating collections of building blocks of functionality that have been developed as a result of extreme partitioning. My team has already enjoyed this benefit. We have used the Ethernet processor block in other designs, as well as parts of the FPGA design. Each firmware product linked with the physical circuitry can be considered a building block. Expected code reuse for some of these building blocks should be in the 95% to 99% range.

In this design, we created an elegant embedded system that's simple in design yet relied on the technical skills and creativity of an engineering team to master the complexity of the system's requirements. The end result is a reasonably complex system where small changes in the initial design have had a significant impact on the final system performance.

Mitchell S. Alexander is the manager of Digital/Software Design at Spellman High Voltage Electronics Corporation. He is also a part-time faculty member at Walden University's NTU School of Engineering and Applied Science. He holds an undergraduate degree in electrical computer engineering/technology from the NY Institute of Technology, a graduate degree in software engineering from the National Technological University, and is presently working on his PhD in engineering management at Walden University. He can be reached at malexander@spellmanhv.com.