|
||||||||
Compilers Enable Embedded Comm Application OptimizationBy Lerie Kane, Intel As processor architectures evolve and the complexity of instructions increases, the role of the compiler in application development continually gains importance. The applications developed in the embedded industry are becoming progressively more intricate, placing even more importance on software tools. An optimal compiler and good optimization methods cannot only increase the performance of an application, but can also decrease development cost and engineering cycle time, thereby accelerating time to market. Performance Methodology The ideal time to start thinking about optimization is during the application design phase, and time should be budgeted specifically for optimization activities. However, having time available to concentrate on optimizations is only part of the process. Ideally, code should be architected in "optimization friendly" ways. This means that it is readable, reliable, portable, and maintainable. It is a common misconception that performance-intensive sections of an application require hand-written assembly code. In fact, as processor architectures have exponentially increased in complexity, so have the compiler optimization technologies. There are many benefits to allowing the compiler to do the optimizing. For example, as new processors are introduced, the computing industry is inundated with information on how the newest processor instructions will benefit existing programs. At this point, a developer must assess available options and platform approaches. The developer can study the new instructions, find out how they work and where they would be most beneficial in the application and then re-code, or the developer can simply use the latest compiler optimization flag. Performance Cycle
Figure 1: After a program is stable, the Performance Cycle allows iterative analysis for optimization. As illustrated in (Figure 1), this is meant to be an iterative cycle. There is likely to be more than one performance intensive area in an application, and it is best to concentrate on one area at a time. Focusing on a large portion of the code will only confuse the data and make it more difficult to see how the applied optimizations directly affect the program. The first step is to gather performance data, which requires a method to measure the performance of the application. Simply using a clock on the wall or a stop watch is a highly inaccurate way to time an application. Timing functions inserted into the code are more effective ways to gather performance data. Although they are extra overhead, the timers will provide accurate results. One of the easiest ways to gather timing data is to purchase a good performance analyzer software package. Most performance analyzers will show the time spent in each function of the program and will provide an analysis based on this data. Once performance data has been collected, find the routines that are taking the majority of the application time and decide if the numbers make sense. Is there a loop that appears to be taking more time than it should? These areas are known as "hot spots." This process is not always obvious, and a performance analysis tool that graphically displays the data will often help. Some tools will actually show the path in the code that is executed most frequently. However, in the absence of these tools, a strong knowledge of the application and time should yield the same results: identifying the sections of code that are taking more time than they should to process. The next two steps require time and knowledge of optimization techniques. Enhancement modifications may be as simple as applying a compiler flag to targeted source files, or they may require redesigning sections of code. Perform a code inspection to diagnose why the application is slow and to see if any changes will increase the speed. Subsequent sections of this article describe various compiler optimizations and where they will be of most help. Success here delivers a lightning-fast application built on the combination of good code and a good compiler. After the compiler optimization has been applied or the most efficient algorithm has been implemented, it is important to verify the results. Does the program still run correctly? If so, did it actually achieve a performance improvement? Both of these are very important questions. If either of the answers is "No," the head of the cycle has been reached and another iteration is in order. Compiler Overview Once a compiler is selected, read the documentation. While this may seem like a trivial step, typical compiler documentation will outline the technologies implemented in the compiler as well as provide examples on how to use those features. It is important to show restraint when using compiler optimizations. It is tempting to find the most powerful optimization options and throw all of them at an application. However, no single optimization technique will work the same for every application. It is best to apply one optimization at a time, verify the results, and measure any performance improvements before moving on. A recommended way to proceed is first with general optimizations, then processor optimizations, interprocedural optimizations, and finally profile-guided optimizations, as described below (Figure 2) . Methods covered first are the easiest to use, whereas the latter have the potential for greater performance improvements.
Figure 2: It is best to apply one optimization at a time, verify the results and measure any performance improvements before moving on. General Optimizations Using aggressive optimizations will often reveal errors in code that may have otherwise gone undetected. Another common problem associated with higher optimizations is precision control. When the compiler is given free reign to fully optimize a program, it will sacrifice precision for speed. In applications where floating point consistency is heavily relied upon, it is important to tell the compiler to omit optimizations that affect these values. (Most compilers have precision control options.) Processor Specific Optimizations Processors are more efficient when instructions are optimally scheduled for the architecture. Most often there is a separate optimization flag for instruction scheduling. This option takes into account specific processor instruction latencies as well as cache line sizes. For example, given the following code:
Depending on the target processor specified, the compiler could produce the following instruction sequence:
Here the compiler changed the multiply by 32 into a series of adds. The resulting code will execute faster on the targeted processor. With a technology called automatic processor dispatch, it is even possible to target more than one processor. With this option, the compiler inserts processor ID checks and creates multiple code paths in the executable. At run time, the processor's ID is fed into the program and the stream of code associated with that processor is executed. This results in slightly larger code due to the multiple code paths, but is a good option if the end user is planning to run the application on more than one target system. Interprocedural Optimizations
Figure 3: In the Interprocedural Optmization process, the complier can implement actions and optimizations taken in one routine with those in another routine. To use this technique, the source files first must be compiled with the IPO option enabled. The compiler creates object files containing the intermediate language (IL) used by the compiler. Upon linking, the compiler combines all of the IL information and analyzes it for optimization opportunities. Since the compiler has the whole program to analyze in this step, the linking phase will take much longer than normal. One of the main optimization techniques enabled with IPO is inline function expansion, which occurs when the compiler finds a function call that is frequently executed. Using internal heuristics, the compiler can decide to replace the function call with the actual code of the function. By minimizing jumps through the code, this creates a higher performing application. Another example of how IPO can benefit a program is through pointer disambiguation. The compiler's main goal is to ensure a correctly running program. If the compiler cannot predict the full impact of an optimization, it will take the safe route and not perform it. For instance, in the following function:
In this situation, the compiler will not optimize this routine without the help of IPO because it does not know what the variables a and b are pointing to. To ensure a correctly running program, the compiler will assume these pointers are aliased. Aliasing occurs when two pointers are assigned the same location in memory. If, in fact, these pointers are not aliased, IPO will feed this information to the compiler which will then optimize the routine. Profile-guided Optimizations
Figure 4: Profile-Guided Optmization is a three-step process in which the complier can use data collected during program execution to aid in optimization. First, the compiler creates an instrumented executable that contains additional statements that write the execution information into a file. Next, the instrumented application is run. This executable will perform slowly due to the added overhead of writing the dynamic information, such as basic block numbers with associated execution counts, to a file. Finally, the application is recompiled, and the compiler reads in the dynamic information and optimizes the code paths contained in the file. When using PGO, ensure that the instrumented executable runs on a dataset that is representative of the typical workload processed by the application. If the dataset used only hits the corner cases, then the compiler spends its time optimizing corner cases that the end user will likely never encounter. PGO can affect a program in many ways. Since the compiler is aware of the order in which the code is executed, it can make more intelligent decisions about basic block ordering, register allocation, and vectorization. It can also perform enhanced switch statement optimizations. For example, given the following code:
PGO will gather data on the percentage of times each case statement is executed (presented in comments above). To enable the highest performance, the compiler will order the code with the common case occurring first. In this example, case 10 is executed most frequently and having that case evaluated first will eliminate the overhead associated with evaluating case 3.
Profile-guided optimization is not suited for every application. It benefits most when an application has a consistent execution path and performs similarly for multiple datasets. If the application has a relatively flat profile, with code execution evenly distributed, then PGO will likely not provide a performance improvement. Before embarking on a PGO build, it is important to consider that it requires changing the build structure. Adding the three step process to some build systems can be difficult. This technique should generally be used at the end of the optimization process. Increasing Demands Compiler optimization technologies are continually advancing. As the demand placed on embedded applications increases and the required time to market decreases, the compiler plays an integral part in all application development. Armed with a good compiler and some knowledge about optimization techniques, every developer can create high performance embedded applications. About the Author
Copyright 2005 © CMP Media LLC
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |