|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
How to calculate CPU utilization
How to calculate CPU utilization Is your chip fast enough? Is it too fast? Systems engineers might be paying for more chip than they need, or they may be dangerously close to over-taxing their current processor. Take the guesswork out of measuring processor utilization levels. Many theories and guidelines dictate how burdened a processor should be at its most loaded state but which guideline is best for you? This article presents several ways to discern how much CPU throughput an embedded application is really consuming. You can use this information to verify the system software design versus a maximum processor load. Sizing a project In any case, once the system development has progressed, it's in the team's best interest to examine the CPU utilization so you can make changes if the system is likely to run out of capacity. If a system is undersized, several options are available: upgrade the processor (if possible), reduce available functionality, or optimize, optimize, optimize. This article doesn't focus on any of those solutions but illustrates some tools and techniques I've used to track actual CPU utilization. You can use these methods to determine how close to the "edge" a specific project is performing. Defining CPU utilization The idle task is the task with the absolute lowest priority in a multitasking system. This task is also sometimes called the background task or background loop, shown in Listing 1. This logic traditionally has a while(1) type of loop. In other words, an infinite loop spins the CPU waiting for an indication that critical work needs to be done. Listing 1: Simple example of a background loop int main( void ) while(1) /* endless loop - spin in the background */ This depiction is actually an oversimplification, as some "real" work is often done in the background task. However, the logic coded for execution during the idle task must have no hard real-time requirements because there's no guarantee when this logic will complete. In fact, one technique you can use in an overloaded system is to move some of the logic with less strict timing requirements out of the hard real-time tasks and into the idle task. Using a logic state analyzer The first is an external technique and requires a logic state analyzer (LSA). The LSA watches the address and data buses and captures data, which you can interpret. In this test, you should configure the LSA to trigger on an instruction fetch from a specific address and measure the time between each occurrence of an observation of this specific address. The address to watch for could be any address within the while(1) loop from Listing 1. The task of identifying an appropriate address is tricky but not inordinately difficult. You can use the map file output by the linker to get close to a good address. Peruse the map file for the address of the main function, and then set up the LSA to look for the occurrence of any address within a limited range beyond the entry to main. This range is justified because, unless there's a large amount of logic between the entry to main and the start of the while(1) loop, the beginning of the loop should be easy to spot with a little iteration and some intelligent tweaking of the address range to inspect. If your LSA can correlate the disassembled machine code back to C source, this step is even more straightforward because you only have to capture the addresses within the range known to hold the main function (again, see the map file output from the linker) and then watch for the while(1) instruction. If the while(1) loop is moved to its own function, perhaps something like Background(), then the location is much easier to find via the linker map file. If the previous approach isn't appealing, you have other options. By inspecting Listing 1, you'll notice that the CheckCRC function is called every time through the background loop. If you could ensure that this is the only place where CheckCRC is called, you could use the entry to this function as the marker for taking time measurements. Finally, you could set a dummy variable to a value every time through the background loop. The LSA could trigger on writing to this "special" variable as shown in Listing 2. Of course, I'm supposed to be showing you how using the LSA means you don't have to modify code. However, the code change in Listing 2 is so minor that it should have a negligible effect on the system. Listing 2: Background loop with an "observation" variable extern INT8U ping; while(1) /* endless loop - spin in the background */ Regardless of the method you use to trigger the LSA, the next step is to collect time measured from instance to instance. Obviously, the LSA must be able to time stamp each datum collected. Some of the more sophisticated modern logic analysis tools also have the ability to carry out some software performance analysis on the data collected. One such function that could help would be one that mathematically averages the instance-to-instance timing variation. Even more helpful is a histogram distribution of the variation since this shows the extent to which the background-loop execution time varies. If the LSA doesn't perform any kind of data analysis, you have to export the data and manipulate it using more labor-intensive tools, such as a spreadsheet. The spreadsheet is a good alternative to an LSA-based performance analysis tool as most spreadsheet applications have many statistical tools built in. To accurately measure CPU utilization, the measurement of the average time to execute the background task must also be as accurate as possible. To get an accurate measurement of the background task using the LSA method, you must ensure that the background task gets interrupted as little as possible (no interruptions at all is ideal, of course). Essentially two classes of interrupts can disrupt the background loop: event-based triggers and time-based triggers. Event-based triggers are usually instigated by devices, modules, and signals external to the microprocessor. When measuring the average background time, you should take all possible steps to remove the chance that these items can cause an interrupt that would artificially elongate the time attributed to the background task. It may be possible to disable the timing interrupt using configuration options. If it's possible, the background measurement should be extremely accurate and the load test can proceed. However, if it's impossible to disable the time-based interrupts, you'll need to conduct a statistical analysis of the timing data. Specifically, the histogram analysis of the time variation can be used to help the tester discern which data represent the measured background period that has executed uninterrupted and those that have been artificially extended through context switching. Figure 1 shows a histogram of an example data set. This data set contains a time variation of the measured idle-task period. Analysis of idle-task"period histogram data requires that you know how background loops become interrupted. This knowledge can help you isolate which histogram data to discard and which to keep. Looking at the sample histogram, you might estimate that any data above the threshold of 280μs represents instances where the background task was interrupted. Using this threshold, you would discard all data above 280μs for the purpose of calculating an average idle-task period. For the sake of this example, let's assume that the average of the histogram data below the threshold of 280μs is 180μs. Therefore, in all of the subsequent calculations, we'll use a value of 180μs to represent the average execution time for one cycle through the background loop in an "unloaded" system. Once you know the average background-task execution time, you can measure the CPU utilization while the system is under various states of loading. Obviously there's no way (yet) to measure CPU utilization directly. You'll have to derive the CPU utilization from measured changes in the period of the background loop. You should measure the average background-loop period under various system loads and graph the CPU utilization. For example, if you're measuring the CPU utilization of a engine management system under different systems loads, you might plot engine speed (revolutions per minute or RPM) versus CPU utilization. Assume the average background loop is measured given the data in Table 1. Note that the background loop should only be collected after the system has been allowed to stabilize at each new load point. Table 1: System load (RPM) vs. average background loop period (T) Now you've collected all the information you'll need to calculate CPU utilization under specific system loading. Recall from Equation 1 that the CPU utilization is defined as the time not spent executing the idle task. The amount of time spent executing the idle task can be represented as a ratio of the period of the idle task in an unloaded CPU to the period of the idle task under some known load, as shown in Equations 1 and 2. Table 2 shows the results of applying Equations 1 and 2 to the data in Table 1. Figure 2 shows the salient data in graphical form. Of course you'll want to reduce the amount of manual work to be done in this process. With a little up-front work instrumenting the code, you can significantly reduce the labor necessary to derive CPU utilization. Table 2: System load data and calculated utilization Counting background loops Let's say we use a 25ms period task to monitor the CPU utilization. We enhance the while(1) loop of Listing 2 so that a free-running counter is incremented every time through the loop as shown in Listing 3. A free-running counter uses a variable that, when incremented, is allowed to overflow. No math protection is needed (or desired) because the math that will look for counter changes can comprehend an overflow situation. Math protection would just add unnecessary overhead. Listing 3: Background loop with a loop counter extern INT16U bg_loop_cnt = 0; int main( void ) while(1) /* endless loop - We still know the average nonloaded background-loop period from the LSA measurements we collected and postprocessed. Recall that in the earlier example, the average idle-task period was calculated as 180μs. Therefore, in a 25ms time frame, the idle task would execute 138 times if it were never interrupted. We must modify the 25ms task as shown in Listing 4 to use this count to calculate the CPU utilization, and we have to retain the previous loop count so that a delta can be calculated. The delta indicates how many times the background loop executed during the immediately previous 25ms timeframe. Comparing this value to the maximum loop count indicates how much time was spent in the idle task versus doing other processing. Listing 4: Additional logic added to a period task for CPU use calculation /* predetermined average Idle Task Period */ /* Unloaded 'max' bg loops per 25ms task */ INT8U CPU_util_pct; /* 0 = 0% , 255 = 100% */ delta_cnt = bg_loop_cnt - prev_bg_loop_cnt; ..... do other 25 millisecond tasks ..... One thing you'll notice when comparing the actual C code to Equation 3 is that the delta loop counter is multiplied by 255, not 100% as indicated in the equation. This is a common scaling trick used to maximize the resolution of a variable. Since the minimum variable size that can hold the CPU utilization number is an 8-bit value, 100% is scaled to be the full range value of this variable space. To convert back to real percentage, use Equation 4. The conversion from computer units back into engineering units can be done after you've collected the data. This allows the end result to retain as much resolution as possible. Of course, if you're using floating-point math, you can do the conversion in the actual C code. Once these changes have been implemented, you must be able to retrieve the CPU-utility value from the processor. You can do this through various instrumentation ports or through communications protocols using UART, J1850, CAN, and so forth. You then pull the data into a spreadsheet and manipulate it to create the graph shown previously in Figure 2. Table 3 shows how the data would look and some of the intermediate calculations you can do. Some instrumentation solutions allow the scaled value to be converted from computer units to engineering units automatically. Table 3: Scaling the output for human consumption Automating the system I'll call this the automated method. The automated method calculates, in real time, the average time spent in the background loop. There are two main advantages to having the software calculate the average time for the background loop to complete, unloaded: Also, each interrupt service routine, exception handler, and preemption mechanism must indicate that a context switch has happened. A preemption indicator flag facilitates this notice. The background loop can use the flag to discern that the time measured has been elongated by another task. See Listing 5 for an example of how a preemption indicator can be used. Notice that the PreemptionFlag variable is more than a Boolean value; you can use it to indicate which actual event executed since the last time the preemption flag was cleared. I won't explain this function any further here but it may spark some ideas for expanding the method to measure the time spent in each individual task and not just in the background. Listing 5: A task harnessed with a preemption indicator extern INT32U PreemptionFlag; void SomeEventISR( void ) Once a flag exists that can indicate preemption, the background-loop logic can be enhanced to measure its own execution period. Listing 6 shows a completely modified background loop with the logic necessary to measure and calculate the average, uninterrupted, idle-task period. Listing 6: Idle task period measurement with preemption detection extern INT16U bg_loop_cnt = 0; int main( void ) while(1) /* endless loop - spin in the background */ void MonitorIdlePeriod( void ) DisableInterrupts(); /* start atomic section */ IdlePeriod = RT_Clock - prevRT_Clock; Notice that the average idle-period variable, IdlePeriod, is filtered in the source code shown in Listing 6. The definition of the filter is beyond the scope of this article; the filter could be as simple as a first-order lag filter or as complex as a ring buffer implementing a running average. The CPU-utilization calculation logic found in the 25ms logic must also be modified to exploit these changes. Listing 7 shows how you can modify this piece of code to use a filtered idle period (scaled in real-time clock counts). Listing 7: Refined CPU measurement /* How many RT clocks (5 us) happen each 25ms */ INT8U CPU_util_pct, FiltCPU_Pct; /* 0 = 0% , 255 = 100% */ PreemptionFlag = 0x0004; /* indicate preemption by 25mS task */ idle_time = delta_cnt * FiltIdlePeriod; ..... do other 25 millisecond tasks ..... This logic now uses the filtered idle period instead of a constant to calculate the amount of time spent in the background loop. Of course, the logic must still know how much total time exists between measurements, but now the time constant is relative to the resolution of the real-time clock instead of a hard-coded average idle period. Note that a filtered CPU utilization value has also been added to assist you if the raw CPU-usage value contains noise. How much is enough? A less scientific (and perhaps more heuristic) limit is the 70 to 80% range. This range makes sense when considered in the context of RMA and also understanding that RMA is a fairly restrictive theory in that it assumes a fixed priority of tasks. Many real-time implementations of logic allow tasks to raise their priority to accomplish critical functions. However, opinions abound. A study I found on the Internet indicates that system designers should strive to keep CPU use below 50%: "A CPU with really high utilization will lead to unpredictable real-time behavior. Also, it is possible that the high priority tasks in the system will starve the low priority tasks of any CPU time. This can cause the low priority tasks to misbehave. . . . [K]eep the peak CPU utilization below 50 %."2 Refining your tools Although I've mentioned that some logic-analysis equipment contains software-performance tools, I didn't explain how to exploit these tools. Every software-performance tool is a little different, but if your project team has such a tool available, it's in your best interest to discover whether the tool can help you understand your system loading. Profiling tools can also help you understand where the system is spending a majority of its time. You should use these tools if they're available to you. Michael Trader is a senior applied specialist with EDS' Engineering and Manufacturing Services business unit. A software engineer since 1989, he currently develops embedded powertrain control firmware in the automotive industry. He has a BSEE from the Milwaukee School of Engineering and a MS-CSE from Oakland University in Rochester, Michigan. Further reading Oshana, Robert, "Rate-monotonic Analysis Keeps Real-time Systems on Schedule," EDN AccessDesign Feature, September 1997, www.reed-electronics.com/ednmag/article/CA81193 Krishna, C. M., and Kang G. Shin, Real-Time Systems, WCB/McGraw-Hill, 1997. Copyright 2005 © CMP Media LLC |
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |