Scalable UHD H.264 Encoder - Ultra-High Throughput, Full Motion Estimation engine
Optimizing sort algorithms on DSPs
By Hazarathaiah Malepati and Yosi Stein, Analog Devices
dspdesignline.com (March 13, 2009)
In this article, we present the bubble sort and merge sort algorithms. We discuss the cycle counts of these algorithms on the Blackfin processor, and show how to reduce the cycle count of the bubble sort and merge sort algorithm. In addition we look at methods to accelerate the sorting of data that is wider than the register width of the processor.
Introduction
Standard algorithm cookbooks provide dozens of number-sorting algorithms [1]. Though the sorting algorithms are very simple from a mathematical point of view, they can be very time consuming to perform on DSPs due to their non-linear computational complexity and the presence of many control operations in their execution. Usually, we specify the complexity of an algorithm using the notation O(.), which indicate 'the order of'. For example, the complexity of sorting N numbers using the bubble sort method is O(N2) and using merge sort method is O(NlogN). This doesn't mean we consume N2 and NlogN processor clock cycles in executing those algorithms to perform sorting. However, we can say that we consume about xN2 and yNlogN cycles (excluding the overhead), where the efficiency factors x and y depend on the particular processor architecture, software code, algorithm flow, etc. used to execute the algorithm.
Since most of the operations in sort algorithms are of the control type, a processor compute unit with a deep pipeline will stall whenever it encounters a conditional operation such as a jump, a branch in a loop structure, etc. To avoid this situation we should restructure the algorithm to use fewer conditional operations. In addition to the processor architecture, the efficiency factors x and y depend on the kind of source code (C, assembly level, etc.) used to develop the algorithm. We usually develop software with the C language because of its ease of use. The C code is converted to native machine code using a compiler. The efficiency of compiled code depends entirely on the particular compiler used. If the C compiler is not efficient then the values x and y are much larger than can be obtained from hand-optimized assembly code.
In this article, we will estimate the values for x and y by implementing the bubble sort and merge sort algorithms on a Blackfin DSP processor [2]. We provide techniques to efficiently implement the merging of arrays on Blackfin. We also provide optimized assembly level code for the merge sort algorithm.
E-mail This Article | Printer-Friendly Page |
Related Articles
- Optimizing High Performance CPUs, GPUs and DSPs? Use logic and memory IP - Part II
- Image stabilizers: Utilizing DSP for more advanced, scalable stabilization algorithms
- Writing a modular Audio Post Processing DSP algorithm
- Optimizing high-performance CPUs, GPUs and DSPs? Use logic and memory IP - Part I
- Optimizing efficiency and flexibility in DSP systems
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone