Optimizing sort algorithms on DSPs

This article explains the bubble sort and merge sort algorithms, and shows how to optimize them on the Blackfin processor.

By Hazarathaiah Malepati and Yosi Stein, Analog Devices
dspdesignline.com (March 13, 2009)

In this article, we present the bubble sort and merge sort algorithms. We discuss the cycle counts of these algorithms on the Blackfin processor, and show how to reduce the cycle count of the bubble sort and merge sort algorithm. In addition we look at methods to accelerate the sorting of data that is wider than the register width of the processor.

Introduction

Standard algorithm cookbooks provide dozens of number-sorting algorithms [1]. Though the sorting algorithms are very simple from a mathematical point of view, they can be very time consuming to perform on DSPs due to their non-linear computational complexity and the presence of many control operations in their execution. Usually, we specify the complexity of an algorithm using the notation O(.), which indicate 'the order of'. For example, the complexity of sorting N numbers using the bubble sort method is O(N2) and using merge sort method is O(NlogN). This doesn't mean we consume N2 and NlogN processor clock cycles in executing those algorithms to perform sorting. However, we can say that we consume about xN2 and yNlogN cycles (excluding the overhead), where the efficiency factors x and y depend on the particular processor architecture, software code, algorithm flow, etc. used to execute the algorithm.

Since most of the operations in sort algorithms are of the control type, a processor compute unit with a deep pipeline will stall whenever it encounters a conditional operation such as a jump, a branch in a loop structure, etc. To avoid this situation we should restructure the algorithm to use fewer conditional operations. In addition to the processor architecture, the efficiency factors x and y depend on the kind of source code (C, assembly level, etc.) used to develop the algorithm. We usually develop software with the C language because of its ease of use. The C code is converted to native machine code using a compiler. The efficiency of compiled code depends entirely on the particular compiler used. If the C compiler is not efficient then the values x and y are much larger than can be obtained from hand-optimized assembly code.

In this article, we will estimate the values for x and y by implementing the bubble sort and merge sort algorithms on a Blackfin DSP processor [2]. We provide techniques to efficiently implement the merging of arrays on Blackfin. We also provide optimized assembly level code for the merge sort algorithm.

Click here to read more ...