W-CDMA RAKE Receiver Comes to Life in DSP

W-CDMA RAKE Receiver Comes to Life in DSP
By Ahsan Aziz, Kim-Chyan Gan, and Imran Ahmed, Motorola, CommsDesign.com
November 19, 2003 (9:09 a.m. EST)
URL: http://www.eetimes.com/story/OEG20031119S0011

The computational requirements for UMTS wideband CDMA (W-CDMA) are substantially higher than that of the second-generation GSM and CDMA systems. Due to the complexity, many designers are turning to ASICs as a means for handling computationally intensive processing tasks. However, some tasks, such as chip and symbol rate processing, may best be handled in a digital signal processor.

In this article, we'll show how a single digital signal processor (DSP), housing four arithmetic logic units (ALUs), can be used to handle chip-rate and symbol-rate processing in downlink of a W-CDMA handset design. During the discussion, we'll present some baseband processing algorithms related to the RAKE receiver and outline methods to speed up these algorithms for practical implementation. This article will also present methods to speedup dispreading, descrambling, channel estimation and other downlink RAKE receiver functions.

Understanding the Receiver
UMTS W-CDMA systems use direct-sequence code division multiple access (DS-CDMA) as its method of transmission. DS-CDMA is well suited for transmission over the multipath fading channel. The signal bandwidth for the W-CDMA system is set at 5 MHz. This high bandwidth allows the received signal to be split into distinct multipaths with high resolution.

For the first-generation of UMTS handsets, RAKE receiver will be used as the receiver of choice. In a RAKE receiver, one RAKE finger is assigned to each multipath, thus maximizing the amount of received signal energy. Each of these different paths are combined to form a composite signal that is expected to have substantially better characteristics for the purpose of demodulation than just the a single path. In order to combine the different paths meaningfully, the RAKE receiver needs the knowledge of channel parameters such as, number of paths, their location (in the delay domain) and (complex-valued) attenuation.

Figure 1 shows a typical four-f inger RAKE receiver where r(t) is the received signal. Since r(t) consists of multipath components, it can split into r(t-τ_i) independent paths which can be combined with the corresponding channel estimates g(t, τ_i).

Figure 1: Typical four-finger W-CDMA RAKE receiver.
In a W-CDMA receiver the following steps take place (excluding the error correction coding):

Descrambling: Received signals are multiplied by the scrambling code and delayed versions of the scrambling code. The delays are determined by a path searcher prior to descrambling. Each delay corresponds to a separate multipath that will eventually be combined by the Rake receiver.

Despreading The descrambled data of each path are despread by simply multiplying the descrambled data by the spreading code.

Integration and dump: The despread data is then integrated over one symbol period, giving one complex sample output per quadrature phase-shift keying (QPSK) symbol. This process is carried out for all the paths that will be combined by the RAKE receiver.

The same symbols obtained via different paths are then combined together using the corresponding channel information using a combining scheme like maximum ratio combing (MRC).

The combined outputs are then sent to a simple decision device to decide on the transmitted bits.

The objective of the channel estimation block is to estimate the channel phase and amplitude [denoted in Figure 1 as g(t, τ_i)] for each of the identified paths. Once this information is known, it can be used for combining each path of the received signal.
The above operations can be mathematically expressed by the following equations starting form the transmitter end. In Equation 1, u(t) is the transmitted signal, a_k is the complex t ransmitted symbol, p_k is the complex spreading (OVSF) and scrambling code combined, N is the spreading factor, f(t) is the pulse shaping filter (root raised cosine with roll-off factor 0.22) and T is the chip duration.

Equation 2 shows the received signal y(t) at the mobile unit. The mobile channel is modeled as a filter with complex taps given by c_i and delays of d_i, where {j=0..J-1} for J different paths and g(t) is the additive white Gaussian noise (AWGN) with single-sided power spectral density (p.s.d) N₀.

At the receiver end, the received data is first passed through a matched filter (matched to the transmitter filter). The match filter maximizes the signal-to-noise ratio (SNR) at the receiver. This process is shown in Equatio n 3:

A path searcher estimates the delay of each path (τ_i) in the composite received signal r(t). Then the received signal is delayed by the amount estimated by the path searcher and multiplied by the conjugate of scrambling and spreading code (code that was used for transmission). The descrambled and despread data are then summed over one symbol period as shown in Equation 4. As an example, if there are four strong paths, four different estimate of the same symbol will be generated using:

The estimates generated through Equation 4 will be combined by the Rake receiver with the corresponding channel estimate as shown in Equation 5:

In Equation 5, c_j are the channel estimates and d_j are the estimated path delays, J is the estimated number of strong paths, det( ) is a simple decision device and a is the estimated bit/symbol obtained at the output of the RAKE receiver.
Channel Estimation
Typically channel estimation can be done in three major ways or some combination of these three methods: data aided (DA), decision directed (DD), and blind.
In a W-CDMA downlink traffic channel (DPDCH/DPCCH), pilot symbols (2 to 8 symbols) and control symbols are transmitted in every slot. There are 15 slots per W-CDMA frame. Each frame is 10 ms long and has 38400 chips (3.84 MChips/s).
Channel estimation can be made using these pilot symbols. If these time multiplexed pilot data bits are used, then the estimate for the data bits in between two consecutive sets of pilot bits (two slots) can be obtained by interpolation. The DD channel estimation approach can be then used to improve the performance. Figure 2 gives a layout of the pro cess of channel estimation using time multiplexed symbols.

Figure 2: Channel estimation using time-multiplexed pilot signals.
Also, in the downlink of the W-CDMA system, a common control channel (CPICH) is transmitted with a higher power than the dedicated traffic channels. This channel is received by all the mobiles in a given cell. CPICH is transmitted with a constant spreading factor (SF) of 256 and a spreading code of all ones. This means there are 10 symbols per slot and 150 symbols per frame of CPICH. All the symbols of the CPICH are 1+j.
At the receiver end, the CPICH symbols as pilot symbols and can be used for channel estimation. The advantage of using CPICH for channel estimation is that all the data in the frame can be used for channel estimation as opposed to only a few symbols in the DPCCH/DPDCH. Also since this is transmitted with a higher power then the traffic c hannel it will have better reception at the handset.
While both channel estimation techniques are effective, we've used CPICH for channel estimation to show how the four-ALU DSP handles Rake receiver tasks. Figure 3 defines the channel estimation process that was implemented on the DSP.

Figure 3: Diagram illustrating the CPICH channel estimation technique used to show Rake receiver performance on a four-ALU DSP.
For each independent path the channel estimate is obtained as follows:

First the CPICH modulation form is removed. This is done by multiplying the CPICH data by its conjugate. In this case, the conjugate is 1-j since all the CPICH symbols are 1+j. This produces a channel estimate that is noisy due to the presence of AWGN and multiple user interferences.

Next, the noisy channel estimate is passed through a variable length moving average filt er. In the DSP implementation the filter length can be N = 8, 16, or 32, where N is the number of filter taps.

Then the filtered channel estimate that are obtained in step 2 are either decimated or interpolated to match the data rate of the CPICH to the data rate of the DPCCH/DPDCH. The process of interpolation is done by a simple zero-order hold (i.e. a simple repeater). This works well in most cases as the channel is assumed to be stable for the symbol duration.

DSP Implementation
Now that we've detailed what channel estimation technique will be used, let's see how key rake receiver functions will be implemented on the DSP. Let's start with the channel estimation block.
1. Handling Channel Estimation
In the example presented here, it is assumed that all the CPICH data are scaled properly so that it is representable in 16 bits. The first step in the channel estimation algorithm involves the complex multiplication of the received CPICH with the conjugate o f the original CPICH symbols, to remove the modulation i.e. multiplying the received CPICH data by 1-j on a symbol by symbol basis. Lets consider the received CPICH symbols as a_i+jb_i so the complex multiplication can be written as:

Data is read out of the memory holding the received CPICH symbols and after performing the additions shown in Equation 6 the data is written to the memory. The next step in the process involves smoothing out the output. This is done with a moving average (MA) filter.
The MA filter is defined by the following equation:

In Equation 7, ~c_j(n-i) are the noisy channel estimate from the CPICH and a(i) are the filter coefficient and ^c_j are the final channel estimates that will be used by the MRC. All the filter coefficients are equal (1/N) .
From Equation 7, it can be seen that the number of memory reads/writes and multiply accumulate (MAC) operations can be reduced by simply adding a new sample (scaled by 1/N) to the running sum and removing the oldest sample form the running sum. This is the process used in computing the MA filter output in the DSP. However, the very first sample of the running sum is computed by reading four complex numbers form the buffer at a time and performing four MAC operations per cycle and repeating the loop N/4 times.
Once computed, the output of the MA filter is written to memory. Before writing the results, "rate matching" needs to be performed between the CPICH and the traffic channel (DPCCH/ DPDCH).
CPICH is always transmitted with a spreading factor of 256 and the traffic channel can be transmitted with spreading factor in the range of 4 to 512 depending on the data rate needed. This rate matching is performed by a simple zero-order hold (or decimation for SF=512). A true interpolation filter is not needed as the channel changes much slower then the symbol interval. Results for the channel estimation algorithm are presented below
2. Path Searcher
The path searcher identifies the location (in chips) for each independent strong path that are present in the composite received signal r(t). This is done mainly by correlation or the received CPICH signal (with its unique scrambling code) with the stored scrambling code. The "near orthogonal" properties of the complex pseudorandom noise (PN) code are used to separate each path. The following explains how this was implemented in the DSP to optimize performance and MIPS count.
The received signal at the mobile unit is correlated with the stored cell specific scrambling code. Equation 8 expresses the output of the correlation process.

In Equation 8, N is the size of the autocorrelation window, which can be thought of as number of taps for the auto correlation. This number is chosen by some finger management routine.
In this example, auto correlation is performed to handle a delay spread of up to 20 μs, which is about 320 slides of the auto correlation window. This index is denoted by the m in Equation 8 while P is the length of autocorrelation output (approximately 320 for this implementation). Since the path searcher output is generated on a frame-by-frame basis, n in Equation 8 refers to the frame index. At the instant in time when the stored and the received sequences are perfectly aligned the autocorrelation output y(n,m) at that index becomes:

The first term on the right-hand side is the average of the channel coefficients. This is a complex Gaussian distribution. The summation and scaling in Equation 9 for the first term is essentially the same as taking the mean of the channel coefficient. Thus, it tur ns out to be the local mean of the channel coefficient.
The second term on the R.H.S. is interpath interference (IPI), which is interference caused by other paths that do not align perfectly with the scrambling code. Since the scrambling code yields a peak only if two scrambling codes are perfectly aligned, the IPI is usually small when compared with the third term, which is multi-access interference (MAI) and thermal noise.
Typically, the first term is larger than the last two terms. However, in deep fading when the local mean is close to zero, the magnitude of the first term can be significantly lower.
After running autocorrelation, non-coherent averaging is used to combat a spurious peak signal in a fading channel. This technique averages the current and previous M-1 power delay profiles yielding a better estimate as the noise is averaged out. This can be expressed as follows:

Assuming that the channel statistic does not change much over M frames, we can take the expected value of the power delay profile, as follows:

The first term contributes to the peak in the power delay profile, and the second term constitutes the noise floor. If the received and the stored sequence do not line up perfectly, the first term disappears, leaving only the second term.
After performing non-coherent averaging, a local peak search technique is employed to find all local peaks in the power delay profile. This search technique is based on the observation of three points. As long as the middle point is higher than the two points at the side, a local peak is found.
A threshold is then computed that should be higher than all the floor noise but lower than the true "delay index" peak. As seen previously the floor noise comes from interpath interference, multi-channel interference, and thermal noise. An adaptive threshold is used because it is more robust, accounting for both interference and noise variations. The formula in calculating the threshold is:⁴

After finding the compute threshold, designers must perform a local peak removal operation. This is the final operation of the path searcher. All the local peaks are compared against the threshold, and peaks lower than the threshold are removed and the higher are retained. Here are some cycle counts and code size for the above path searcher (Table 1).

Table 1: Cycle Count and Code Size for the Presented Path Searcher.

3. Maximum Ratio Combining (MRC)
There are two ways to achieve combining: a) at the chip level and b) at the symbol level. As the name implies, symbol-level combining combines the signals from different paths at the symbol level. The descrambling and despreading are performed before combining in order to convert chip-level signals into symbol-level signals.
Figure 1 above shows the block diagram for combining at the symbol level. Chip-level combining performs combining followed by descrambling and despreading. The performance of both combining schemes are the same under perfect channel estimation, path search, and assuming that the fading channel is constant over a symbol period.
Table 2 shows the estimation of the computational loads of both combining schemes for one channel. The descrambling and despreading are combined and are done in one step. It is assumed that the scrambling code and the spreading code do not change during the transmission.¹

Table 2: Chip-Level vs. Symbol-Level combining

In the implementation described in this artic le, symbol-rate combining was used since it requires fewer computations, especially when the spreading factor goes up. Channel estimation is hard to achieve in chip-level combining. Usually, it is estimated at the symbol level and interpolated to the chip rate. Therefore, chip rate channel estimation takes more MIPS and memory than its symbol rate counterpart.
In Table 2, symbol- and chip-rate combining are also compared in terms of memory usage. In chip-rate combining, we can see that one path is stored (after the MRC in), but the stored data is in chip-rate, which is as many as 38400 samples. In symbol-rate combining, the data of multiple paths are stored (Figure 1 before MRC). However, each path is at symbol rate. Therefore, the total number of samples are (38400/SF) * L. The difference in memory usage between chip and symbol combining can be given by equation:

where L is the number of fingers (2 to 8) and SF is the spreading factor (4 to 512). If Δ in Equation 13 is positive, chip-rate combining requires more memory, if Equation 13 is negative, then the symbol-rate combining requires more memory.
As mentioned at the outset of the article, we're considering the implementation of a four-finger RAKE receiver. MRC is just a complex multiplication with a channel coefficient in each finger, followed by addition of the results from the different fingers. These steps can be combined using MAC instructions.
The Results
Now let's show some results from the DSP implementation. Figure 4 shows the in-phase (I) and quadrature (Q) part of the estimated channel taps generated by the DSP and the actual channel taps (ideal) at some instant in time. In this example, the Spreading factor is 32.

Figure 4: I and Q part of the ideal and estimated channel taps.
Figure 5 shows the tracking of the fading envelope for two frames of data for the weakest path of the following channel (no MAI is assumed).

Figure 5: I and Q of the channel estimate from DSP (-- blue), Matlab (solid red) and perfect channel estimate (solid blue) for N=16, SF = 32, and mobile speed = 120 km/h.
Reducing the size of the MA filter from N=16 taps to N=8 taps, the channel estimate gets noisier as expected, but having less number of taps helps in tracking the fast fading envelopes. Figure 6 shows the performance of the channel estimation algorithm in tracking fast fading channel (for the same channel condition the mobile is traveling at twice the speed). Shorter length MA filter tracks the fast fading channel better.

Figure 6: Q of the channel estimate from DSP (-- blue), Matlab (solid red) and perfect channel estimate (solid blue) for N = 8, SF = 32, mobile speed = 240 km/h.
The length of the MA filter is left as a variable for the designer's choice. It is seen that to estimate the fast fading channel reducing the filter length helps track the channel better when the envelope changes fast. The number of DSP cycles needed for channel estimation can be expressed by:
DSP_cycles_interp= (SF_MATCH*150) + (7N/4) + 780
DSP_cycles_decimation = (7N/4) + 780
SF_MATCH = 1,2,4,8,16,32,64 corresponding to SF = 256,128,64,32, 16,8,4
Figure 7 shows the result of the path searcher for the following channel condition:
SNR=5dB;Path_delay_ns=[0 260 521 781]ns; power_dB=[0 -3 -6 -9]dB; v=120; km/h
As Figure 7 shows, the path searcher is able to identify the four different paths that are 1 chip apart.

Figure 7: Output from the path searcer.
Table 2 above shows the complexity and memory requirements of chip level and symbol level combining. The MIPS requirements for the chip-level and symbol-level combining are also shown in Figure 8 for different spreading factors.

Figure 8: Computational Complexity of Symbol-rate vs. chip-rate combining.
Finally, let's look at the bit-error-rate (BER) curves in Figure 9 for the W-CDMA receiver that was implemented in the DSP.

Figure 9: BER curves for W-CDMA RAKE receiver with path searcher and channel estimation.
The plot shows that the channel estimation and the path searcher algorithm produces results that are very close to the ones obtained assuming ideal channel estimation and known paths. It wa s observed that averaging the power delay profile for longer period provides good estimation of the path delay but long time averaging has the disadvantage if the delay spread changes fast.
Wrap Up
In this article, we presented a practical implementation approach for a W-CDMA RAKE receiver in a four-ALU DSP. Based on the above estimates of MIPS requirement for RAKE processing, we can say that approximately 171 MIPS are needed for four channels for RAKE receiver, leaving 129 MIPS free for other processing on the proposed DSP.
References

Mayer, Moeneclaey, Fechtel, Digital Communication Receivers Synchronization, Channel Estimation and Signal Processing, John Wiley & Sons, Inc. 1998.

H. Holma and A. Toskala, WCDMA for UMTS Radio Access For Third Generation Mobile Communications, John Wiley & Sons, Inc. 2001.

E. Bejjani, J.-F. Bouquier, and B. de Cacqueray. "Adaptive Channel Delay Selection for WCDMA Mobile System". In Proc. IEEE VTC 1999, page 203-207, Amsterdam, the Netherlands, 1999.

H. Elders-Boll. "Simplified Interference-Based Threshold Rule for Delay Selection in DS-CDMA Systems". PPIMRC 2000. The 11th IEEE International Symposium on, Volume: 1, 2000 Page(s): 77 -81 vol.1.

About the Authors
Ahsan Aziz is a senior DSP engineer at Motorola DSP Platforms group. He received his MS in Electrical Engineering form Texas A&M University and can be reached at ahsan.aziz@motorola.com.
Kim-Chyan Gan is a DSP software engineer at Motorola DSP Platforms group. He received the M.S. degree in electrical engineering from the Utah State University, Logan, and can be reached at kim-chyan.gan@motorola.com.
Imran Ahmed is a systems applications engineer at in Motorola's DSP Platforms group. He received a B.S. in Computer Engineering from University of Texas at Austin and can be reached at imranahmed@motorola.com.