3. DWT implementation methodology

## 3.1. DWT implementation using DA

Since x[k]<sup>l</sup> takes the value of either 0 or 1, P<sup>N</sup>�<sup>1</sup>

2.3. Residue number system

116 Wavelet Theory and Its Applications

in parallel.

have been investigated.

That is, rather than computing the summation at each iteration online, it is possible to precompute and store these values in a ROM, indexed by x[k]l. In other words, Eq. (6) simply

The RNS is a non-weighted number system that performs parallel carry-free addition and multiplication arithmetic. In DSP applications, which require intensive computations, the carry-free propagation allows for a concurrent computation in each residue channel. The RNS moduli set, P = {m1, m2, …, mq}, consists of q channels. Each mi represents a positive relatively

Any number, X ∈ ZM = 0, 1, …,M-1, is uniquely represented in RNS by its residues j j X mi

where M determines the range of unsigned numbers in [0, M - 1], and should be greater than the largest performed results. In addition, M uniquely represents any signed numbers. The implementation of RNS-based DWT obtained from Eq. (4) is given by Eq. (8) as follows:

∣h k½ �mi

for each mi ∈ P. This implies that a q-channel DWT is implemented by q FIR filters that work

Mapping from the RNS system to integers is performed by the Chinese reminder theorem (CRT) [34, 41, 42]. The CRT states that binary/decimal representation of a number can be

Designing a robust RNS-based DWT requires selecting a moduli set and implementing the hardware design of residue to binary conversion. Most widely studied moduli sets are given as a power of two due to the attractive arithmetic properties of these modulo sets. For example, <sup>2</sup><sup>n</sup> � <sup>1</sup> ; <sup>2</sup><sup>n</sup>; <sup>2</sup><sup>n</sup>þ<sup>1</sup> � <sup>1</sup> � � [43], 2<sup>n</sup> � <sup>1</sup> ; <sup>2</sup><sup>n</sup>; <sup>2</sup><sup>n</sup> f g <sup>þ</sup> <sup>1</sup> [39], and 2<sup>n</sup>; <sup>2</sup><sup>2</sup><sup>n</sup> � <sup>1</sup>; 22<sup>n</sup> <sup>þ</sup> <sup>1</sup> � � [44]

For the purpose of illustrating, the moduli set Pn <sup>¼</sup> <sup>2</sup><sup>n</sup> � <sup>1</sup>; <sup>2</sup><sup>n</sup>; <sup>2</sup><sup>n</sup>þ<sup>1</sup> � <sup>1</sup> � � is used for three reasons. First, the multiplicative adder (MA) is simple and identical for <sup>m</sup><sup>1</sup> <sup>¼</sup> <sup>2</sup><sup>n</sup> � 1 and <sup>m</sup><sup>3</sup> <sup>¼</sup> <sup>2</sup><sup>n</sup>þ<sup>1</sup> � 1. Second, for small (<sup>n</sup> = 7), the dynamic range of <sup>P</sup><sup>7</sup> is large, M = 4,145,280, which could efficiently express real numbers in the range [�2.5, 2.5] using a 16-bit fixed-point representation, provided scaling and rounding are done properly. We assume that this interval is sufficient to map the input values, which does not exceed �2. Third, the reverse converter

:x n½ � � k mi mi

� � � � � !� � � � � : mi

@ (8)

<sup>∗</sup> m<sup>2</sup>

<sup>i</sup>¼<sup>1</sup>mi <sup>¼</sup> <sup>m</sup><sup>1</sup>

realizes the sum of product computation by memory (LUT), adders, and shift register.

prime integer; the greatest common divisor (GCD) (mi, mj) = 1 for i 6¼ j.

which is the remainder of division X by mi and M is defined in Eq. (7) as

� � � � �

0

N X�1 k¼0

obtained from its RNS if all elements of the moduli set are pairwise relatively prime.

<sup>M</sup> <sup>¼</sup> <sup>Π</sup><sup>q</sup>

y n½ �mi ¼ ymi ¼

unit is simple and regular [42] due to using simple circuits design.

<sup>k</sup>¼<sup>0</sup> h k½ �:x k½ �<sup>l</sup> may have only <sup>2</sup><sup>N</sup> possible values.

<sup>∗</sup> … <sup>∗</sup> mq (7)

,

DAA hides the explicit multiplications with a ROM lookup table. The memory stores all possible values of the inner product of a fixed w-bit with any possible combination of the DWT filter coefficients. The input data, x[n], are signed fixed-point of a 22-bit width, with 16 binary-point bits (Q5,16). We assumed that the memory contents have the same precision as the input, which is reasonable to give high enough accuracy for the fixed-point implementation. As a consequence, 22 ROMs, each consisting of 16 words, are required. Each ROM stores any possible combination of the four DWT filter coefficients, where the final result is a 22-bit signed fixed-point (Q5,16). In order to decrease the number of memory, the width should be reduced, which will have an impact on the output precision.

Figure 3 shows the block diagram of 1-bit DAA at position l. This block contains one ROM (4 22) and one shift register. Because the word's length w of the input x is 22 bits, the actual design contains 22 memory blocks and 21 adders for summing up the partial results.

### 3.2. DWT implementation using RNS

The RNS-based DWT implementation has mainly three components. They are the forward converter, the modulo adders (MAs), and the reverse converter. The forward converter, which is also known as the binary-to-residue converter (BRC), is used to convert a binary input number to residue numbers. By contrast, the reverse converter or the residue-to-binary converter (RBC) is used to obtain the result in a binary format from the residue numbers. We refer to the RNS system, which does not include RBC, as a forward converter and modular-adders (FCMA), as illustrated in Figure 4.

#### 3.2.1. The forward converter

The forward converter is used to convert the result of multiplying an input number by a wavelet coefficient to q residue numbers via LUT, shift, and modulo adders, where q is the number of channels.

Figure 3. The block diagram of DAA-based architecture of the DB2. For simplicity, we showed one ROM and one shift register. In the actual design, there are 22 ROMs and shift registers. >> is a 16 l shift operation, where 16 is the number of the binary point.

Figure 4. The block diagram of DB2 RNS-based architecture. BRC is an abbreviation for binary-to-residue converter, RBC for residue-to-binary converter, and MA for modulo adder.


contains hk

ROM jðÞ¼ ∣hk

adders and registers.

value.

<sup>∗</sup> j j <sup>x</sup> <sup>m</sup><sup>2</sup> and the most significant <sup>7</sup> bit contains hk

<sup>∗</sup> 22<sup>n</sup>þ<sup>1</sup> <sup>þ</sup> <sup>∣</sup>hk

<sup>∗</sup> j∗211∣<sup>m</sup><sup>1</sup>

ROM, so that three outputs, corresponding to hk

Figure 6. The block diagram of (2n – 1) modulo adder.

Finally, the modulo adder adds the corresponding output (Figure 6).

<sup>∗</sup> j j <sup>x</sup> <sup>m</sup><sup>1</sup>

A Comparative Performance of Discrete Wavelet Transform Implementations Using Multiplierless

<sup>∗</sup> <sup>2</sup><sup>n</sup>þ<sup>1</sup> <sup>þ</sup> hk

∗ j <sup>∗</sup> 211 m<sup>3</sup>

, are produced.

The advantage of this method is that no extra hardware is required to separate each module

Figure 5. The block diagram of the binary-to-residue converter for the three-channel RNS-based DWT, P7 = {127, 128, 255}. Four identical memories are used for each tap. The upper corner shows the memory content at location j ∈ [0, 15].

As with DAA-based approach, if the input word length is 16 bits, the ROM should contain 216 locations. One way to reduce the size of the memory is to divide it into four ROMs of 4 � 22. Figure 4 shows the block diagram of the binary-to-residue converter with four ROMs; each is indexed by four bits of x. However, the output of each ROM should be combined, so that the final result can be corrected. It is worth noting that this division comes with a cost in terms of

According to the previous improvements, the RNS-based works are as follows. The input X<sup>16</sup>�bit ¼ ð Þ x1; x2; x3; x<sup>4</sup> is divided into four segments. Each of the 4-bit segment is fed into one

To obtain the final multiplications' result, each mi output should be shifted by l positions, where l is the index of the lowest input bit (4, 8, or 12). The modular multiplication and shift for (2<sup>n</sup> – 1) and (2<sup>n</sup> + 1–1) can be achieved by a left circular shift (left rotate) for l positions, whereas the modular multiplication and shift for 2<sup>n</sup> can be achieved by a left shift for l positions [17].

∗ xl∗211 mi

∗ j∗211∣<sup>m</sup><sup>2</sup> , which is generalized by Eq. (9).

http://dx.doi.org/10.5772/intechopen.76522

119

, j <sup>¼</sup> <sup>0</sup>; <sup>2</sup><sup>w</sup> ½ � (9)

Table 1. The DB2 low-pass real and RNS-system number equivalent, multiplied by 211.

#### 3.2.2. RNS-system number conversion

The received samples and wavelet coefficients span the real number and might take small values. One of the main drawbacks of RNS-number representation is that it only operates with positive integer numbers from [0, M – 1]. The DWT coefficients are generally between 1 and � 1. As a possible solution, we have divided the range of RNS, [0, M – 1], to handle those numbers.

In addition, the received sample X[i] is scaled up by shifting y positions to the left (multiplying by 2<sup>y</sup> ), which ensures that X[i] is a y-bit fixed point integer. In a similar manner, the wavelet coefficients are scaled by shifting its z positions to the left. In our design, we set the filter scaling factor z to 11. Table 1 presents the low-pass filter of DB2 before and after scaling.
