**3. Background**

#### **3.1 The lifting scheme**

MRA like the lifting scheme is used to design different wavelets. This wavelet transform is named as the second-generation wavelet transform and developed in Sweldens [32]. It uses convolution operations instead of filters, simplifying mathematical operations [33, 34]. The lifting scheme splits the signal in even and odd samples to apply them simple operations, as shown in **Figure 1**. The general version of the lifting scheme was presented in Sole and Salembier [36]. The Lifting Scheme steps are detailed in Trevino-Sánchez and Alarcón-Aquino [22]. The Wavelet Lifting Scheme decomposes the 1D Signal into two coefficients: approximations (L: Low Frequency) and details (H: High Frequency).

**Figures 1**–**3** show three wavelet functions for a 1D lifting scheme: Haar, Daubechies 4 (Db4) and Daubechies 6 (Db6) accordingly, where (↓2) indicates that the signal is downsampled by half. Using *Z* means that a previous sample is used, while *Z*<sup>1</sup> is for the next one. Additionally, *Z*<sup>1</sup> also states that the signal is splitted in even and odd components [35].

*Random Wavelet Coefficients Pooling for Convolutional Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.105162*

#### **Figure 1.**

*The Haar Wavelet Lifting Scheme for a 1D Signal [35].*

**Figure 2.**

*The Daubechies 4 Wavelet Lifting Scheme for a 1D Signal [35].*

**Figure 3.**

When the lifting scheme needs to be applied to an image a 2D version is required, as shown in **Figure 4**. In this example a Haar wavelet function is used but it can be applied to any other wavelet function. It simply repeats the same block two more times, one for the low and one for high frequency coefficients. The first block splits the signal in one direction (horizontal), while the other two blocks divide the signal in the other direction (vertical) [37]. At the end of the decomposition process, four coefficients are obtained, they are also known as approximations (LL: low-low pass filters), verticals (HL: high-low pass filters), horizontals (LH: low-high pass filters) and diagonals (HH: high-high pass filters).

#### **3.2 Pooling**

The pooling layer is a subsampling technique used to reduce the size of the information. This method shirks a predefined region into one single value. The most used pooling methods are max-pooling [38] and average pooling [39]. Max-pooling takes the highest value of each region *R* of size of *rxr* using a stride of *SxS* and no padding,

*The Daubechies 6 Wavelet Lifting Scheme for a 1D Signal [35].*

#### **Figure 4.**

*Haar Wavelet Lifting scheme for 2D decomposes an image into four subbands. The subbands names are approximations (LL), horizontals (LH), verticals (HL), and diagonals (HH).*

until the entire image is reduced. In this study, both parameters *r* and *S* take the value of 2, these parameters need to be equal to avoid skipping any value, and thus ignore that piece of information.

The equation for max-pooling is described in Eq. (1).

8 >>>>>><

>>>>>>:

$$F\_{\max}(\mathbf{x}', \mathbf{y}') = \max\_{k=a \dots a+(r-1)} \{ I(k, l) \} \tag{1}$$

$$l = \{ 0, \ 1, \ 2, \ \dots, \ n \}, \ \mathbf{x}' = i + 1,$$

$$j = \{ 0, \ 1, \ 2, \ \dots, \ m \}, \ \mathbf{y}' = j + 1,$$

$$I(k, l) \mid \quad a = ((\mathbf{S} \* i) + 1), n \mid a \le (\mathbf{x} - (r - 1)),$$

$$b = ((\mathbf{S} \* j) + 1), m \mid b \le (y - (r - 1)),$$

$$\forall \ (k, l) \exists! (a, b), \quad \forall \ I(k, l) \in I(\mathbf{x}, \ y)$$

where *I* is the input image and ð Þ *x*, *y* its dimensions. *F max* is the obtained output and *x*<sup>0</sup> , *y*<sup>0</sup> ð Þ are indexes that point out to each element of the reduced information map. *I k*ð Þ , *l* is the pooling region *R* and the indexes ð Þ *k*, *l* indicate each value of the region. Additionally, ð Þ *a*, *b* are variables used to slide the region indexes through the entire images at a *S* pace, while ð Þ *n*, *m* prevents from overpassing the image size considering both *S* and *r*. Finally, ð Þ *i*, *j* are the iterations required to cover all possible regions in the image.

Similarly, to max-pooling, average pooling reduces the feature map by calculating the average value of each region. The equation for average pooling is shown in Eq. (2).

$$F\_{\text{avg}}(\mathbf{x}', \mathbf{y}') = \frac{1}{|I(k, l)|} \sum\_{l=1}^{l} \sum\_{k=1}^{k} I(k, l) \tag{2}$$

where *Favg x*<sup>0</sup> , *y*<sup>0</sup> ð Þ is the output, and j j *I k*ð Þ , *l* is the magnitude or number of elements in the pooling region. The elements of the pooling region are added by row and by column until all have been added, then the result is divided by the number of elements to obtain the average value of the entire region.

*Random Wavelet Coefficients Pooling for Convolutional Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.105162*

There are also two probabilistic pooling methods: mix pooling [40] and stochastic pooling [41]. Mix pooling takes either the maximum or the average value randomly during the training process. Mix pooling can select values the region or between channels. Mix pooling is described in Eq. (3).

$$F\_{\rm mix}(\mathbf{x}', \mathbf{y}') = \lambda F\_{\rm max}(\mathbf{x}', \mathbf{y}') + (\mathbf{1} - \lambda) F\_{\rm avg}(\mathbf{x}', \mathbf{y}') \tag{3}$$

where *λ* is a random value from the interval 0, 1 ½ �.

For the stochastic pooling, each value within the region is assigned with a probability, the higher the value, the higher the probability it gets. All probability values from the same region add up 100%. Then a value is selected based on the probabilities of the region. The Eq. (4) normalizes the value of each element within a pooling region to calculate their probabilities *p k*ð Þ , *l* .

$$p(k,l) = \frac{I(k,l)}{\sum^{l} \sum^{k} I(k,l)} \tag{4}$$

$$F\_{stch}(\mathbf{x}', \mathbf{y}') = P(p(k, l)) \tag{5}$$

where *P*ðÞ is a function that selects a sample from the multinomial distribution created with the probabilities obtained by the Eq. (4).
