**1. Introduction**

Detect and classify objects correctly is challenging, especially if they are cropped or occluded. Even hard weather conditions or poorly lit scenes constitute an obstacle to achieve the task. The Convolutional Neural Network (CNN) is one of the techniques with highest performance in this research field. However, the drawbacks previously mentioned are some of the reasons to constantly seeking to improve CNNs.

There is constant research exploring new technologies to improve the CNN performance. Since deep learning was first proposed there have been plenty of recommendations to increase CNN accuracy. They go from enlarging training datasets and reducing batches [1], to pretrained networks, skip connections [2], deeper architectures [3], and complex connections [4]. CNN can obtain huge amounts of features this is the main reason they are being used in many applications. The training stage is

where all key patterns are learned. This capability is constructed updating CNN parameters values, and the network accuracy improves by extracting more relevant patterns during training.

Nevertheless, increasing the effectiveness of CNNs architectures is not only about learning patterns, but extracting high quality features and moreover keeping the right ones. Therefore, feature extraction is still an open field to research. To address this challenge some approaches have focused on different feature extraction techniques to identify key patterns. For example, Farah Malik et al. [5] and Li et al. [6] utilized spectral analysis to analyze heart sounds, the first one uses wavelet transform to produce a scalograms to trains a CNN, while the second one uses Denoising Autoencoders to extract features from the spectrograms to do the same. In Haque and Mishu [7], Principal Component Analysis is used to extract a feature vector and reduce its dimensionality. Then the images are classified in a Multi-scale CNN. Statistical methods are also common, the work in Rao et al. [8], focused on detecting objects with camouflage. First, the image is divided into pieces of equal size named subblocks, each subblock is decomposed into two levels employing Discrete Wavelet Transform and Daubechies wavelet Db2. Then the coefficients obtained are used to extract features statistically. These features feed the final stage of the algorithm to detect and locate the objects. In general, the main idea is to do more with less.

Within the CNN there are several kinds of layers, and where most of the size reduction takes place is in the pooling layers [9, 10]. And according to Liu et al. [11, 12], the pooling process enlarges receptive field and decreases computational cost. But it is also in the dimension reduction stage where some attributes are lost. Traditional pooling methods like average, maximum, mixed, and stochastic are known to lose important details, specifically high-frequency patterns. Moreover, according to Fujieda et al. [13, 14], conventional CNN lost most of the spectral information. And they can be considered as a deficient form of Multi-Resolution Analysis (MRA). This problem can be overcome by incorporating a well-structured MRA like the Wavelet Transform (WT). The WT is particularly useful preserving details by capturing frequency and location features, according to Daubechies [15, 16]. CNN feature extraction process can be enriched directly from the inside to get more useful set of patterns, allowing less relevant features while reducing the map dimensionality (see, e.g., [17–21]). However, contrarily to the traditional methods it requires more computational resources.

One of the most popular methods used in CNN for size reduction is the maxpooling, but it presents some unsolved issues that need to be addressed, like lack of formality. Even though an equation describes how to apply the Max-pooling layer, it is an empirical solution within the CNN. Contrarily, the proposed random wavelet pooling can be presented as a formal mathematical model. It uses the Lifting Scheme, which is a second-generation wavelet transform, meets all these requirements, and fits inside the CNN as another layer. Formalizing the pooling method leads to a better understanding of its internal behavior, it helps to design improved models that optimize the process. In this sense, new models may reproduce and enhance the advantages of the process without repeating the disadvantages of the pooling methods.

Additionally, max-pooling behaves like a dynamic filter, but also like a random amplifier, both controlled indirectly by the signal. In many cases, this property helps to prevent the network from overfitting and increase its accuracy. But in some others, it reduces its effectiveness and prevents CNN from reaching high accuracy levels. Some studies have been focusing on this problem creating new pooling techniques. They focus on improving these layers by blending MRA techniques within the CNN layers to get useful patterns [22]. Following that idea, MRA can be seen as high and

## *Random Wavelet Coefficients Pooling for Convolutional Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.105162*

low-pass filters that decompose the signal into separated components that can be used to duplicate the max-pooling behavior.

The main contribution of this study is: a computational model that functions as a pooling layer within a CNN selecting signal components randomly as a dynamic filter independent from the signal features preventing overfitting and improving accuracy.

Other contributions are:


The organization of this paper is as follows. Section 2 presents some related work using wavelet transform with CNN models. Section 3 briefly explains the lifting scheme and the most common pooling methods. Section 4 describes the methodology and the proposed model. Section 5 reports the characteristics of the experiments and their results. Finally, Section 6 draws some conclusions.
