**2.2 Transit signal identification**

In our datasets, the possible transit signals have already been detected. To determine if these detections are real, we have used different machine learning models (i.e., artificial intelligence algorithms). Even more, we have occupied multiresolution analysis techniques to preprocess the light curves, and we have compared the performance of the machine learning models, using multiresolution analysis and without it. Multiresolution analysis techniques are used to obtain the different levels of resolution of a signal, in order to "look at it from different perspectives." This process is similar to using a microscope to observe small objects, at different magnification levels different details of these objects will be visible. An example of such a technique is wavelets. Wavelets are functions that grow and decay over a finite time interval (they are short waves, hence their name, wavelet). By varying the translation and dilation parameters of the wavelet, it is possible to localize a function in both position and scale. The wavelets are convolved with the signal in order to determine how much does a section of the signal resemble the wavelet. The wavelet equation is shown in Eq. (5).

**Figure 8.**

*Ensemble empirical mode decomposition technique.*

$$
\psi\_{\lambda,\tau}(\mathbf{u}) = \frac{1}{\sqrt{\lambda}} \,\,\psi\left(\frac{u-\tau}{\lambda}\right) \tag{5}
$$

where *ψ*ð Þ� is a function called mother wavelet, used to create several wavelets by varying the *λ*> 0 dilation parameter and *τ* translation parameter.

We have also used the empirical mode decomposition and ensemble empirical mode decomposition techniques. These multiresolution analysis techniques adaptively obtain intrinsic mode functions by iterating a process called *sifting*. In this process, the signal is separated into its different components. A description of these processes is shown in the diagrams from **Figures 7** and **8**. For a more detailed explanation of these techniques, refer to [13].

### **3. Results**

Several machine learning models were tested using these techniques to preprocess the light curves. The models tested were a convolutional neural network (CNN), different multilayer perceptron (MLP) architectures, least squares (LS), random forests (RF), Naïve Bayes, and a support vector machine (SVM) with the discrete wavelet transform. For the empirical mode decomposition and ensemble empirical mode decomposition techniques, we used a CNN, RF, K-nearest neighbors (*KNN*), and a Ridge classifier. Refer to [1, 13] for more details concerning these models and their configuration. In order to measure the performance of each model, we compared the models in terms of their accuracy and execution time. These metrics are based on the number of correctly classified exoplanets (true positives), and correctly classified nonexoplanets (false positives). The accuracy measures how many times the model was correct. The formula for this metric is presented in Eq. (6).

$$Accuracy = \frac{TP + TN}{TP + TN + FP + FN} \tag{6}$$

The accuracies obtained by the models that used the discrete wavelet transform with both datasets are presented in **Figures 9** and **10**, where the blue bars represent

**Figure 9.** *Accuracy results using the discrete wavelet transform in the Real-LC dataset.*

## *Exoplanet Research Using Machine Learning and Multiresolution Analysis Techniques DOI: http://dx.doi.org/10.5772/intechopen.99973*

### **Figure 10.**

*Accuracy results using the discrete wavelet transform in the 3-median dataset.*

### **Figure 11.**

*Execution time results using the discrete wavelet transform in the Real-LC dataset.*

the results obtained without using the discrete wavelet transform, and the orange ones are the results obtained using it. It is noticeable that in most cases, the accuracy is increased, or at least it does not decrease. Then, in **Figures 11** and **12**, the execution time results are presented. As it can be seen, the execution times are always reduced, and this is due to the downsampling property of the discrete wavelet transform. At each level of resolution, the length of the signal is reduced by half.

In **Figures 13** and **14**, the accuracy results of the empirical mode decomposition and its ensemble variant are presented. The blue bars, again, represent the signal without multiresolution preprocessing. The orange bars represent the results obtained using the empirical mode decomposition technique, and the gray bars represent the results obtained using the ensemble empirical mode decomposition

### **Figure 13.**

*Accuracy results using the empirical mode decomposition and ensemble empirical mode decomposition techniques in the Real-LC dataset.*

### **Figure 14.**

*Accuracy results using the empirical mode decomposition and ensemble empirical mode decomposition techniques in the 3-median dataset.*

## *Exoplanet Research Using Machine Learning and Multiresolution Analysis Techniques DOI: http://dx.doi.org/10.5772/intechopen.99973*

### **Figure 15.**

*Execution time results using the empirical mode decomposition and ensemble empirical mode decomposition techniques in the Real-LC dataset.*

### **Figure 16.**

*Execution time results using the empirical mode decomposition and ensemble empirical mode decomposition techniques in the 3-median dataset.*

technique. Finally, **Figures 15** and **16** show the execution times for these techniques. These figures demonstrate that in most cases, using these techniques increase the performance of the identification models, both in time and accuracy. The only case in which the execution time is severely affected by these techniques is with the CNN model. We attribute this to the fact that the data obtained several decimal positions after the sifting processes.
