**1. Introduction**

The term exoplanet is an abbreviation for extra-solar planets. Exoplanets are planets found outside our Solar System, either orbiting a star or not. Their study is important for several reasons, such as obtaining statistical information about planets, which in turn allows us to extend our understanding of how our Solar System was created. One of the reasons to study exoplanets is to look for habitable planets found outside the Solar System; which could lead to finding life outside planet Earth (although no evidence of life has yet been found in exoplanet atmospheres) [1]. In order to search for exoplanets, several missions have been launched. The Kepler [2, 3], Convection, Rotation and Planetary Transits space observatory (*CoRoT*) [4], and (*TESS*) [5] missions, are some examples of those missions.

In order to look for exoplanets, astronomers have developed different detection techniques. Among the most used are the transit method, radial velocity, gravitational microlensing, direct imaging, and others. In this work, we focus on the transit method. This method looks for transits, which occur when an exoplanet passes between the observer and its host star. To look for transits, scientists use light curves, which are records of the light flux received by the star at different moments in time. When an exoplanet transits its star, a reduction of the light flux characterized by a "U" or "V" shape is observed. This technique has provided the greatest number of exoplanet discoveries. But this technique is not infallible, and it is sensitive to noise sources that may look like transits or that hide the transit signal. In order to deal with these and other difficulties (see [1]), several artificial intelligence algorithms such as [6–12] have been developed. These approaches have the aim of ameliorating the detection and identification accuracy of exoplanet transit signals within the light curves.

In this work, we summarize the work done in [1, 13], where simulated light curves are used to test the performance of artificial intelligence and multiresolution analysis techniques for exoplanet identification.

### **2. Methodology**

Automating the exoplanet discovery process requires a pipeline that describes clear instructions for the artificial intelligence algorithms to work with. We have proposed a data pipeline in [1], that establishes the whole process of exoplanet discovery with artificial intelligence. This pipeline is shown in **Figure 1**. The data acquisition step refers to the process of obtaining the light curves to work with. These light curves may be obtained by real telescopes (such as the *Kepler* satellite), or by simulating them. The light curves contain different sources that difficult their analysis. For this reason, the next step is to preprocess the light curves in order to reduce the influence of noise in the light curves. With the transit signals already enhanced, the detection step may be performed by an artificial intelligence algorithm, to search for periodic signals within the light curves that could be explained by an exoplanet. Finally, it is required to analyze the periodic signals found, to make sure that they belong to an exoplanet, and not to an event of similar geometry. In the remaining of this section, we explain how we applied this pipeline to simulated light curves generated by us, to identify exoplanet signals.

**Figure 1.** *Proposed pipeline for exoplanet discovery.*

*Exoplanet Research Using Machine Learning and Multiresolution Analysis Techniques DOI: http://dx.doi.org/10.5772/intechopen.99973*

## **2.1 Light curve datasets creation**

We generated two simulated datasets consisting of 10,000 light curves each. For each dataset, half of the light curves contain simulated transits and the other half does not. Each light curve contains 15,000 datapoints. These datasets can be used to train and test machine learning algorithms for exoplanet identification with controlled, though realistic, noise sources. The presented work considers four different types of transit models. Furthermore, we explain the light curve preprocessing methodology that has been used by several works such as [6, 7, 14]. The first dataset, which is called the Real-LC dataset was generated using real light curves from the Mikulski Archive for Space Telescopes (*MAST*<sup>1</sup> ) with periodic events marked as nontransiting planets and then adding simulated transits to them. The second one is called the 3-median dataset, and it was created by simulating the light curves, and then adding the simulated transits. Next, it is described how the light curves were simulated.

There are several models that can be used to generate simulated transit light curves. Some examples of these models can be found in [15–19]. We used the BAsic Transit Model cAlculatioN (*BATMAN*) model proposed in [15], which is a python package based on several models such as [16, 20], and others. We selected this model because it uses the model proposed by [8], and it allows one to model light curves very fast. Also, it can be parallelized with the use of OpenMP (in case it was necessary to produce a greater number of samples), and it includes a wide variety of limb darkening models including the uniform, linear, quadratic, and nonlinear models which we used. Even more, it can generate secondary eclipses which are useful for accounting for these astrophysical false positive phenomena. An example of a simulated transit is presented in **Figure 2**, which was generated using the BATMAN nonlinear model.

In order to add noise, we used Eqs. (1)–(4) [7]. The generated noise adds quasiperiodic systematic trends to the simulated transit data.

$$\mathbf{t'} = \mathbf{t} - \mathbf{t}\_{\min} \tag{1}$$

**Figure 2.** *Example of a simulated transit light curve using the BATMAN nonlinear model.*

<sup>1</sup> Mikulski Archive for Space Telescopes (MAST): https://archive.stsci.edu/

$$A(t') = A + A \sin\left(\frac{2\pi t'}{P\_A}\right) \tag{2}$$

$$
\alpha(t') = \alpha + \alpha \sin\left(\frac{2\pi t'}{P\_{\alpha}}\right) \tag{3}
$$

$$F\_{transit}(t) \* N\left(\frac{R\_p}{R\_s}^2 / \sigma\_{tol}\right) \* \left(1 + A(t')\sin\left(\frac{2\pi t'}{o(t')} + \phi\right)\right) \tag{4}$$

where *Ftransit*ð Þ*t* is the simulated transit signal created by using BATMAN, *t* is time, *A* is the amplitude of the stellar variability, *ω* is the period of oscillation, *ϕ* is the phase shift, *Rp* is the planet radius, *Rs* is the star radius, *σtol* is the noise parameter, and *N* is a Gaussian distribution to generate random numbers with a mean of 1 and standard deviation of *Rp* 2 *Rs* <sup>2</sup> *<sup>=</sup>σtol* as explained in [7]. Each dataset contains two types of light curves, namely light curves containing a transit and light curves that do not contain a transit. Notice that for generating light curves without the transit signal, the *Ftransit*ð Þ*t* of Eq. (4) is omitted.

The parameters used to simulate the transits are presented in **Table 1**. These parameters were chosen from a list of 140 real exoplanets presented in the Q1-Q17 Kepler Data Release 24 [11], which were discovered using the transit method. In **Table 2**, the parameters used to simulate the noisy light curves are presented.

After simulating the light curves, they can be preprocessed in order to accentuate the transits and to reduce the noise sources. We used the spline fitting method proposed in [6] to preprocess the Real-LC light curves, and a 3-median filter was applied to the 3-median dataset. This process is also called flattening and it is


**Table 1.** *Simulated transit parameters.* *Exoplanet Research Using Machine Learning and Multiresolution Analysis Techniques DOI: http://dx.doi.org/10.5772/intechopen.99973*


### **Table 2.**

*Noisy light curve simulation parameters.*

**Figure 3.** *Simulated light curve using synthetic noise and the BATMAN model.*

performed to remove confusing data from the light curve. An example of a simulated light curve is presented in **Figure 3**. In this figure, each vertical blue line represents a transit, and the red line represents the mid-transit time from the first transit present in the light curve. This light curve consists of 15,000 simulated datapoints; which were added with a transit signal simulated using the BATMAN model.

The next step is to phase fold the light curve to overlap all the points in the light using the transit event as the center. We used the *PyAstronomy* python package.<sup>2</sup> An example of a folded light curve is presented in **Figure 4**. There is a major dim in the light flux in the middle of the light curve, which corresponds to the transit. Also, there are other sources that could belong to another transit within the same light curve, although these are not centered because they do not correspond to the event that is being analyzed in this example.

Finally, the binning step allows one to reduce the dimensionality of the dataset by grouping the values in a limited number of bins. **Figure 5** explains the construction of one bin: the bins are created by calculating the mean of all the *n* points found inside a bin. We used 2048 bins; in other words, the length of the light curves is reduced from 15,000 to 2048 datapoints, and each bin is then represented by the

<sup>2</sup> PyAstronomy package: https://www.hs.uni-hamburg.de/DE/Ins/Per/Czesla/PyA/PyA/index.html

**Figure 4.** *Phase folded light curve.*

**Figure 5.** *Binned process.*

mean of all the values inside that bin. An example of a binned light curve can be seen in **Figure 6**, this is the same light curve as the one presented in **Figure 4**, with the difference that it is now binned.

**80**

*Exoplanet Research Using Machine Learning and Multiresolution Analysis Techniques DOI: http://dx.doi.org/10.5772/intechopen.99973*
