Periodogram Analysis under the Popper-Bayes Approach

*George Caminha-Maciel*

### **Abstract**

In this chapter, we discuss the use of the Lomb-Scargle periodogram, its advantages, and pitfalls on a geometrical rather than statistical point of view. It means emphasizing more on the transformation properties of the finite sampling – the available data – rather than on the ensemble properties of the assumed model statistical distributions. We also present a brief overview and criticism of recent literature on the subject and its new developments. The whole discussion is under the geophysical inverse theory point of view, the Tarantola's combination of information or the so-called Popper-Bayes approach. This approach has been very successful in dealing with large ill-conditioned, or under-determined complex problems. In the case of periodogram analysis, this approach allows us to manage more naturally the experimental data distributions and its anomalies (uncorrelated noise, sampling artifacts, windowing, aliasing, spectral leakage, among others). Finally, we discuss the **Lomb-Scargle-Tarantola (LST) periodogram**: an estimator of spectral content existing in irregularly sampled time series that implements these principles.

**Keywords:** Lomb-Scargle, periodogram, irregular sampling, inverse theory, spectral analysis, cyclostratigraphy, paleoclimatology, pattern recognition

### **1. Introduction**

Although being old, periodogram analysis until nowadays represents the main workhorse for the studying of irregularly sampled time series from a vast majority of scientific branches. Since its introduction more than a century ago by A. Schuster, the periodogram has evolved and gained widespread use, even if sometimes without a complete understanding of its more subtle aspects. Its popularity comes from its relatively simple statistical behavior, easy implementation, and easy interpretation of the results. In summary, the Lomb-Scargle periodogram is an estimation method that emulates the power of Fourier decomposition – in a case when it is not possible to apply it – for data series irregularly sampled in time.

Being unnecessary to advocate for Fourier analysis, we want here to remember one of its main advantages – its *simplicity*. Fourier basis, sines and cosines, are the most basic functions that exhibit periodic behavior. Then it is very natural to use a Fourier basis to compare and detect periodic patterns in experimental data.

Usually applied in areas as diverse as astronomy, biology, meteorology, oceanography, and cyclostratigraphy [1–7], the Lomb-Scargle periodogram has not, however, a unique direct rule of use. We should always consider the subtle

differences in the time series from each of these areas to explore better the spectral content in the data set and to improve understanding of the results.

To solve this class of problems, geophysicists developed statistical techniques collectively called *inverse theory*. In the inverse theory, we deal with two main difficulties: first, to find at least one model that satisfies the measurements; second, to qualify the set of obtained models. Usually, we find the **best model** approach to attack this problem – where we pick a model from a subset of possibilities by maximizing some kind of measure over the whole set of possibilities. After that, we

This approach is favored by statisticians and can be mathematically formalized. Generally, it works well when the involved variables, data and noise, follow regular statistical distributions. However, in the case of very irregularly sampled short time

A Bayesian attack on the problem would be by posing the following question: how does this newly obtained data set modify our previous knowledge about the

Furthermore, it means to think about what we already know about the system – usually put in the form of probability distributions on the dynamical variables – and how to incorporate new information through the means of the constitutive equations on the dynamics. After what, we arrive on the *a posteriori* distributions (or

Here we present a more radical idea of physical inference, which is called the Popper-Bayes approach, which departs entirely from the idea of finding the best model, the mean model, or the maximum likelihood model. Professor Albert Tarantola started this idea: observations might not be used to produce models; they

1.Using the available *a priori* information to create all possible models on the

2.For each model, solve the direct problem – assuming as true, calculate a measure (or probability) for this model in comparison with the actual

3.Use some criteria to define which models are acceptable based on these measures (or probabilities) and the physical theory on the system. The

4.The set of surviving models constitute **the solution** for the physical inference. Uncertainties on these models should consider the properties of these *a*

Then we have a natural interpretation of multi-modal probability distributions or ill-defined final models – what is very useful in periodogram analysis of unevenly

What are the needed assumptions for properly analyzing a continuous signal by

He proposed that physical inference could be set in principle as:

system – potentially an infinite number of them;

unacceptable models should be dropped or falsified;

*posteriori* distributions over the variables subspace.

**3. Periodogram analysis of irregularly sampled time series**

the Fourier method through a discrete sampling?

attach some uncertainty to the best model chosen.

*Periodogram Analysis under the Popper-Bayes Approach*

*DOI: http://dx.doi.org/10.5772/intechopen.93162*

series, this assumption departs from reality.

**2.1 Progressing by falsification**

models) "assimilating" the new data.

observations;

sampled times series.

**79**

should be used only to falsify models [18–20].

system?

There are fundamental questions that still permeate the whole subject of periodogram analysis:


In this chapter, we discuss these questions and present a Popper-Bayes point of view of the periodogram, comparing it with the more traditional approach. See [15] for a comprehensive review (see also https://jakevdp.github.io/blog/2017/03/30/ practical-lomb-scargle/). The traditional approach of the Lomb-Scargle method was developed mainly by astronomers and was adapted to applying to the characteristics of the astronomical data. Here we point out that the techniques devoted to astronomy are non-unique and not necessarily appropriate to other areas. We also show examples from cyclostratigraphy, our subject of study, which has some typical sampling anomalies.

Then we will present the **Lomb-Scargle-Tarantola (LST) periodogram** [16, 17], a more general technique for the use of the periodogram. The LST periodogram applies the Popper-Bayes perspective to the periodogram of irregularly sampled time series: incorporate the *a priori* variance (sampled data with all of its anomalies) directly into the *a posteriori* (periodogram) variance, and analyze the ill-defined, possibly multi-modal, complex obtained distribution.

### **2. The Popper-Bayes approach to inverse problems**

In geophysics, it is usual to deal with high-dimensional and ill-conditioned problems. This happens because geophysicists are always trying to understand and image subsurface structures with data generally obtained from the surface. Furthermore, the measurements we want to interpret are, in general, indirectly related to the structure we want to model.

In gravimetry, for example, we get measurements of the gravitational field at a spatial grid on the surface and try to figure out what possible subsurface density anomalies could produce the observed gravitational field anomaly. This problem is highly ill-conditioned since an infinite number of configurations of subsurface bodies and density contrasts could originate the same set of field measurements at the surface. It cannot be solved without adding an *a priori* information or some kind of regularization.

*Periodogram Analysis under the Popper-Bayes Approach DOI: http://dx.doi.org/10.5772/intechopen.93162*

To solve this class of problems, geophysicists developed statistical techniques collectively called *inverse theory*. In the inverse theory, we deal with two main difficulties: first, to find at least one model that satisfies the measurements; second, to qualify the set of obtained models. Usually, we find the **best model** approach to attack this problem – where we pick a model from a subset of possibilities by maximizing some kind of measure over the whole set of possibilities. After that, we attach some uncertainty to the best model chosen.

This approach is favored by statisticians and can be mathematically formalized. Generally, it works well when the involved variables, data and noise, follow regular statistical distributions. However, in the case of very irregularly sampled short time series, this assumption departs from reality.

### **2.1 Progressing by falsification**

differences in the time series from each of these areas to explore better the spectral

• What are the necessary conditions on the original continuous function for the

• What is the relationship between the Lomb-Scargle periodogram [8–14] and

• What is the appropriate discrete domain of frequency numbers for which an irregularly sampled time series has information? What is the minimum frequency allowable? Is there a maximum frequency (Nyquist limit) to the analysis of an unevenly sampled time series, and which it would be? What is

• What is the source of the several spurious peaks arising in the periodogram,

In this chapter, we discuss these questions and present a Popper-Bayes point of view of the periodogram, comparing it with the more traditional approach. See [15] for a comprehensive review (see also https://jakevdp.github.io/blog/2017/03/30/ practical-lomb-scargle/). The traditional approach of the Lomb-Scargle method was developed mainly by astronomers and was adapted to applying to the characteristics of the astronomical data. Here we point out that the techniques devoted to astronomy are non-unique and not necessarily appropriate to other areas. We also show examples from cyclostratigraphy, our subject of study, which has some typical

Then we will present the **Lomb-Scargle-Tarantola (LST) periodogram** [16, 17], a more general technique for the use of the periodogram. The LST periodogram applies the Popper-Bayes perspective to the periodogram of

**2. The Popper-Bayes approach to inverse problems**

irregularly sampled time series: incorporate the *a priori* variance (sampled data with all of its anomalies) directly into the *a posteriori* (periodogram) variance, and analyze the ill-defined, possibly multi-modal, complex obtained distribution.

In geophysics, it is usual to deal with high-dimensional and ill-conditioned problems. This happens because geophysicists are always trying to understand and image subsurface structures with data generally obtained from the surface. Furthermore, the measurements we want to interpret are, in general, indirectly related

In gravimetry, for example, we get measurements of the gravitational field at a spatial grid on the surface and try to figure out what possible subsurface density anomalies could produce the observed gravitational field anomaly. This problem is highly ill-conditioned since an infinite number of configurations of subsurface bodies and density contrasts could originate the same set of field measurements at the surface. It cannot be solved without adding an *a priori* information or some kind

besides the original peak on the proper periodicity frequencies?

• What is the uncertainty in the frequency of the periodicity found?

There are fundamental questions that still permeate the whole subject of

*Real Perspective of Fourier Transforms and Current Developments in Superconductivity*

content in the data set and to improve understanding of the results.

periodogram to analyze the irregularly sampled time series?

the Discrete Fourier Transform (DFT)?

the proper density of frequency points?

periodogram analysis:

sampling anomalies.

to the structure we want to model.

of regularization.

**78**

A Bayesian attack on the problem would be by posing the following question: how does this newly obtained data set modify our previous knowledge about the system?

Furthermore, it means to think about what we already know about the system – usually put in the form of probability distributions on the dynamical variables – and how to incorporate new information through the means of the constitutive equations on the dynamics. After what, we arrive on the *a posteriori* distributions (or models) "assimilating" the new data.

Here we present a more radical idea of physical inference, which is called the Popper-Bayes approach, which departs entirely from the idea of finding the best model, the mean model, or the maximum likelihood model. Professor Albert Tarantola started this idea: observations might not be used to produce models; they should be used only to falsify models [18–20].

He proposed that physical inference could be set in principle as:


Then we have a natural interpretation of multi-modal probability distributions or ill-defined final models – what is very useful in periodogram analysis of unevenly sampled times series.

### **3. Periodogram analysis of irregularly sampled time series**

What are the needed assumptions for properly analyzing a continuous signal by the Fourier method through a discrete sampling?

First of all, it has to be a *single function* of the time variable *t* (*t* can also be a spatial variable). It means that the signal needs to be a unique sequence of values ð Þ *x t*ð Þ, *t* , where, for each *t*, there is one and only one assigned value *x t*ð Þ. Besides that, the values *x t*ð Þ cannot "explode" (as the exponential function) – they all have to be limited by two real numbers: a maximum and a minimum. Furthermore, the function *x t*ð Þ cannot oscillate too fast; it has to be relatively smooth. This last condition means that the function cannot have "jumps"; in Fourier analysis, this would mean that the function has limited informational content or is "bandlimited." Our discussion here uses real-valued functions, but the same ideas can be easily extended for complex-valued cases.

Lomb [9] and Scargle [10] addressed the problem of finding a generalized form

• Reduces to the classical form in the case of uniformly sampled time series;

The classical periodogram is very noisy even for time series only slightly noisy. The Scargle's modified periodogram, called *Lomb-Scargle periodogram*, is much smoother and differs from the classical periodogram in at least two aspects:

> *ω tj* � *τ* � � � � and <sup>P</sup>

> > P

P

With this formulation, the periodogram of uncorrelated irregularly spaced Gaussian noise is also exponentially distributed (sum of squares of two zero-mean

The Lomb-Scargle periodogram is equivalent to the least-squares fitting of a sinusoidal model to the data at each frequency *ω*. The Lomb-Scargle periodogram

> *y <sup>j</sup>* � *y tj*;*ω* � � � �

P *j*

*<sup>j</sup>* sin 2*ωtj* � �

*<sup>j</sup>* cos 2*ωtj*

*y t*ð Þ¼ ; *ω A<sup>ω</sup>* sin ð Þ *ω*ð Þ *t* � *ϕω* (5)

2

P *<sup>j</sup>* sin <sup>2</sup>

terms in periodogram. These denominators differ from *N*0*=*2, which is the expected value in the limiting case of complete phase sampling at each

� � and cos*ωtj*

*<sup>j</sup>* cos <sup>2</sup>*<sup>ω</sup> tj* � *<sup>τ</sup>* � � � � to the

> 9 >=

> >;

(3)

(6)

*<sup>j</sup>* sin*ωtj* cos*ωtj* � �. It is an

*<sup>x</sup> <sup>j</sup>* sin*<sup>ω</sup> tj* � *<sup>τ</sup>* � � h i<sup>2</sup>

*<sup>ω</sup> tj* � *<sup>τ</sup>* � �

� � (4)

� �. In

• It adds a time-shift term *τ*. This time-shift is calculated to minimize independence between the two trigonometric basis sin*ωtj*

*<sup>j</sup>* sin <sup>2</sup>

*<sup>x</sup> <sup>j</sup>* cos*<sup>ω</sup> tj* � *<sup>τ</sup>* � � h i<sup>2</sup>

arctan 2ð Þ¼ *ωτ*

*<sup>j</sup>* cos <sup>2</sup>*<sup>ω</sup> tj* � *<sup>τ</sup>* � � <sup>þ</sup>

of periodogram in order to:

• Its statistics are computable;

• It adds the denominators P

*PX*ð Þ¼ *<sup>ω</sup>* <sup>1</sup>

where *τ* is given by

Gaussian variables).

**81**

2

• It is invariant to global time-shifts in the series.

*Periodogram Analysis under the Popper-Bayes Approach*

*DOI: http://dx.doi.org/10.5772/intechopen.93162*

other words, it minimizes the crossing term P

frequency (as in the uniformly sampled series).

The Lomb-Scargle periodogram is given by

8 ><

>:

*3.1.1 The least-squares periodogram and its extensions*

The χ<sup>2</sup> goodness-of-fit can be defined as

power relates to the <sup>χ</sup><sup>2</sup>ð Þ *<sup>ω</sup>* goodness-of-fit, at the frequency *<sup>ω</sup>*. Let us consider a sinusoidal model at the frequency *ω*,

χ2

ð Þ� *<sup>ω</sup>* <sup>X</sup> *j*

P *j*

P

attempt to improve orthogonality in the equations.

The Fourier transform is a mathematical tool that for which a real function *x t*ð Þ relates another, a complex function – *X f* ð Þ. As a complex number, each *X f* ð Þ value can be described by a pair of real numbers – an amplitude *A f* ð Þ and a phase θð Þ*f* . These values *A f* ð Þ represent the relative importance of each time scale T ð Þ *f* ¼ 1*=T* . The squared values *A*<sup>2</sup> ð Þ*f* are proportional to the *relative energy* of each of the frequencies *f* in the signal. The function that gives the relative energy of each of these frequency components is called *power spectral density (PSD) PX*ð Þ*f* , where *PX*ð Þ¼ *f A*2 ð Þ*f* . It is usual, in the literature, to find these expressions in terms of the variable *ω*, called *angular frequency*, instead of *f*, where *ω* ¼ 2*πf*.

### **3.1 The periodogram**

The Fourier transform has an analytical form applicable to continuous functions and also has a discrete form applicable to discrete functions – as the sampled time series we intend to study. That one is called *Discrete Fourier Transform (DFT)*. Being computationally intensive, particularly for large data sets, the DFT has a fast implementation called *Fast Fourier Transform (FFT)* algorithm. The FFT algorithm dramatically reduces the time and the computational cost of calculating Fourier transforms for real-time series. It is worth mentioning that if the time series is irregularly sampled, the FFT algorithm cannot be applied.

From the DFT, we can obtain estimates for the PSD of experimental real-time series. The statistics that gives an estimation of the relative energy among the different frequencies present in a signal is called the *periodogram*.

The classical periodogram is simply the squared modulus of the DFT. In the exponential form, it can be written as

$$P\_X(\boldsymbol{\alpha}) = \frac{1}{N\_0} \left| \sum\_j \left( \mathbf{x}\_j \cdot \boldsymbol{e}^{-i\boldsymbol{a}t\_j} \right) \right|^2 \tag{1}$$

Or in the trigonometric (equivalent) form as

$$P\_X(\boldsymbol{\omega}) = \frac{1}{N\_0} \left[ \left( \sum\_j \boldsymbol{\omega}\_j \cos \boldsymbol{\omega} t\_j \right)^2 + \left( \sum\_j \boldsymbol{\omega}\_j \sin \boldsymbol{\omega} t\_j \right)^2 \right] \tag{2}$$

where *N*<sup>0</sup> is the number of data points in the time series.

One main statistical property of the classical periodogram is that for a time series constituted solely of evenly spaced Gaussian noise, the values of the periodogram are exponentially distributed. Unfortunately, when the time series is irregularly sampled in time, this property no longer holds. This statistical behavior also only applies to observations of uncorrelated white noise.

Lomb [9] and Scargle [10] addressed the problem of finding a generalized form of periodogram in order to:


First of all, it has to be a *single function* of the time variable *t* (*t* can also be a spatial variable). It means that the signal needs to be a unique sequence of values ð Þ *x t*ð Þ, *t* , where, for each *t*, there is one and only one assigned value *x t*ð Þ. Besides that, the values *x t*ð Þ cannot "explode" (as the exponential function) – they all have to be limited by two real numbers: a maximum and a minimum. Furthermore, the function *x t*ð Þ cannot oscillate too fast; it has to be relatively smooth. This last condition means that the function cannot have "jumps"; in Fourier analysis, this would mean that the function has limited informational content or is "bandlimited." Our discussion here uses real-valued functions, but the same ideas can be easily

*Real Perspective of Fourier Transforms and Current Developments in Superconductivity*

The Fourier transform is a mathematical tool that for which a real function *x t*ð Þ relates another, a complex function – *X f* ð Þ. As a complex number, each *X f* ð Þ value can be described by a pair of real numbers – an amplitude *A f* ð Þ and a phase θð Þ*f* . These values *A f* ð Þ represent the relative importance of each time scale T ð Þ *f* ¼ 1*=T* .

frequencies *f* in the signal. The function that gives the relative energy of each of these frequency components is called *power spectral density (PSD) PX*ð Þ*f* , where *PX*ð Þ¼ *f*

ð Þ*f* . It is usual, in the literature, to find these expressions in terms of the variable

The Fourier transform has an analytical form applicable to continuous functions and also has a discrete form applicable to discrete functions – as the sampled time series we intend to study. That one is called *Discrete Fourier Transform (DFT)*. Being computationally intensive, particularly for large data sets, the DFT has a fast implementation called *Fast Fourier Transform (FFT)* algorithm. The FFT algorithm dramatically reduces the time and the computational cost of calculating Fourier transforms for real-time series. It is worth mentioning that if the time series is

From the DFT, we can obtain estimates for the PSD of experimental real-time series. The statistics that gives an estimation of the relative energy among the

The classical periodogram is simply the squared modulus of the DFT. In the

X *j*

� � � � �

*x <sup>j</sup>* cos*ωtj* !<sup>2</sup>

One main statistical property of the classical periodogram is that for a time series constituted solely of evenly spaced Gaussian noise, the values of the periodogram are exponentially distributed. Unfortunately, when the time series is irregularly sampled in time, this property no longer holds. This statistical behavior also only

*x <sup>j</sup>* � *e* �*iωt <sup>j</sup>* � �

!<sup>2</sup> 2

<sup>þ</sup> <sup>X</sup> *j*

� � � � �

2

*x <sup>j</sup>* sin*ωtj*

3

5 (2)

(1)

ð Þ*f* are proportional to the *relative energy* of each of the

extended for complex-valued cases.

*ω*, called *angular frequency*, instead of *f*, where *ω* ¼ 2*πf*.

irregularly sampled, the FFT algorithm cannot be applied.

exponential form, it can be written as

different frequencies present in a signal is called the *periodogram*.

*PX*ð Þ¼ *<sup>ω</sup>* <sup>1</sup>

X *j*

where *N*<sup>0</sup> is the number of data points in the time series.

Or in the trigonometric (equivalent) form as

*N*<sup>0</sup>

applies to observations of uncorrelated white noise.

4

*PX*ð Þ¼ *<sup>ω</sup>* <sup>1</sup>

*N*<sup>0</sup>

The squared values *A*<sup>2</sup>

**3.1 The periodogram**

*A*2

**80**

• It is invariant to global time-shifts in the series.

The classical periodogram is very noisy even for time series only slightly noisy. The Scargle's modified periodogram, called *Lomb-Scargle periodogram*, is much smoother and differs from the classical periodogram in at least two aspects:


The Lomb-Scargle periodogram is given by

$$P\_X(\boldsymbol{\omega}) = \frac{1}{2} \left\{ \frac{\left[\sum\_j \boldsymbol{\mathcal{X}}\_j \cos \boldsymbol{\omega} \left(\boldsymbol{t}\_j - \boldsymbol{\tau}\right)\right]^2}{\sum\_j \cos^2 \boldsymbol{\omega} \left(\boldsymbol{t}\_j - \boldsymbol{\tau}\right)} + \frac{\left[\sum\_j \boldsymbol{\mathcal{X}}\_j \sin \boldsymbol{\omega} \left(\boldsymbol{t}\_j - \boldsymbol{\tau}\right)\right]^2}{\sum\_j \sin^2 \boldsymbol{\omega} \left(\boldsymbol{t}\_j - \boldsymbol{\tau}\right)} \right\} \tag{3}$$

where *τ* is given by

$$\arctan\left(2\alpha\pi\right) = \frac{\left(\sum\_{j} \sin 2\alpha t\_j\right)}{\left(\sum\_{j} \cos 2\alpha t\_j\right)}\tag{4}$$

With this formulation, the periodogram of uncorrelated irregularly spaced Gaussian noise is also exponentially distributed (sum of squares of two zero-mean Gaussian variables).

### *3.1.1 The least-squares periodogram and its extensions*

The Lomb-Scargle periodogram is equivalent to the least-squares fitting of a sinusoidal model to the data at each frequency *ω*. The Lomb-Scargle periodogram power relates to the <sup>χ</sup><sup>2</sup>ð Þ *<sup>ω</sup>* goodness-of-fit, at the frequency *<sup>ω</sup>*.

Let us consider a sinusoidal model at the frequency *ω*,

$$y(t; \boldsymbol{\alpha}) = A\_{\boldsymbol{\alpha}} \sin \left( \boldsymbol{\alpha} (t - \boldsymbol{\phi}\_{\boldsymbol{\alpha}}) \right) \tag{5}$$

The χ<sup>2</sup> goodness-of-fit can be defined as

$$\chi^2(o) \equiv \sum\_{j} \left( y\_j - \mathfrak{y}(t\_j; o) \right)^2 \tag{6}$$

We can find the "best" model ^*y t*ð Þ ; *<sup>ω</sup>* by minimizing <sup>χ</sup>2. Let ^χ<sup>2</sup> be the minimum and *A*^*ω*; *ϕ*^*<sup>ω</sup>* � � the optimal value, then we can write

$$P\_{LS}(\boldsymbol{\alpha}) \sim \hat{\mathbf{A}}\_{\boldsymbol{\alpha}}^{2} \tag{7}$$

In the Fourier transform, the windowing replaces each Dirac delta function at some frequency *ω<sup>i</sup>* with a *sinc* function centered at that same frequency *ωi*, in the Fourier transform. This behavior is a direct consequence of the inverse relationship

An infinite sin ð Þ *ωit* function runs from �∞ to þ∞ and has two delta functions at �*ω<sup>i</sup>* as its Fourier transform. The finite signal, the windowing version, has a broader transform – the *sinc* function. The *sinc* function, besides having a broader and lower *central peak* (delta function has infinite height), also has *side lobes*. It spreads the power at the frequency *ω<sup>i</sup>* to the adjacent frequencies. This phenomenon is called *spectral leakage*, and it is more pronounced as the shorter is the

There is another essential aspect of spectral windows: its smoothness. The more

Though we made this discussion about the spectrum power, the Fourier transform squared modulus, it equally applies to its actual estimate from data – the

Recommendations for the choice of each of these parameters vary in the literature. Here we discuss some points to consider in the case of regularly as well as to

**Minimum frequency:** The minimum frequency, *f min*, is the easiest to define. It relates to the largest period of a wave we can investigate in the time series. We usually set it as the inverse of *T* – the length of the time series, or as zero – where its

**Maximum frequency:** The maximum frequency, *f max*, represents the shortest period of a wave we can investigate in the time series. For evenly sampled time series, the *Nyquist theorem or Sampling theorem* defines this maximum frequency –

Putting in another way, the theorem says that to fully represent the content of a band-limited signal whose Fourier transform is zero outside the range of �*B*, we must sample the signal with a rate at least *f <sup>δ</sup>* ¼ 2*B*. Then, for evenly sampled time

This theorem states that if we have a regularly sampled function with the sampling rate of *f <sup>δ</sup>* ¼ 1*=δt*, we can only recover full frequency information if the signal is *band-limited* between frequencies *f <sup>δ</sup>=*2. This theorem states that if we have a regularly sampled function with the sampling rate of *f <sup>δ</sup>* ¼ 1*=δt*, we can only recover full frequency information if the signal is *band-limited* between frequencies

abrupt (less smooth) is the window, the more spectral leakage happens in the Fourier transform, lowering the central peak and heightening the side lobes. Instead of a rectangular function, we can use a smoother function, like a sin bell function,

There are three parameters to consider when choosing the appropriate

frequency grid for periodogram analysis of a particular time series:

for example, and the resulting spectrum will exhibit much less leakage.

between the time window width and the width of its Fourier transform.

duration of the time series (narrower window).

*Periodogram Analysis under the Popper-Bayes Approach*

*DOI: http://dx.doi.org/10.5772/intechopen.93162*

*3.2.2 Frequencies for periodogram analysis*

1.The minimum frequency, *f min*;

2.The maximum frequency, *f max*;

value virtually equals to the frequency spacing, Δ*f*.

3.The frequency spacing, Δ*f*.

irregularly sampled time series.

called *Nyquist frequency*, *f Nyquist*.

series, *f max* ¼ *f Nyquist* ¼ *f <sup>δ</sup>=*2.

�*f <sup>δ</sup>=*2.

**83**

periodogram.

For data sets with errors, we can consider introducing them into the periodogram. Vio and others [16, 17] studied a more general model, including a *N* � *N* error covariance matrix Σ, for *N* timely observations.

$$\chi^2(w) = \left(\overrightarrow{\mathcal{Y}} - \overrightarrow{\mathcal{Y}}\_{model}\right) \Sigma^{-1} \left(\overrightarrow{\mathcal{Y}} - \overrightarrow{\mathcal{Y}}\_{model}\right) \tag{8}$$

In the case of uncorrelated zero-mean colored noise, this expression reduces to

$$\chi^2(o) = \sum\_{j} \left( \frac{\wp\_j - \wp(t\_j; o)}{\sigma\_j^2} \right)^2 \tag{9}$$

where *σ*<sup>2</sup> *<sup>j</sup>* are the gaussian errors.

For practical applications, there are some additional issues to consider when introducing data errors in the periodogram calculations, such as unaccountable uncertainties in error estimates, correlated noise, and the dependence in the signal slope. All of that makes the use of error estimates not very advisable [13].

### **3.2 Periodograms and significance**

The Lomb-Scargle periodogram keeps most of the optimal analytical properties of the Fourier transform and its power spectrum:


### *3.2.1 Spectral windows*

The pointwise product of the underlying infinite periodic signal with a rectangular window function usually describes the observed signal; its length is the time series duration *T*.

*Periodogram Analysis under the Popper-Bayes Approach DOI: http://dx.doi.org/10.5772/intechopen.93162*

We can find the "best" model ^*y t*ð Þ ; *<sup>ω</sup>* by minimizing <sup>χ</sup>2. Let ^χ<sup>2</sup> be the minimum

*PLS*ð Þ� *<sup>ω</sup> <sup>A</sup>*^<sup>2</sup>

Σ�<sup>1</sup> *y*

*<sup>y</sup> <sup>j</sup>* � *y tj*;*<sup>ω</sup>* � � *σ*2 *j*

!<sup>2</sup>

In the case of uncorrelated zero-mean colored noise, this expression reduces to

For practical applications, there are some additional issues to consider when introducing data errors in the periodogram calculations, such as unaccountable uncertainties in error estimates, correlated noise, and the dependence in the signal

The Lomb-Scargle periodogram keeps most of the optimal analytical properties

• The periodogram of a pure sinusoidal at *ω*<sup>0</sup> is a sum of the Dirac delta functions

• The periodogram, just as the power spectrum, is insensitive to translations in

• It is a real-valued even function (that is the reason why we only calculate the

quantum mechanics) applies: a narrow feature in time becomes a broader peak

The pointwise product of the underlying infinite periodic signal with a rectangular window function usually describes the observed signal; its length is the time

• The "Heisenberg uncertainty principle" of Fourier transforms (usual in

slope. All of that makes the use of error estimates not very advisable [13].

! � *y* ! *model* � �

For data sets with errors, we can consider introducing them into the periodogram. Vio and others [16, 17] studied a more general model, including a

*Real Perspective of Fourier Transforms and Current Developments in Superconductivity*

! � *y* ! *model* � �

ð Þ¼ *<sup>ω</sup>* <sup>X</sup> *j*

*<sup>ω</sup>* (7)

(8)

(9)

the optimal value, then we can write

*N* � *N* error covariance matrix Σ, for *N* timely observations.

ð Þ¼ *ω y*

χ2

χ2

*<sup>j</sup>* are the gaussian errors.

of the Fourier transform and its power spectrum:

time (only reflects on the phase spectrum);

• The transform of a gaussian is another (different) gaussian;

**3.2 Periodograms and significance**

and *A*^*ω*; *ϕ*^*<sup>ω</sup>* � �

where *σ*<sup>2</sup>

• Linearity;

at ð Þ �*ω*<sup>0</sup> ;

positive part);

*3.2.1 Spectral windows*

series duration *T*.

**82**

in frequency and vice-versa.

In the Fourier transform, the windowing replaces each Dirac delta function at some frequency *ω<sup>i</sup>* with a *sinc* function centered at that same frequency *ωi*, in the Fourier transform. This behavior is a direct consequence of the inverse relationship between the time window width and the width of its Fourier transform.

An infinite sin ð Þ *ωit* function runs from �∞ to þ∞ and has two delta functions at �*ω<sup>i</sup>* as its Fourier transform. The finite signal, the windowing version, has a broader transform – the *sinc* function. The *sinc* function, besides having a broader and lower *central peak* (delta function has infinite height), also has *side lobes*. It spreads the power at the frequency *ω<sup>i</sup>* to the adjacent frequencies. This phenomenon is called *spectral leakage*, and it is more pronounced as the shorter is the duration of the time series (narrower window).

There is another essential aspect of spectral windows: its smoothness. The more abrupt (less smooth) is the window, the more spectral leakage happens in the Fourier transform, lowering the central peak and heightening the side lobes. Instead of a rectangular function, we can use a smoother function, like a sin bell function, for example, and the resulting spectrum will exhibit much less leakage.

Though we made this discussion about the spectrum power, the Fourier transform squared modulus, it equally applies to its actual estimate from data – the periodogram.

### *3.2.2 Frequencies for periodogram analysis*

There are three parameters to consider when choosing the appropriate frequency grid for periodogram analysis of a particular time series:

1.The minimum frequency, *f min*;

2.The maximum frequency, *f max*;

3.The frequency spacing, Δ*f*.

Recommendations for the choice of each of these parameters vary in the literature. Here we discuss some points to consider in the case of regularly as well as to irregularly sampled time series.

**Minimum frequency:** The minimum frequency, *f min*, is the easiest to define. It relates to the largest period of a wave we can investigate in the time series. We usually set it as the inverse of *T* – the length of the time series, or as zero – where its value virtually equals to the frequency spacing, Δ*f*.

**Maximum frequency:** The maximum frequency, *f max*, represents the shortest period of a wave we can investigate in the time series. For evenly sampled time series, the *Nyquist theorem or Sampling theorem* defines this maximum frequency – called *Nyquist frequency*, *f Nyquist*.

This theorem states that if we have a regularly sampled function with the sampling rate of *f <sup>δ</sup>* ¼ 1*=δt*, we can only recover full frequency information if the signal is *band-limited* between frequencies *f <sup>δ</sup>=*2. This theorem states that if we have a regularly sampled function with the sampling rate of *f <sup>δ</sup>* ¼ 1*=δt*, we can only recover full frequency information if the signal is *band-limited* between frequencies �*f <sup>δ</sup>=*2.

Putting in another way, the theorem says that to fully represent the content of a band-limited signal whose Fourier transform is zero outside the range of �*B*, we must sample the signal with a rate at least *f <sup>δ</sup>* ¼ 2*B*. Then, for evenly sampled time series, *f max* ¼ *f Nyquist* ¼ *f <sup>δ</sup>=*2.

**Frequency spacing:** The frequency spacing, Δ*f*, has only general guidelines: too small frequency spacing can lead to unnecessarily long computation times, which adds up fastly for large data sets. Too coarse frequency spacing can risk missing narrow peaks in the periodogram – which would fall between adjacent grid points. However, there is a controversy when considering these frequency grids as independent points when applying statistical significance tests in the periodogram ordinates (testing for true periodicities).

continuous signals, around the sampling times *tj*. This kind of sampling is typical in several applications, including cyclostratigraphy. In that case, again, the Fourier transform is the product of the original signal transform and the transform of the time window, which has the width proportional to 1*=δt*. Then, *f max* ¼ *f Nyquist* ¼ 1*=*ð Þ 2*δt* . However, in this case, this frequency limit does not imply aliasing. Instead, it is about a frequency limit beyond which all signal is

• Frequency limit based on *a priori* knowledge on the expected signals.

Finally, for irregularly sampled time series, the maximum frequency limit can be

The classical periodogram has a fundamental statistical property for evenly sampled time series: when the signal consists solely of pure Gaussian noise, the values of the periodogram are exponentially distributed – for irregularly sampled

The Scargle's generalized form of the periodogram brings back that statistical simplicity for the irregularly sampled case: for time series consisting solely of pure Gaussian noise, the unnormalized periodogram has its ordinates exponentially

This statistical property is used to test for what would be a "true" periodicity in periodogram ordinates. The standard procedure is to assume that the periodogram maximum ordinate represents a true periodicity, called *Fisher criteria*, and to test this value against all others ordinates – supposedly arising from the background

Scargle defined a *False Alarm Probability (FAP)* that, based on the assumed distribution of Gaussian noise, simply measures the probability that a time series without any signal would arise, due to stochastic fluctuations only, an ordinate of the observed magnitude in the periodogram. Following Scargle [15], the *detection threshold*, *z*0, is a magnitude level above which, if we claim that a peak is due a real

signal, we would only be wrong a small fraction *p*<sup>0</sup> (FAP) of the time:

*periodogram feature comes from a periodic phenomenon?"*

*z*<sup>0</sup> ¼ � ln 1 � 1 � *p*<sup>0</sup>

where *p*<sup>0</sup> (FAP) is a small number, and *N* is the number of *independent* frequen-

It is worth noting that this statistical analysis answers the question: "What is the probability that a time series without any periodic component would make arise a peak of that magnitude in the periodogram?" It *does not* answer the utterly more physically significant, more direct question: *"What is the probability that this*

The ability to analytically quantify the relationship between peak height and statistical significance of a feature in the periodogram has been one of the main reasons for the widespread use of the Lomb-Scargle periodogram [10–13, 23–25]. However, the independence of the tested frequencies remains an open issue.

Data quality (and quantity) generally reflects on the peak height related to the background noise, which gives peak *significance*, as discussed above. Neither the number of points in a time series or the signal-to-noise ratio affects the peak frequency determination nor its *precision*. The uncertainty in the frequency value of

� �<sup>1</sup>*=<sup>N</sup>* h i, (10)

*attenuated to zero*.

set by the *precision of time measurements*.

*Periodogram Analysis under the Popper-Bayes Approach*

*DOI: http://dx.doi.org/10.5772/intechopen.93162*

*3.2.3 Statistics of the periodogram*

times, this property no longer holds.

distributed.

noise.

cies tested.

**85**

An evenly sampled time series represents a pointwise product of the original continuous signal with a sequence of Dirac delta functions (a *Dirac comb*) at the sampling times. The Nyquist limit is a direct consequence of the symmetry in this Dirac comb window. Beyond this limit, the spectrum becomes a periodic repetition of itself – that is why the periodogram is unique between the limits �*f Nyquist*. The rise of power in the spectrum beyond the Nyquist limit is called *aliasing* since these peaks are not real but "alias" of the real power inside the Nyquist interval in the original signal.

For unevenly sampled time series:


For irregularly sampled time series, if there is a *periodic pattern in the observation times gaps*, this can lead to a peak in the periodogram indicating a periodicity. For example, the daily pattern of measurements in astronomy: an observation in time *t*<sup>0</sup> is likely to be followed by other observation only at time *t*<sup>0</sup> þ *np* (*p* is an integer number of days, and *n* is an integer). Therefore, it can generate a peak at the frequency *p* in the periodogram.

We find in the literature some proposals for the maximum frequency (Nyquistlike) limit for irregular sampling [15, 21–26]. These estimates are easy to calculate and reduce to the Nyquist frequency limit in the evenly sampled case:


There are also, in the literature, some Nyquist-like limits based on not-so-simple statistics of the time intervals [15]:


continuous signals, around the sampling times *tj*. This kind of sampling is typical in several applications, including cyclostratigraphy. In that case, again, the Fourier transform is the product of the original signal transform and the transform of the time window, which has the width proportional to 1*=δt*. Then, *f max* ¼ *f Nyquist* ¼ 1*=*ð Þ 2*δt* . However, in this case, this frequency limit does not imply aliasing. Instead, it is about a frequency limit beyond which all signal is *attenuated to zero*.

• Frequency limit based on *a priori* knowledge on the expected signals.

Finally, for irregularly sampled time series, the maximum frequency limit can be set by the *precision of time measurements*.
