**4. Results of calculations using algorithm presented in Section 3**

We test the algorithms discussed above on the real data obtained by the ONSA station that is a part of the IGS network [2]. These data are included in the distribution kit of the installation software package [3] and available for usage. We consider measurement data received from global positioning system (GPS) satellite with system number PRN = 12 for 2010, day 207 to check the efficiency of the proposed algorithm described in Section 3. **Figure 2** plots the values of the Melbourne-Wübbena combination over a time interval of 89.5 min (*N* = 180). The index numbers *j* of time epochs counting from the beginning of a 24-h period with a 30-second interval are plotted on the horizontal axis. The values *y <sup>j</sup>* of the combination are plotted on the vertical axis and are expressed in cycles with wavelength λ<sup>5</sup> ≈ 0*:*86 [3]. **Figure 3** shows the values of deviations from the mean of the data cleared of outliers using the algorithms described in Sections 2 and 3. In both cases, σmax = 0.6 and MINOBS = 10. The values *y <sup>j</sup>* � *z* are plotted on the vertical axes in cycles with wavelength λ<sup>5</sup> and the index numbers *j* of epochs on the horizontal axis.

**Figure 2.** *Melbourne-Wübbena combination for ONSA station (GPS satellite, PRN = 12 for 2010, day 207).*

#### **Figure 3.**

*(a) Deviations of values of the Melbourne-Wübbena combination from the mean value after data cleaning from outliers using the algorithm described in Section 2. (b) Deviations of values of the Melbourne-Wübbena combination from the mean value after data cleaning from outliers using the developed algorithm (see Section 3).*

Epochs in which the data were rejected are designated by white circles. In the first case (see **Figure 3a**), 47 data of the measurements were rejected, which are 26.1% of the total amount of data. In the second case (see **Figure 3b**), 14 of these measurements were rejected (7.8%), which are almost 18% less than in the previous calculation.

We also provide similar results for data obtained by TLSE station, which is also included in the IGS network. We consider measurement data from GLONASS, Russia satellite with system number PRN = 1 for 2010, day 207. **Figure 4** shows the values of the Melbourne-Wübbena combination over a time interval of 65.5 min (*N* = 132). **Figure 5** plots the values of deviations from the mean value of the data cleared of outliers using the algorithms described in Sections 2 and 3, respectively. Parameters σmax and MINOBS are the same as in the previous calculation example. In the first case (see **Figure 5a**), 41 data of the measurements were discarded, which are 31% of the total amount data. In the second case (see **Figure 5b**), 8 of these measurements were rejected (6%), which are 25% less than in the previous calculation.

*Effective Algorithms for Detection Outliers and Cycle Slip Repair in GNSS Data Measurements DOI: http://dx.doi.org/10.5772/intechopen.92658*

**Figure 4.**

*Melbourne-Wübbena combination for TLSE station (GLONASS satellite, PRN = 1 for 2010, day 207).*

#### **Figure 5.**

*(a) Deviations of values of the Melbourne-Wübbena combination from the mean value after data cleaning from outliers using the algorithm described in Section 2. (b) Deviations of values of the Melbourne-Wübbena combination from the mean value after data cleaning from outliers using the developed algorithm (see Section 3).*

### **5. Mathematical prerequisites for modifying of existing algorithm**

Note that the number of arithmetic operations required to find the optimal solution according to the algorithm described in Section 3 depends on the Lmax, which is the length of the solution. As can be seen from Eq. (24), the smaller the length of the found solution (i.e., the larger the number of detected outliers), the more arithmetic operations are required to find it. This number of arithmetic operations is estimated to be of order N <sup>þ</sup> ð Þ <sup>N</sup> � Lmax <sup>2</sup> . Note that the expression in the parentheses herein is equal to the number of outliers detected. Thus, if the number of outliers in the original data series is comparable to N, it will take �N2 arithmetic operations to find the optimal solution. In particular, to make certain that there is no solution (e.g., in the case where the data contain a trend), �N2 arithmetic operations will also be required. In Section 6, we will modify the existing algorithm and describe fast outlier search algorithm that requires �Nlog2N arithmetic operations.

The necessary preparations are given in this section. Note that in this and the next sections we are dealing with the sequence yj n o<sup>N</sup> j¼1 arranged in the ascending order.

**Assertion 2.** *Let* yj n o<sup>N</sup> j¼1 *be monotonically increasing sequence. The following inequality is true:*

$$
\sigma^2(\mathbf{k}; \mathbf{L} + \mathbf{1}) \ge \min \left\{ \sigma^2(\mathbf{k}; \mathbf{L}), \ \sigma^2(\mathbf{k} + \mathbf{1}; \mathbf{L}) \right\}. \tag{25}
$$

**Proof.** From the monotonicity of the sequence yj and the definition z k, L ð Þ þ 1 [see Eq. (13)],

$$\mathcal{Y}\_k \le z(k, L+1) \le \mathcal{Y}\_{k+L}.\tag{26}$$

One of two cases is possible:

$$\begin{aligned} \text{a. } 2\mathbf{z}(\mathbf{k}, \mathbf{L} + \mathbf{1}) &\leq \mathbf{y}\_{\mathbf{k} + \mathbf{L}} + \mathbf{y}\_{\mathbf{k}}, \Rightarrow \mathbf{z}(\mathbf{k}, \mathbf{L} + \mathbf{1}) - \mathbf{y}\_{\mathbf{k}} \leq \mathbf{y}\_{\mathbf{k} + \mathbf{L}} - \mathbf{z}(\mathbf{k}, \mathbf{L} + \mathbf{1}), \\\\ \text{b. } 2\mathbf{z}(\mathbf{k}, \mathbf{L} + \mathbf{1}) &> \mathbf{y}\_{\mathbf{k} + \mathbf{L}} + \mathbf{y}\_{\mathbf{k}}, \Rightarrow \mathbf{z}(\mathbf{k}, \mathbf{L} + \mathbf{1}) - \mathbf{y}\_{\mathbf{k}} > \mathbf{y}\_{\mathbf{k} + \mathbf{L}} - \mathbf{z}(\mathbf{k}, \mathbf{L} + \mathbf{1}). \end{aligned}$$

Suppose, for example, the case (a) holds. Let us show that in this case

$$
\sigma^2(\mathbf{k}; \mathbf{L} + \mathbf{1}) \ge \sigma^2(\mathbf{k}; \mathbf{L}).\tag{27}
$$

At first, we will show that:

$$|\mathbf{y}\_{j} - \mathbf{z}(\mathbf{k}, \mathbf{L} + \mathbf{1})| \le \mathbf{y}\_{\mathbf{k} + \mathbf{L}} - \mathbf{z}(\mathbf{k}, \mathbf{L} + \mathbf{1}); j = k, \dots, k + L. \tag{28}$$

Truly, inequalities:

$$\mathbf{y\_k} \le \mathbf{y\_j} \le \mathbf{y\_{k+L}}; j = \dots, k+L, \mathbf{y\_k}$$

and the above inequality derived in Case (a) implies:

$$\mathbf{z}(\mathbf{k}, \mathbf{L} + \mathbf{1}) - \mathbf{y}\_{\circ} \le \mathbf{z}(\mathbf{k}, \mathbf{L} + \mathbf{1}) - \mathbf{y}\_{\mathbf{k}} \le \mathbf{y}\_{\mathbf{k} + \mathbf{L}} - \mathbf{z}(\mathbf{k}, \mathbf{L} + \mathbf{1}),\tag{29}$$

*Effective Algorithms for Detection Outliers and Cycle Slip Repair in GNSS Data Measurements DOI: http://dx.doi.org/10.5772/intechopen.92658*

$$\mathbf{y}\_{\mathbf{j}} - \mathbf{z}(\mathbf{k}, \mathbf{L} + \mathbf{1}) \le \mathbf{y}\_{\mathbf{k} + \mathbf{L}} - \mathbf{z}(\mathbf{k}, \mathbf{L} + \mathbf{1}).\tag{30}$$

These inequalities, in turn, imply Eq. (28). Next let us prove (27). This inequality is expanded as follows:

$$\frac{1}{L} \sum\_{\mathbf{j=k}}^{k+L} \left(\mathbf{y}\_{\mathbf{j}} - \mathbf{z}(\mathbf{k}; \mathbf{L} + \mathbf{1})\right)^2 \geq \frac{\mathbf{1}}{L - \mathbf{1}} \sum\_{\mathbf{j=k}}^{k+L-1} \left(\mathbf{y}\_{\mathbf{j}} - \mathbf{z}(\mathbf{k}; \mathbf{L})\right)^2.$$

Substituting here of the expression (17) in place of z k; L ð Þ and writing for brevity, *z* instead of z k; L ð Þ þ 1 , we get inequality:

$$(\mathbf{L} - \mathbf{1}) \sum\_{\mathbf{j=k}}^{\mathbf{k+L}} \left(\mathbf{y}\_{\mathbf{j}} - \mathbf{z}\right)^{2} \ge \mathbf{L} \sum\_{\mathbf{j=k}}^{\mathbf{k+L-1}} \left(\mathbf{y}\_{\mathbf{j}} - \mathbf{z} - \frac{\mathbf{z} - \mathbf{y}\_{\mathbf{k+L}}}{\mathbf{L}}\right)^{2}.\tag{31}$$

Transform the right-hand side of this inequality:

$$\begin{split} \text{RHS}(31) &= \text{L} \sum\_{j=\mathbf{k}}^{\mathbf{k}+\mathbf{L}} \left( \mathbf{y}\_{j} - \mathbf{z} + \frac{\mathbf{y}\_{\mathbf{k}+\mathbf{L}} - \mathbf{z}}{\mathbf{L}} \right)^{2} - \frac{(\mathbf{L}+\mathbf{1})^{2}}{\mathbf{L}} \left( \mathbf{y}\_{\mathbf{k}+\mathbf{L}} - \mathbf{z} \right)^{2} \\ &= \text{L} \sum\_{j=\mathbf{k}}^{\mathbf{k}+\mathbf{L}} \left( \mathbf{y}\_{j} - \mathbf{z} \right)^{2} + \frac{(\mathbf{L}+\mathbf{1})}{\mathbf{L}} \left( \mathbf{y}\_{\mathbf{k}+\mathbf{L}} - \mathbf{z} \right)^{2} - \frac{(\mathbf{L}+\mathbf{1})^{2}}{\mathbf{L}} \left( \mathbf{y}\_{\mathbf{k}+\mathbf{L}} - \mathbf{z} \right)^{2}. \end{split}$$

Here we take into account the equality: P<sup>k</sup>þ<sup>L</sup> <sup>j</sup>¼<sup>k</sup> yj � <sup>z</sup> � � <sup>¼</sup> 0. Next, we have:

$$\text{RHS}(\mathbf{31}) = \mathbf{L} \sum\_{\mathbf{j=k}}^{\mathbf{k+L}} \left( \mathbf{y}\_{\mathbf{j}} - \mathbf{z} \right)^2 - (\mathbf{L} + \mathbf{1}) \left( \mathbf{y}\_{\mathbf{k+L}} - \mathbf{z} \right)^2.$$

Substituting this expression in Eq. (31), we get inequality

$$\sum\_{\mathbf{j=k}}^{\mathbf{k+L}} \left(\mathbf{y}\_{\mathbf{j}} - \mathbf{z}\right)^2 \le \left(\mathbf{L} + \mathbf{1}\right) \left(\mathbf{y}\_{\mathbf{k+L}} - \mathbf{z}\right)^2,$$

that is true due to Eq. (28). Thus, Eq. (27) is proved for case (a). Analogously, case (b) is considered.

We introduce the notation:

$$
\sigma\_{\min}^2(\mathcal{L}) = \min\_{\mathbf{1} \le \mathbf{k} \le \mathcal{N} - \mathcal{L} + 1} \sigma^2(\mathcal{k}, \mathcal{L}).\tag{32}
$$

**Assertion 3.** *The following inequalities hold:*

$$
\sigma\_{\min}^2(\mathbf{N}) \ge \sigma\_{\min}^2(\mathbf{N} - \mathbf{1}) \ge \dots \ge \sigma\_{\min}^2(\mathbf{MINOBS}).\tag{33}
$$

*That is, the sequence* σ<sup>2</sup> minð Þ L *monotonically decreases when L decreases from N to MINOBS.*

**Proof.** Assertion 2 and definition of σ<sup>2</sup> minð Þ L expressed in Eq. (32) imply:

$$
\sigma^2(\mathbf{k}, \mathbf{L} + \mathbf{1}) \ge \min \left\{ \sigma^2(\mathbf{k}, \mathbf{L}), \sigma^2(\mathbf{k} + \mathbf{1}, \mathbf{L}) \right\} \ge \sigma^2\_{\text{min}}(\mathbf{L}).
$$

Since k is chosen arbitrarily, then for all L = MINOBS, … , N � 1 the following inequalities hold:

$$
\sigma\_{\min}^2(\mathbf{L} + \mathbf{1}) \ge \sigma\_{\min}^2(\mathbf{L}),
$$

which proves Assertion 3. Assertion 3 implies the following corollary. **Corollary 1**. *If the inequality*

$$
\sigma\_{\rm min}^2(\mathcal{L}\_0) > \sigma\_{\rm max}^2. \tag{34}
$$

*holds for some* L0*, then for existence of the solution* YL <sup>¼</sup> yk, ykþ1, … , ykþL�<sup>1</sup> *for the problem expressed in Eqs. (3)–(7), it is necessary that the length L of the set YL satisfies the condition L <* L0*.*

**Proof.** Let us assume that L ≥ L0. Assertion 3 on account of monotony of σ<sup>2</sup> minð Þ� , expressed in Eq. (33) implies the following inequalities σ<sup>2</sup> minð Þ <sup>L</sup> <sup>≥</sup>σ<sup>2</sup> minð Þ L0 <sup>&</sup>gt;σ<sup>2</sup> max for all L≥L0. From this, it follows that for any set YL <sup>¼</sup> yk, ykþ<sup>1</sup>, … , ykþL�<sup>1</sup> we will have <sup>σ</sup><sup>2</sup>ð Þ k, L <sup>≥</sup> <sup>σ</sup><sup>2</sup> minð Þ <sup>L</sup> <sup>&</sup>gt; <sup>σ</sup><sup>2</sup> max. Thus, any of sets YL of length L ≥L0 does not satisfy the condition in Eq. (15) and, therefore, cannot be a solution of the problem, expressed in Eqs. (3)–(7).

In particular, we have come to the next important result. If, for example, the inequalities σ<sup>2</sup> minð Þ MINOBS <sup>&</sup>gt;σ<sup>2</sup> max are fulfilled, then the solution for the problem described in Eqs. (3)–(7) does not exist.

In the above-described procedure for solving problem (3)–(7), it takes �N<sup>2</sup> arithmetic operations to make certain that the solution not exists. Taking into account Assertion 3 and Corollary 1, the search procedure may begin by checking the conditions.

$$
\sigma\_{\min}^2(\mathbf{N}) = \sigma^2(\mathbf{1}, \mathbf{N}) \le \sigma\_{\max}^2 \text{ and } \sigma\_{\min}^2(\mathbf{MINBBS}) \le \sigma\_{\max}^2.
$$

This will require approximately �N arithmetic operations. If none of these conditions are fulfilled, the solution search must stop because the solution does not exist. As a result, only �N arithmetic operations are required to ensure that there is no solution.

#### **6. Fast outlier search algorithm**

The above proposed search procedure consists in the calculating values of z k, L ð Þ and <sup>σ</sup><sup>2</sup>ð Þ k, L using recurrent formulas (17)–(22) and checking at every k and L the fulfillment of the inequalities (11) and (12). The algorithm complexity is estimated by value of � <sup>N</sup> <sup>þ</sup> <sup>N</sup><sup>2</sup> Outlier , where NOutlier is the number of outliers found. If it is known a priori that there are few outliers in the measurement data, then the search algorithm for the optimal solution that described in Section 3 can be applied. If the measurement data contain an N-comparable number of outliers, the complexity of such an algorithm will be estimated by about N<sup>2</sup> . It is namely for such type of data we below describe a modified outlier search algorithm with complexity of about Nlog2N.

First of all, note one property that is the key to the construction of a fast outlier search algorithm. Note that if the inequality (15) holds for some set of length L + 1, then there exists a set of length L for which the inequality (15) is valid too. Truly, let assume for some k the inequality <sup>σ</sup><sup>2</sup>ð Þ k, L <sup>þ</sup> <sup>1</sup> <sup>≤</sup> <sup>σ</sup><sup>2</sup> max holds. This inequality and Eq. (25) imply

*Effective Algorithms for Detection Outliers and Cycle Slip Repair in GNSS Data Measurements DOI: http://dx.doi.org/10.5772/intechopen.92658*

$$\min \left\{ \sigma^2(\mathbf{k}; \mathbf{L}), \ \sigma^2(\mathbf{k} + \mathbf{1}; \mathbf{L}) \right\} \le \sigma\_{\max}^2.$$

From this, it follows that

$$
\sigma^2(\mathbf{k}; \mathbf{L}) \le \sigma\_{\text{max}}^2 \text{ and/or } \left. \sigma^2(\mathbf{k} + \mathbf{1}; \mathbf{L}) \right| \le \sigma\_{\text{max}}^2. \tag{35}
$$

This means that at least one of these sets yk, … , ykþL�<sup>1</sup> � � and ykþ1, … , ykþ<sup>L</sup> � � with length of L satisfies conditions expressed in Eq. (15).

However, this property is not true when checking the conditions expressed in Eq. (16). In other words, if these conditions are fulfilled for any set of length L + 1, it might happen that none of the sets of length L may satisfy them. This fact is a significant obstacle to increasing the rate of outlier detection that is necessary when processing a large amount of data with a large number of rough measurements. To overcome this obstacle, we will make the condition expressed in Eq. (16) weaker.

First of all, note that if for some set yk, … , ykþL�<sup>1</sup> � �, the both conditions expressed in Eq. (16) are fulfilled, then the following inequality will hold:

$$\mathbf{y\_{k+L-1}} - \mathbf{y\_k} \le \mathbf{6} \sigma\_{\text{max}}.\tag{36}$$

Consider a problem with condition expressed in Eq. (36) instead of conditions expressed in Eq. (16).

**Remark.** *Recall that in this condition L means the length of the set under checking, and k is the index of the smallest number included in the set. Although "k" and "L" are also encountered as indexes in the sets we use hereinafter, we hope nevertheless that this will not lead to confusion.*

It is easily seen that condition expressed in Eq. (36) for an arbitrary set yk, … , ykþ<sup>L</sup> � � of length L + 1 implies the fulfillment this condition for each of the sets yk, … , ykþL�<sup>1</sup> � � and ykþ<sup>1</sup>, … , ykþ<sup>L</sup> � � of length L. In fact, the fulfillment condition (36) for the set yk, … , ykþ<sup>L</sup> � � means the fulfillment of inequality ykþ<sup>L</sup> � yk ≤ 6σmax from which due to monotony of yj , the inequalities imply ykþ<sup>L</sup> � ykþ<sup>1</sup> <sup>≤</sup> <sup>6</sup>σmax and ykþL�<sup>1</sup> � yk <sup>≤</sup> <sup>6</sup>σmax.

Thus, we have established the validity of the following assertion.

**Assertion 4.** *If the set yk*, … , *yk*þ*<sup>L</sup>* � �*of length L + 1 satisfies conditions (15) and (36), then at least one of the two sets yk*, … , *yk*þ*L*�<sup>1</sup> � � *or yk*þ<sup>1</sup>, … , *yk*þ*<sup>L</sup>* � � *of length L also satisfies conditions (15) and (36).*

Based on this statement, we can formulate the following:

**Assertion 5.** *Solution for the problem (15) + (36) can be found for* �*N* log <sup>2</sup>*N arithmetic operations.*

**Proof.** Let us consider the sequence of steps.

Step 0: Consider the segment *N*ð Þ <sup>0</sup> *Left*, *<sup>N</sup>*ð Þ <sup>0</sup> *Right* h i of numerical axis, where *<sup>N</sup>*ð Þ <sup>0</sup> *Left* = MINOBS, *N*ð Þ <sup>0</sup> *Right* = N. In this step, there is one set *y*1, … , *yN* � � of length N. We check it for fulfillment of the conditions expressed in Eqs. (15) and (36) with k = 1, L = N. If they are satisfied, this set is the solution, and our search stops. Otherwise, we pass to considering of Nð Þ � MINOBS � 1 sets of length MINOBS. We check each of these sets for fulfillment of conditions expressed in Eqs. (15) and (36). If none of them satisfy these conditions, then we stop the search process and conclude that the solution does not exist. Otherwise, once we find the set of length MINOBS satisfying conditions (15) and (36), we transit to Step 1.

#### **Figure 6.**

*Three possible cases for the proposed search. In case (a) we go to the right-hand side range (range for length of sets) to find a solution, in case (c) we go to the left-hand side range to find a solution, in case (b) we look for a solution with length L = N*ð Þ *<sup>k</sup>*�<sup>1</sup> *Mid and the search ends.*

Step 1: Step 1 is the same as Step k described below for k = 1 …

Step k: On the *k*th step, where (k ≥ 1), we consider a segment *N*ð Þ <sup>k</sup>�<sup>1</sup> *Left* , *<sup>N</sup>*ð Þ *<sup>k</sup>*�<sup>1</sup> *Right* h i, which we halve, as a result we receive two segments *N*ð Þ <sup>k</sup>�<sup>1</sup> *Left* , *<sup>N</sup>*ð Þ *<sup>k</sup>*�<sup>1</sup> *Mid* h i and *N*ð Þ <sup>k</sup>�<sup>1</sup> *Mid* <sup>þ</sup> 1, *<sup>N</sup>*ð Þ *<sup>k</sup>*�<sup>1</sup> *Right* h i, where *<sup>N</sup>*ð Þ *<sup>k</sup>*�<sup>1</sup> *Mid* <sup>=</sup> *<sup>N</sup>*ð Þ <sup>k</sup>�<sup>1</sup> *Left* <sup>+</sup> *<sup>N</sup>*ð Þ *<sup>k</sup>*�<sup>1</sup> *Right* � *<sup>N</sup>*ð Þ <sup>k</sup>�<sup>1</sup> *Left* � �*=*<sup>2</sup> h i, [�] designate integral part of a number. Next, we check the sets of length *N*ð Þ *<sup>k</sup>*�<sup>1</sup> *Mid* and *<sup>N</sup>*ð Þ *<sup>k</sup>*�<sup>1</sup> *Mid* + 1 for fulfillment of conditions (15) and (36). The following three cases are possible, schematically shown in **Figure 6**. The sign "�" above the point means that for none of the sets of the corresponding length the conditions (15) + (36) are satisfied, the sign "+" vice versa, that is, there is at least one set of the corresponding length satisfying (15) + (36). In the case of (a), we set *N*ð Þ <sup>k</sup> *Left*<sup>=</sup> *<sup>N</sup>*ð Þ *<sup>k</sup>*�<sup>1</sup> *Mid* + 1, *<sup>N</sup>*ð Þ*<sup>k</sup> Right* <sup>=</sup> *<sup>N</sup>*ð Þ *<sup>k</sup>*�<sup>1</sup> *Right* and transit to the (k + 1)-th step; in the case of (c), we set *N*ð Þ <sup>k</sup> *Left*<sup>=</sup> *<sup>N</sup>*ð Þ <sup>k</sup>�<sup>1</sup> *Left* , *<sup>N</sup>*ð Þ*<sup>k</sup> Right* <sup>=</sup> *<sup>N</sup>*ð Þ *<sup>k</sup>*�<sup>1</sup> *Mid* and transit to the (k + 1)-th step; in the case of (b), we search the solution (15) + (36) with minimal value of <sup>σ</sup><sup>2</sup>ð Þ *<sup>k</sup>*, *<sup>L</sup>* with L = *<sup>N</sup>*ð Þ *<sup>k</sup>*�<sup>1</sup> *Mid* ; the algorithm is terminated.

The search process will continue until either case (b) or until the length of the segment *N*ð Þ <sup>k</sup> *Left*, *<sup>N</sup>*ð Þ*<sup>k</sup> Right* h i is less than or equal to 1. In either case, the total number of steps will not exceed the number of log <sup>2</sup>ð Þ *N* � MINOBS . Since �N operations are performed in each step, the search process is guaranteed to be terminated in �*N* log <sup>2</sup>*N* arithmetic operations.
