**4. Machine learning based RSO behavior pattern classification**

#### **4.1 The scheme of machine learning**

The machine learning (ML) details will be described in this section. The main purpose of construction this machine learning scheme is to detect the behaviors of the resident space object (RSO) by fusing sensors data from multiple sources, including the velocity, orbital energy, angular momentum, and the position of RSO compared to the station. Similar with other machine learning model, this model is generally trained off-line by using generated data. Then the generalized weights will be deployed in the application using TensorFlow deployment. In addition, in order to improve the robustness of the trained system, we proposed a neural network scheme with the ability to train the newly-added unknown pattern online with only tiny modifications of the weights.

As shown in **Figure 8**, the RSO pattern classification architecture is displayed. It consists of two separate parts, one of which is offline part as *Modeling*, the other of which is online part as *Monitoring,* to detect the RSO behavior pattern.

As the matter of the offline part, named as *Modeling RSO Behavior Pattern*, our neural networks will be trained as a classifier by using the collected simulated data. The data specifically indicates the different behaviors of the RSO. As shown in the graph, some useful features are obtained from different sensors in the sessions of feature extraction. Subsequently, the extracted feature will be sent into the training model for neural network tuning and training to generate a classifier, which is used to identify the RSO behavior with fine grained size. On the contrary, the online part, *Monitoring RSO Behavior Pattern*, acquires RSO patterns in real-time to distinguish the abnormal behaviors. In this way, a warning message would be prompted if there are any abnormal behaviors detected.

**Figure 8.**

*The scheme of RSO behavior classification.*

*Game Theoretic Training Enabled Deep Learning Solutions for Rapid Discovery of Satellite… DOI: http://dx.doi.org/10.5772/intechopen.92636*

Additionally, the classifier generated by the machine learning methods has several properties as the following: (a) ability to fuse heterogeneous and complex input data, (b) scalability, (c) robustness with perturbations relative to the data, (d) high accuracy, and (e) explicitly. However, it is hard to fulfill all the requirements as shown above, therefore, some trade off would be considered for our model. Among different structures of neural networks, convolutional neural networks fit our case appropriately.

#### **4.2 Framework of convolutional neural networks (CNNs)**

The structures of Convolutional Neural networks (CNN) as well as Dense Neural Networks (DNN) are shown in **Figure 9**. The DNN neural networks consist of several hidden layers with hidden neurons. Every neuron in the subsequent layer is fully connected to the neurons in the previous hidden layer. The neurons contain the linear functions with activation function for each neuron, which is completely independent with the other neurons. Finally, after going through several layers, the input data is generated to the output data as the "output layer", which utilizes softmax activation function to produce classification probabilities.

However, the DNN (as top image shown in **Figure 9**) has several drawbacks, such as scaling difficulties for large images. For instance, for an image with 18 × 18 × 3 dimensions, DNN for the first layers will have 18 × 18 × 3 = 972 neurons. If the next DNN layer has 30 neurons, the weight parameters will be 972 × 30 + 30 = 29190. With a larger size of image, the weight parameters will increase a lot as well. Moreover, the 30 neurons may not have enough complexity to generalize the accuracy of our classification model.

On the other hand, the CNN (as bottom graph shown in **Figure 9**) can solve the aforementioned issues. A simple CNN [14] consists of several different filters as convolutional layer. In addition, the pooling layer can decrease the dimensions of the input data. Meanwhile, since the adjacent data in an image has similar values,

**Figure 9.** *The framework of the convolutional neural network.*

the filter can extract the features reasonably with few weight parameters. Thus, using the CNN filters can increase the accuracy of our neural network classifier.

With the development of CNN, it successfully proceeds to be the best model structure for computer vision and image processing tasks. Moreover, CNN has also been used by Natural Language Processing (NLP) area. With small filters moving across the input dataset, the filters, with only small numbers of parameters, are re-utilized to recognize patterns for the large image. Therefore, with the similar classification abilities, CNN network is faster to train and predict compared with DNN. After several filters, the output part is flattened to bypass several dense layers to produce the softmax activation at the last layer with the sparse cross-entropy loss function to backpropagation for the behavior of RSO pattern prediction.

Our GTEL PE method utilizes the CNN architecture to classify the RSO pattern with observed data. Compared with the other conventional methods, the convolutional neural networks can process the RSO observation much faster. Therefore, we utilize the Python and TensorFlow with Keras [15] as the fundamentals for code write up. Although, each filter is computationally expensive to be trained, the overall CNN architecture is faster to be trained to provide similar accuracy classification.

A typical CNN structure is shown in **Figure 10**. After training our CNN-DNN with our training data, the test data (with 10–20%) will be employed to evaluate the generalization of our CNN-DNN model. Additionally, in order to solve the

**Figure 10.** *CNN architecture for RSO behavior classification.*

*Game Theoretic Training Enabled Deep Learning Solutions for Rapid Discovery of Satellite… DOI: http://dx.doi.org/10.5772/intechopen.92636*

overfitting problem of our model, several dropout layers were added after each layer. This model outperforms other traditional methods with better accuracy and higher computationally efficiency.

The revised deep learning neural network structure for 143-sapce-behavior (we partition the pointing angles of α and β into cells with 15 degrees each and the total cell number is 143) is shown in **Figure 11**. At first, there are 3 different dimensions in our raw data, which are 3 parameters dimension, 15 times (each track consists of 15 observation measurements) interval dimension, and 1 number of channels dimension. Therefore, the convolutional neural network (CNN) comprises an initial 3\*15\*1 input dimension with 72,000 samples as the raw dataset. In order to distinguish the 143 different labels of satellite behaviors (15-degree difference for between each behavior), 128 "2 × 2" filters are utilized with the same padding for the first convolutional layer. Thus, there are 3\*15\*128 dimensions of output

**Figure 11.** *Revised CNN structure for 143-label data.*


#### **Table 1.**

*The parameters of our 143 classification CNN-DNN model.*

after the first layer. Then the previous output bypasses another two convolutional layers with 128 and 256 "1 × 5" filters together with the same paddings. After the first three convolutional layers, the dimension of the data exploded to 3\*15\*256.

**Figure 12.** *Model structure for the CNN\_DNN network.*

*Game Theoretic Training Enabled Deep Learning Solutions for Rapid Discovery of Satellite… DOI: http://dx.doi.org/10.5772/intechopen.92636*

Afterwards, another two convolutional layers are added with the same padding methods to downsample the dimensions to 3\*3\*256 for future operation. The two layers both has 1 × 7 filters with 512, and 256 third dimension respectively. Finally, the 3\*3\*256 data is flattened to one dimension to bypass three dense layers with 1024, 512, and 512 neurons, respectively. All these layers utilized 30% dropout parameters after dense layers. The employment of the dropout layer is used for solving the overfitting problem during the training process. In order to predict the 143 labels classifications, there is a 143-degree softmax attached at the last layer of the CNN-DNN model. Additionally, the cross-entropy loss function with the Adam Optimizer minimizes the result with the gradient of the next calculation points.

The details of the parameters of the CNN are shown in **Table 1** and **Figure 12**. There are 5,303,695 parameters that need to be trained in this large NN for detecting the different behaviors. In order to train the GTEL CNN much faster and more accurately, a learning rate decay is used during the training process. For the first 75 epochs, the learning rate is 1e<sup>−</sup><sup>4</sup> , which can learn faster after the weight initializing. After 75 epochs, the exponentially decay of learning rate finds the optimized path for decreasing the loss for the CNN model.
