**4. Deep learning for human target analysis**

With the rapid emergence of new deep learning algorithms and architectures, the development of many domains such as speech recognition, visual object recognition, object detection, and even drug discovery and genomics has been accelerated. Deep learning is composed of multiple processing layers to learn high-level representations with multiple levels of abstraction, thus automating the process of feature extraction. Hence, deep models do not need heavy feature engineering and domain knowledge compared with traditional machine learning techniques. What is more, with so many complicated deep-level transformations, very complex functions can be learned, and more classification and recognition problems can be solved. As a result, deep learning has made great contributions in overcoming difficulties in artificial intelligence and advancing the development of artificial intelligence.

Next, we will mainly describe several deep learning models, which are used mostly in human target analysis field.

#### **4.1. Convolutional neural network**

Convolutional neural network (CNN) is inspired by the visual cortex structure which is composed of simple cells and complex cells. It adopts four key ideas: local connections, parameter sharing, pooling, and multilayers. In this way, CNN is able to fully explore the property of raw signals that there are compositional hierarchies, namely, extracting higherlevel features from the lower-level ones. As a result, convolutional neural networks, as one of the representative algorithms of deep learning, have made a remarkable progress in object detection and recognition, natural language processing (NLP), speech recognition, and medical image analysis in the past few years. In human activity recognition field, CNN is one of the most used deep learning models. For instance, Zhenyuan Zhang et al. have adopted this network to realize continuous dynamic gesture recognition using a radar sensor [30], while Youngwook Kim et al. detected and classified human activities using deep convolutional neural networks [32].

#### **4.2. Recurrent neural network**

In **Figure 7**, the RDS of the two-people scenario shows that the backscattering of the human targets is automatically separated in the 3D range-Doppler-time space. This indicates that RDS is not only able to show the range-Doppler signatures of a single extended target but also able to separate (or even track) multiple targets in the range-Doppler video sequence. Additional processing to separate multi-target reflection (e.g., the separating method pro-

As an example, the RDS has been demonstrated for human target analysis using an S-/C-band UWB radar, but RDS itself is in fact a generic tool. It can be used in various applications, such

posed in [19]) is not required anymore.

68 UWB Technology and its Applications

**Figure 7.** Range-Doppler surface of two human targets (threshold = −23 dB).

With the successful application in NLP, recurrent neural network (RNN) has caught researchers' attention. RNNs have shone light on modeling temporal sequences such as texts and speeches because they can mine timing and semantic information in them. From the perspective of network structure, RNN can remember the previous information and use it to influence the output of the following nodes. However, conventional RNN has its own limit: long-term dependencies. To overcome this problem, long short-term memory (LSTM) came into being and performed better in many tasks. LSTM owns three special gates: input gate, output gate, and forget gate. By using these memory units especially the forget gate, LSTM can access a long-range context of the sequential data. Due to these advantages above, many human activity recognition systems adopted RNN and its variants. Zhi Zhou et al. adopted multimodal signals, including HRRPs and Doppler profiles, which are acquired by the terahertz radar system to recognize dynamic gestures, and the recognition rate reaches more than 91% [22].

**5.2. Notice phase information**

**5.3. Take orientation sensitivity into consideration**

**5.4. Focus more on 1D and 3D domain radar echoes**

be paid to this part of human target analysis field.

ing and supporting and for their advisable opinions.

\*Address all correspondence to: eric.yuanhe@gmail.com

Beijing University of Posts and Telecommunications, Beijing, China

future studies.

research topics.

**Acknowledgements**

**Author details**

Yuan He\*, Xinyu Li and Xiaojun Jing

Common energy-based power spectrograms after FT or STFT always abandon the phase information in backscatter echoes. But phase is an important attribute of any signal and contains a wealth of information such as transmission duration and distance. Pavlo Molchanov et al. investigated frequency and phase coupling phenomena for radar backscattered signals and proposed novel bicoherence-based information features [31]. We think phrase information in radar backscattering signals should be considered more in

Toward Deep Learning-Based Human Target Analysis http://dx.doi.org/10.5772/intechopen.81592 71

Doppler shift is caused by the radial velocity of the moving target. The radial velocity changes with the position of the target and the radar because it is the component of the object's velocity. In other words, when the radar is above the pedestrian, the Doppler is partly induced by the motion vertical component such as arm and leg vertical motions. In this case, negative Doppler will appear. As a result, if the relative position is different, radar backscattered signals produced by one subject performing a specified activity will differ a lot. How to overcome the orientation sensitivity of radar-based HAR is one of the future

Through the investigation of the current research status, compared with the researches in 2D domain, there are few research results on 1D and 3D domains of human echo signals, but through the discussion in previous chapters, we have reason to believe that the two forms of echoes have enough development potential and explore space. Thus, more attention should

I would like to express my gratitude to Pavlo Molchanov, Takuya Sakamoto, Pascal Aubry, Francois Le Chevalier, and Alexander Yarovoy for their contribution. I thank them for assist-

#### **4.3. Auto-encoder**

Auto-encoder is a high-performance deep learning network suitable for dealing with one-dimensional data by extracting optimized deep features. It learns a deep feature representation of raw input via several rounds of encoding-decoding procedures. Auto-encoder applies the layer-wise greedy unsupervised pre-training principle so as to quickly obtain an efficient deep network.

The commonly used variants of auto-encoder are mainly the following kinds: (1) sparse autoencoder, which is able to rebuild the input data well, and (2) de-noising auto-encoder and contractive auto-encoder which can make the models more generic by adding noise or a wellchosen penalty term.

Auto-encoder is able to provide a powerful feature extraction approach for many tasks, which saves a lot of labor. In this way, auto-encoder can combine with whether conventional machine learning algorithm or other deep learning models and becomes a more robust one. Mehmet Saygin Seyfiolu et al. [33] used a convolutional auto-encoder architecture to discriminate 12 indoor activity classes involving aided and unaided human motions by recognizing different 2D Doppler maps, and Branka Jokanovic et al. [34] applied three stacked auto-encoders to extract deep features, respectively, and fuse the result together with a voting principle to classify activities.
