3.2.3 Results

Table 1 summarizes accuracy and F1-score results of CNN (in both time and frequency domains) for both the SMM recognition and HAR tasks. For SMM


#### Performance rates of time-, and frequency-domain CNNs for the SMM recognition (in terms of F1-score) and Human Activity Recognition referred to as HAR (in terms of accuracy). Highest ratesare in bold.

recognition, as opposed to other subjects, Subject 6 within Study 2 had only one session recorded; thus, no CNN was trained for this subject. We observe that all frequency-domain CNNs (of all subjects in all studies) perform better than timedomain CNNs by 8.52% (in terms of the mean F1-score). This suggests that ST eliminates all noisy information and thus helps the CNN capture meaningful features. However, as opposed to these results, comparing results of time and frequency domain CNNs on the Human Activity Recognition (HAR) task demonstrates the efficiency of time over frequency by 3.92% in terms of accuracy (as shown in Table 1). These contradictory results can be explained by the difference in the chosen ST frequency range for SMM recognition and that of HAR. Indeed, in SMM recognition, the frequency range of the ST was carefully chosen to cover almost all SMMs (0–3 Hz), resulting in optimal frequency-domain samples (containing full and noise-free information) which produced better CNN parameters. Meanwhile, the ST frequency range for HAR (0–8 Hz) may be a short/small range which generated frequency-domain samples that may have lost relevant information. Indeed human activity frequencies fall between 0 and 20 Hz (with 98% of the FFT amplitude contained below 10 Hz). Thus, in order to train CNNs with frequencydomain signals, it is necessary to analyze raw time series to come up with the proper ST frequency range which covers all valuable information needed for the recogni-

tion task.

4.1.1 Definition

4.1.2 CNN structure

convolving each filter w<sup>1</sup>

65

4. Algorithm-level approach

CNN Approaches for Time Series Classification DOI: http://dx.doi.org/10.5772/intechopen.81170

4.1 Background on convolutional neural networks

sary, translation-invariant features from the input.

case, we consider a one-dimensional time series <sup>x</sup> <sup>¼</sup> ð Þ xt <sup>N</sup>�<sup>1</sup>

predicted class ^y based on the input time series, xð Þ 0 , …,x tð Þ.

ð Þ¼ <sup>i</sup>; <sup>h</sup> <sup>w</sup><sup>1</sup>

a1

CNNs were developed with the idea of local connectivity. Each node is connected only to a local region in the input. The local connectivity is achieved by replacing the weighted sums from the neural network with convolutions. In each layer of the CNN, the input is convolved with the weight matrix (e.g., the filter) to create a feature map. As opposed to regular neural networks, all the values in the output feature map share the same weights so that all nodes in the output detect exactly the same pattern. The local connectivity and shared weights aspect of CNNs reduce the total number of learnable parameters, resulting in more efficient training and learning in each layer a weight matrix which is capable of capturing the neces-

The input to a convolutional layer is usually taken to be three-dimensional: the height, weight and number of channels. In the first layer this input is convolved with a set of M<sup>1</sup> three-dimensional filters applied over all the input channels. In our

task and a model with parameter values w, the task for a classifier is to output the

The output feature map from the first convolutional layer is then given by

<sup>h</sup> for h ¼ 1, …, M<sup>1</sup> with the input:

∞ j¼�∞

w1

<sup>h</sup> <sup>∗</sup> <sup>x</sup> ðÞ¼ <sup>i</sup> <sup>∑</sup>

<sup>t</sup>¼<sup>0</sup> . Given a classification

<sup>h</sup>ð Þj x ið Þ � j (3)
