*6.3.2. Choosing optimal number of clusters*

Based on previous insights, we used AIC measure to analyze quality of the models with respect to number of clusters. Results for each activity for one care recipient are shown on **Figure 7**. It can be seen from **Figure 7** that different activities have different "optimal" number of clusters. In this analyses term "optimal" have to be considered very loosely because, in many cases difference in AIC performance is very similar for different number of clusters. This means that for behavior analysis purposes adequate model can be selected in range of models with good and similar AIC performance. Most often, parsimonious solution is applied: model with satisfying performance and the least number of cluster is selected. On the other hand, in case of existence of global saddle point model selection is clearer process. Saddle points have strict mathematical definition based on function derivatives, but in this case, saddle point can be descriptively defined as: point with property that all points from the left side (lower number of clusters) are larger and all points from the right side (higher number of clusters) are larger. In these situations, model selection is based on minimal (optimal value of AIC). Clear example of saddle point on **Figure 7** is labeled with k = 4 for physical\_activity\_calories activity measure.

#### *6.3.3. Behavior characterization*

**Figure 8** depicts behavioral patterns for activity sleep\_light\_time for one care recipient identified by HMM. X-axis represents temporal dimension in day units is presented for the period and Y-axis represents cumulative duration of sleep\_light time for each day.

Normal sleeping process includes interchange of light sleep and deep sleep. First and second behaviors are considered desirable and such times of light sleep lead to mitigation of frailty risk. On the other hand lack of light sleep time and high variations are considered as negative behavior and could indicate increase of stress and chance of MCI/frailty risk development. Based on these observations behavioral patterns are quantified and ordered (e.g., 1—worst behavior, 2 medium behavior, and 3—good behavior) and pushed in further process of risk quantification

Temporal Clustering for Behavior Variation and Anomaly Detection from Data Acquired…

http://dx.doi.org/10.5772/intechopen.75203

129

After characterization of behavioral patterns, we analyzed behavior (pattern) changes over time. Identification and characterization of behavior changes (transitions) over time is crucial step for building proactive systems and providing timely and preventive interventions.

through derivation of numerical indicators and grading (described in previous section).

**Figure 9** describes transitions of behaviors identified in previous sub-section.

*6.3.4. Behavior variation change and anomaly detection*

**Figure 7.** Selection of "optimal" number of behavioral patterns based on AIC values.

It can be seen that HMM model based on AIC model selection criteria identified three different clusters (behavioral patterns) that can be characterized as following:


Temporal Clustering for Behavior Variation and Anomaly Detection from Data Acquired… http://dx.doi.org/10.5772/intechopen.75203 129

**Figure 7.** Selection of "optimal" number of behavioral patterns based on AIC values.

Normal sleeping process includes interchange of light sleep and deep sleep. First and second behaviors are considered desirable and such times of light sleep lead to mitigation of frailty risk. On the other hand lack of light sleep time and high variations are considered as negative behavior and could indicate increase of stress and chance of MCI/frailty risk development. Based on these observations behavioral patterns are quantified and ordered (e.g., 1—worst behavior, 2 medium behavior, and 3—good behavior) and pushed in further process of risk quantification through derivation of numerical indicators and grading (described in previous section).

#### *6.3.4. Behavior variation change and anomaly detection*

After this point, BIC curve grows super linearly meaning that it does not prefer models with higher number of clusters than 2 or 3. Deeper inspection of AIC, BIC and log likelihood curves for each care recipient and each activity showed consistent behavior with ones described on **Figure 6**. Thus we selected AIC as measure of choice for HMM model selection. Based on previous discussion we took AIC as measure of choice for model selection and identification

However, it is very important to emphasize that insights presented in previous text cannot be considered as conclusive and cannot generalize over all problems. This is because cluster performance is dependent on data distributions that are different for each dataset, but also

Based on previous insights, we used AIC measure to analyze quality of the models with respect to number of clusters. Results for each activity for one care recipient are shown on **Figure 7**. It can be seen from **Figure 7** that different activities have different "optimal" number of clusters. In this analyses term "optimal" have to be considered very loosely because, in many cases difference in AIC performance is very similar for different number of clusters. This means that for behavior analysis purposes adequate model can be selected in range of models with good and similar AIC performance. Most often, parsimonious solution is applied: model with satisfying performance and the least number of cluster is selected. On the other hand, in case of existence of global saddle point model selection is clearer process. Saddle points have strict mathematical definition based on function derivatives, but in this case, saddle point can be descriptively defined as: point with property that all points from the left side (lower number of clusters) are larger and all points from the right side (higher number of clusters) are larger. In these situations, model selection is based on minimal (optimal value of AIC). Clear example of saddle point on **Figure 7** is labeled with k = 4 for physical\_activity\_calories

**Figure 8** depicts behavioral patterns for activity sleep\_light\_time for one care recipient identified by HMM. X-axis represents temporal dimension in day units is presented for the period

It can be seen that HMM model based on AIC model selection criteria identified three differ-

**1.** Behavior (purple line): medium values of sleep\_light\_time (between 8000 and 13000 s) with

**2.** Behavior (green line): high values of sleep\_light\_time (between 13000 and 20000 s) with

**3.** Behavior (red line): low values of sleep\_light\_time (between 0 and 15000 s) with high

and Y-axis represents cumulative duration of sleep\_light time for each day.

ent clusters (behavioral patterns) that can be characterized as following:

of optimal number of behavioral state for each care recipient and each activity.

because depends on the context of analyses.

*6.3.2. Choosing optimal number of clusters*

128 Recent Applications in Data Clustering

activity measure.

low deviations,

deviations.

low deviations and

*6.3.3. Behavior characterization*

After characterization of behavioral patterns, we analyzed behavior (pattern) changes over time. Identification and characterization of behavior changes (transitions) over time is crucial step for building proactive systems and providing timely and preventive interventions. **Figure 9** describes transitions of behaviors identified in previous sub-section.

**Figure 8.** Behavioral patterns identified by HMM model.

Frequent pattern changes from **Figure 9** can be observed from green ("good" behavior) to red ("bad" behavior) lines. It can also be observed that red behavior appears more frequently than other two.

behavior transitions and if behaviors are characterized well, these probabilities can be used as early risk identification indicators. Furthermore, based on HMM, model anomalies can be automatically identified per user defined thresholds. For example, by manual labeling on behavioral series presented on **Figure 9**, the lowest point of bad behavior (red line between 2017-05 and 2017-06) is identified. This point is captured as anomalous based on probability threshold of 70%. This means that behavioral point (instance) has max. probability of belonging to any state less than 70%. Experiments on all other activities showed that optimal value of threshold should be between 65 and 75%. Similarly, anomalous states (behaviors) can be identified by setting threshold for minimum number of instances (behavioral measurements) that should constitute behavior (cluster). Since number of behavior measurements is variable for different users, activities and even periods of measurements, we define threshold as percentage of total number of measurements for selected period. In all our experiments series were constituted from 140 to 180 measurements. Experiments showed that good anomaly scoring is achieved by setting threshold to 3–5%. **Figure 10** illustrates situation where anomalous

Temporal Clustering for Behavior Variation and Anomaly Detection from Data Acquired…

http://dx.doi.org/10.5772/intechopen.75203

131

In this chapter we addressed the problem of behavioral pattern recognition, behavior change detection and anomaly detection based on IoT data in smart city environment. We proposed a framework for behavioral change detection that will be utilized in context of mild cognitive impairment (MCI) and frailty risk assessment and detection in the City4Age project. Behavioral modeling and risk assessment for MCI and Frailty are very challenging tasks because of the large variations between each specific personal case, and the practical lack

behavior is detected (last two measurements connected with yellow line).

**7. Conclusion and future work**

**Figure 10.** Anomalous state.

Finally, in most cases "medium" behavior (purple line) transitions to "good" behavior (green line). Based on this analysis it can be observed that after behavior improvement (from "medium" to "good") care recipients often have sudden worsening of behavior. Recognition of such transitional patterns enables predictive and preventive approach in risk prevention. Namely, HMM models, based on transitional probability matrices identify probabilities of

**Figure 9.** Behavior variations (transitions) and anomalous point.

Temporal Clustering for Behavior Variation and Anomaly Detection from Data Acquired… http://dx.doi.org/10.5772/intechopen.75203 131

**Figure 10.** Anomalous state.

Frequent pattern changes from **Figure 9** can be observed from green ("good" behavior) to red ("bad" behavior) lines. It can also be observed that red behavior appears more frequently

Finally, in most cases "medium" behavior (purple line) transitions to "good" behavior (green line). Based on this analysis it can be observed that after behavior improvement (from "medium" to "good") care recipients often have sudden worsening of behavior. Recognition of such transitional patterns enables predictive and preventive approach in risk prevention. Namely, HMM models, based on transitional probability matrices identify probabilities of

than other two.

130 Recent Applications in Data Clustering

**Figure 8.** Behavioral patterns identified by HMM model.

**Figure 9.** Behavior variations (transitions) and anomalous point.

behavior transitions and if behaviors are characterized well, these probabilities can be used as early risk identification indicators. Furthermore, based on HMM, model anomalies can be automatically identified per user defined thresholds. For example, by manual labeling on behavioral series presented on **Figure 9**, the lowest point of bad behavior (red line between 2017-05 and 2017-06) is identified. This point is captured as anomalous based on probability threshold of 70%. This means that behavioral point (instance) has max. probability of belonging to any state less than 70%. Experiments on all other activities showed that optimal value of threshold should be between 65 and 75%. Similarly, anomalous states (behaviors) can be identified by setting threshold for minimum number of instances (behavioral measurements) that should constitute behavior (cluster). Since number of behavior measurements is variable for different users, activities and even periods of measurements, we define threshold as percentage of total number of measurements for selected period. In all our experiments series were constituted from 140 to 180 measurements. Experiments showed that good anomaly scoring is achieved by setting threshold to 3–5%. **Figure 10** illustrates situation where anomalous behavior is detected (last two measurements connected with yellow line).
