**5.1 Time-series clustering**

Time-series clustering is often problematic [58], especially when we need to analyse risk factors from matching patterns across time. The literature on time-series clustering and pattern discovery has highlighted several studies [59]. There have been some qualitative measures for clustering time-series data, which captured similar risk factor patterns in dynamic temporal data, regardless of whether the correlation between them was linear or not [60]. However, they did not seem to be very suitable for a long and an unequal number of time-series data (e.g., T2DM data). For instance, authors in [59] proposed an algorithm to cluster patients based on clinical data whilst utilising the clustering information for identifying distinct patterns. Altiparmak in [59] provided a slope-wise comparison method (SWC) to find the correlation between local distance vectors of patients visits, and group clinical test results into different sub-groups, based upon the related risk factors, by using feature selection. In their method each cluster of patients was considered as a transaction data that included a pattern indicating which cluster belonged to each patient. Authors in [61] used a similar method [59] in clustering, but they clustered fixed length time-series. Ceccon and coauthors [62] exploited a variation of the naive Bayes classifier with a hidden variable for segmenting patients into disease sub-types. Ceccon's study intended to enhance the classification performance of Glaucoma patients based upon visual field data. Nevertheless, they only focused on standard/static BNs (instead of DBNs) to infer the parameter in a cross-sectional dataset. Moreover, they failed to analyse the influences of multiple hidden variables on the prediction results.

#### **5.2 Pattern discovery and association rules mining**

It has previously been observed that patients with T2DM are also at an increased risk of microvascular comorbidities, including nephropathy, neuropathy, and retinopathy [63]. The underlying pattern of T2DM complications and how their cooccurrence is followed/caused/related by other complications associated with the disease, known as the major source of mortality and morbidity in T2DM [64]. That is because predicting a target complication can be challenging without the consideration of the effects of its associated complications. Similar to Diabetic type 1 patients, although genetic factors impact on developing T2DM, it is believed ignorance of developing complications harms patients' life. What is more, T2DM patients develop a different profile of complications and features, which changes over time per followup visit. One of the most important factors in the high number of dependencies among T2DM features and complications is the appearance of unmeasured risk factors. Surprisingly, the effect of understanding unmeasured variables, which play an important role in disease prediction, does not seems that closely examined.

Understanding these associated patterns has a remarkable actual value and can significantly being used in the clinical domain [6]. It provides an insight into the prediction and relative prevention of the associated complications which are expected to occur in patient followups [7]. It also leads to less suffering time for patients while saves time and cost to healthcare. However, that is highly dependent on the stage of disease along with the prior occurring complications, which is associated with time-series analysis. In time-series analysis, every disease risk factor and complication is determined by various features in previous patient visits (time interval). To better understand the complications of the disease and their effects, this chapter clusters patient the associated rules among the complications. It

#### *Predicting Type 2 Diabetes Complications and Personalising Patient Using Artificial… DOI: http://dx.doi.org/10.5772/intechopen.94228*

attempts to address this issue and present an informative rules/ordering pattern of patient behaviour, with an aim to capture the complexities of the associated complications' over time. The proposed descriptive strategy has been regarded as a useful tool known as association rules (ARs) to detect interesting relationships among T2DM complications.

Temporal Association Rules (TARs) [65] is an extension to association rules [66] to analyse basket data that includes a temporal dimension to order related items. Many algorithms with temporal rules work by dividing the temporal transitions database into different partitions based on the time granularity obliged. For example, different mining algorithms were reformulated and presented to reflect the new general temporal association rules. These include Progressive Partition Minder (PPM), Segmented Progressive Filter (SPF), and TAR algorithm [65–67]. Various algorithms have been proposed for the incremental mining of temporal association rules, especially for numerical attributes [68]. Allen's rules [69] generalised abstracted time-series data into a relation (PRECEDES) to find TARs in [70]. Various ways were proposed to explore the problem of temporal association rules discovery [71]. Nevertheless, previous studies performed discovering association rules on a given subset specified by the time [72], whilst not considering the specific exhibition period of the elements.

Association Rule Mining (ARM) finds frequent patterns by mining ARs with the use of two basic parameters of support and confidence [73]. The majority of the previous ARM algorithms worked by dividing the temporal transitions database into different partitions based on the time granularity obliged.

Difficulties arise with TARs when there are some rare rules of particular interest [74]. Many studies have employed the most common filtering metrics rather than support and confidence in order to detect interesting rules [75]. There is a controversy to this, as a study in the literature argued that a conservative ARM methodology only based on a fixed and rigid threshold for the filtering metrics could be problematic. A few studies attempted to mine frequent underlying patterns of diabetic complications [76]. The frequent pattern mining research significantly affects data mining techniques in longitudinal data. A post-processing approach in [77] attempted to extract interesting subsets of temporal rules within T2DM data. However, it only considered characteristic patterns of administrative data without the appearance of latent variables. Other researchers have undertaken association rule mining of clinical data [78, 79]. Lee et al. attempted to address the issue in [67] and have led to the proposal of the concept of general TARs, where the items were allowed to have varying exhibition periods, and their support was made based on that accordingly. Another research conducted by Plasse et al. in [80] looked at finding homogeneous groups of variables. They suggested that a variable clustering method could be applied to the data in order to achieve a better result in pattern discovering methodology. However, their strategy to mine ARs differed from this chapter in which the number of rules was reduced only based on hierarchical clustering applied to items, not to multiple identical binary attributes. Among these, some methods uncovered temporal patterns and relationships among clinical variables, including causal information [81], numeric time-series analysis [82]. Nevertheless, considering all of this evidence, none of the above studies has clustered uneven time-series clinical data based on a hidden variable for extracting temporal phenotype and behaviours of patients.

#### **6. The suggested methodology**

This chapter, so far, has described the research gap in the modelling and explaining of complex disease processes and thus given the motivation behind the

tool known as association rules to detect interesting relationships among T2DM

Time-series clustering is often problematic [58], especially when we need to analyse risk factors from matching patterns across time. The literature on time-series clustering and pattern discovery has highlighted several studies [59]. There have been some qualitative measures for clustering time-series data, which captured similar risk factor patterns in dynamic temporal data, regardless of whether the correlation between them was linear or not [60]. However, they did not seem to be very suitable for a long and an unequal number of time-series data (e.g., T2DM data). For instance, authors in [59] proposed an algorithm to cluster patients based on clinical data whilst utilising the clustering information for identifying distinct patterns. Altiparmak in [59] provided a slope-wise comparison method (SWC) to find the correlation between local distance vectors of patients visits, and group clinical test results into different sub-groups, based upon the related risk factors, by using feature selection. In their method each cluster of patients was considered as a transaction data that included a pattern indicating which cluster belonged to each patient. Authors in [61] used a similar method [59] in clustering, but they clustered fixed length time-series. Ceccon and coauthors [62] exploited a variation of the naive Bayes classifier with a hidden variable for segmenting patients into disease sub-types. Ceccon's study intended to enhance the classification performance of Glaucoma patients based upon visual field data. Nevertheless, they only focused on standard/static BNs (instead of DBNs) to infer the parameter in a cross-sectional dataset. Moreover, they failed to analyse the influences of multiple hidden variables on the prediction results.

It has previously been observed that patients with T2DM are also at an increased risk of microvascular comorbidities, including nephropathy, neuropathy, and retinopathy [63]. The underlying pattern of T2DM complications and how their cooccurrence is followed/caused/related by other complications associated with the disease, known as the major source of mortality and morbidity in T2DM [64]. That is because predicting a target complication can be challenging without the consideration of the effects of its associated complications. Similar to Diabetic type 1 patients, although genetic factors impact on developing T2DM, it is believed ignorance of developing complications harms patients' life. What is more, T2DM patients develop a different profile of complications and features, which changes over time per followup visit. One of the most important factors in the high number of dependencies among T2DM features and complications is the appearance of unmeasured risk factors. Surprisingly, the effect of understanding unmeasured variables, which play an important role in disease prediction, does not seems that closely examined.

Understanding these associated patterns has a remarkable actual value and can significantly being used in the clinical domain [6]. It provides an insight into the prediction and relative prevention of the associated complications which are expected to occur in patient followups [7]. It also leads to less suffering time for patients while saves time and cost to healthcare. However, that is highly dependent on the stage of disease along with the prior occurring complications, which is associated with time-series analysis. In time-series analysis, every disease risk factor and complication is determined by various features in previous patient visits (time interval). To better understand the complications of the disease and their effects, this chapter clusters patient the associated rules among the complications. It

**5.2 Pattern discovery and association rules mining**

complications.

**208**

**5.1 Time-series clustering**

*Type 2 Diabetes - From Pathophysiology to Cyber Systems*

suggested methodology. The previously discussed methods suffer from some limitations in addressing imbalance issues, complex and temporal relationships between (sometimes unmeasured) factors, and the identification of different underlying characteristics of disease for different subgroups of the population. There is considerable research on predicting T2DM complications. Among these, studies on explaining unknown risk factors and identifying temporal phenotypes by using

hybrid methods (including descriptive and predictive) are rare to find in literature. It represented the reason of the earlier research conducted by the author in [32, 33, 83–85]. The current work of this chapter's author has attempted to address these issues in the previous research in [32, 33, 84], after describing the case study data as a starting point, the suggested methodology is explored as a framework for modelling real time-series clinical data. In the recent work conducted in [83, 85], the identification of informative hidden factors is investigated followed by methods to cluster patients into meaningful subgroups along with the identification of a latent temporal phenotype and the characterisation of these groups using temporal asso-

*Predicting Type 2 Diabetes Complications and Personalising Patient Using Artificial…*

The World Health Organisation (WHO) reported that Type 2 Diabetes Mellitus (T2DM) accounts for at least 90% of all diabetes types. Another study in WHO

The observed dataset in this chapter is similar to the data utilised in the previous study of Diabetes patients in [83] of pre-diagnosed T2DM patients aged twenty five to sixty five years (inclusive) that were recruited from clinical followups at the "IRCCS Instituti Clinic Scientifici" (ICS) Maugeri of Pavia, Italy. The MOSAIC project funds the information based on the seventh Framework Program of the European Commission, Theme ICT201152 Virtual Physiological Human (600914) from 2009 to 2013. These consists of physical examinations and laboratory data for complications and risk factors (predictors) in T2DM which were selected supported existing literature on T2DM [76, 87–90] as well as the recommendations from the clinicians at ICS. These are Retinopathy (RET), Hypertension (HYP), Nephropathy (NEP), Neuropathy (NEU) and LIVer disease (LIV) (see **Table 1**). Here, the predictors are known and selected from the dataset: Body Mass Index (BMI), Systolic Blood Pressure (SBP), High-density Lipoprotein (HDL), Glycated Haemoglobin (HbA1c or HBA), Diastolic Blood pressure (DBP), ChOLesterol (COL), Smoking habit (SMK) and Creatinine (CRT). Control Values for T2DM risk factors are classified in **Table 2** illustrates three clinical level of risk, particularly low (zero), medium (one) and high (two). In T2DM data, the worsening level of the microvascular diseases and HYP is known as a significant cause of death [91]. Even though micro-vascular complications such as RET, NEP, NEU are less frequent comparing to HYP, an inadequate estimation of them causes long-term suffering

comorbidities, which is known as "underlying cause of death" and severe phenotype of the disease [86]. It has previously been observed that patients with T2DM are also at an increased risk of microvascular comorbidities, including nephropathy, neuropathy, and retinopathy [86]. Similar to Diabetic type 1 patients, although genetic factors impact on developing T2DM, it is believed ignorance of developing complications harms patient life because it may develop a different profile of complications and features, which changes over time per follow-up visit. However, these life-threatening complications remain undiagnosed for a long time because of the hidden patterns of their associated risk factors [11]. The underlying pattern of the complications is known as the major source of mortality and morbidity in T2DM and how their co-occurrence is followed/caused by other complications associated with the disease [64]. That is because predicting a target complication can be challenging without the consideration of the effects of its associated complications.

revealed that T2DM patients are at increased risk of long-term vascular

ciation rules (as illustrated in **Figure 2**).

*DOI: http://dx.doi.org/10.5772/intechopen.94228*

**7. Type 2 Diabetes as a case study**

**7.1 Data description**

**211**

#### **Figure 2.**

*The proposed hybrid methodology to find explainable subgroup of patients by personalising diabetic patients in precision medicine. This figure is an abstract methodology explained in Figures 1–4 in the previous work in [83].*

*Predicting Type 2 Diabetes Complications and Personalising Patient Using Artificial… DOI: http://dx.doi.org/10.5772/intechopen.94228*

hybrid methods (including descriptive and predictive) are rare to find in literature. It represented the reason of the earlier research conducted by the author in [32, 33, 83–85]. The current work of this chapter's author has attempted to address these issues in the previous research in [32, 33, 84], after describing the case study data as a starting point, the suggested methodology is explored as a framework for modelling real time-series clinical data. In the recent work conducted in [83, 85], the identification of informative hidden factors is investigated followed by methods to cluster patients into meaningful subgroups along with the identification of a latent temporal phenotype and the characterisation of these groups using temporal association rules (as illustrated in **Figure 2**).
