**2. Precision medicine machine learning system design**

We have previously published work using spinal cord metrics generated by the Spinal Cord Toolbox [17] alongside simple linear and logistic regression models [16]. *Delivering Precision Medicine to Patients with Spinal Cord Disorders; Insights into… DOI: http://dx.doi.org/10.5772/intechopen.98713*

While it found moderate success, our results suggested that complex aspects of the spinal cord morphology are likely the key to an accurate model, with simple regression analyses alone appearing to be insufficient. Said study also only used MRI-derived metrics, resulting in our model being unable to use non-imaging attributes to support its diagnostic conclusions, something which has been shown to aid model accuracy in other studies [18]. Finally, our prior models were static in nature, and thus had to be rebuilt each time new data became available. While this may be tractable for simple models (which can be rebuilt very quickly), more complex models require more computational investment, and as such would become far too difficult to manage as the dataset grows. As an additional concern, there is reason to believe that the trends in our collected metrics are likely to change over time as societal, behavioral, and environmental changes occur, influencing DCM epidemiology [19], resulting in prior trends becoming obsolete or less significant. As such, an ideal model would be able to adapt to these changes as they arise, without the need of manual correction.

### **2.1 Data management**

As previously mentioned, a key consideration in the clinical use of machine learning is that clinical data does not remain fixed. As new patients arrive and have their data collected and current patients see their disease state change, the relevant data that can be leveraged will change and expand over time. One possible approach is to retrain our machine learning model from scratch each time we update our dataset; this would become incredibly time and resource consuming as the dataset grows, however. Thankfully, advancements in *continual learning* in the last 5 years provide an elegant solution [20] (which we discuss in Section 3). To use these techniques effectively, we will need to consider the best way of optimizing how data is collected, stored, accessed, processed, and reported. Ideally, these data management systems should be malleable, extendable, and easy to use, so they may remain useful long-term in a everchanging clinical environment. This section will focus on detailing methodologies for achieving this, accounting for the challenges presented by ongoing clinical data collection in the process.

### *2.1.1 Acquisition and storage*

Ideally, our clinical dataset would include any and all relevant features that can be reliably and cost-effectively obtained. In reality, the specific data elements (or "features") will vary both across patients and over time (as new diagnostic tests come available or as ethical rules/constraints are updated). As such, an ideal data management approach should be capable of adapting to variable data feature collection over time, while still allowing new patients to be included. For ethical reasons, the storage system also needs to be set up so that data can be easily removed, should patients request their data be purged or if privacy rules require it.

In our facility, we addressed these considerations by creating a non-relational document database system using MongoDB. This allows for new features to be added and removed on-the-fly via a modular framework of 'forms', which specify sets of related features that should exist inside a single document 'type'. These documents can then be stored within a larger super-document (which we will refer to as a 'record') for each specific patient. This results in a large dataset containing all relevant features organized in an individual-specific manner. Each form acts as a 'schema', specifying what features can be expected to exist within each patient's record. With

MongoDB, this allows features to be added and removed as needed without restructuring the entire database [21], which would risk data loss. If new features are desired, one can simply write a new form containing said features and add it to the system; previous entries without these new features can then be treated as containing "null" values in their place, thereby enabling them to still be included in any analyses performed on the data. Should features need to be removed, the form containing them can either be revised or deleted entirely. This results in the features effectively being masked from analysis access without deleting them from the database itself, allowing for their recovery in the future.

Our system also has the added benefit of allowing for the creation of 'output' forms, which capture and store metrics generated from data analyses. This enables the same system that collects the data to also report these analytical results back to the original submitter via the same interface. These output forms can also be stored alongside forms containing the original values that were provided to the analysis, making both easily accessible when calculating error/loss.

In our DCM dataset, all features (including MRI sequences) were collected at the time of diagnosis and consent to participate in our longitudinal registry associated with the Canadian Spine Outcomes and Research Network [22]. This registry collects hundreds of metrics on each patient, including a number of common diagnostic tests, with each being stored in the database as a single form. Most notably, this includes the modified Japanese Orthopedic Association (mJOA) scale form [23]. This is important for our study as we used this diagnostic assessment of DCM severity as the target metric for model training purposes. The MRI sequence form (which contains our MRI sequences alongside metadata associated with how they were obtained) and demographic information about the patient (including metrics such as name, age, and sex, among others) are also represented by one form each within our system. A simplified visualization of this structure can be seen in **Figure 1**.

This system can also allow pre-built structures to be re-created within it. For example, our MRI data is currently stored using the Brain Imaging Data Structure (BIDS) format [24]. This standardized data structure has directory hierarchies according to the contents of the file, with metadata describing the contents of the directory "falling through" to sub-directories and documents nested within it. These nested directories can then contain new metadata which overrides some or all of the previously set values, allowing for more granular metadata specification. Such a structure is conducive to our system, with said "nested" directories acting as features within forms, or forms within records; features could even consist of sets of subfeatures (such as our MRI feature, which contains the MRI image *and* its associated metadata bundled together). Such nested structure can then specify "override" values, as they become needed.

### **2.2 Cleaning and preparation**

The raw data collected in a clinical setting is almost never "analysis ready", as factors like human error and/or missing data fields must be contended with. Strategies for "cleaning" data can vary from dataset to dataset, but for precision medicine models there are some common standards. First, such protocols should work on a perrecord basis, not a full-data basis. This is to avoid the circumstance where adding entries with extreme values would skew the dataset's distribution, compromising the model's prior training (as the input metrics are, in effect, re-scaled), resulting in an unrealistic drop in model accuracy. Per-record internal normalization, however,

*Delivering Precision Medicine to Patients with Spinal Cord Disorders; Insights into… DOI: http://dx.doi.org/10.5772/intechopen.98713*

### **Figure 1.**

*A simplified example of how data is stored and managed in our theoretical system. Each feature tracked is first bundled into a 'model', which groups related features together alongside a descriptive label. These models act as a schema for any data analysis procedures to hook into, and can be modified, removed, and created as needed. Model instances are then stored in 'records', which represent one entry for any analysis system which requires it (in our case, that of one patient enrolled in our DCM study). A data structure like this can be easily achieved with any non-relational database system; in our case, we opted to use MongoDB.*

typically works well, so long as it remains consistent over the period of the model's use. Some exceptions to this exist; for example, exclusion methods may need to be aware of the entire dataset to identify erroneous records. Likewise, imputation methods will need to "tap into" other available data to fill in missing or incorrect data points within each record.

It is often the case that data is obtained from multiple different sources (e.g. different clinics, practitioners, hospitals, labs, databases, etc.), which may have varying protocols and/or environmental differences that can structurally influence the resulting measurements. If the model could be retrained from scratch every time new data was obtained, these batch effects could be easily removed [25]. In iteratively trained systems, however, this would result in the same issue as full-data normalization; new entries causing fall-through changes in the entire dataset. However, under the assumption that batch effects have less influence on the data than 'true' contributing effects, it has been shown that systems which learn iteratively can integrate batch effect compensation directly into their training for both numeric [26] and imaging [27] metrics, thereby resolving the issue.

Coming back to our DCM example, our data consists of demographic information (which included a mix of numerical and categorical data), diagnostic data (also numerical and categorical), and 3-dimensional MRI sequences data (which also contains metadata describing its acquisition method). For numerical and categorical data, our processing procedures are minimal, consisting of a quick manual review to confirm that all required features were present. As our dataset was relatively large, we opted to simply drop entries which contained malformed or missing data. New patient entries with errors were either met with a request to the supplier for corrected data, or had true values imputed for prediction purposes [28]. Categorical data is then one-hot encoded, while numerical data is scaled between 0 and 1 for values with know minimums and maximums. We had access to multiple different MRI sequencing methodologies as well, but focus on T2w sagittal oriented sequences based on our prior tests with the data [16]. MRI sequences are then resampled to a voxel size of 1*mm*<sup>3</sup> and the signal values normalized to a 0 to 1 range. Unlike our numerical results, this was done based on per-image signal minimum and maximum, in an attempt to account for variations in signal intensity variation, aiding in batch effect removal in the process.
