**4. Protocol assessment**

### **4.1 Progressive learning ConvNet**

### *4.1.1 Methodology*

To confirm our protocol functions effectively in practice, we replicated the CIFAR-100 analysis used in the original *Progressive Learning* paper [29], with a few major changes to replicate a precision medicine environment (i.e. the kind of clinical context in which data is typically collected). First, only the CIFAR-10 dataset was used [34], rather than it being used as the model initialization dataset. This was done to better reflect the clinical environment, where we are unlikely to have a pre-established dataset which is known to be effective at preparing our model for the task. The 10 categories of the CIFAR-10 dataset also represent the average granularity usually used to assess many illnesses. Second, our datasets were randomly split into 10 subsets of 6000 random images, with 5000 used for training and the remaining 1000 used for validation. The contents of these subsets was completely random, allowing for imbalance in the number of elements from each of the 10 categories, reflecting how data collected in a clinical setting could occur. Third, we skipped the curriculum stage, again to reflect the circumstances of clinical data collection (wherein the scale of collection is insufficient). Fourth, our framework was implemented in PyTorch [35] rather than TensorFlow [36] due to its more robust network pruning support. Finally, data augmentation was performed on each image to both discourage the model from memorizing data and to simulate human error/variation in clinically acquired data. This results in a problem which is slightly more difficult than the original setup devised by Fayek et al., though for parity sake, we continued to use the same Convolution Network design.

We tested 6 procedures, representing combinations of two different variations. The first variation was whether the learning model trained as an independent learning model, a progressive learning model with prior blocks frozen, or a progressive learning model with prior blocks being freely pruned and updated. For independent learning procedures, the model was completely reset after each training cycle, whereas for progressive learning procedures the model persisted across cycles (allowing for it to "apply" prior knowledge to new data). The second was whether data was provided in batches (similar to a clinical setting), or submitted all at once (the "ideal" for machine learning analyses). In batched procedures, data was submitted one subset at a time, as described prior. A strict max wall time of 8 hours was put in place for all protocols to simulate the limited resources (in both time and hardware) that clinical settings often have. All protocols were run on a single Tesla V100-PCIE-16GB GPU with 16GB of RAM and two Intel(R) Xeon(R) Gold 6148 CPUs run at 2.40GHz (speeding up initial protocol setup).

The initial architecture for all procedures is shown in **Table 1**. For progressive learning procedures, new blocks were added which were half the size of the original


*Delivering Precision Medicine to Patients with Spinal Cord Disorders; Insights into… DOI: http://dx.doi.org/10.5772/intechopen.98713*

### **Table 1.**

*The basic structure of the convolutional neural network being tested on the CIFAR-10 dataset. Based on the model used by Fayek et al. [29]. [Concatenation] indicates where the output of one set of blocks would be concatenated together before being fed into new blocks in the following layer, and can be ignored for independent learning tasks.*

blocks, set to receive the the concatenated outputs of all blocks in the prior layer of each set of blocks. All parameters were initialized randomly using PyTorch version 1.8.1 default settings. We used an ADAM optimizer with a learning rate of 0.001, first moment *β*<sup>1</sup> of 0.99, second moment *β*<sup>2</sup> of 0.999, and weight decay *λ* of 0.001 during training. For progressive learning models, an identical optimizer with one tenth the learning rate was used for post-pruning model optimization. Each cycle consisted of 90 epochs of training. Progressive procedures were given 10 epochs per pruning cycle, with pruning being repeated until the mean accuracy of the prior set of epochs was

greater than that of the new set of epochs, with the model's state being restored to the prior before continuing. The model's training and validation accuracy was evaluated and reported once per epoch. Protocol efficacy was measured via the max validation accuracy of the model over all cycles and epochs and mean best-accuracy-per-cycle (BAPC) for all cycles.

### *4.1.2 Results*

For our full datasets, the model achieved diagnostic classification accuracy values of 80-85% for most of the results. The simple model, without progression and with full access to the entire dataset, reached a max accuracy of 81.13%, with a mean BAPC of 80.71%. Adding progression to the process further improved this, primarily though the pruning stage, with a max accuracy of of 84.80%. However, the mean BAPC dropped to 77.44%, as prior frozen parameters in the model appeared to make the model "stagnate". Allowing the model to update and prune these carried-over parameters improves things substantially, leading to a max accuracy of 90.66% and a mean BAPC of 84.83%.

When data was batched, a noticeable drop in accuracy was observed, as expected. Without progressive learning, our model's max observed accuracy was only 73.7% (a drop of 7.37%), with a mean BAPC of 71.75%. The progressive model with frozen priors initially performed better, reaching its maximum accuracy of 75.9% in its first cycle, but rapidly fell off, having a mean BAPC of 66.0%. Allowing the model to update its priors greatly improve the results, however, leading to a maximum accuracy of 82.4% and a mean BAPC of 79.02%, competing with the static model trained on all data at once.

A plot of each model's accuracy for each model setup, taken over the entire duration (all cycles and epochs) for both the training and validation assessments, is shown in **Figure 5**.

### **4.2 DenseNet**

### *4.2.1 Methodology*

To confirm that the success of the setup suggested by Fayek et al. was not due to random chance, we also applied the technique to another model which is effective at predicting the CIFAR-10 dataset; the DenseNet architecture [37]. DenseNets are characterized by their "blocks" of densely connected convolution layer chains, leading to a model which can utilize simpler features identified in early convolutions to inform later layers that would, in a more linear setup, not be connected together at all. These blocks are a perfect fit for our method, as they can be generated and added to our progressive learning network just like any other set of layers. DenseNets have also been shown to have better accuracy than classical convolution nets within the CIFAR-10 dataset, reaching error rates of less than 10% in many cases [37]. However, the dense connections make the networks extremely complex, and they are generally highly over-parameterized as well, making them prone to over-fitting in some cases.

Our testing methodology was largely identical to that of the Convolutional network tested in the previous section. One change was to use a Stochastic Gradient Descent (SGD) optimizer with an initial learning rate of 0.1. Training was done in batches of 64, for a total of 300 epochs per cycle. The learning rate is reduced by a factor of 10 when 50% and 75% of the epochs for each cycle has passed. The SGD

*Delivering Precision Medicine to Patients with Spinal Cord Disorders; Insights into… DOI: http://dx.doi.org/10.5772/intechopen.98713*

### **Figure 5.**

*The training progression of the ConvNet model replicated from Fayek et. al's study [29] in various forms. From left to right, the model on its own, reset after every cycle (static), a progressively learning model, with prior traits frozen (progressive, frozen), and a progressively learning model, with all traits open to training and pruning each cycle (progressive, free). Training accuracy is shown in blue, with validation accuracy shown in orange. The maximum observed accuracy for each is indicated via a horizontal dotted (training) or dashed (validation) line. The dotted horizontal lines indicate where the training of the model for a given cycle was complete (not including the pruning of progressive models). Note that the total number of epochs taken between these cycles differs from cycle to cycle in progressive models, as a result of the pruning stage cycling until a validation accuracy loss was observed.*

optimizer was set with a weight decay of 0.0001 and a momentum of 0.9. Dropout layers with a drop rate of 0.2 were added after each block as well. The initial architecture for the network was based on the 'densenet-169' architecture, and is shown in **Table 2**, having a growth rate of 32, an initial feature count of 64, and consisting of 4 blocks of densely connected convolution layers, each with 6, 12, 32, and 32 convolution layers respectively. For progressive learning systems, new blocks followed the same architecture with half the growth rate (16) and initial features (32). These changes were made to maintain parity with the original DenseNet CIFAR-10 test [37].

### *4.2.2 Results*

For our full datasets, we saw an accuracy values of around 90%. The simple model, without progression, reached a max accuracy of exactly 90%, but was only able to run one cycle to completion before the 8 hour time limit was reached. Adding progression to the process improved this slightly, resulting in a max accuracy of of 90.66%, but only barely completed its first pruning cycle before the time limit was reached. As a result, the same accuracy was observed for both progressive models with and without

### *Machine Learning - Algorithms, Models and Applications*


### **Table 2.**

*The structure of the DenseNet mdoel being tested on the CIFAR-10 dataset. Based on the model used by Huang et al. [37]. Dense block indicates a densely connected convolution block, with transition block indicating a transition layer, both being detailed in Huang et. al's original paper. [Concatenation] indicates where the output of one set of blocks would be concatenated together before being fed into new blocks in the following layer, and can be ignored for independent learning tasks. Where it appears, r indicates dropout rate for the associated block.*

priors being trainable, as no new prior blocks were added. Slight variations were still observed, however, due to how the model's initialization process differs.

When data was batched, a much more significant drop in accuracy occured as compared to the Convolutional network. Without progressive learning, our model's max observed accuracy was only 69.9% (a drop of more than 30%), with a mean BAPC of 65.87%. However, it was able to run for all 10 cycles within the allotted 8 hour time span. The progressive model with frozen priors performed even worse, reaching a maximum accuracy of 67.1% in its first cycle, and only completing 5 cycles before the time limit, only reaching a mean BAPC of 64.28%. Allowing the model to update its priors somewhat improved the results, leading to a maximum accuracy of 71.7% and a mean BAPC of 68.28%, showing some slight recovery over the static model in a batch scenario. However, it also only managed to run through 5 cycles before the time limit was reached.

*Delivering Precision Medicine to Patients with Spinal Cord Disorders; Insights into… DOI: http://dx.doi.org/10.5772/intechopen.98713*

### **Figure 6.**

*The training progression of the DeepNet model replicated from Huang et. al's original 'densenet-169' model [37] in various forms. From left to right, the model on its own, reset after every cycle (static), a progressively learning model, with prior traits frozen (progressive, frozen), and a progressively learning model, with all traits open to training and pruning each cycle (progressive, free). Training accuracy is shown in blue, with validation accuracy shown in orange. The maximum observed accuracy for each is indicated via a horizontal dotted (training) or dashed (validation) line. The dotted horizontal lines indicate where the training of the model for a given cycle was complete (not including the pruning of progressive models). Note that the total number of epochs taken between these cycles differs from cycle to cycle in progressive models, as a result of the pruning stage cycling until a validation accuracy loss was observed.*

A plot of each model's accuracy for each model setup, taken over the entire duration (all cycles and epochs) for both the training and validation assessments, is summarized in **Figure 6**.
