**3.4 How would the reported results be improved?**

*Artificial Intelligence - Applications in Medicine and Biology*

Decision tree Interpretability (with a format consistent

LASSO regression Better interpretability (compared to ridge regularization method)

to random forest)

respond to the network

Deep learning Very accurate, can be adapted to many

Logistic regression Have a nice probabilistic interpretation,

scalable

Naive Bayes Performs surprisingly well, easy to

Random forest Often can produce very accurate

Gradient boosting machines

Support vector machines

Neural networks or more precisely artificial neural networks

Ensembles (decision

Principal component

tree)

**Table 1.**

analysis

*oncology studies.*

**Method Strengths Weaknesses**

Overgrowing a tree with too few observations at leaf nodes

Provides a bias towards zero (not be appropriate in some applications)

More tuning parameters (compared to random forest), and overfitting

Referred to as "black box" models and provide very little insight, and require a large diversity of training datasets

Requires a very large amount of data, and computationally intensive to train

Not flexible enough to naturally capture

Unconstrained, and prone to overfitting

Often beaten by models properly trained

threshold for a cumulative variance

and tuned (algorithms listed)

more complex relationships

Not readily interpretable, and not optimizing the parameters perfectly

the number of trees

Not easily interpretable, and not optimizing

with many clinical pathways)

predictions with little feature engineering

Generates very stable results (compared

Very accurate, few parameters that require tuning, and kernels options

Works even if one or a few units fail to

types of problems, and the hidden layers reduce the need for feature engineering

Perform very well, robust to outliers, and

implement, and can scale with the dataset

and updated easily with new data

little insight) for large datasets [6, 13]. The development of accurate and interpretable models using different ML architectures is an active area of research [6]. As with any algorithm that we use in radiation oncology today (e.g., dose calculation or deformable registration), ML algorithms will need acceptance, commissioning, and QA to ensure that the right algorithm or model are applied to the right application and that the model results make sense in a given clinical situation. Finally, the field of radiation oncology is highly algorithmic and data-centric, and while the road ahead is filled with potholes, the destination holds tremendous promise [14].

*Strengths and weaknesses of the most machine learning methods discussed here appearing in radiation* 

K-means Fast, simple, and flexible Manually specify the number of clusters

Versatile, fast, and simple to implement Not interpretable, and manually set a

The reported prediction results [15–38, 41–47, 52–60, 63–67, 71, 72, 74–77, 79–83, 85–88, 89–95, 97–102] by investigators indicate the performance of these predictive models on data that used in modeling. However, these ML models can suffer from different data biases which may lead to lack of generalizability. A machine learning system trained on local datasets only may not be able to predict (reproduce) the needs of outof-sample datasets (new datasets that are not presented in the training data). External validation of models in cohorts, which were acquired independently from the discovery cohort (e.g., from another institution) is considered the gold standard for true estimates of performance and generalizability of prediction models [6]. The application of different algorithms to the same dataset may yield variable results for predictors found to be significantly associated with the outcome of interest [6, 105]. However, this may

**3.3 How far are the reported results by the investigators correct?**

**60**

Although promising and improving accuracy results of many ML-based predictive models in radiation oncology have been reported [18, 19, 21, 31–38, 41–43, 53–55, 74, 79–83, 85, 86, 89–95, 97–102], the effective applications of these methods in day-to-day clinical practice are very few yet. Such an example of a recently deployed commercial product into clinical use is Quick Match (Siris Medical, Redwood City, CA, USA) [68]. A private initiative, such as IBM's Watson, is already used in some institutions such as the Memorial Sloan Kettering Cancer Center in New York [106–109]. Watson Oncology [108] is a cognitive AI computing system designed to support the broader oncology community of physicians as they consider treatment options with their patients. To improve the prediction accuracy of these reported results, more training and validation datasets from multi-institution are required. Such frameworks, e.g., [50] to compare these methods on standard consensus data to establish benchmarks for evaluating different models would definitely lead to improving these results and developing robust toolkits/systems. It is anticipated to see ML and AI tools very soon settled more effectively with the indispensable role in the routine clinical practice for the benefit of patients, society, and the profession.
