**2. Clinical application using big data in radiation oncology**

#### **2.1. Prostate cancer**

(a) Expanding capacity to create new knowledge

(c) Translating personalized medicine in clinical practice with EHR data

key machine learning algorithms in radiation oncology in this chapter.

**Data type Format Approx. size** Clinical features Text 10 MB Blood tests Numbers 1 MB Administrative ICD-10 codes 1 MB Imaging data DICOM 450 MB

Raw genomic data BAM: position, base, quality 6 GB **Total 7.9 GB**

planning information could be a major treatable data [15].

Radiation oncology data (planning and

could be the major treatable data [15].

onboard imaging)

(d) Allowing for a transformation of health care by transferring information to patient [10] This trend is called to be "big bang" to adapt and research for big data and machine learning in medicine. Especially, machine learning is widely used [4–6]. Radiotherapy is a treatment method using radiation for cancer treatment based on a patient treatment planning for each radiotherapy machine. At this time, the dose, volume, device setting information, complication, tumor control probability, etc. are considered as a single-patient treatment for each fraction during radiotherapy process. Thus, these filed-up big data for a long time and numerous patient cases are inevitably suitable to produce optimal treatment and minimize the radiation toxicity and complication. Thus, we describe various clinical cases and

First, what is the big data for a single patient in hospital? The data type and its size for each patient can be summarized in **Table 1**. In case of radiation oncology, imaging and treatment

Second, we would like to explain radiation treatment planning and decision support system in radiation oncology. When we set up treatment planning with parameters for patient cure in radiotherapy, it is based on the radiation treatment planning (RTP) system. The clinical target volume (CTV) and planning target volume (PTV) have to be targeted by maximum radiation, and critical organs have to be radiated by minimum. It is established based on the correlation between the dose and volume, also known as dose-volume histogram (DVH). At this process, considered parameters are the prescription dose (PD), dose distribution, dose fractionation, dose constraints at normal tissue, target volume, treatment machine setting values, etc. [2, 16]. Third, when the finish treatment planning has been completed, the DVH is acquired. The dose-volume distribution will be the basic information whether it could be use or not. But, these limited information do not give hot spot for target volume, conformity, homogeneity,

**Table 1.** Data type and its size for each patient. In case of radiation oncology, imaging and treatment planning information

DICOM, RT-DICOM 500 MB

(b) Helping with knowledge dissemination

176 Radiotherapy

Çınar et al. [25] describe prostate cancer as follows:


Thus, this clinical application is meaningful to deal with machine learning in big data. Coates et al. [4] studied the integrated big data research for prostate cancer in radiation oncology. The parameters are dose-volume metrics (EUD), clinical parameter [gastrointestinal (GI) toxicities or rectal bleeding and genitourinary (GU) toxicities or erectile dysfunction (ED)], spatial parameters (zDVH), biological variables (genetic variables), etc., and the risk quantification modeling of TCP and NTCP has performed. These modeling methods are various, and the neural network and kernel-based methods are widely used. **Figure 1** shows that the toxicity prediction results using principle component analysis (PCA) [4].

**Figure 1.** The predicted NTCP via principle component analysis (PCA) (reproduced from James Coates et al. [4]).

De Bari et al. [5] have done the pilot study for the prediction of pelvic nodal status using machine learning of prostate cancer. A 1555 cN0 and 50 cN+ prostate cancer patients enrolled, and decision tree and machine learning algorithm were used to study for performance results of Roach formula and Partin table. The accuracy, specificity, and sensitivity ranging between 48–86%, 35–91%, and 17–79%, respectively, were showed through this study (**Figure 2**).

**Figure 2.** A decision tree example for prediction of pelvic nodal status in prostate cancer patients [5].

In addition, several analysis articles have been reported for prostate cancer with index results, which could be the example for adding above machine learning algorithm in the next step [30, 31].

#### **2.2. Lung cancer**

Das et al. [6] describe radiation-induced pneumonitis as a serious problem around thorax including the lung as follows:

a. Important problem for the incident radiation to the adjacent or surrounding normal lung. b. Occurrence of high grade in 15–36% with retrospective studies.

Das et al. [6] conducted prediction modeling based on 234 lung cancer patients and Lyman normal tissue complication probability (LNTCP) by decision tree analysis. **Table 2** shows injury prediction by various settings for a male patient.

#### **2.3. Head and neck cancer**

Head and neck cancer patients undergo anatomical change during radiotherapy for a few weeks. Thus, kilovoltage cone-beam computed tomography (kV-CBCT) and mega-voltage computed tomography (MVCT) combined with a linear accelerator (LINAC) permit to control patient's daily anatomical change for treatment fractions in recent radiotherapy [7]. The adaptive radiotherapy (ART) could fix the anatomical variation for the patient through the dose distribution adjustment. Finally, reducing unexpected toxicity can be possible. But, This ART accompanies time and labor for daily setup about the variation fixing. At this time,


when replanning has to be done daily/weekly for numerous patients, then it is laborious and time-consuming for this process.

De Bari et al. [5] have done the pilot study for the prediction of pelvic nodal status using machine learning of prostate cancer. A 1555 cN0 and 50 cN+ prostate cancer patients enrolled, and decision tree and machine learning algorithm were used to study for performance results of Roach formula and Partin table. The accuracy, specificity, and sensitivity ranging between 48–86%, 35–91%, and 17–79%, respectively, were showed through this study (**Figure 2**).

In addition, several analysis articles have been reported for prostate cancer with index results, which could be the example for adding above machine learning algorithm in the next step

**Figure 2.** A decision tree example for prediction of pelvic nodal status in prostate cancer patients [5].

Das et al. [6] describe radiation-induced pneumonitis as a serious problem around thorax

a. Important problem for the incident radiation to the adjacent or surrounding normal lung.

Das et al. [6] conducted prediction modeling based on 234 lung cancer patients and Lyman normal tissue complication probability (LNTCP) by decision tree analysis. **Table 2** shows

Head and neck cancer patients undergo anatomical change during radiotherapy for a few weeks. Thus, kilovoltage cone-beam computed tomography (kV-CBCT) and mega-voltage computed tomography (MVCT) combined with a linear accelerator (LINAC) permit to control patient's daily anatomical change for treatment fractions in recent radiotherapy [7]. The adaptive radiotherapy (ART) could fix the anatomical variation for the patient through the dose distribution adjustment. Finally, reducing unexpected toxicity can be possible. But, This ART accompanies time and labor for daily setup about the variation fixing. At this time,

b. Occurrence of high grade in 15–36% with retrospective studies.

injury prediction by various settings for a male patient.

[30, 31].

178 Radiotherapy

**2.2. Lung cancer**

including the lung as follows:

**2.3. Head and neck cancer**

**Table 2.** Comparison table of injury prediction for combinations of radiotherapy plan and various settings for a male patient [6].

Guidi et al. [7] studied the prediction of replanning benefit using unsupervised machine learning on retrospective data considering this process and patient characteristics. **Figure 3** is the algorithm architecture for this study. From the DVH input, clustering which classifies into data group, support vector machine (SVM) training which analyzes the parotid gland, and clinical acceptance level with test and output process are shown in **Figure 3** [7]. Thus, the results suggest that the replanning for 77% patients is needed because the significant morphodosimetric changes affect them when the fourth week of treatment starts.

**Figure 3.** Algorithm architecture for prediction using clustering and support vector machine training [7].
