**4. Conclusion**

estimation on statistics. This unsupervised learning is beneficial to data characteristics analysis and its explanation. Typical example is clustering. Another one is an independent

b. It is called that the largest projected variance direction is the first principal component.

c. In case of orthogonal direction, then it is the second principal component and so forth.

And also, the mean squared error can be minimized by maximizing the data variance [12].

Principal component analysis (PCA) is applied to the normalized X to identify a set of princi-

PC = UT X = ∑VT (13)

Clustering is an unsupervised learning method, and that is finding the cluster without data label. The data and data label are required to classify. Thus, it needs different classification methods for unlabeled data. There are several ways to define cluster. One simple way is that we can define as "the data in same cluster inside" is close to each other, and the closest distance data could be selected. k-Means assume the data is close in same cluster. One center exists, and cost which is a distance between center and each data can be defined. Thus,

*<sup>k</sup>* ∑*Xj*

k-Means employs a greedy iterative approach to find a clustering that minimizes the SSE

Here is the advantage and disadvantage of various machine learning algorithms in radiation

<sup>∈</sup>*Ci*|| *Xj* <sup>−</sup> *ui*

}, the scoring function evaluates its quality. This sum of

<sup>|</sup> <sup>|</sup><sup>2</sup> (14)

{*SSE*(*C* )} (15)

component analysis.

186 Radiotherapy

pal components (PCs) [11]:

Given a clustering C = {C<sup>1</sup>

objective [12].

oncology in **Table 4**.

*3.3.2. Clustering*

*3.3.1. Principal component analysis (PCA)*

Zaki and Wagner Meira defined the PCA as follows:

where UΣV<sup>T</sup> is the singular value decomposition of X.

k-means is an algorithm to reduce and minimize cost in cluster.

, …, Ck

The goal is to find the clustering that minimizes the SSE score, thus,

, C2

squared error scoring function is defined as [12]

C\*=argmin*<sup>c</sup>*

*SSE*(*C* ) = ∑*<sup>i</sup>*=0

a. Finding r-dimensional basis that take the data variance.

We summarized various clinical applications such as head, neck, lung, and prostate cancer using machine learning algorithm in radiation oncology [13, 18, 19]. And those machine learning algorithm introductions and several definitions were listed. For the precision medicine in radiation oncology, radiation toxicity and complication factors are inevitable parameters for patients after radiotherapy. The dose-volume distribution will be the basic information, but this limited information does not give the tumor control probability (TCP) and normal tissue complication probability (NTCP) and grade level. Thus, some decision support system is needed to select the best treatment plan for personalized patient care. But now, although this decision support system is needed to add specific function using machine learning and historical treatment results and previously mentioned big data information to predict patients toxicity or complication after radiation treatment [29].

Another current big data trend is the research for the medical imaging such as DICOM RT in radiotherapy. The images have a lot of information for current patient status and future undergoing information as prediction of patient's quality of life. Thus, lung cancer and breast cancer applications are good applications in case of using simple chest X-ray or low-cost imaging method for big data research in clinical application.

**Figure 10.** An example of the big data based on patient-specific treatment prediction in radiation oncology (a), its block diagram (b), and overview (c).

Thus, we explain a predictive solution of radiation toxicity based on the big data as treatment planning decision support system in **Figure 10**. From this block diagram, the input part gives treatment data (i.e., rival plans with DVH) through a radiation treatment planning system. After this process, the dosimetric and biological index analysis process is performed by program. The normal tissue complication probability (NTCP) model could be adaptable, and it is used to consider central lung distance (CLD) and maximal heart distance information to be measured such as two-dimensional radiation therapy indicators between the three-dimensional conformal radiation therapies in case of lung cancer. Dose-volume relationship and tolerance dose in organ-at-risk information are analyzed by some machine learning algorithm in decision support system. At this time, numerous patient treatment "big data" could be used to evaluate machine learning results and predict toxicity and normal tissue complication versus know-based approach. Thus, this will be the evidence-based decision to finalize treatment plan for customized patient cure [20–24].

Therefore, current decision support system can be modified and developed to predict complication and toxicity after radiotherapy by adding not only dosimetric index and biological index function but also clinical big data analysis with various machine learning algorithms. This is the fusion solution for customized patient cure method in big data era in radiation oncology.
