**2.3. Cross-validation**

Evaluating GEBV accuracy through cross validation (CV). CV entails splitting the data into training and validation set. The ratio of observations in each set varies, but often a fivefold CV is used, that is, the data set is randomly divided into five sets, with four sets being combined to form the training set and the remaining set designated as the validation set. Each subset of the data is used as the validation set once, before applying of the prediction model to the breeding population, the accuracy of the model should be tested. For this, most of the training population is used to create a prediction model, which is then used to estimate the genomic estimation breeding values of the remaining individuals in the training population, using genotypic data only. This permits researchers to "test" and refine the prediction model to make sure the prediction accuracy is high enough that future predictions are often relied upon. Once valid, the model is often applied to a breeding population to calculate GEBVs of lines that genotypical, however no phenotypical, information is available.
