**3. Functional mapping**

## **3.1. Statistical model**

the sample did exhibit dispersion phenomena. All of the samples were determined not to be

We measured the absorbance of the DNA samples at wavelengths of 260 and 280 nm and calculated the value of the optical density (OD) 260/OD280. Samples were determined to be of sufficient quality if this value was greater than 1.8 and less than 2.2. After testing, all samples

DNA from the sample was sequenced on the Illumina HiSeq 2000 platform to generate 125 bp paired-end reads at greater depth. All these RILs sequencing were performed at Total Genomics Solution (TGS) Institute. After sequencing quantity control, the data of 107 samples reached 117.68 G, with an average of 1.10 G per sample. The percentage of Q30 bases was more than 90%, the percentage of Q20 bases was more than 95%, and the distribution of GC was normal. Thus, the quantity and quality of the sequencing met the requirements for sub-

The per-base coverage depth across all contigs was calculated by mapping raw reads from each RIL against reference genomes. The results showed that the average mapping ratio and sequencing coverage of the samples were higher than 95 and 92%, respectively. The sequence depth of the samples was 9.91×. In addition to the line 110, the other lines are better for subse-

SNP detection was performed with the widely accepted mutation detection software GATK. SNP. The screening criteria were as follows: The depth of sequencing for each sample was greater than or equal to 4, otherwise the sample was marked as missing. Additionally, the quality value of the comparison must be greater than or equal to 20, and the variation detection quality value must be greater than or equal to 50. If the allele frequency was less than 5% or the absence rate of the sample was greater than 50%, the site was filtered out.

The RIL population contains 105 progeny and two parents; the parent number is "A518\_ LER" and "A518\_SHA." The reads of each sample were compared with the reference genome. The average ratio of the samples was above 95%, and the coverage was greater than 92%. The average sequencing depth was 9.19×. By processing, we obtained 107 samples and 1,023,325 whole genome SNP markers. According to the situation of this population, individuals with heterozygous genotype ratios over 25% and markers indicating that parents possessed heterozygous genotypes were eliminated. Finally, 609,427 SNPs and 80 individuals met the model requirements and could be used to detect the QTLs that affect

degraded and were sequenced.

80 Next Generation Plant Breeding

*2.3.2. Sequencing and mapping*

quent variation detection and analysis.

the growth height of *A. thaliana*.

sequent analysis.

**2.4. SNP calling**

were found to meet the requirements for sequencing.

Let *yi* <sup>=</sup> *<sup>c</sup>*(yi (1), <sup>⋯</sup> ,*yi* (T) ) denote the vector of trait values for RIL, *i* measured at *T* time-points. Consider a SNP with two alleles *Q* and *q*, generating two genotypes: *QQ* with *n*<sup>1</sup> RILs and *qq* with *n*<sup>2</sup> RILs. In this study, the likelihood for height growth data of *A. thaliana* is expressed as

$$L(\Phi \mid \mathbf{y}) = \prod\_{i=1}^{n\_{\mathbf{y}}} f\_i(\mathbf{y}\_i) \prod\_{i=1}^{n\_{\mathbf{y}}} f\_i(\mathbf{y}\_i) \tag{1}$$

where Φ indicates the unknown parameters including the time-dependent effects of different QTL genotypes and the time-dependent residual variance and correlations. *f* <sup>1</sup>(yi ) and *f* 2(yi ) are a multivariate normal distribution with a time-dependent mean vector for genotype *QQ* and *qq*,

ganwitype %% and  $\eta\mu$ 

$$\begin{cases} \mu\_1 = \{\mu\_1(1), \dots, \mu\_1(T)\} \text{for QQ} \\ \mu\_2 = \{\mu\_2(1), \dots, \mu\_2(T)\} \text{for } \eta\mu. \end{cases} \tag{2}$$

(T × T)-dimensional longitudinal covariance matrix is expressed as Σ, which can be modeled by using a statistical approach such as the first-order autoregressive [AR(1)] model or an autoregressive moving-average process (ARMA). The maximum-likelihood estimates (MLEs) of the unknown parameters are implemented with the simplex algorithm in R software [48].

## **3.2. Modeling the mean vector**

One of the most important equations for capturing time-specific change in growth is the logistic curve [49, 50], which we used to describe height growth of the QTL genotype according to the following expression:

$$\mathbf{g(t) = \ 1 + bv^{\*n}}\tag{3}$$

where *g*(*t*) represents the trait value at time point t, *a* indicates the asymptotic value of g when *t* → ∞, *b* is a parameter to position the curve on the time axis, and *r* indicates the relative growth rate. Consequently, any specific growth characteristics described by the logistic growth equation can be captured by parameter *a*, *b*, and *r,* and these can be used to determine the coordinates of biologically important benchmarks along the growth trajectory. The mean vector for the QTL genotypes *QQ* and *qq* from time *1* to *T* in the multivariate normal density function is expressed as:

$$\begin{cases} \mu\_1 = \left(\frac{a\_1}{1 + b\_1 e^{-r\_1 t}}, \dots, \frac{a\_1}{1 + b\_1 e^{-r\_1 T}}\right) \text{for QQ}\_t\\ \mu\_2 = \left(\frac{a\_2}{1 + b\_2 e^{-r\_2 t}}, \dots, \frac{a\_2}{1 + b\_2 e^{-r\_2 T}}\right) \text{for q} q. \end{cases} \tag{4}$$

## **3.3. Modeling the covariance structure**

The first-order autoregressive [AR(1)] model has been successfully applied to model the structure of the within-subject covariance matrix for functional mapping. The AR(1) model includes two basic assumptions: (1) variance stationarity (the residual variance, *σ*<sup>2</sup> ) is unchanged over time points and (2) covariance stationarity (the correlation between different time points) decreases proportionally (in *ρ*) with increased time interval. The AR(1) is described as:

$$
\Sigma = \sigma^2 \begin{bmatrix}
1 & \rho & \mathbf{L} & \rho^{T-1} \\
\rho & 1 & \mathbf{L} & \rho^{T-2} \\
\mathbf{L} & \mathbf{L} & \mathbf{L} & \mathbf{L} \\
\rho^{T-1} & \rho^{T-2} & \mathbf{L} & 1
\end{bmatrix} \tag{5}
$$

height for each RIL over 9 weeks. A least-squares approach was used to fit height growth with the logistic curve (Eq. (3)) for each RIL. Based on statistical tests, all genotypes can be fit by a

**Figure 3.** The growth trajectories of height for individual recombinant inbred lines (RILs) are shown in green, whereas yellow lines are the mean growth trajectories of all RILs fitted to a curve. The x- and y-axes of the plot denote time (in

Functional Mapping of Plant Growth in *Arabidopsis thaliana*

http://dx.doi.org/10.5772/intechopen.74424

83

According to the fit results, the relative growth rate of *A*. *thaliana* was *r* = 0.61, and the asymptotic value of plant height was calculated to be *a* = 10.08. We can see from **Figure 3** that the plant height of *A. thaliana* increases exponentially over a period of time, and as growth time increases, the trend of the curve begins to flatten, until the plant height does not change anymore. In addition, different genotypes showed different growth curves, suggesting the possibility of genetic control over the growth trajectories. The statistical model built upon the logistic growth curve

Functional mapping was implemented to analyze the mapping population. A Manhattan plot of LR values against the genome locations of SNPs distributed throughout the genome is shown in **Figure 4**. We found QTLs that affect the growth trajectory of plant height on the fourth chromosome of the RIL population (**Figure 4**). The genome-wide empirical estimate of the critical value is obtained from permutation tests. The profile of the LR value of the full and reduced model across the length of chromosome 4 has a clear peak from 7.9 to 22.5 kb. A total of 48 significant QTLs responsible for growth trajectories of plant heights were identified. The LR value at this peak is 437.78, which is beyond the empirical critical threshold at

Functional mapping can also be used to observe the dynamic expression of a QTL over time such as when a QTL starts to affect a growth process. The parameter combination of the *QQ* genotype was *a*<sup>1</sup> = 9.07, *b*<sup>1</sup> = 370.08, and *r*<sup>1</sup> = 0.92. The parameter combination of the *qq* genotype was *a*<sup>2</sup> = 39.62, *b*<sup>2</sup> = 114.61, and *r*<sup>2</sup> = 0.28. Different QTL genotypes corresponded to different parameter combinations, which indicate that the QTL controls the developmental

logistic curve containing parameters a, b, and r (P < 0.01, **Figure 3**).

the significance level *p* = 0.05.

weeks) and height (in cm).

model is used to map QTLs responsible for growth trajectories in plant height.

where 0 < *ρ* < 1 is the proportion parameter with which the correlation decays with a time point.

## **3.4. Hypothesis tests**

After the MLEs of the parameters are obtained, the hypothesis concerning the existence of a QTL affecting overall growth can be written as:

\*\*Q1\*\*: 2mecung overall gnìom can be written as:

$$\begin{cases} H0: a\_1 = a\_{2'}b\_1 = b\_{2'}r\_1 = r\_2\\ H1: \text{at least one of the equalities above does not hold} \end{cases} \tag{6}$$

where the null hypothesis H<sup>0</sup> corresponds to the reduced model and the alternative hypothesis H<sup>1</sup> corresponds to the full model. The test statistics for testing the hypotheses is the loglikelihood ratio (LR) of the full over reduced model. The critical threshold is determined from permutation tests.

## **3.5. Candidate gene function annotation**

Gene ontology (GO) annotation analysis was performed using Blast2GO [51]. Finally, R script language programming was used to translate the GO annotation results into charts.

## **4. Results**

## **4.1. QTL detection**

By plotting the total growth against growth week, it was observed that each of the 116 mapped genotypes followed the logistic growth curve. **Figure 3** illustrates the growth trajectories of

82 Next Generation Plant Breeding

⎧ ⎪ ⎨ ⎪ ⎩

**3.3. Modeling the covariance structure**

*Σ* = *σ*<sup>2</sup>

QTL affecting overall growth can be written as:

*H*0:*a*<sup>1</sup> = *a*<sup>2</sup>

**3.4. Hypothesis tests**

{

permutation tests.

**4. Results**

**4.1. QTL detection**

esis H<sup>1</sup>

where the null hypothesis H<sup>0</sup>

**3.5. Candidate gene function annotation**

*<sup>μ</sup>*<sup>1</sup> <sup>=</sup> (

*<sup>μ</sup>*<sup>2</sup> <sup>=</sup> (

two basic assumptions: (1) variance stationarity (the residual variance, *σ*<sup>2</sup>

⎡

⎢ ⎣

, *b*<sup>1</sup> = *b*<sup>2</sup>

*<sup>a</sup>* \_\_\_\_\_\_ <sup>1</sup> 1 + *b*<sup>1</sup> *e*<sup>−</sup>*r*<sup>1</sup>

*<sup>a</sup>* \_\_\_\_\_\_ <sup>2</sup> 1 + *b*<sup>2</sup> *e*<sup>−</sup>*r*<sup>1</sup>

, .…, *<sup>a</sup>* \_\_\_\_\_\_\_ <sup>1</sup> 1 + *b*<sup>1</sup> *e* <sup>−</sup>*r*<sup>1</sup>

, .…, *<sup>a</sup>* \_\_\_\_\_\_\_ <sup>2</sup> 1 + *b*<sup>2</sup> *e* <sup>−</sup>*r*<sup>2</sup>

The first-order autoregressive [AR(1)] model has been successfully applied to model the structure of the within-subject covariance matrix for functional mapping. The AR(1) model includes

time points and (2) covariance stationarity (the correlation between different time points) decreases proportionally (in *ρ*) with increased time interval. The AR(1) is described as:

> 1 *ρ* L *ρ<sup>T</sup>*−1 *<sup>ρ</sup>* <sup>1</sup> *<sup>L</sup> <sup>ρ</sup><sup>T</sup>*−<sup>2</sup> L L L L

*ρ <sup>T</sup>*−1 *ρ<sup>T</sup>*−<sup>2</sup> L 1

where 0 < *ρ* < 1 is the proportion parameter with which the correlation decays with a time point.

After the MLEs of the parameters are obtained, the hypothesis concerning the existence of a

, *r*<sup>1</sup> = *r*<sup>2</sup>

 corresponds to the full model. The test statistics for testing the hypotheses is the loglikelihood ratio (LR) of the full over reduced model. The critical threshold is determined from

Gene ontology (GO) annotation analysis was performed using Blast2GO [51]. Finally, R script

By plotting the total growth against growth week, it was observed that each of the 116 mapped genotypes followed the logistic growth curve. **Figure 3** illustrates the growth trajectories of

language programming was used to translate the GO annotation results into charts.

*<sup>T</sup>*)*for QQ*,

*<sup>T</sup>*)*for qq*.

⎥ ⎦

*<sup>H</sup>*1:*at least one of the equalities above does not hold*, (6)

corresponds to the reduced model and the alternative hypoth-

⎤

(4)

) is unchanged over

, (5)

**Figure 3.** The growth trajectories of height for individual recombinant inbred lines (RILs) are shown in green, whereas yellow lines are the mean growth trajectories of all RILs fitted to a curve. The x- and y-axes of the plot denote time (in weeks) and height (in cm).

height for each RIL over 9 weeks. A least-squares approach was used to fit height growth with the logistic curve (Eq. (3)) for each RIL. Based on statistical tests, all genotypes can be fit by a logistic curve containing parameters a, b, and r (P < 0.01, **Figure 3**).

According to the fit results, the relative growth rate of *A*. *thaliana* was *r* = 0.61, and the asymptotic value of plant height was calculated to be *a* = 10.08. We can see from **Figure 3** that the plant height of *A. thaliana* increases exponentially over a period of time, and as growth time increases, the trend of the curve begins to flatten, until the plant height does not change anymore. In addition, different genotypes showed different growth curves, suggesting the possibility of genetic control over the growth trajectories. The statistical model built upon the logistic growth curve model is used to map QTLs responsible for growth trajectories in plant height.

Functional mapping was implemented to analyze the mapping population. A Manhattan plot of LR values against the genome locations of SNPs distributed throughout the genome is shown in **Figure 4**. We found QTLs that affect the growth trajectory of plant height on the fourth chromosome of the RIL population (**Figure 4**). The genome-wide empirical estimate of the critical value is obtained from permutation tests. The profile of the LR value of the full and reduced model across the length of chromosome 4 has a clear peak from 7.9 to 22.5 kb. A total of 48 significant QTLs responsible for growth trajectories of plant heights were identified. The LR value at this peak is 437.78, which is beyond the empirical critical threshold at the significance level *p* = 0.05.

Functional mapping can also be used to observe the dynamic expression of a QTL over time such as when a QTL starts to affect a growth process. The parameter combination of the *QQ* genotype was *a*<sup>1</sup> = 9.07, *b*<sup>1</sup> = 370.08, and *r*<sup>1</sup> = 0.92. The parameter combination of the *qq* genotype was *a*<sup>2</sup> = 39.62, *b*<sup>2</sup> = 114.61, and *r*<sup>2</sup> = 0.28. Different QTL genotypes corresponded to different parameter combinations, which indicate that the QTL controls the developmental

process, to describe the characteristics of genes and gene products [52, 53]. To further summarize the data of GO classification, the cytological components include cell structure, tissue, protein complex, extracellular structure, and cell process. Biological processes include developmental processes, physiological processes, regulatory processes, and the processes of responding to stress. Molecular functions include binding, catalysis, activation, structural molecules, and tran-

Functional Mapping of Plant Growth in *Arabidopsis thaliana*

http://dx.doi.org/10.5772/intechopen.74424

85

Functional prediction and classification analysis of 48 loci were screened with the National Center for Biotechnology and the Joint Genome Institute (JGI) databases. There are 20 gene families in these loci, which include the F-box and calcium-binding EF-hand protein families. Pathway cluster analysis was used to compare the 48 genes with the known protein sequences

It was found that 7 of the 48 genes corresponded to functions in the JGI database. They were divided into nine groups, including carbon fixation pathways in prokaryotes, biosynthesis of

scriptional regulatory functions (**Figure 6**).

**Figure 6.** Gene ontology annotations of significant sites.

in the JGI database.

**Figure 4.** The log-likelihood ratio (LR) of height change of *Arabidopsis thaliana*. The profiles of the LR between the full model (there is a quantitative trait locus [QTL]) and reduced model (there is no QTL) for height growth trajectories throughout the *A. thaliana* genome. The critical thresholds for claiming the genome-wide existence of a QTL are obtained from permutation tests.

trajectory of the plant height. Using the estimates from the growth curves, we drew two different curves, each corresponding to a genotype at each of the detected QTLs on the fourth chromosome (**Figure 5**).

On the basis of the hypothesis test (6), this QTL is detected to be inactive until *A. thaliana* grew to 4 weeks, and its effect on height growth increased with time. At 9 weeks, the genotype *qq* exhibited height growth more than its alternative *QQ*. The effect of *qq* on plant height was more significant than that of *QQ*. If different genotypes at a given QTL correspond to different trajectories, the QTL must affect the differentiation of this trait. Apparently, this mapped QTL interacts significantly with time to affect the height growth of *A. thaliana*.

## **4.2. Candidate gene function annotation**

GO classification is widely used for gene classification and functional annotation, and GO provides three types of semantic terms, including cell component, molecular function, and biological

**Figure 5.** Growth trajectory for different genotypes at the QTLs detected on groups.

process, to describe the characteristics of genes and gene products [52, 53]. To further summarize the data of GO classification, the cytological components include cell structure, tissue, protein complex, extracellular structure, and cell process. Biological processes include developmental processes, physiological processes, regulatory processes, and the processes of responding to stress. Molecular functions include binding, catalysis, activation, structural molecules, and transcriptional regulatory functions (**Figure 6**).

Functional prediction and classification analysis of 48 loci were screened with the National Center for Biotechnology and the Joint Genome Institute (JGI) databases. There are 20 gene families in these loci, which include the F-box and calcium-binding EF-hand protein families. Pathway cluster analysis was used to compare the 48 genes with the known protein sequences in the JGI database.

It was found that 7 of the 48 genes corresponded to functions in the JGI database. They were divided into nine groups, including carbon fixation pathways in prokaryotes, biosynthesis of

**Figure 6.** Gene ontology annotations of significant sites.

**Figure 5.** Growth trajectory for different genotypes at the QTLs detected on groups.

trajectory of the plant height. Using the estimates from the growth curves, we drew two different curves, each corresponding to a genotype at each of the detected QTLs on the fourth

**Figure 4.** The log-likelihood ratio (LR) of height change of *Arabidopsis thaliana*. The profiles of the LR between the full model (there is a quantitative trait locus [QTL]) and reduced model (there is no QTL) for height growth trajectories throughout the *A. thaliana* genome. The critical thresholds for claiming the genome-wide existence of a QTL are obtained

On the basis of the hypothesis test (6), this QTL is detected to be inactive until *A. thaliana* grew to 4 weeks, and its effect on height growth increased with time. At 9 weeks, the genotype *qq* exhibited height growth more than its alternative *QQ*. The effect of *qq* on plant height was more significant than that of *QQ*. If different genotypes at a given QTL correspond to different trajectories, the QTL must affect the differentiation of this trait. Apparently, this mapped QTL

GO classification is widely used for gene classification and functional annotation, and GO provides three types of semantic terms, including cell component, molecular function, and biological

interacts significantly with time to affect the height growth of *A. thaliana*.

chromosome (**Figure 5**).

from permutation tests.

84 Next Generation Plant Breeding

**4.2. Candidate gene function annotation**

antibiotics, one-carbon pool by folate, starch, and sucrose metabolism, glycerolipid metabolism (synthase), glycerolipid metabolism (lipase), polyketide sugar unit biosynthesis, streptomycin biosynthesis, and pentose and glucuronate interconversions. We found that these nine pathways are mostly associated with carbohydrate, energy, and amino acid metabolism. These biological processes play an important role in the development of height in *Arabidopsis*.

dynamic traits, a new analytical model for genome-wide association analysis of dynamic data, called functional GWAS (fGWAS), has been derived [65]. There are two advantages to fGWAS over GWAS: (1) fGWAS is able to identify genes that determine the final form of the trait and (2) it provides the ability to study the temporal pattern of genetic control over a time course. The regulation network of plant height traits has been studied intensively in molecular biology. We already know that the development of plant height traits is regulated by growth hormones such as GA and IAA. Using functional mapping, we found 48 growth QTLs in *A*. *thaliana*. Through the GO annotation of QTLs, we found that there are many genes among the significant loci identified in this study that are related to the pathways for synthesis and conduction of growth hormones, such as AT4G00160.1, which encodes an F-box protein in the signal transduction pathway. It has been shown that the F-box is an auxin receptor. Thus, we can see that the QTLs identified in this study may not only be applicable to *A*. *thaliana*, but also to other plants. These results show that functional mapping can reveal more intricate details

Functional Mapping of Plant Growth in *Arabidopsis thaliana*

http://dx.doi.org/10.5772/intechopen.74424

87

Functional mapping is far from enough to fully study complex traits, and there are still many limitations in describing the developmental pathways leading to the final phenotype and revealing the underlying genetic mechanisms for the formation and development of these traits. It is too simple to draw a complete dynamic diagram of complex traits. Wu [66] extends functional mapping to system mapping. By identifying the dynamic formation process of complex traits as a system and decomposing it into several parts, the QTL that controls the interaction of each component during the development of complex characters is identified. From the point of view of ecology, the process of character formation is extremely complex. To draw a complete quantitative genetic structure, we need to study the characteristics of an organism affected by its own genes as well as the influence from the community partner genome. In nature, most organisms live in groups, and individuals compete with each other. Wu combined game theory with QTL mapping, which opens up new opportunities for

improving the accuracy and resolution of complex phenotype QTL recognition [67].

and grant 201404102 from the State Administration of Forestry of China.

, Lina Wang1

\*Address all correspondence to: libojiang@bjfu.edu.cn

This work is supported by National Natural Science Foundation of China (grant 31700576)

1 Center for Computational Biology, College of Biological Sciences and Technology, Beijing

2 Center for Statistical Genetics, Pennsylvania State University, Hershey, PA, USA

, Rongling Wu1,2 and Libo Jiang1

\*

of dynamic traits such as height growth and other phenotypes.

**Acknowledgements**

**Author details**

, Wenhao Bo1

Forestry University, Beijing, China

Kaiyue Liu<sup>1</sup>

We also found that AT4G00160.1 encodes an F-box protein in the signal transduction pathway. It has been shown that F-box is an auxin receptor [54]. Plant height development is regulated by gibberellin (GA) and auxin (indole-3-acetic acid [IAA]) [55], and GA20ox and GA30ox are encoded by multiple genes, and mutations in these loci can result in dwarfing of the plant in the later stage of GA biosynthesis [56]. The most significant site, AT4G01150.1, is related to protein curvature thylakoid chloroplastic. Protein curvature thylakoid chloroplastic tends to be located in leaves and stems, and plays an important role in plant photosynthesis, which affects plant growth [57].
