**1. Introduction**

In recent years, high-throughput omics technologies (i.e. genomics, proteomics, transcriptomics and/or metabolomics) provide unprecedented opportunities to discover potential genes, proteins, metabolites, pathways and molecular markers for various applications. The availability of omics data produced from high-throughput omics technologies has facilitated the molecular and genetic improvement of rice varieties with higher yield, quality, nutrient dan resistance to the biotic and abiotic stresses. However, past 15 years have seen the significant increase of omics data in volume and types. This event has challenged the researchers to extract and decipher invaluable information encoded in the data. This challenge can be addressed with the use of bioinformatics in analyzing, integrating and interpreting these massive omics data. There are various bioinformatic tools and algorithms that can be used by the researchers to process, interpret and integrate data in a more efficient and reproducible way. However, lack of connectivity between various tools and algorithms complicates the process of extracting and deciphering the data. Hence there is the need to find procedures to connect these tools to develop workflows that can connect and bring together different techniques that are able to exploit data at many levels.

Here we describe a computational workflow composed of different bioinformatic tools that exploits data from large-scale gene expression experiments and contextualize them at many biological levels. To illustrate the relevance of our workflow, we applied it to data from rice varieties datasets in search for potential SNPs that are associated with flavonoid biosynthetic genes. The workflow started with identification of known flavonoid biosynthetic genes from published articles and database search using several genome and pathway databases such as. KEGG, PlantReactome, RiceCyc) and similarity search analysis.

The potential flavonoid biosynthetic genes were used as a guide-gene to screen for single nucleotide polymorphisms (SNPs) in the flavonoid biosynthetic genes from the genomics and transcriptomics data. Integration of SNP and co-expressed genes was performed via network analysis. The transcriptomics data was used to construct the gene co-expression network followed by the mapping of SNPs onto it. A pathway-network analysis was performed to interpret the biological information related to the flavonoid pathway-network. All information generated from these computational analyses are stored in *MyNutRice*Base (http://www.mynutricebase. org) for knowledge sharing and future use in the functional genomics study and the development of molecular markers.

### **2. Overview of flavonoids in omics and rice breeding improvement**

Several types of rice breeding traits such as disease resistance, drought tolerance, salinity tolerance, grain quality, high nutritional content are currently pursue by the breeders and farmers in their effort to improve the traits of rice varieties. However, nowadays breeding for improved rice varieties with high nutritional content has attracted interest among breeders, geneticists and nutritionists. High protein [1], carotenoid [2], micronutrients [3] and antioxidant content [4] are among the preferred nutritional contents improved in rice varieties. Rice breeding for high nutritional content has been carried out extensively by bio-fortification [5, 6] and/or genetic engineering [7]. The improvement of nutritional content in rice is essential due to their benefits for human health. Lack of specific nutrition can lead to several diseases and malnutrition.

Flavonoids are secondary metabolites commonly produced in flowers, fruits, vegetables and pigmented rice. Flavonoids are known as a potent antioxidant beneficial for human health. Consumption of foods with high antioxidant contents may lower the risk of cardiovascular disease, type II diabetes and colon cancer [8]. Additionally, flavonoid has shown to improve plant resistance against abiotic and biotic stress [9].

**85**

biosynthetic genes.

*Computational Analysis of Rice Transcriptomic and Genomic Datasets in Search for SNPs…*

The flavonoid biosynthetic pathways are found in several crops such as maize, tomato and rice [10]. Genes encoding enzymes involved in the flavonoid biosynthetic pathways are categorized into general phenylpropanoid, early biosynthetic genes (EBGs), late biosynthetic genes (LBGs) and transcription factors [11]. General phenylpropanoid contains three major genes such as phenylalanine ammonia-lyase (PAL), cinnamic acid 4-hydroxylase (C4H) and 4-coumarate CoA ligase (4CL). The EBGs include chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3-hydroxylase (F3H) and flavanone 3′-hydroxylase (F3'H) [11]. Genes in the general phenylpropanoid category and EBGs are upstream genes that initiate the flavonoid biosynthetic pathways and responsible in the production of secondary metabolites [11]. The LBGs such as dihydroflavonol reductase (DFR), leucoanthocyanidin reductase (LAR), UDP-glucose flavonoid 3-O-glucosyl transferase (UGT) and leucoanthocyanidin oxidase (LDOX) lead to the production of anthocyanin [11]. Genes categorized in EBGs and LBGs are also recognized as

Four transcription factors involved in the flavonoid biosynthetic pathways are *Kala4* (Os04g0557500), *Rc* (Os07g0211500), *R2R3-MYB* (Os06g0205100) and *WD40* (Os02g0682500). *Kala4* encodes a basic-helix–loop–helix (bHLH) and it activates late biosynthetic genes to produce black pigmentation or anthocyanin in the seed pericarps [12]. *Rc* gene encodes for bHLH that regulates the *Rd* (Os01g0633500) expression to produce red pigmentation [13]. However, *Rc* expression without the participation of *Rd* has resulted to brown pigmentation [13]. Besides the existence of *R2R3-MYB* and *WD40* in the flavonoid biosynthesis, they

Anthocyanin and proanthocyanidins are two major flavonoid compounds [11]. Pigmented rice (black and red) is enriched with antioxidant due to the presence of anthocyanin and proanthocyanidin [16]. Understanding the genetic basis of pigmented rice varieties is essential to develop rice varieties with high antioxidant (flavonoid, anthocyanin, proanthocyanidin) content. A study by [17] has dissected the regulation of flavonoid biosynthesis in edible rice tissue using a metabolic engineering approach. Bioinformatics approach was used to identify key flavonoid structural genes in the rice genome by species comparison against maize and sorghum as sequence homologs [17]. A total of six genes encoding enzymes (CHS, CHI, F3H, F3'H, DFR and ANS) in the flavonoid biosynthetic pathway were selected for tBLASTN analysis against Nipponbare rice genome sequence. At least 66% amino acid identities were found in the rice genome sequences. The expression patterns of six flavonoid genes were analyzed to investigate the accumulation of

Previous study showed the correlation between sequence polymorphism and metabolite profiling affects the flavonoid accumumation between *indica* and *japonica* rice sub-species [18]. Different accumulation of flavonoids in these two rice sub-species might due to the variation in flavonoid biosynthetic genes [19, 20]. A quantitative trait loci analysis was performed to develop molecular markers associated with antioxidant content in rice [21]. The potential molecular markers associated with antioxidant content can be applied in the marker-assisted breeding

Understanding gene-based-SNP underlying flavonoid biosynthesis process is crucial in developing rice cultivars with higher flavonoid contents. Integration of single nucleotide polymorphisms (SNPs) and co-expressed genes can be used to identify causal SNPs and to prioritize the functional SNPs involved with flavonoid

However, the study on molecular and genetic improvement of antioxidant content in rice via integration of multi-omics data is still limited. Current data is still

towards the improvement of high-level antioxidant content in rice variety.

also responsible to regulate the pigmentation in purple leaves [14, 15].

*DOI: http://dx.doi.org/10.5772/intechopen.94876*

structural genes [11].

flavonoids level in rice seedlings.

#### *Computational Analysis of Rice Transcriptomic and Genomic Datasets in Search for SNPs… DOI: http://dx.doi.org/10.5772/intechopen.94876*

The flavonoid biosynthetic pathways are found in several crops such as maize, tomato and rice [10]. Genes encoding enzymes involved in the flavonoid biosynthetic pathways are categorized into general phenylpropanoid, early biosynthetic genes (EBGs), late biosynthetic genes (LBGs) and transcription factors [11]. General phenylpropanoid contains three major genes such as phenylalanine ammonia-lyase (PAL), cinnamic acid 4-hydroxylase (C4H) and 4-coumarate CoA ligase (4CL). The EBGs include chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3-hydroxylase (F3H) and flavanone 3′-hydroxylase (F3'H) [11]. Genes in the general phenylpropanoid category and EBGs are upstream genes that initiate the flavonoid biosynthetic pathways and responsible in the production of secondary metabolites [11]. The LBGs such as dihydroflavonol reductase (DFR), leucoanthocyanidin reductase (LAR), UDP-glucose flavonoid 3-O-glucosyl transferase (UGT) and leucoanthocyanidin oxidase (LDOX) lead to the production of anthocyanin [11]. Genes categorized in EBGs and LBGs are also recognized as structural genes [11].

Four transcription factors involved in the flavonoid biosynthetic pathways are *Kala4* (Os04g0557500), *Rc* (Os07g0211500), *R2R3-MYB* (Os06g0205100) and *WD40* (Os02g0682500). *Kala4* encodes a basic-helix–loop–helix (bHLH) and it activates late biosynthetic genes to produce black pigmentation or anthocyanin in the seed pericarps [12]. *Rc* gene encodes for bHLH that regulates the *Rd* (Os01g0633500) expression to produce red pigmentation [13]. However, *Rc* expression without the participation of *Rd* has resulted to brown pigmentation [13]. Besides the existence of *R2R3-MYB* and *WD40* in the flavonoid biosynthesis, they also responsible to regulate the pigmentation in purple leaves [14, 15].

Anthocyanin and proanthocyanidins are two major flavonoid compounds [11]. Pigmented rice (black and red) is enriched with antioxidant due to the presence of anthocyanin and proanthocyanidin [16]. Understanding the genetic basis of pigmented rice varieties is essential to develop rice varieties with high antioxidant (flavonoid, anthocyanin, proanthocyanidin) content. A study by [17] has dissected the regulation of flavonoid biosynthesis in edible rice tissue using a metabolic engineering approach. Bioinformatics approach was used to identify key flavonoid structural genes in the rice genome by species comparison against maize and sorghum as sequence homologs [17]. A total of six genes encoding enzymes (CHS, CHI, F3H, F3'H, DFR and ANS) in the flavonoid biosynthetic pathway were selected for tBLASTN analysis against Nipponbare rice genome sequence. At least 66% amino acid identities were found in the rice genome sequences. The expression patterns of six flavonoid genes were analyzed to investigate the accumulation of flavonoids level in rice seedlings.

Previous study showed the correlation between sequence polymorphism and metabolite profiling affects the flavonoid accumumation between *indica* and *japonica* rice sub-species [18]. Different accumulation of flavonoids in these two rice sub-species might due to the variation in flavonoid biosynthetic genes [19, 20]. A quantitative trait loci analysis was performed to develop molecular markers associated with antioxidant content in rice [21]. The potential molecular markers associated with antioxidant content can be applied in the marker-assisted breeding towards the improvement of high-level antioxidant content in rice variety.

Understanding gene-based-SNP underlying flavonoid biosynthesis process is crucial in developing rice cultivars with higher flavonoid contents. Integration of single nucleotide polymorphisms (SNPs) and co-expressed genes can be used to identify causal SNPs and to prioritize the functional SNPs involved with flavonoid biosynthetic genes.

However, the study on molecular and genetic improvement of antioxidant content in rice via integration of multi-omics data is still limited. Current data is still

*Recent Advances in Rice Research*

levels.

various applications. The availability of omics data produced from high-throughput omics technologies has facilitated the molecular and genetic improvement of rice varieties with higher yield, quality, nutrient dan resistance to the biotic and abiotic stresses. However, past 15 years have seen the significant increase of omics data in volume and types. This event has challenged the researchers to extract and decipher invaluable information encoded in the data. This challenge can be addressed with the use of bioinformatics in analyzing, integrating and interpreting these massive omics data. There are various bioinformatic tools and algorithms that can be used by the researchers to process, interpret and integrate data in a more efficient and reproducible way. However, lack of connectivity between various tools and algorithms complicates the process of extracting and deciphering the data. Hence there is the need to find procedures to connect these tools to develop workflows that can connect and bring together different techniques that are able to exploit data at many

Here we describe a computational workflow composed of different bioinformatic tools that exploits data from large-scale gene expression experiments and contextualize them at many biological levels. To illustrate the relevance of our workflow, we applied it to data from rice varieties datasets in search for potential SNPs that are associated with flavonoid biosynthetic genes. The workflow started with identification of known flavonoid biosynthetic genes from published articles and database search using several genome and pathway databases such as. KEGG,

The potential flavonoid biosynthetic genes were used as a guide-gene to screen for single nucleotide polymorphisms (SNPs) in the flavonoid biosynthetic genes from the genomics and transcriptomics data. Integration of SNP and co-expressed genes was performed via network analysis. The transcriptomics data was used to construct the gene co-expression network followed by the mapping of SNPs onto it. A pathway-network analysis was performed to interpret the biological information related to the flavonoid pathway-network. All information generated from these computational analyses are stored in *MyNutRice*Base (http://www.mynutricebase. org) for knowledge sharing and future use in the functional genomics study and the

**2. Overview of flavonoids in omics and rice breeding improvement**

Several types of rice breeding traits such as disease resistance, drought tolerance, salinity tolerance, grain quality, high nutritional content are currently pursue by the breeders and farmers in their effort to improve the traits of rice varieties. However, nowadays breeding for improved rice varieties with high nutritional content has attracted interest among breeders, geneticists and nutritionists. High protein [1], carotenoid [2], micronutrients [3] and antioxidant content [4] are among the preferred nutritional contents improved in rice varieties. Rice breeding for high nutritional content has been carried out extensively by bio-fortification [5, 6] and/or genetic engineering [7]. The improvement of nutritional content in rice is essential due to their benefits for human health. Lack of specific nutrition can lead

Flavonoids are secondary metabolites commonly produced in flowers, fruits, vegetables and pigmented rice. Flavonoids are known as a potent antioxidant beneficial for human health. Consumption of foods with high antioxidant contents may lower the risk of cardiovascular disease, type II diabetes and colon cancer [8]. Additionally, flavonoid has shown to improve plant resistance against abiotic and

PlantReactome, RiceCyc) and similarity search analysis.

development of molecular markers.

to several diseases and malnutrition.

**84**

biotic stress [9].

inadequate to be applied in rice breeding selection. Furthermore, this constraint limits detailed insight into the underlying mechanism and regulation of antioxidant content on the system level. The association of SNPs in the causative biosynthetic genes might be useful to uncover key alleles that influence the accumulation of flavonoid. Linking the SNPs with their co-expressed genes that involved in the flavonoid biosynthesis process could be a promising approach to prioritize the functional SNPs and causal genes to be used in the experimental validation towards the molecular and genetic improvement of rice variety enriched with flavonoid content.
