**4. Transcript‐based markers and their usage**

out by microarray technology. Analysis of gene expression during *Mycosphaerella pinodes* infec‐ tion was carried out using a microarray [45] containing 16,470 different 70‐mer oligonucleotides from *M. truncatula* and only 25 did not show a detectable signal [46]. In another study, microarray transcriptome profiling based on known pea Expressed Sequence Tags (ESTs) revealed altered expression of genes associated with programmed cell death, oxidative stress and protein ubiqui‐

234 Applications of RNA-Seq and Omics Strategies - From Microorganisms to Human Health

In spite of many advantages of microarrays, this technique is not effective for quantification of transcript splice variants and, furthermore, cannot provide information about novel genes not included in the array. The development of NGS technology made analysis of full transcriptome gene expression possible. To date, there were several studies of pea gene expression based on RNA‐seq technology. Comparative analysis of transcriptional control of pea seed development conducted by RNA‐seq revealed significant differences in gene expression between vegetable and grain pea. Genes associated with sugar and starch biosynthesis were significantly activated during seed maturation. Analysis of differential expression of these genes revealed a nega‐ tive correlation between soluble sugar and starch flux in vegetable and grain pea seeds [32]. Alves‐Carvalho et al. [39] developed the Pea RNA‐Seq gene atlas containing expression profiles of thousands of genes in different pea tissues harvested at distinct developmental stages [48]. Although RNA‐seq technology is indispensable for exhaustive transcriptome studies, it is not the most cost‐efficient tool for gene expression analysis due to substantial sequencing depth required for rare transcript detection. There are RNA‐seq modifications, for example, Massive Analysis of cDNA Ends (MACE) developed by GenXPro GmbH (Frankfurt am Main, Germany) (http://genxpro.net/) that increase the sequencing depth (number of reads per‐transcript) by sequencing only a 50–500 bp fragment (adjacent to the 5' or 3'‐end of the transcript, dependent on the version) [49]. As each read originates from a distinct copy of mRNA, MACE technology is free of duplications and similar artefacts, leading to much more accurate transcript quantifi‐ cation. Even though MACE data cannot be used to distinguish expression of splice‐variants of genes, it can be successfully applied in a number of scenarios even with species not possessing

In our opinion, 5'MACE is a technology possessing potential for simultaneous analysis of gene expression in prokaryotic and eukaryotic organisms; therefore, this technology is practi‐ cally tailor‐made for the analysis of plant‐microbe interaction, particularly for studying the

One of the many challenges in analysing the onset of nodule symbiosis is the small amounts of tissue available. Enclosed environments of symbiotic compartments complicate direct measurements. Implementation of 5'MACE technology made it possible to analyse the gene expression patterns of both organisms simultaneously in a developing nodule and at a frac‐

In our group, 5'MACE was implemented in a study investigating the expression changes in pea nodules caused by a mutation in the *Sym31* gene with unknown function. This gene is

mutant phenotype (non‐nitrogen‐fixing nodules) with halted

(carrying a mutation in the

process of root nodule development in the plants of the Fabaceae family.

bacteroid development [50]. Two plant genotypes Sprint‐2Fix<sup>−</sup>

tylation during seed aging [47].

a high‐quality transcriptome.

tion of the cost of a full RNA‐seq study.

responsible for the unique fix−

The application of NGS for massive genetic polymorphism discovery is widely used due to being much more labour and time efficient than previously used methods such as microar‐ ray hybridisation [54] or denaturing high‐performance liquid chromatography (HPLC) [55]. Originally, the main challenge in using NGS methods for massive polymorphism screening was obtaining sequences of a particular genomic locus for multiple lines due to complexity of plant genomes and the relatively low productivity of the first‐generation NGS‐sequencing platforms, leading to the development of several methods for sequencing optimisation.

For example, Restriction site Associated DNA‐sequencing method (RAD‐Seq) consists of genome cleavage and selection of fragments of appropriate size flanked by specific restriction sites (as with RFLP and AFLP analyses) [56]. RAD‐Seq yields fragments distributed randomly over a genome and is suitable for discovering indels (insertion‐deletion polymorphisms), SNVs (single nucleotide variations) and microsatellites simple sequence repeats (SSR). Using RAD‐Seq, Boutet et al. [57] discovered a total of 419,024 SNVs between at least two of the four pea lines analysed in their work. Pea genetic map constructed by genotyping a sub‐ set of 64,754 SNVs on a subpopulation of 48 RILs (recombinant inbred lines) was collinear with previous pea consensus maps and therefore with the *M. truncatula* genome. Yang et al. [58] using Illumina HiSeq 2500 platform uncovered 8899 putative SSR‐containing sequences. Reliable amplifications of detectable polymorphic fragments among 24 genotypes of pea were obtained for about a half of randomly selected SSR, 820 in total.

Another way of data complexity reduction is transcriptome sequencing. It makes the discovery of polymorphic sites in open reading frames (ORFs) and 5′‐ and 3′‐untranslated regions (UTR) of a gene possible. Moreover, polymorphic sites associated with individual genes may have special meaning for evolutionary studies and QTL analyses. Even though the transcriptome sequencing omits introns and intergenic regions, it can successfully be used for SSR site detection.

Several polymorphism‐screening studies aimed on SNVs and SSR sites discovering in tran‐ scriptomic data were performed on pea (see **Table 1**). SNVs detection may be executed by map‐ ping NGS reads to an existing reference transcriptome assembly [59] or by de novo assembly of those reads [33, 35, 60]. In the case of existing assembly, the additional data complexity


search for candidate genes in the already sequenced genome of the model legume plant *M. truncatula*. Transcriptome‐discovered SNVs and high‐throughput genotyping systems made the construction of several highly saturated genetic maps of pea possible (see **Table 1**) [33,

Transcriptomic Studies in Non-Model Plants: Case of *Pisum sativum* L. and *Medicago lupulina* L.

http://dx.doi.org/10.5772/intechopen.69057

237

Next‐generation sequencing techniques make the analysis of differential gene expression and molecular marker development by transcriptome sequencing possible even in species lacking genomic information. Further development of sequencing and bioinformatics should substantially promote the investigation into genetics of non‐model plants. It is worth noting that numerous traits like effectiveness of symbioses development [62] or specific resistance to pathogens can only be studied in each particular cultivated plant species, most having limited genomic data available. In addition, the decline in biodiversity makes the investiga‐ tion of unique secondary metabolites inherent to non‐model medicinal plants a pressing

Leguminous plants capable of improving the soil quality due to the formation of the mutual‐ istic symbioses with nodule bacteria and arbuscular mycorrhizal fungi are an integral part of agricultural systems. The genetics of most crop legumes lags behind that of model plants, and some are even considered 'orphan' crops, separated from the intense genomic studies due to a number of factors. Fortunately, the similarity of genome organisation, or 'genome synteny', characteristic for most related species, can help 'translate' the genomic data from the model

Using RNA‐seq technologies for de novo transcriptome assembly provides opportunities for finding novel genes and isoforms in non‐model species and investigation of their differential expression. Comparison to genomes and transcriptomes of closely related species can help determine the level of evolutionary distance between the two species and discover possible evolutionary pressures shaping contemporary species. Technologies for determining gene expression levels using transcript ends (like 3' and 5' MACE) can be used to conduct large‐ scale gene expression studies on a smaller budget. 5' MACE, a technology for simultane‐ ous analysis of prokaryotic and eukaryotic transcript abundancies, is particularly useful for studying plant‐bacteria interactions. Using transcriptome‐sequencing data in genetic marker development streamlines the construction of high‐quality genomic maps, crucial for routine gene identification tasks as well as potentially for refining genome assemblies for non‐model organisms. All the methods are useful in investigation of the unique phenotypes not present in the model plants, for example, *M. lupulina* MlS‐1 genetic line, uniquely dependent on the AM formation. Adaptation of standardised RNA‐seq approaches and data analysis devel‐ oped for model plants to an important crop culture *P. sativum* should facilitate the breeding of new cultivars that meet the requirements of the present‐day agriculture and possess the complex of beneficial traits, including increased efficiency of interactions with nodule bacteria

35, 60].

matter.

legumes to their pulse crop relatives [63].

and arbuscular‐mycorrhizal fungi.

**5. Conclusion**

**Table 1.** Studies aimed at gene polymorphism detection in pea (*Pisum sativum* L.) using transcriptome NGS‐sequencing.

reduction is achievable by limiting sequenced mRNA regions. Since UTRs are generally more polymorphic than ORFs using sequences from the 3' and 5' mRNA, ends in SNV analysis should yield comparable results to those obtained with RNA‐seq. 3'MACE protocol for cDNA‐libraries preparation was used by Zhernakov et al. [59] to discover SNVs distinguish‐ ing six pea lines. Mapping MACE reads to the reference nodule transcriptome assembly of the pea line SGE [36] resulted in characterisation of over 34,000 polymorphic sites in more than 9700 contigs. Several of these SNVs were located within recognition sites of restriction endonucleases which allowed the design of co‐dominant Cleaved Amplified Polymorphic Sequences (CAPS) markers for the particular transcript.

SNVs are markers of choice now due to their abundance and the availability of high‐through‐ put screening techniques. SNV genotyping systems are now available, varying in the number of samples and markers to be genotyped, such as GoldenGate® and Infinium from Illumina Inc., SNPStream from Beckman Coulter and GeneChip from Affymetrix [61]. Illumina GoldenGate® oligonucleotide pool assay (OPA) designed for transcriptome‐discovered SNVs was used for pea salinity tolerance QTLs search [60].

As the pea genome is not sequenced yet, the genetic linkage maps are still relevant, since determination of loci responsible for target traits requires their fine mapping and subsequent search for candidate genes in the already sequenced genome of the model legume plant *M. truncatula*. Transcriptome‐discovered SNVs and high‐throughput genotyping systems made the construction of several highly saturated genetic maps of pea possible (see **Table 1**) [33, 35, 60].
