**3. Differential gene expression (DGE)**

based on preliminary stop codons detected in an open reading frame of *NSP1* homologous sequence in two *sym34* allelic mutants (RisNod1 and RisNod23) and full co‐segregation of the

Alves‐Carvalho et al. [39] sequenced transcriptomes of roots, nodules, shoots, leaves, flowers, seeds, tendrils and pods harvested at different developmental stages of pea cultivar 'Caméor'. Sequencing of 20 cDNA libraries produced one billion reads. After de novo assembly and sev‐ eral steps of redundancy reduction, 46,099 contigs were obtained. The main objective of their study was to obtain the most complete transcriptome and to filter out all the artefacts and chime‐ ric contigs so a rigorous filtration pipeline was developed and implemented. The accumulated transcriptome data was used for the development of the Pea RNA‐Seq gene atlas containing expression profiles of thousands of genes in several organs, including symbiotic nodules. It is worth noting that the pipeline used in this work filtered out a large proportion of short protein‐ coding transcripts, including a number of NCR peptide‐coding transcripts [40], making the Pea

Pea RNA‐Seq gene atlas is also lacking information regarding mycorrhiza‐specific transcripts. Genetic framework of mycorrhizal symbiosis is as of yet not fully understood in either model or non‐model legumes [38]. In order to discover symbiotically active genes both in plant roots and arbuscular‐mycorrhizal fungus, a transcriptome of Frisson pea cultivar roots colonised by *Rhizophagus irregularis* isolate BEG144 was assembled by our workgroup. Sequencing was performed on an Illumina HiSeq2000 sequencing platform yielding 120 million pair end reads. In order to separate the transcriptomes of two organisms present in the samples, all the reads were mapped using the HISAT2 mapper [41] to the genome of *R. irregularis* [42]. Over 5 million successfully mapped reads were assembled by Trinity with default parameters yield‐ ing 30,000 transcripts, in good correlation with 28,000 of known genes for the fungus [42, 43]. All the transcripts not mapped to the *R. irregularis* genome were then assembled with the Trinity pipeline with standard assembly parameters and quality trimming parameters. This resulted in more than 200,000 contigs, of which more than 100,000 were similar to genes of pea

An assessment of transcriptome assembly and annotation completeness with single‐copy orthologs for all available pea transcriptomes was carried out using BUSCO V.2 software with OrthoDB v9.1 'embryophyta' base as a reference [44]. The lowest number of present groups in the transcriptome published by Franssen et al. [30] named 'Franssen' is due to low transcrip‐ tome coverage. High number of missing groups in 'Kaspa', 'Parafield' and 'SGE' assemblies are most likely the result of limited tissue representation (see **Figure 2**). Deep sequencing of mycorrhized roots yielded similar results in regard to transcriptome completeness as a com‐ bined transcriptome from 20 tissues, indicative of assembly of low‐copy transcripts due to

*M. lupulina* is a plant of the Fabaceae family, a close relative to the *M. truncatula*, for which a unique genetic line MlS‐1 characterised by obligate mycotrophic lifestyle was obtained [28].

generation.

alleles of the hypothetical pea *Nsp1* gene with the nodulation phenotype in F2

232 Applications of RNA-Seq and Omics Strategies - From Microorganisms to Human Health

RNA‐Seq gene atlas less useful than tissue‐specific transcriptomes in some cases.

and other plants of the Fabaceae family.

high transcriptome coverage.

*2.2. M. lupulina* **transcriptomics**

Analysis of alterations in gene expression between conditions or genotypes is the most sig‐ nificant part of transcriptomic data analysis. The differences in expression levels can help determine the important genes and elucidate the processes taking place in the investigated samples.

Extensive analysis of gene expression can be carried out by microarray analysis or RNA sequencing technology. Microarray technology requires prior knowledge of gene sequences and is more suitable for objects with available genome sequence. In the case of model object *M. trun‐ catula,* combination of microarray data resulted in development of atlas of gene expression pro‐ files (*Medicago truncatula* Gene Expression Atlas (MtGEA)) (https://mtgea.noble.org/v3/). MtGEA contains information about gene expression in roots, nodules, stems, petioles, leaves, vegetative buds, flowers, seeds, pods and is potentially helpful for studying other legumes. Despite the fact that pea genome is not sequenced yet, several studies of pea gene expression have been carried out by microarray technology. Analysis of gene expression during *Mycosphaerella pinodes* infec‐ tion was carried out using a microarray [45] containing 16,470 different 70‐mer oligonucleotides from *M. truncatula* and only 25 did not show a detectable signal [46]. In another study, microarray transcriptome profiling based on known pea Expressed Sequence Tags (ESTs) revealed altered expression of genes associated with programmed cell death, oxidative stress and protein ubiqui‐ tylation during seed aging [47].

*Sym31* gene) and parental wild‐type line Sprint‐2 were inoculated with an efficient *Rhizobium leguminosarum* bv. *viciae* RCAM1026 [51]. All the obtained reads were sequentially mapped to the RCAM1026 genome (about 8% mapped reads), then to the pea transcriptome assem‐ bly from Alves‐Carvalho et al. [39] (about 60% mapped reads) resulting in two sets of dif‐ ferential transcriptome data. The transcript quantification was carried out using the edgeR package [52]. Differentially expressed genes were then visualised on a metabolic map using KOBAS 2.0 annotation server [53]. Analysis resulted in the discovery of a coordinated shift in sulphur metabolism in both organisms. These preliminary data show the great potential of the 5'MACE technology in furthering our understanding of inter‐organismal gene regulatory

Transcriptomic Studies in Non-Model Plants: Case of *Pisum sativum* L. and *Medicago lupulina* L.

http://dx.doi.org/10.5772/intechopen.69057

235

The application of NGS for massive genetic polymorphism discovery is widely used due to being much more labour and time efficient than previously used methods such as microar‐ ray hybridisation [54] or denaturing high‐performance liquid chromatography (HPLC) [55]. Originally, the main challenge in using NGS methods for massive polymorphism screening was obtaining sequences of a particular genomic locus for multiple lines due to complexity of plant genomes and the relatively low productivity of the first‐generation NGS‐sequencing platforms, leading to the development of several methods for sequencing optimisation.

For example, Restriction site Associated DNA‐sequencing method (RAD‐Seq) consists of genome cleavage and selection of fragments of appropriate size flanked by specific restriction sites (as with RFLP and AFLP analyses) [56]. RAD‐Seq yields fragments distributed randomly over a genome and is suitable for discovering indels (insertion‐deletion polymorphisms), SNVs (single nucleotide variations) and microsatellites simple sequence repeats (SSR). Using RAD‐Seq, Boutet et al. [57] discovered a total of 419,024 SNVs between at least two of the four pea lines analysed in their work. Pea genetic map constructed by genotyping a sub‐ set of 64,754 SNVs on a subpopulation of 48 RILs (recombinant inbred lines) was collinear with previous pea consensus maps and therefore with the *M. truncatula* genome. Yang et al. [58] using Illumina HiSeq 2500 platform uncovered 8899 putative SSR‐containing sequences. Reliable amplifications of detectable polymorphic fragments among 24 genotypes of pea were

Another way of data complexity reduction is transcriptome sequencing. It makes the discovery of polymorphic sites in open reading frames (ORFs) and 5′‐ and 3′‐untranslated regions (UTR) of a gene possible. Moreover, polymorphic sites associated with individual genes may have special meaning for evolutionary studies and QTL analyses. Even though the transcriptome sequencing

Several polymorphism‐screening studies aimed on SNVs and SSR sites discovering in tran‐ scriptomic data were performed on pea (see **Table 1**). SNVs detection may be executed by map‐ ping NGS reads to an existing reference transcriptome assembly [59] or by de novo assembly of those reads [33, 35, 60]. In the case of existing assembly, the additional data complexity

omits introns and intergenic regions, it can successfully be used for SSR site detection.

networks in plant‐microbe interactions.

**4. Transcript‐based markers and their usage**

obtained for about a half of randomly selected SSR, 820 in total.

In spite of many advantages of microarrays, this technique is not effective for quantification of transcript splice variants and, furthermore, cannot provide information about novel genes not included in the array. The development of NGS technology made analysis of full transcriptome gene expression possible. To date, there were several studies of pea gene expression based on RNA‐seq technology. Comparative analysis of transcriptional control of pea seed development conducted by RNA‐seq revealed significant differences in gene expression between vegetable and grain pea. Genes associated with sugar and starch biosynthesis were significantly activated during seed maturation. Analysis of differential expression of these genes revealed a nega‐ tive correlation between soluble sugar and starch flux in vegetable and grain pea seeds [32]. Alves‐Carvalho et al. [39] developed the Pea RNA‐Seq gene atlas containing expression profiles of thousands of genes in different pea tissues harvested at distinct developmental stages [48].

Although RNA‐seq technology is indispensable for exhaustive transcriptome studies, it is not the most cost‐efficient tool for gene expression analysis due to substantial sequencing depth required for rare transcript detection. There are RNA‐seq modifications, for example, Massive Analysis of cDNA Ends (MACE) developed by GenXPro GmbH (Frankfurt am Main, Germany) (http://genxpro.net/) that increase the sequencing depth (number of reads per‐transcript) by sequencing only a 50–500 bp fragment (adjacent to the 5' or 3'‐end of the transcript, dependent on the version) [49]. As each read originates from a distinct copy of mRNA, MACE technology is free of duplications and similar artefacts, leading to much more accurate transcript quantifi‐ cation. Even though MACE data cannot be used to distinguish expression of splice‐variants of genes, it can be successfully applied in a number of scenarios even with species not possessing a high‐quality transcriptome.

In our opinion, 5'MACE is a technology possessing potential for simultaneous analysis of gene expression in prokaryotic and eukaryotic organisms; therefore, this technology is practi‐ cally tailor‐made for the analysis of plant‐microbe interaction, particularly for studying the process of root nodule development in the plants of the Fabaceae family.

One of the many challenges in analysing the onset of nodule symbiosis is the small amounts of tissue available. Enclosed environments of symbiotic compartments complicate direct measurements. Implementation of 5'MACE technology made it possible to analyse the gene expression patterns of both organisms simultaneously in a developing nodule and at a frac‐ tion of the cost of a full RNA‐seq study.

In our group, 5'MACE was implemented in a study investigating the expression changes in pea nodules caused by a mutation in the *Sym31* gene with unknown function. This gene is responsible for the unique fix− mutant phenotype (non‐nitrogen‐fixing nodules) with halted bacteroid development [50]. Two plant genotypes Sprint‐2Fix<sup>−</sup> (carrying a mutation in the *Sym31* gene) and parental wild‐type line Sprint‐2 were inoculated with an efficient *Rhizobium leguminosarum* bv. *viciae* RCAM1026 [51]. All the obtained reads were sequentially mapped to the RCAM1026 genome (about 8% mapped reads), then to the pea transcriptome assem‐ bly from Alves‐Carvalho et al. [39] (about 60% mapped reads) resulting in two sets of dif‐ ferential transcriptome data. The transcript quantification was carried out using the edgeR package [52]. Differentially expressed genes were then visualised on a metabolic map using KOBAS 2.0 annotation server [53]. Analysis resulted in the discovery of a coordinated shift in sulphur metabolism in both organisms. These preliminary data show the great potential of the 5'MACE technology in furthering our understanding of inter‐organismal gene regulatory networks in plant‐microbe interactions.
