**2. Transcriptome assembly studies**

#### *2.1. P. sativum* **transcriptomics**

The genome of *P. sativum* is as of yet not assembled due to its comparatively large size and numer‐ ous repeats, greatly reducing the number of research methods available. Pea transcriptome, unlike genome, is closer in size to transcriptomes of other legumes, including model plant *M. truncatula*, making it more susceptible to analysis. Due to the existence of tissue‐specific gene expression, different plant tissues possess unique sets of transcripts, making the choice of tis‐ sue samples important for further research. Furthermore, transcriptome assemblies from distinct plant organs should be used as reference for analysis of tissue‐specific processes. A high‐quality transcriptome assembly with full tissue representation is therefore crucial for studies associated with gene interactions (differential gene expression, see section 3), gene polymorphism studies and proteome analysis.

annotation off of the researcher. Trinotate combines the output of a number of annotation tools

One example of an 'orphan' legume is garden pea (*Pisum sativum* L.), a valuable pulse crop capable of forming both nitrogen‐fixing symbiosis and arbuscular mycorrhiza. Global pro‐ duction of green pea in 2014 was 17.4 million tons, harvested from 2.3 million hectares, with an additional 11.2 million tons of dried pea from 6.9 million hectares [6]. The genome of the species is considered to be about 4300 Mb with high percentage of repetitive sequences [27]. Adaptation of RNA‐seq data analysis approaches standardised for model plants to *P. sativum* should facilitate both studying of pea molecular genetics and breeding of new cultivars pos‐

Black medick (*Medicago lupulina* L.), a close relative of a model legume plant barrel medick (*M.truncatula* Gaertn.), is another example of an important (but almost not studied in terms of genet‐ ics) non‐model legume. It is valuable as a pasture legume component in complex grass mixtures and can also be used as an intermediate culture in crop rotation and as green manure. Black medick is characterised by high protein, vitamin and mineral content, long growing season and ability for improving soil fertility due to nitrogen fixation, therefore being a perfect lawn plant [28]. Black medick is a very promising object for studying AM functioning and development, since a unique genetic line of *M. lupulina* obligatory dependent on arbuscular mycorrhiza symbi‐ osis formation has been selected from the spring landrace population VIK‐32 of *M. lupulina* var. *vulgaris* Koch originating from Kazakhstan [28, 29]. Plants of the line MlS‐1 (for *Medicago lupulina* Spring) [28] demonstrate dwarfism when grown in the soil with low Pi (inorganic phosphorus) level in the absence of the AM fungi inoculation but can grow normally when inoculated with AM fungus. Therefore, MlS‐1 line is considered highly effective in AM symbiosis formation (as inoculation by fungi dramatically heightens the plant biomass). Apparently, MlS‐1 line is only capable of using the symbiotrophic way of phosphorus uptake from the soil, supposedly due to yet unidentified mutation(s) and, consequently, can serve as a model object for the investigation of arbuscular‐mycorrhizal symbiosis. For instance, this line is suitable for mutagenesis aimed at selection of mutants with defects in arbuscular mycorrhiza development, since plants carrying mutations in genes related to AM formation can be easily identified by visual examination as

High level of genome synteny, similarity of gene sequences and developmental processes pro‐ vide the opportunity to use the vast amounts of data accumulated on *M. truncatula* in genetics, genomic and transcriptomics of these non‐model legumes *M. lupulina* and *P. sativum*. In this chapter, we give a brief description of the current achievements in the field of transcriptomics

The genome of *P. sativum* is as of yet not assembled due to its comparatively large size and numer‐ ous repeats, greatly reducing the number of research methods available. Pea transcriptome,

of non‐model legumes black medick (*M. lupulina*) and garden pea (*P. sativum*).

into an integrated database simplifying the following deeper analysis of acquired data.

230 Applications of RNA-Seq and Omics Strategies - From Microorganisms to Human Health

sessing agriculturally important traits.

demonstrating dwarfism under inoculation with AM fungi [29].

**2. Transcriptome assembly studies**

*2.1. P. sativum* **transcriptomics**

In the last 5 years, several pea transcriptome assemblies of distinct organs and tissues were presented by different workgroups. The first publication of pea transcriptome sequencing and assembly was made by Franssen et al. [30]. Total of 20 libraries from flowers, leaves, cotyledons, epicotyls and hypocotyls and etiolated and light‐treated etiolated seedlings were sequenced using the Roche 454 sequencing platform. Several iterations of de novo assembly and merging yielded 81,449 unigenes. Sudheesh et al. [31] sequenced transcriptomes from dif‐ ferent parts (leaf, stipule, stem, tendril tissues from multiple nodes, root‐tip tissues, flowers, stamens, pistils, immature pods, immature seeds and nodules) of two pea cultivars (Parafield and Kaspa) differing in both seed and plant morphological characteristics. Read assembly for separate cultivars yielded 126,335 and 145,730 contigs, respectively, with 87% showing signif‐ icant expression levels in both cultivars. Later on, Liu et al. sequenced samples from pea seeds harvested at the stage of 10 and 25 days after pollination and assembled 77,273 unigenes [32].

Several transcriptome assembly sets were generated for Single Nucleotide Polymorphism (SNP) marker development and genetic mapping in pea (see section 4). Duarte et al. [33] sequenced libraries from eight pea cultivars (six spring sown, one winter sown field pea, one fodder pea cultivar) with Roche 454 technology. A total of 3,826,797 reads were assembled into 68,850 contigs by MIRA transcriptome assembler [34]. Sindhu et al. sequenced 3'‐anchored libraries of eight diverse pea accessions (six *P. sativum* cultivars (CDC Bronco, Alfetta, Cooper, CDC Striker, Nitouche and Orb) and two wild accessions P651 (*P. fulvum*), PI 358610 (*P. sati‐ vum* ssp. *abyssinicum*)) with Roche 454 technology, generating 4,008,648 reads in total. De novo assembly was performed for 520,797 reads from the CDC Bronco by MIRA, resulting in a set of 29,725 reference contigs representing a significant proportion of the 3′ end of genes in pea [35].

Since analysis of inter organismal genetic network between pea and rhizobia is a poorly developed field, assembly of a high‐quality transcriptome provided researchers with the much‐needed data on nodule‐specific transcripts. Transcriptomes of pea nodules and root tips were obtained by Zhukov et al. [36]. Transcriptome sequencing using the Illumina Genome Analyzer IIx platform (Illumina Inc.) generated 52,021,865 reads from the 'Nodules' library and 17,684,604 reads from the 'Root Tips' library, yielding 58,397 and 37,287 contigs assembled de novo by Trinity, respectively [37]. A total of 13,000 nodule‐specific contigs were annotated by alignment to known plant protein‐coding sequences and by Gene Ontology search. Of these, 581 sequences were found to possess full Coding DNA Sequence (CDSs) and could thus be considered novel nodule‐specific transcripts of pea. Further investigation of those transcripts can potentially lead to the discovery of key regulators of nodule symbiosis, such as identifica‐ tion of pea gene homologous to *Nodulation signaling pathway 1 (NSP1*) gene of *M. truncatula* [38]. In this study, pea gene *Sym34* was shown to be homologous to the *M. truncatula NSP1* gene, based on preliminary stop codons detected in an open reading frame of *NSP1* homologous sequence in two *sym34* allelic mutants (RisNod1 and RisNod23) and full co‐segregation of the alleles of the hypothetical pea *Nsp1* gene with the nodulation phenotype in F2 generation.

Alves‐Carvalho et al. [39] sequenced transcriptomes of roots, nodules, shoots, leaves, flowers, seeds, tendrils and pods harvested at different developmental stages of pea cultivar 'Caméor'. Sequencing of 20 cDNA libraries produced one billion reads. After de novo assembly and sev‐ eral steps of redundancy reduction, 46,099 contigs were obtained. The main objective of their study was to obtain the most complete transcriptome and to filter out all the artefacts and chime‐ ric contigs so a rigorous filtration pipeline was developed and implemented. The accumulated transcriptome data was used for the development of the Pea RNA‐Seq gene atlas containing expression profiles of thousands of genes in several organs, including symbiotic nodules. It is worth noting that the pipeline used in this work filtered out a large proportion of short protein‐ coding transcripts, including a number of NCR peptide‐coding transcripts [40], making the Pea RNA‐Seq gene atlas less useful than tissue‐specific transcriptomes in some cases.

Pea RNA‐Seq gene atlas is also lacking information regarding mycorrhiza‐specific transcripts. Genetic framework of mycorrhizal symbiosis is as of yet not fully understood in either model or non‐model legumes [38]. In order to discover symbiotically active genes both in plant roots and arbuscular‐mycorrhizal fungus, a transcriptome of Frisson pea cultivar roots colonised by *Rhizophagus irregularis* isolate BEG144 was assembled by our workgroup. Sequencing was performed on an Illumina HiSeq2000 sequencing platform yielding 120 million pair end reads. In order to separate the transcriptomes of two organisms present in the samples, all the reads were mapped using the HISAT2 mapper [41] to the genome of *R. irregularis* [42]. Over 5 million successfully mapped reads were assembled by Trinity with default parameters yield‐ ing 30,000 transcripts, in good correlation with 28,000 of known genes for the fungus [42, 43].

This line may potentially be extremely useful as a model for investigation of genetic founda‐ tions of mycorrhizal symbiosis. *M. lupulina* is a novel object for genomic studies, so to kick‐ start its analysis the transcriptome of the mycorrhized roots of *M. lupulina* was sequenced using the Illumina 2500 platform. Plants of MlS‐1 line were grown in soil under inoculation with *R. irregularis* strain RCAM00320, followed by total RNA extraction from the mycorrhized root system and appropriate preparation of cDNA libraries for Illumina sequencing. Using Trinity assembly pipeline, 41 million paired reads were assembled yielding over 138,000 con‐ tigs, of which 19,022 showed resemblance to genes of *R. irregularis*. Further analysis revealed over 70,000 contigs similar to known genes of *M. truncatula*. The assembled transcriptome can

**Figure 2.** The results of BUSCO analysis of pea transcriptomes. Light‐blue: complete and single‐copy genes; dark‐blue:

Transcriptomic Studies in Non-Model Plants: Case of *Pisum sativum* L. and *Medicago lupulina* L.

http://dx.doi.org/10.5772/intechopen.69057

233

Analysis of alterations in gene expression between conditions or genotypes is the most sig‐ nificant part of transcriptomic data analysis. The differences in expression levels can help determine the important genes and elucidate the processes taking place in the investigated

Extensive analysis of gene expression can be carried out by microarray analysis or RNA sequencing technology. Microarray technology requires prior knowledge of gene sequences and is more suitable for objects with available genome sequence. In the case of model object *M. trun‐ catula,* combination of microarray data resulted in development of atlas of gene expression pro‐ files (*Medicago truncatula* Gene Expression Atlas (MtGEA)) (https://mtgea.noble.org/v3/). MtGEA contains information about gene expression in roots, nodules, stems, petioles, leaves, vegetative buds, flowers, seeds, pods and is potentially helpful for studying other legumes. Despite the fact that pea genome is not sequenced yet, several studies of pea gene expression have been carried

be used as reference for differential gene expression analysis.

complete and duplicated genes; yellow: fragmented genes; red: missing genes.

**3. Differential gene expression (DGE)**

samples.

All the transcripts not mapped to the *R. irregularis* genome were then assembled with the Trinity pipeline with standard assembly parameters and quality trimming parameters. This resulted in more than 200,000 contigs, of which more than 100,000 were similar to genes of pea and other plants of the Fabaceae family.

An assessment of transcriptome assembly and annotation completeness with single‐copy orthologs for all available pea transcriptomes was carried out using BUSCO V.2 software with OrthoDB v9.1 'embryophyta' base as a reference [44]. The lowest number of present groups in the transcriptome published by Franssen et al. [30] named 'Franssen' is due to low transcrip‐ tome coverage. High number of missing groups in 'Kaspa', 'Parafield' and 'SGE' assemblies are most likely the result of limited tissue representation (see **Figure 2**). Deep sequencing of mycorrhized roots yielded similar results in regard to transcriptome completeness as a com‐ bined transcriptome from 20 tissues, indicative of assembly of low‐copy transcripts due to high transcriptome coverage.

#### *2.2. M. lupulina* **transcriptomics**

*M. lupulina* is a plant of the Fabaceae family, a close relative to the *M. truncatula*, for which a unique genetic line MlS‐1 characterised by obligate mycotrophic lifestyle was obtained [28].

**Figure 2.** The results of BUSCO analysis of pea transcriptomes. Light‐blue: complete and single‐copy genes; dark‐blue: complete and duplicated genes; yellow: fragmented genes; red: missing genes.

This line may potentially be extremely useful as a model for investigation of genetic founda‐ tions of mycorrhizal symbiosis. *M. lupulina* is a novel object for genomic studies, so to kick‐ start its analysis the transcriptome of the mycorrhized roots of *M. lupulina* was sequenced using the Illumina 2500 platform. Plants of MlS‐1 line were grown in soil under inoculation with *R. irregularis* strain RCAM00320, followed by total RNA extraction from the mycorrhized root system and appropriate preparation of cDNA libraries for Illumina sequencing. Using Trinity assembly pipeline, 41 million paired reads were assembled yielding over 138,000 con‐ tigs, of which 19,022 showed resemblance to genes of *R. irregularis*. Further analysis revealed over 70,000 contigs similar to known genes of *M. truncatula*. The assembled transcriptome can be used as reference for differential gene expression analysis.
