: *E. coli malE* gene is used as a negative control.

was not carried out at the second screen.

**Class Official ID** Larva

growth and development rather than molting and metamorphosis. In contrast, most of the insects injected with TC001872/Cirl dsRNA entered the quiescent stage and died during this stage. About 40% of the insect injected TC001872/Cirl dsRNA were able to molt to the pupal stage and eventually died during the early pupal stage. The majority of insects injected with TC012521/stan dsRNA was not able to complete adult eclosion and died during pharate adult stage. Interestingly, TC014055/fz and TC009370/mthl RNAi caused an arrest in both larval-pupal and pupal-adult ecdysis, suggesting that they may play important roles in the regulation of ecdysis behavior. In contrast, insects injected with TC009370/mthl dsRNA were arrested at the late phase of larval-pupal and pupal-adult ecdysis. The majority of insects injected with TC005545/smo dsRNA died during the early pupal stages without

The GPCRs identified in this study [71] could be served as potential pesticide targets, which can be used in small molecule screen, or the development of RNAi-based pesticides. Among the identified GPCRs, many of them belong to classic GPCR families, e.g. biogenic amine receptors (TC007490 /D2R and TC011960/5-HTR) and neuropeptide receptors (TC009127/glycoprotein hormone-like receptor). These GPCRs, which are activated by small molecules, can be used as potential tar-gets for novel pesticide development. On the other hand, it may not be possible to apply small molecule ligands for pest management through targeting identified atypical GPCRs (e.g. TC014055 / fz and TC005545 / smo) whose ligands tend to be larger proteins. However, it should be possible to develop a RNAi-based pest control strategy through ingestion of specific dsRNA targeting atypical GPCRs as well as

Mortality

**Table 3. Summary of RNAi for 12 GPCRs in** *T. castaneum*. Asterisk indicates that RNAi for TC012493

/ *malE #* 6.7% 0.0% 2.4% 2.4%

**First Screen Second Screen** 

Larva Mortality

Pupa Mortality

Pupa Mortality

TC007490 64.3% 35.7% 100.0% 0.0% TC008163 10.0% 90.0% 21.2% 75.8% TC009127 50.0% 16.7% 40.0% 0.0% TC006805 0.0% 62.5% 9.1% 54.5% TC013945 0.0% 100.0% 42.1% 52.6% TC012493 20.0% 60.0% \* \* TC004716 0.0% 41.7% 38.9% 22.2%

TC001872 55.6% 44.4% 68.4% 31.6% TC009370 0.0% 90.0% 42.9% 57.1% TC012521 0.0% 90.0% 31.6% 68.4%

TC014055 0.0% 100.0% 60.0% 40.0% TC005545 0.0% 92.3% 46.7% 53.3% Genome-wide RNAi screen is a powerful technique for studying gene functions, deciphering complex phenotypes, and identifying novel drug targets. It opens up a whole new field that allows researchers to explore new modulators in classical signaling pathways, new mechanisms underlying basic biological functions, and new drug targets of human diseases. An increasing number of genome-wide RNAi screens have been successfully conducted for all kinds of novel discoveries. Although the off-target effects and other false discovery issues still remain, RNAi screen technique will be greatly improved as the development of new RNAi libraries and image detection instruments. Most importantly, as our understanding of RNAi pathway continues to grow, we will be able to design more specific and effective RNAi tools for genome-wide RNAi screen. There is no doubt that, through genome-wide RNAi screens, we will gain more insights into complex signaling networks and molecular mechanism of diseases in the near future, which will eventually lead to the discovery of novel therapeutic drug and crop protection reagents.

#### **Author details**

Hua Bai *Department of Ecology and Evolutionary Biology, Brown University, USA* 

#### **Acknowledgement**

I'd like to thank Subba R. Palli and Ping Kang for valuable comments on the manuscript, and Ellison Medical Foundation/AFAR postdoctoral fellowship for the financial support.

#### **5. References**


[7] Bentwich I, Avniel A, Karov Y, Aharonov R, Gilad S, Barad O, et al. Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet. 2005 Jul;37(7):766-70.

Genome-Wide RNAi Screen for the Discovery of

Gene Function, Novel Therapeutical Targets and Agricultural Applications 109

[23] Perrimon N, Ni JQ, Perkins L. In vivo RNAi: today and tomorrow. Cold Spring Harb

[24] Ramadan N, Flockhart I, Booker M, Perrimon N, Mathey-Prevot B. Design and implementation of high-throughput RNAi screens in cultured Drosophila cells. Nat

[25] Clemens JC, Worby CA, Simonson-Leff N, Muda M, Maehama T, Hemmings BA, et al. Use of double-stranded RNA interference in Drosophila cell lines to dissect signal

[27] Kamentsky L, Jones TR, Fraser A, Bray MA, Logan DJ, Madden KL, et al. Improved structure, function and compatibility for CellProfiler: modular high-throughput image

[28] Kiger AA, Baum B, Jones S, Jones MR, Coulson A, Echeverri C, et al. A functional genomic analysis of cell morphology using RNA interference. J Biol. 2003;2(4):27. [29] Bjorklund M, Taipale M, Varjosalo M, Saharinen J, Lahdenpera J, Taipale J. Identification of pathways regulating cell size and cell-cycle progression by RNAi.

[30] Sims D, Duchek P, Baum B. PDGF/VEGF signaling controls cell size in Drosophila.

[31] Gonsalves FC, Klein K, Carson BB, Katz S, Ekas LA, Evans S, et al. An RNAi-based chemical genetic screen identifies three small-molecule inhibitors of the Wnt/wingless

[32] Seyhan AA, Varadarajan U, Choe S, Liu Y, McGraw J, Woods M, et al. A genome-wide RNAi screen identifies novel targets of neratinib sensitivity leading to neratinib and

[33] Zhu YX, Tiedemann R, Shi CX, Yin H, Schmidt JE, Bruins LA, et al. RNAi screen of the druggable genome identifies modulators of proteasome inhibitor sensitivity in

[34] Kulkarni MM, Booker M, Silver SJ, Friedman A, Hong P, Perrimon N, et al. Evidence of off-target effects associated with long dsRNAs in Drosophila melanogaster cell-based

[35] Booker M, Samsonova AA, Kwon Y, Flockhart I, Mohr SE, Perrimon N. False negative rates in Drosophila cell-based RNAi screens: a case study. BMC Genomics. 2011;12:50. [36] Bakal C, Linding R, Llense F, Heffern E, Martin-Blanco E, Pawson T, et al. Phosphorylation networks regulating JNK activity in diverse genetic backgrounds.

[37] Tu Z, Argmann C, Wong KK, Mitnaul LJ, Edwards S, Sach IC, et al. Integrating siRNA and protein-protein interaction data to identify an expanded insulin signaling network.

signaling pathway. Proc Natl Acad Sci U S A. 2011 Apr 12;108(15):5954-63.

paclitaxel combination drug treatments. Mol Biosyst. 2011 Jun;7(6):1974-89.

myeloma including CDK5. Blood. 2011 Apr 7;117(14):3847-57.

transduction pathways. Proc Natl Acad Sci U S A. 2000 Jun 6;97(12):6499-503. [26] Hammond SM, Bernstein E, Beach D, Hannon GJ. An RNA-directed nuclease mediates post-transcriptional gene silencing in Drosophila cells. Nature. 2000 Mar

analysis software. Bioinformatics. 2011 Apr 15;27(8):1179-80.

Perspect Biol. 2010 Aug;2(8):a003640.

Nature. 2006 Feb 23;439(7079):1009-13.

assays. Nat Methods. 2006 Oct;3(10):833-8.

Science. 2008 Oct 17;322(5900):453-6.

Genome Res. 2009 Jun;19(6):1057-67.

Genome Biol. 2009;10(2):R20.

Protoc. 2007;2(9):2245-64.

16;404(6775):293-6.


[23] Perrimon N, Ni JQ, Perkins L. In vivo RNAi: today and tomorrow. Cold Spring Harb Perspect Biol. 2010 Aug;2(8):a003640.

108 Functional Genomics

98.

Jul;37(7):766-70.

Oct;5(10):1825-33.

9;435(7043):828-33.

Dec 23;105(51):20380-5.

Science. 2008 Feb 15;319(5865):921-6.

PLoS Genet. 2005 Jul;1(1):119-28.

Annu Rev Biochem. 2010;79:37-64.

Nature. 2000 Feb 24;403(6772):901-6.

[7] Bentwich I, Avniel A, Karov Y, Aharonov R, Gilad S, Barad O, et al. Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet. 2005

[8] Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE, Bettinger JC, Rougvie AE, et al. The 21 nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans.

[9] Arasu P, Wightman B, Ruvkun G. Temporal regulation of lin-14 by the antagonistic action of two other heterochronic genes, lin-4 and lin-28. Genes Dev. 1991

[10] Esau C, Davis S, Murray SF, Yu XX, Pandey SK, Pear M, et al. miR-122 regulation of lipid metabolism revealed by in vivo antisense targeting. Cell Metab. 2006 Feb;3(2):87-

[11] Trajkovski M, Hausser J, Soutschek J, Bhat B, Akin A, Zavolan M, et al. MicroRNAs 103

[12] Mencia A, Modamio-Hoybjor S, Redshaw N, Morin M, Mayo-Merino F, Olavarrieta L, et al. Mutations in the seed region of human miR-96 are responsible for nonsyndromic

[13] He L, Thomson JM, Hemann MT, Hernando-Monge E, Mu D, Goodson S, et al. A microRNA polycistron as a potential human oncogene. Nature. 2005 Jun

[14] Consortium CeS. Genome sequence of the nematode C. elegans: a platform for

[15] Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, et al. The genome sequence of Drosophila melanogaster. Science. 2000 Mar 24;287(5461):2185-95. [16] Friedman A, Perrimon N. A functional RNAi screen for regulators of receptor tyrosine

[17] Luo B, Cheung HW, Subramanian A, Sharifnia T, Okamoto M, Yang X, et al. Highly parallel identification of essential genes in cancer cells. Proc Natl Acad Sci U S A. 2008

[18] Pospisilik JA, Schramek D, Schnidar H, Cronin SJ, Nehme NT, Zhang X, et al. Drosophila genome-wide obesity screen reveals hedgehog as a determinant of brown

[19] Brass AL, Dykxhoorn DM, Benita Y, Yan N, Engelman A, Xavier RJ, et al. Identification of host proteins required for HIV infection through a functional genomic screen.

[20] Hamilton B, Dong Y, Shindo M, Liu W, Odell I, Ruvkun G, et al. A systematic RNAi screen for longevity genes in C. elegans. Genes Dev. 2005 Jul 1;19(13):1544-55. [21] Hansen M, Hsu AL, Dillin A, Kenyon C. New genes tied to endocrine, metabolic, and dietary regulation of lifespan from a Caenorhabditis elegans genomic RNAi screen.

[22] Mohr S, Bakal C, Perrimon N. Genomic screening with RNAi: results and challenges.

and 107 regulate insulin sensitivity. Nature. 2011 Jun 30;474(7353):649-53.

progressive hearing loss. Nat Genet. 2009 May;41(5):609-13.

investigating biology. Science. 1998 Dec 11;282(5396):2012-8.

kinase and ERK signalling. Nature. 2006 Nov 9;444(7116):230-4.

versus white adipose cell fate. Cell. 2010 Jan 8;140(1):148-60.


[38] Berns K, Hijmans EM, Mullenders J, Brummelkamp TR, Velds A, Heimerikx M, et al. A large-scale RNAi screen in human cells identifies new components of the p53 pathway. Nature. 2004 Mar 25;428(6981):431-7.

Genome-Wide RNAi Screen for the Discovery of

Gene Function, Novel Therapeutical Targets and Agricultural Applications 111

[53] Neely GG, Kuba K, Cammarato A, Isobe K, Amann S, Zhang L, et al. A global in vivo Drosophila RNAi screen identifies NOT3 as a conserved regulator of heart function.

[54] Neely GG, Hess A, Costigan M, Keene AC, Goulas S, Langeslag M, et al. A genomewide Drosophila screen for heat nociception identifies alpha2delta3 as an evolutionarily

[55] Neumuller RA, Richter C, Fischer A, Novatchkova M, Neumuller KG, Knoblich JA. Genome-wide analysis of self-renewal in Drosophila neural stem cells by transgenic

[56] Fernandes C, Rao Y. Genome-wide screen for modifiers of Parkinson's disease genes in

[57] Lee SS. Whole genome RNAi screens for increased longevity: important new insights

[58] Kenyon C, Chang J, Gensch E, Rudner A, Tabtiang R. A C. elegans mutant that lives

[59] Kaeberlein M, Powers RW, 3rd, Steffen KK, Westman EA, Hu D, Dang N, et al. Regulation of yeast replicative life span by TOR and Sch9 in response to nutrients.

[60] Apfeld J, O'Connor G, McDonagh T, DiStefano PS, Curtis R. The AMP-activated protein kinase AAK-2 links energy levels and insulin-like signals to lifespan in C. elegans.

[61] Paik D, Jang YG, Lee YE, Lee YN, Yamamoto R, Gee HY, et al. Misexpression screen delineates novel genes controlling Drosophila lifespan. Mech Ageing Dev. 2012 Feb 24. [62] Whitehurst AW, Bodemann BO, Cardenas J, Ferguson D, Girard L, Peyton M, et al. Synthetic lethal screen identification of chemosensitizer loci in cancer cells. Nature. 2007

[63] Tomoyasu Y, Denell RE. Larval RNAi in Tribolium (Coleoptera) for analyzing adult

[64] Sanchez-Vargas I, Travanty EA, Keene KM, Franz AW, Beaty BJ, Blair CD, et al. RNA interference, arthropod-borne viruses, and mosquitoes. Virus Res. 2004 Jun 1;102(1):65-

[65] Nunes FM, Simoes ZL. A non-invasive method for silencing gene transcription in honeybees maintained under natural conditions. Insect Biochem Mol Biol. 2009

[66] Shakesby AJ, Wallace IS, Isaacs HV, Pritchard J, Roberts DM, Douglas AE. A waterspecific aquaporin involved in aphid osmoregulation. Insect Biochem Mol Biol. 2009

[67] Zhou X, Wheeler MM, Oi FM, Scharf ME. RNA interference in the termite Reticulitermes flavipes through ingestion of double-stranded RNA. Insect Biochem Mol

[68] Baum JA, Bogaert T, Clinton W, Heck GR, Feldmann P, Ilagan O, et al. Control of coleopteran insect pests through RNA interference. Nat Biotechnol. 2007

Cell. 2010 Apr 2;141(1):142-53.

Drosophila. Mol Brain. 2011;4:17.

Science. 2005 Nov 18;310(5751):1193-6.

Genes Dev. 2004 Dec 15;18(24):3004-9.

Apr 12;446(7137):815-9.

Feb;39(2):157-60.

Jan;39(1):1-10.

Nov;25(11):1322-6.

Biol. 2008 Aug;38(8):805-15.

74.

conserved pain gene. Cell. 2010 Nov 12;143(4):628-38.

but not the whole story. Exp Gerontol. 2006 Oct;41(10):968-73.

twice as long as wild type. Nature. 1993 Dec 2;366(6454):461-4.

development. Dev Genes Evol. 2004 Nov;214(11):575-8.

RNAi. Cell Stem Cell. 2011 May 6;8(5):580-93.


[53] Neely GG, Kuba K, Cammarato A, Isobe K, Amann S, Zhang L, et al. A global in vivo Drosophila RNAi screen identifies NOT3 as a conserved regulator of heart function. Cell. 2010 Apr 2;141(1):142-53.

110 Functional Genomics

79.

23;458(7241):1047-50.

Jan;33(1):40-8.

2008 Jan;5(1):49-51.

Science. 2009 Jul 17;325(5938):340-3.

Nature. 2004 Mar 25;428(6981):431-7.

Nat Genet. 2009 Oct;41(10):1133-7.

[38] Berns K, Hijmans EM, Mullenders J, Brummelkamp TR, Velds A, Heimerikx M, et al. A large-scale RNAi screen in human cells identifies new components of the p53 pathway.

[39] Boehm JS, Zhao JJ, Yao J, Kim SY, Firestein R, Dunn IF, et al. Integrative genomic approaches identify IKBKE as a breast cancer oncogene. Cell. 2007 Jun 15;129(6):1065-

[40] Sessions OM, Barrows NJ, Souza-Neto JA, Robinson TJ, Hershey CL, Rodgers MA, et al. Discovery of insect and human dengue virus host factors. Nature. 2009 Apr

[41] Konig R, Stertz S, Zhou Y, Inoue A, Hoffmann HH, Bhattacharyya S, et al. Human host factors required for influenza virus replication. Nature. 2010 Feb 11;463(7282):813-7. [42] Meacham CE, Ho EE, Dubrovsky E, Gertler FB, Hemann MT. In vivo RNAi screening identifies regulators of actin dynamics as key determinants of lymphoma progression.

[43] Bric A, Miething C, Bialucha CU, Scuoppo C, Zender L, Krasnitz A, et al. Functional identification of tumor-suppressor genes through an in vivo RNA interference screen in

[44] Fraser AG, Kamath RS, Zipperlen P, Martinez-Campos M, Sohrmann M, Ahringer J. Functional genomic analysis of C. elegans chromosome I by systematic RNA

[45] Dillin A, Hsu AL, Arantes-Oliveira N, Lehrer-Graiwer J, Hsin H, Fraser AG, et al. Rates of behavior and aging specified by mitochondrial function during development.

[46] Lee SS, Lee RY, Fraser AG, Kamath RS, Ahringer J, Ruvkun G. A systematic RNAi screen identifies a critical role for mitochondria in C. elegans longevity. Nat Genet. 2003

[47] Wang MC, O'Rourke EJ, Ruvkun G. Fat metabolism links germline stem cells and

[48] Parry DH, Xu J, Ruvkun G. A whole-genome RNAi Screen for C. elegans miRNA

[49] Maeda I, Kohara Y, Yamamoto M, Sugimoto A. Large-scale analysis of gene function in Caenorhabditis elegans by high-throughput RNAi. Curr Biol. 2001 Feb 6;11(3):171-6. [50] Ni JQ, Markstein M, Binari R, Pfeiffer B, Liu LP, Villalta C, et al. Vector and parameters for targeted transgenic RNA interference in Drosophila melanogaster. Nat Methods.

[51] Saj A, Arziman Z, Stempfle D, van Belle W, Sauder U, Horn T, et al. A combined ex vivo and in vivo RNAi screen for notch regulators in Drosophila reveals an extensive

[52] Cronin SJ, Nehme NT, Limmer S, Liegeois S, Pospisilik JA, Schramek D, et al. Genomewide RNAi screen identifies genes involved in intestinal pathogenic bacterial infection.

a mouse lymphoma model. Cancer Cell. 2009 Oct 6;16(4):324-35.

longevity in C. elegans. Science. 2008 Nov 7;322(5903):957-60.

notch interaction network. Dev Cell. 2010 May 18;18(5):862-76.

pathway genes. Curr Biol. 2007 Dec 4;17(23):2013-22.

interference. Nature. 2000 Nov 16;408(6810):325-30.

Science. 2002 Dec 20;298(5602):2398-401.


[69] Mao YB, Cai WJ, Wang JW, Hong GJ, Tao XY, Wang LJ, et al. Silencing a cotton bollworm P450 monooxygenase gene by plant-mediated RNAi impairs larval tolerance of gossypol. Nat Biotechnol. 2007 Nov;25(11):1307-13.

**Chapter 6** 

© 2012 Husnain et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2012 Husnain et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**How RNA Interference Combat Viruses in Plants** 

Bushra Tabassum, Idrees Ahmad Nasir, Usman Aslam and Tayyab Husnain

RNA mediated silencing technology has now become the tool of choice for induction of virus resistance in plants. A significant feature of this technology is the presence of doublestranded RNA (dsRNA), which is not only the product of RNA silencing but also the potent triggers of RNA interference (RNAi). Upon RNAi induction, these dsRNAs are diced into short RNA fragments termed as small interfering RNAs (siRNAs), which are hallmarks of RNAi. Considerable resistance in transgenic plants against viruses can be created by exploiting the phenomenon of RNAi. In the current chapter, generation of potato virus Y (*PVY*) resistant potato and sugarcane mosaic virus (*SCMV)* resistant sugarcane by CEMB has

We are in the dawn of a new age in functional genomics driven by RNAi methods. RNA interference (RNAi) refers to a post-transcriptional process triggered by the introduction of double-stranded RNA (dsRNA) which leads to gene silencing in a sequence-specific manner. It is one of the most exciting discoveries of the past decade in functional genomics and is rapidly becoming an important method for analyzing gene functions in eukaryotes and holds promise for the development of therapeutic gene silencing and which is therefore

Agriculture sector of any country strengthen the economy by contributing in its gross domestic product (GDP). In Pakistan, the major agricultural crops include cotton, wheat, rice, sugarcane, potato and tomato etc. All of the above mentioned crops has great potential for yield and contribute 24 % in gross domestic product (GDP) of Pakistan economy [1]. There is a major gap in the actual yield potential of each crop with respect to its harvested yield, possible reasons include disease attack, environmental damages and in some cases lack of quality seed. Disease attack as being the most common cause include infections by

currently the most widely used gene-silencing technique in functional genomics.

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772//51870

been quoted as an example.

**2. Need for resistance** 

**1. Introduction** 


## **How RNA Interference Combat Viruses in Plants**

Bushra Tabassum, Idrees Ahmad Nasir, Usman Aslam and Tayyab Husnain

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772//51870

#### **1. Introduction**

112 Functional Genomics

24;452(7190):949-55.

Genomics. 2011;12:388.

Aug 1;344(1):248-58.

Biotechnol. 2008 Jul;26(7):393-400.

[69] Mao YB, Cai WJ, Wang JW, Hong GJ, Tao XY, Wang LJ, et al. Silencing a cotton bollworm P450 monooxygenase gene by plant-mediated RNAi impairs larval tolerance

[70] Richards S, Gibbs RA, Weinstock GM, Brown SJ, Denell R, Beeman RW, et al. The genome of the model beetle and pest Tribolium castaneum. Nature. 2008 Apr

[71] Bai H, Zhu F, Shah K, Palli SR. Large-scale RNAi screen of G protein-coupled receptors involved in larval growth, molting and metamorphosis in the red flour beetle. BMC

[72] Bai H, Palli SR. Functional characterization of bursicon receptor and genome-wide analysis for identification of genes affected by bursicon receptor RNAi. Dev Biol. 2010

[73] Price DR, Gatehouse JA. RNAi-mediated crop protection against insects. Trends

of gossypol. Nat Biotechnol. 2007 Nov;25(11):1307-13.

RNA mediated silencing technology has now become the tool of choice for induction of virus resistance in plants. A significant feature of this technology is the presence of doublestranded RNA (dsRNA), which is not only the product of RNA silencing but also the potent triggers of RNA interference (RNAi). Upon RNAi induction, these dsRNAs are diced into short RNA fragments termed as small interfering RNAs (siRNAs), which are hallmarks of RNAi. Considerable resistance in transgenic plants against viruses can be created by exploiting the phenomenon of RNAi. In the current chapter, generation of potato virus Y (*PVY*) resistant potato and sugarcane mosaic virus (*SCMV)* resistant sugarcane by CEMB has been quoted as an example.

We are in the dawn of a new age in functional genomics driven by RNAi methods. RNA interference (RNAi) refers to a post-transcriptional process triggered by the introduction of double-stranded RNA (dsRNA) which leads to gene silencing in a sequence-specific manner. It is one of the most exciting discoveries of the past decade in functional genomics and is rapidly becoming an important method for analyzing gene functions in eukaryotes and holds promise for the development of therapeutic gene silencing and which is therefore currently the most widely used gene-silencing technique in functional genomics.

#### **2. Need for resistance**

Agriculture sector of any country strengthen the economy by contributing in its gross domestic product (GDP). In Pakistan, the major agricultural crops include cotton, wheat, rice, sugarcane, potato and tomato etc. All of the above mentioned crops has great potential for yield and contribute 24 % in gross domestic product (GDP) of Pakistan economy [1]. There is a major gap in the actual yield potential of each crop with respect to its harvested yield, possible reasons include disease attack, environmental damages and in some cases lack of quality seed. Disease attack as being the most common cause include infections by

© 2012 Husnain et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 Husnain et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

pathogens like viruses, bacteria and insects etc. Viruses can cause most devastating effects due to their systemic infections and hence decrease crop productivity in primary infection and reduce seed quality for subsequent use through persistent infection. Viral epidemics are often associated with the emergence of a new form of the viral strain or a new form of vector.

How RNA Interference Combat Viruses in Plants 115

Tomato (*Lycopersicum esculentum*) ranks among the most widely grown vegetables all over the world. In general, viral diseases are not a routine problem in most tomato plantings but incidence of some viruses including tomato spotted wilt virus, tomato leaf rolling, tobacco mosaic virus and single and double virus streak viruses has devastating impact on crop yield with losses of upto 100% have also been reported [8]. Tobacco mosaic virus is one of the most stable viruses known because it is able to survive in dried plant debris as long as

Various control measures have been taken to overcome losses caused by plant viruses which are expensive and also inadequate. Biotechnologists have developed and adopted several strategies for virus resistance in crop plants. These include cross protection, pathogen derived resistance and more recently RNA interference. Therefore, the development of virus resistant varieties seems the only economically feasible way to control viruses [9]. Today, use of resistant varieties has been advocated as the most promising and least expensive method of viral disease suppression. With the appealing results of RNAi in silencing target genes of attacking virus, RNAi seems to have potential for creating virus resistant crops.

A plant is said to be resistant if it has the ability to suppress viral disease symptoms by inhibiting its replication or by blocking the virus expression. Resistance mechanism in plants may be either protein mediated or RNA mediated, however final outcome of both is reduced accumulation of virus in the host plant. Acquired resistance could be either high almost reaching immunity with no disease symptoms or moderate to low where mild symptoms of particular viral disease can be seen. In contrast, when a viral infected plant shows normal growth rate with a good yield along with milder symptoms of disease, it is said to be a tolerant plants. In this case, the host plant supports multiplication of virus rather

When a plant encounters virus, it reacts naturally through hypersensitive response (HR) and extreme resistance response (ER) which induces the production of secondary metabolites termed as response elements in plants. These response elements include elevated levels of ethylene, jasmonic acid, salicylic acid, nitric oxide and increased rate of ion flux, in combination these factors block the virus entry and /or helps eliminate the virus (figure 1).

The acquired virus resistance mechanisms in plants are of two types: a) gene silencing independent virus resistance and b) gene silencing dependable virus resistance via Post Transcriptional Gene Silencing (PTGS). The first includes coat protein-mediated, movement protein-mediated and replicase protein-mediated resistance, while second includes pathogen-derived resistance, antisense RNA mediated resistance and RNA-mediated resistance. PTGS is an evolutionary conserved mechanism in plants against potential harms by viruses and transposons. In this process, a plant defends itself by exploiting the requirement of plant RNA viruses to replicate using a double-stranded, replicative intermediate (dsRNA). The double-stranded RNA produced is cleaved into approximately

100 years.

than blocking its replication [10,11].

**3. History of virus resistance in transgenic plants** 

Viruses are a major threat to agriculture all over the world. Up till now, more than 1200 plant viruses have been reported which include 250 of those viruses that cause signicant losses in crop yield [2]. In nature, viral particles exist as obligate parasites which consist of hereditary material packed in a thick layered coat and completely depend on host cell throughout their life cycle. Viruses utilizes host resources like nucleic acid, amino acids and certain proteins for their replication and survival, thus disturbing host plant metabolism to a considerable extent. Most of the infecting plant viruses are ssRNA viruses like sugarcane mosaic virus, potato virus Y etc. In an infected plant, virus accumulation goes higher with increased progeny rate through its replication. The spread of the virus in a plant is achieved through its movement from infected cell to healthy one via plasmodesmata while long distance movement occurs through phloem. Entry of virus in plant usually occurs through physical injury like wound etc or via certain viral vectors like aphid, fly etc.

Cotton (Gossypium hirsutum) as being commonly known as 'white gold' is an important cash crop in many developing countries including Pakistan. It is a natural fibre and has many uses in industries. It accounts for 8.2 percent of the value added in agriculture and about 2 percent to GDP of Pakistan. The yield of the crop is severely affected by the viruses including geminiviruses (leaf crumple and leaf curl) and tobacco streak virus etc. These viruses can cause severe losses when infections occur on young plants; some infect cotton yield while others affect lint quality as well [3].

Potato (*Solanum tuberosum*) is the world's major food crop and is one of the leading vegetables. Viruses are a serious problem, not only because of effects caused by primary infection, but also because the crop is vegetatively propagated and the viruses are transmitted through the tubers to subsequent generations. Potato virus Y (PVY) is probably the most damaging and widespread virus of potato and is found wherever potato crops are grown, where losses are reported upto 10 to 90% [4]. PVY is transmitted through aphids.

Sugarcane (*Saccharum* spp. hybrid) is among the top 10 food crops of the world, and yearly provides 60% to 70% of the sugar produced around the world [5]. Yield harvested by the farmers of Pakistan is very low whose main cause is mosaic disease of sugarcane which continues to be a potential threat to the sugarcane production. It is a very common disease in all the major sugarcane growing regions, because of the perpetuation of the disease virus through vegetative propagules. Sugarcane mosaic virus (*SCMV*) is reported to infect sugarcane naturally and can cause severe losses to the farmers and lesser production to the industry [6,7]. Aphids are the vector for transmission of the disease. Seed produced by infected cane can also transmit the disease.

Tomato (*Lycopersicum esculentum*) ranks among the most widely grown vegetables all over the world. In general, viral diseases are not a routine problem in most tomato plantings but incidence of some viruses including tomato spotted wilt virus, tomato leaf rolling, tobacco mosaic virus and single and double virus streak viruses has devastating impact on crop yield with losses of upto 100% have also been reported [8]. Tobacco mosaic virus is one of the most stable viruses known because it is able to survive in dried plant debris as long as 100 years.

Various control measures have been taken to overcome losses caused by plant viruses which are expensive and also inadequate. Biotechnologists have developed and adopted several strategies for virus resistance in crop plants. These include cross protection, pathogen derived resistance and more recently RNA interference. Therefore, the development of virus resistant varieties seems the only economically feasible way to control viruses [9]. Today, use of resistant varieties has been advocated as the most promising and least expensive method of viral disease suppression. With the appealing results of RNAi in silencing target genes of attacking virus, RNAi seems to have potential for creating virus resistant crops.

A plant is said to be resistant if it has the ability to suppress viral disease symptoms by inhibiting its replication or by blocking the virus expression. Resistance mechanism in plants may be either protein mediated or RNA mediated, however final outcome of both is reduced accumulation of virus in the host plant. Acquired resistance could be either high almost reaching immunity with no disease symptoms or moderate to low where mild symptoms of particular viral disease can be seen. In contrast, when a viral infected plant shows normal growth rate with a good yield along with milder symptoms of disease, it is said to be a tolerant plants. In this case, the host plant supports multiplication of virus rather than blocking its replication [10,11].

#### **3. History of virus resistance in transgenic plants**

114 Functional Genomics

vector.

pathogens like viruses, bacteria and insects etc. Viruses can cause most devastating effects due to their systemic infections and hence decrease crop productivity in primary infection and reduce seed quality for subsequent use through persistent infection. Viral epidemics are often associated with the emergence of a new form of the viral strain or a new form of

Viruses are a major threat to agriculture all over the world. Up till now, more than 1200 plant viruses have been reported which include 250 of those viruses that cause signicant losses in crop yield [2]. In nature, viral particles exist as obligate parasites which consist of hereditary material packed in a thick layered coat and completely depend on host cell throughout their life cycle. Viruses utilizes host resources like nucleic acid, amino acids and certain proteins for their replication and survival, thus disturbing host plant metabolism to a considerable extent. Most of the infecting plant viruses are ssRNA viruses like sugarcane mosaic virus, potato virus Y etc. In an infected plant, virus accumulation goes higher with increased progeny rate through its replication. The spread of the virus in a plant is achieved through its movement from infected cell to healthy one via plasmodesmata while long distance movement occurs through phloem. Entry of virus in plant usually occurs through

Cotton (Gossypium hirsutum) as being commonly known as 'white gold' is an important cash crop in many developing countries including Pakistan. It is a natural fibre and has many uses in industries. It accounts for 8.2 percent of the value added in agriculture and about 2 percent to GDP of Pakistan. The yield of the crop is severely affected by the viruses including geminiviruses (leaf crumple and leaf curl) and tobacco streak virus etc. These viruses can cause severe losses when infections occur on young plants; some infect cotton

Potato (*Solanum tuberosum*) is the world's major food crop and is one of the leading vegetables. Viruses are a serious problem, not only because of effects caused by primary infection, but also because the crop is vegetatively propagated and the viruses are transmitted through the tubers to subsequent generations. Potato virus Y (PVY) is probably the most damaging and widespread virus of potato and is found wherever potato crops are grown, where losses are reported upto 10 to 90% [4]. PVY is transmitted

Sugarcane (*Saccharum* spp. hybrid) is among the top 10 food crops of the world, and yearly provides 60% to 70% of the sugar produced around the world [5]. Yield harvested by the farmers of Pakistan is very low whose main cause is mosaic disease of sugarcane which continues to be a potential threat to the sugarcane production. It is a very common disease in all the major sugarcane growing regions, because of the perpetuation of the disease virus through vegetative propagules. Sugarcane mosaic virus (*SCMV*) is reported to infect sugarcane naturally and can cause severe losses to the farmers and lesser production to the industry [6,7]. Aphids are the vector for transmission of the disease. Seed produced by

physical injury like wound etc or via certain viral vectors like aphid, fly etc.

yield while others affect lint quality as well [3].

infected cane can also transmit the disease.

through aphids.

When a plant encounters virus, it reacts naturally through hypersensitive response (HR) and extreme resistance response (ER) which induces the production of secondary metabolites termed as response elements in plants. These response elements include elevated levels of ethylene, jasmonic acid, salicylic acid, nitric oxide and increased rate of ion flux, in combination these factors block the virus entry and /or helps eliminate the virus (figure 1).

The acquired virus resistance mechanisms in plants are of two types: a) gene silencing independent virus resistance and b) gene silencing dependable virus resistance via Post Transcriptional Gene Silencing (PTGS). The first includes coat protein-mediated, movement protein-mediated and replicase protein-mediated resistance, while second includes pathogen-derived resistance, antisense RNA mediated resistance and RNA-mediated resistance. PTGS is an evolutionary conserved mechanism in plants against potential harms by viruses and transposons. In this process, a plant defends itself by exploiting the requirement of plant RNA viruses to replicate using a double-stranded, replicative intermediate (dsRNA). The double-stranded RNA produced is cleaved into approximately

21 nucleotide fragments by the Dicer enzyme [12]. Evidence suggest that transgene loci and RNA viruses can generate double-stranded RNAs which are similar in sequence to the transcribed region of target genes, which further undergo endonucleolytic cleavage to generate small interfering RNAs (siRNA) that promote degradation of cognate RNAs.

How RNA Interference Combat Viruses in Plants 117

Although pathogen derived resistance strategies hold promise for upto 90% resistance against target virus and are being employed still to date but some remarkable and potential threats are also associated with the use of this technology. The major one includes; the expression of a gene fragment derived from virus in transgenic plant confers resistance to particular virus but at the same time also raises environmental safety concerns regarding the constitutive expression of viral genes. It is supposed that infecting virus can interact with expression product in transgenic plants and can potentially modify the biological properties of the existing virus, ultimately leading to creation of new virus species which have novel pathogenic properties, host range and altered transmission specificity. In the initial experiments, the virus resistance was based on protein expression but resistance was neither

Among pathogen derived resistance strategies, **antisense RNA** complementary to part of the viral genome proves to have potential utility for protecting plants from systemic virus infection [34]. Antisense RNAs refer to small untranslatable RNA molecules that pair with a target RNA sequence on homology basis and thereby exert a negative control on interaction of target RNA with other nucleic acids or protein factors. Further, RNase H cause an increase in rate of degradation of double stranded RNA [35]. Antisense RNA technology was quickly adopted by plant researchers because other approaches like homologous recombination and gene-tagging mutagenesis used were based on reverse genetics and also these were not applicable in plants nor these were well developed. This background makes antisense RNA-mediated suppression more powerful tool for transgenic research and also

**Figure 2.** Major milestones in virus resistance strategies drawn to scale, starting form cross protection to

The development of the concept of pathogen derived resistance gave rise to strategies ranging from coat protein based interference of virus propagation to RNA mediated virus gene silencing. Virus resistance is achieved usually through the antiviral pathways of RNA

so stable nor effective as compared to the resistance achieved through RNAi.

for the development of commercial products [36].

RNA-mediated gene suppression.

**4. RNA silencing** 

**Figure 1.** Natural response of plants against viral attack, where production of secondary metabolites cause extreme or moderate resistance.

The first approach made by plant agronomists was the inoculation of susceptible plant with a milder strain of the target virus. This technique was named as **cross protection** and was employed on crops like tomato, papaya and citrus [13-15]. Scientists were met with success as considerable resistance was achieved in transgenic plants through employment of this approach but the success was accompanied with a major drawback that the milder strain of the virus providing protection to one crop may cause serious diseases on varieties growing nearby.

To compensate the drawback of cross protection, **pathogen derived resistance** (PDR) based strategies were employed. These are based on the insertion of resistant genes that are derived from the pathogen (virus) into the host plant. Resistance was achieved by expressing viral genes in plants including coat protein, movement protein and replicase protein gene, each of them targets at a step crucial to virus replication. Coat protein gene is responsible for viral uncoating and is involved in virus replication [16], movement protein is crucial for cell to cell movement of the infecting virus [17] whereas the Rep protein is involved in virus replication and its genome integrity [18,19]. Resistance was either due to protein accumulation (coat protein mediated resistance, movement protein mediated resistance and replicase protein mediated resistance) or because of accumulation of small RNA sequences (replicase mediated resistance).

Uptill now, scientists have made considerable successful attempts to generate virus resistant transgenic plants by employing PDR concept [20, 21]. For example, virus-resistant potato varieties having *PVY* coat protein (CP) or P1 gene sequences has been reported in numerous studies [22-27]. Biotechnologists employed various genes of *PVY* and have met with mixed success in engineering *PVY* resistant transgenic potato plants [22,28-30,24,31,25,32]. In another study, the presence of the movement protein (pr17 protein) was reported to create resistance in transgenic plants against luteovirus Potato leaf roll virus [33].

Although pathogen derived resistance strategies hold promise for upto 90% resistance against target virus and are being employed still to date but some remarkable and potential threats are also associated with the use of this technology. The major one includes; the expression of a gene fragment derived from virus in transgenic plant confers resistance to particular virus but at the same time also raises environmental safety concerns regarding the constitutive expression of viral genes. It is supposed that infecting virus can interact with expression product in transgenic plants and can potentially modify the biological properties of the existing virus, ultimately leading to creation of new virus species which have novel pathogenic properties, host range and altered transmission specificity. In the initial experiments, the virus resistance was based on protein expression but resistance was neither so stable nor effective as compared to the resistance achieved through RNAi.

Among pathogen derived resistance strategies, **antisense RNA** complementary to part of the viral genome proves to have potential utility for protecting plants from systemic virus infection [34]. Antisense RNAs refer to small untranslatable RNA molecules that pair with a target RNA sequence on homology basis and thereby exert a negative control on interaction of target RNA with other nucleic acids or protein factors. Further, RNase H cause an increase in rate of degradation of double stranded RNA [35]. Antisense RNA technology was quickly adopted by plant researchers because other approaches like homologous recombination and gene-tagging mutagenesis used were based on reverse genetics and also these were not applicable in plants nor these were well developed. This background makes antisense RNA-mediated suppression more powerful tool for transgenic research and also for the development of commercial products [36].

**Figure 2.** Major milestones in virus resistance strategies drawn to scale, starting form cross protection to RNA-mediated gene suppression.

#### **4. RNA silencing**

116 Functional Genomics

cause extreme or moderate resistance.

RNA sequences (replicase mediated resistance).

nearby.

21 nucleotide fragments by the Dicer enzyme [12]. Evidence suggest that transgene loci and RNA viruses can generate double-stranded RNAs which are similar in sequence to the transcribed region of target genes, which further undergo endonucleolytic cleavage to generate small interfering RNAs (siRNA) that promote degradation of cognate RNAs.

**Figure 1.** Natural response of plants against viral attack, where production of secondary metabolites

The first approach made by plant agronomists was the inoculation of susceptible plant with a milder strain of the target virus. This technique was named as **cross protection** and was employed on crops like tomato, papaya and citrus [13-15]. Scientists were met with success as considerable resistance was achieved in transgenic plants through employment of this approach but the success was accompanied with a major drawback that the milder strain of the virus providing protection to one crop may cause serious diseases on varieties growing

To compensate the drawback of cross protection, **pathogen derived resistance** (PDR) based strategies were employed. These are based on the insertion of resistant genes that are derived from the pathogen (virus) into the host plant. Resistance was achieved by expressing viral genes in plants including coat protein, movement protein and replicase protein gene, each of them targets at a step crucial to virus replication. Coat protein gene is responsible for viral uncoating and is involved in virus replication [16], movement protein is crucial for cell to cell movement of the infecting virus [17] whereas the Rep protein is involved in virus replication and its genome integrity [18,19]. Resistance was either due to protein accumulation (coat protein mediated resistance, movement protein mediated resistance and replicase protein mediated resistance) or because of accumulation of small

Uptill now, scientists have made considerable successful attempts to generate virus resistant transgenic plants by employing PDR concept [20, 21]. For example, virus-resistant potato varieties having *PVY* coat protein (CP) or P1 gene sequences has been reported in numerous studies [22-27]. Biotechnologists employed various genes of *PVY* and have met with mixed success in engineering *PVY* resistant transgenic potato plants [22,28-30,24,31,25,32]. In another study, the presence of the movement protein (pr17 protein) was reported to create

resistance in transgenic plants against luteovirus Potato leaf roll virus [33].

The development of the concept of pathogen derived resistance gave rise to strategies ranging from coat protein based interference of virus propagation to RNA mediated virus gene silencing. Virus resistance is achieved usually through the antiviral pathways of RNA silencing, a natural defense mechanism of plants against viruses. The experimental approach consists of isolating a segment of the viral genome itself and transferring it into the genome of a susceptible plant. Integrating a viral gene fragment into a host genome does not cause disease (the entire viral genome is needed to cause disease). Instead, the plant's natural antiviral mechanism that acts against a virus by degrading its genetic material in a nucleotide sequence specific manner via a cascade of events involving numerous proteins, including ribonucleases (enzymes that cleave RNA) is activated. This targeted degradation of the genome of an invader virus protects plants from virus infection.

How RNA Interference Combat Viruses in Plants 119

involved in the regulation of development [47]. On the other hand, siRNAs are generated

from long dsRNA and are involved in defense through RNA interference [48, 49].

**Figure 3.** Mechanism of transcriptional gene silencing, active in chromatin modification.

RNAi is an immune system in plants which is directed against viruses [50]. Upon viral attack, long dsRNAs are produced from the replication intermediates of viral RNAs that act as substrate for an endonuclease termed Dicer which is located in the cytosol [51]. Dicer recognizes these dsRNAs and cleave them into duplex siRNA (21-25 nt) [52]. The siRNA duplex comprised of two strands; strand complementary to target mRNA is guide strand and other is passenger strand. The guide strand of short siRNA duplex is incorporated into the RNA-induced silencing complex (RISC) and then siRNA programmed RISC degrade viral RNA. As the RISC complex encounters a foreign mRNA which could be of virus origin, it has two consequences. 1) If the homology of guide strand and target mRNA is 100%, then perfect complement form between them resulting in mRNA cleavage and subsequent degradation or 2) in case of imperfect complement, where few mismatches exist between guide strand of RISC and target mRNA, translation of target mRNA is inhibited (figure 4).

Same mechanism operates in microRNA triggered gene silencing. miRNAs processed from stem loop precursors (shRNA and/or hpRNA) and requires Dicer activity [53] followed by

#### **a. Transcriptional Gene Silencing (TGS)**

In plants, silencing is of two types: transcriptional and post transcriptional gene silencing. In both types, the inactivated genes are in trans position as homologous genes upon interaction reside on opposite chromosomes. TGS and PTGS differ from each other with respect to the underlying mechanism they exhibit. TGS requires sequence homology between promoters as compared to PTGS which require homology between coding region of the interacting genes. In TGS, an inactive allele residing on one chromosome can render another allele silenced. The mechanism behind transcriptional gene silencing is suggested to be DNA-DNA interaction which is thought to play an important role [37,38]. In other studies, it was proposed that RNA molecules interact with DNA and subsequently induce DNA methylation which then leads to gene silening [39-42], however it is not clear whether methylation of DNA alone is sufficient for silencing or not. It is proposed that DNA methylation in promoter region has a strong negative effect on interaction of certain transcription factors with promoter. Possible mechanism of TGS is depicted in figure 3.

#### **b. Post Transcriptional Gene Silencing (PTGS)**

'RNA interference' is a conserved mechanism of post transcriptional gene silencing (PTGS). It has rapidly gained favor as a "reverse genetics" tool to knock down the expression of targeted genes in plants. The term RNAi was coined in 1998 by Fire and Mello to describe a gene-silencing phenomenon based on double-stranded RNA [43]. PTGS mechanism controls processes including development, the maintenance of genome stability and defense against molecular parasites (transposons and viruses). Several reports pointed out that PTGS in plants is strictly linked to RNA virus resistance mechanism [44-46].

#### **5. Mechanism of RNAi/PTGS**

RNAi (RNA interference) is a natural defense pathway evolved in plants against viruses and potential transposons. It is a cellular pathway in which target sequences are degraded on homology basis at mRNA level by small RNAs, thereby preventing the translation of target RNAs. In plants, two functionally different RNAs; microRNA (miRNA) and small interfering RNA (siRNA), have been characterized. The miRNAs are small 21-26nt long dsRNAs that are genome coded and are endogenous to every cell. Structurally, they comprised of a stem region which is double stranded and a loop region which is single stranded. The miRNAs generated from endogenous hpRNA precursors and are basically involved in the regulation of development [47]. On the other hand, siRNAs are generated from long dsRNA and are involved in defense through RNA interference [48, 49].

118 Functional Genomics

silencing, a natural defense mechanism of plants against viruses. The experimental approach consists of isolating a segment of the viral genome itself and transferring it into the genome of a susceptible plant. Integrating a viral gene fragment into a host genome does not cause disease (the entire viral genome is needed to cause disease). Instead, the plant's natural antiviral mechanism that acts against a virus by degrading its genetic material in a nucleotide sequence specific manner via a cascade of events involving numerous proteins, including ribonucleases (enzymes that cleave RNA) is activated. This targeted degradation

In plants, silencing is of two types: transcriptional and post transcriptional gene silencing. In both types, the inactivated genes are in trans position as homologous genes upon interaction reside on opposite chromosomes. TGS and PTGS differ from each other with respect to the underlying mechanism they exhibit. TGS requires sequence homology between promoters as compared to PTGS which require homology between coding region of the interacting genes. In TGS, an inactive allele residing on one chromosome can render another allele silenced. The mechanism behind transcriptional gene silencing is suggested to be DNA-DNA interaction which is thought to play an important role [37,38]. In other studies, it was proposed that RNA molecules interact with DNA and subsequently induce DNA methylation which then leads to gene silening [39-42], however it is not clear whether methylation of DNA alone is sufficient for silencing or not. It is proposed that DNA methylation in promoter region has a strong negative effect on interaction of certain transcription factors with promoter. Possible mechanism of TGS is depicted in figure 3.

'RNA interference' is a conserved mechanism of post transcriptional gene silencing (PTGS). It has rapidly gained favor as a "reverse genetics" tool to knock down the expression of targeted genes in plants. The term RNAi was coined in 1998 by Fire and Mello to describe a gene-silencing phenomenon based on double-stranded RNA [43]. PTGS mechanism controls processes including development, the maintenance of genome stability and defense against molecular parasites (transposons and viruses). Several reports pointed out that PTGS in

RNAi (RNA interference) is a natural defense pathway evolved in plants against viruses and potential transposons. It is a cellular pathway in which target sequences are degraded on homology basis at mRNA level by small RNAs, thereby preventing the translation of target RNAs. In plants, two functionally different RNAs; microRNA (miRNA) and small interfering RNA (siRNA), have been characterized. The miRNAs are small 21-26nt long dsRNAs that are genome coded and are endogenous to every cell. Structurally, they comprised of a stem region which is double stranded and a loop region which is single stranded. The miRNAs generated from endogenous hpRNA precursors and are basically

of the genome of an invader virus protects plants from virus infection.

**a. Transcriptional Gene Silencing (TGS)** 

**b. Post Transcriptional Gene Silencing (PTGS)** 

**5. Mechanism of RNAi/PTGS** 

plants is strictly linked to RNA virus resistance mechanism [44-46].

**Figure 3.** Mechanism of transcriptional gene silencing, active in chromatin modification.

RNAi is an immune system in plants which is directed against viruses [50]. Upon viral attack, long dsRNAs are produced from the replication intermediates of viral RNAs that act as substrate for an endonuclease termed Dicer which is located in the cytosol [51]. Dicer recognizes these dsRNAs and cleave them into duplex siRNA (21-25 nt) [52]. The siRNA duplex comprised of two strands; strand complementary to target mRNA is guide strand and other is passenger strand. The guide strand of short siRNA duplex is incorporated into the RNA-induced silencing complex (RISC) and then siRNA programmed RISC degrade viral RNA. As the RISC complex encounters a foreign mRNA which could be of virus origin, it has two consequences. 1) If the homology of guide strand and target mRNA is 100%, then perfect complement form between them resulting in mRNA cleavage and subsequent degradation or 2) in case of imperfect complement, where few mismatches exist between guide strand of RISC and target mRNA, translation of target mRNA is inhibited (figure 4).

Same mechanism operates in microRNA triggered gene silencing. miRNAs processed from stem loop precursors (shRNA and/or hpRNA) and requires Dicer activity [53] followed by RISC assembly and subsequent degradation of homologous RNA in a sequence specific manner.

How RNA Interference Combat Viruses in Plants 121

through the phloem as shown in figure 5 [59,60]. Presence of a mobile signal has been proposed to be an integral part in systemic spread of silencing. The first evidence of the presence of a mobile silencing signal came from the study of Agro infiltration assay or particle bombardment in development of transgenic tobacco plants [59,60,62]. Subsequently, in silenced tissues of Agro-infiltrated plants, T-DNA or Agrobacterium was detected which suggests that mobile signal is responsible for propagation of silencing from one tissue to another [59] and this signal can also cross graft junction [59,60,62]. Candidates proposed to be responsible for mobile silencing signal involve siRNAs, Aberrant RNAs and dsRNA [63].

**Figure 5.** Mobile silencing signal passes from infected cell to healthy cell upon RNAi induction. Candidate of RNA silencing could siRNA, aRNA or dsRNA and travel through plasmodesmata and/or phloem.

Conclusively, transgenic approach mediated by RNAi pre-programmed an existing antiviral defense in plants [21,64-66]. Plant viruses are the strong inducers of RNAi as well as a target. The simplicity and specificity of RNAi has made RNAi a routine tool for the

In general, gene silencing has proven fruitful with both sense- and antisense transgenes in plant cells [67,68]. An RNA molecule that contains a fragment of a sense strand, an antisense strand and a short loop sequence between the fragment making a tight hairpin turn is

generation of virus resistant crops.

**7. Effective RNAi inducers** 

**Figure 4.** Model for RNA silencing, an ordered biochemical pathway which is triggered by dsRNA of viral origin. The source of dsRNA is either the synthetic siRNA or pre-microRNA. Genome encoded primiRNAs are processed by Drosha (an RnaseIII enzyme) into pre-miRNAs which are exported in the cytosol. dsRNA (siRNA or miRNA) subsequently joins Dicer, Ago and some other accessory proteins located in the cytosol forming RISC (RNA induced Silencing Complex). The degree of complementarity between the RNA silencing molecule and its cognate target determines the fate of the mRNA: blocked translation or mRNA cleavage/ degradation.

RISC is a combination of Dicer (an endonuclease enzyme), some accessory proteins namely argonaute (ago1, 4, 6, 9; catalytic endonucleases) and RNA binding proteins (RBP), and some trans-acting RNA-binding proteins (TRBP) [54,55].

Stability of RNAi induced silencing is based on enzymatic methylation of siRNA. This reaction is catalyzed by the enzyme methyltransferase (HEN1) which methylates the siRNA at 3' end, hereby preventing it from oligouridylation and subsequent degradation [56].

#### **6. Systemic spread of RNAi**

When RNAi is induced at one site in an organism including plant, a mobile signal is generated which spread cell to cell and systemically throughout the organism [57- 59,43,60,61] and make RNAi response obvious in distant tissues of the plant. This silencing signal moves inside plant either through the intercellular spaces called plasmodesmata or through the phloem as shown in figure 5 [59,60]. Presence of a mobile signal has been proposed to be an integral part in systemic spread of silencing. The first evidence of the presence of a mobile silencing signal came from the study of Agro infiltration assay or particle bombardment in development of transgenic tobacco plants [59,60,62]. Subsequently, in silenced tissues of Agro-infiltrated plants, T-DNA or Agrobacterium was detected which suggests that mobile signal is responsible for propagation of silencing from one tissue to another [59] and this signal can also cross graft junction [59,60,62]. Candidates proposed to be responsible for mobile silencing signal involve siRNAs, Aberrant RNAs and dsRNA [63].

**Figure 5.** Mobile silencing signal passes from infected cell to healthy cell upon RNAi induction. Candidate of RNA silencing could siRNA, aRNA or dsRNA and travel through plasmodesmata and/or phloem.

Conclusively, transgenic approach mediated by RNAi pre-programmed an existing antiviral defense in plants [21,64-66]. Plant viruses are the strong inducers of RNAi as well as a target. The simplicity and specificity of RNAi has made RNAi a routine tool for the generation of virus resistant crops.

#### **7. Effective RNAi inducers**

120 Functional Genomics

manner.

RISC assembly and subsequent degradation of homologous RNA in a sequence specific

**Figure 4.** Model for RNA silencing, an ordered biochemical pathway which is triggered by dsRNA of viral origin. The source of dsRNA is either the synthetic siRNA or pre-microRNA. Genome encoded primiRNAs are processed by Drosha (an RnaseIII enzyme) into pre-miRNAs which are exported in the cytosol. dsRNA (siRNA or miRNA) subsequently joins Dicer, Ago and some other accessory proteins located in the cytosol forming RISC (RNA induced Silencing Complex). The degree of complementarity between the RNA silencing molecule and its cognate target determines the fate of the mRNA: blocked

RISC is a combination of Dicer (an endonuclease enzyme), some accessory proteins namely argonaute (ago1, 4, 6, 9; catalytic endonucleases) and RNA binding proteins (RBP), and

Stability of RNAi induced silencing is based on enzymatic methylation of siRNA. This reaction is catalyzed by the enzyme methyltransferase (HEN1) which methylates the siRNA at 3' end, hereby preventing it from oligouridylation and subsequent degradation [56].

When RNAi is induced at one site in an organism including plant, a mobile signal is generated which spread cell to cell and systemically throughout the organism [57- 59,43,60,61] and make RNAi response obvious in distant tissues of the plant. This silencing signal moves inside plant either through the intercellular spaces called plasmodesmata or

translation or mRNA cleavage/ degradation.

**6. Systemic spread of RNAi** 

some trans-acting RNA-binding proteins (TRBP) [54,55].

In general, gene silencing has proven fruitful with both sense- and antisense transgenes in plant cells [67,68]. An RNA molecule that contains a fragment of a sense strand, an antisense strand and a short loop sequence between the fragment making a tight hairpin turn is

termed as short hairpin RNA (shRNA) which has the ability to suppress the expression of desired genes via RNA interference [69]. Silencing can be more efficiently achieved by utilizing shRNA cassettes [70-72] which usually include a specific plant promoter and terminator sequences to control the expression of inversely repeated sequences of the dsRNA. Upon subsequent delivery of shRNA cassette in the plant cells, dsRNA molecules comprised of a loop (single-stranded) and a stem region (double-stranded) are formed. Further, stem region is used by Dicer as a substrate and trigger RNAi mechanism [72-74]. RNA silencing mediated by the use of shRNA cassette enforces stable and heritable gene silencing [67] as it utilizes the specific promoter to ensure that the shRNA is always expressed. Another reason which justify that the silencing efficiency can be more powerful when using shRNA cassette is due to the fact that dsRNA are being fed into a later step in the silencing pathway where they act as a substrate for Dicer (RNaseIII like enzyme) and therefore bypasses the step in which dsRNAs need plant encoded RdRps for their production [75].

How RNA Interference Combat Viruses in Plants 123

protein gene of target virus was analyzed by real-time PCR. In case of *PVY* capsid gene, one specific siRNA out of a total six was found to be the most effective for knockdown of respective mRNA in transfected CHO cells by up to 80-90%. Data obtained showed that all six siRNAs used reduced the mRNA expression of target gene to some extent but only siRNA1 significantly reduced CP-*PVY* mRNA expression by up to 12.25 fold and, as is clearly shown in figure 6, expression was almost diminished or very faint in cells transfected with siRNA1 as compared to the control where scrambled siRNA was transfected. The remaining siRNA knockdown values were: siRNA 2 - 7x decrease ; siRNA 3 - 8x decrease; siRNA 4 - 10.8 x decrease; siRNA 5 - 9x decrease and siRNA 6 - 10x decrease. These

values were based on Ct values obtained from real-time PCR studies [78].

**Figure 6.** Relative measure of the knockdown of mRNA expression of CP-*PVY* gene in transient transfection assays. Knockdown values are based on relative Ct values obtained in realtime PCR assay;

Similar findings were met when knockdown in mRNA expression of CP-*SCMV* was studied *in-vitro* through transient transfection assays. As clear from figure 7, siRNA1 reduced the mRNA expression of target gene by upto 96%, while inhibition by siRNA2 was 46%, siRNA3 and siRNA4 inhibited target gene mRNA expression upto 50% and 77%

Subsequently, the screened siRNA for both viruses was used in shRNA cassette which is thought to synthesize target specific siRNAs that continuously guard the plant against respective viral attack. shRNA cassette cloned in pCAMBIA1301 vector and transformed in potato and sugarcane through Agrobacterium- and particle bombardment method respectively. Results were compared with control non-transgenic plants. Figure 8 and 9 depicts the results, clearly indicating that in transgenic potato having shRNA1 cassette integrated in them, mRNA knockdown was upto 96% whereas in transgenic potato plants

GAPDH was used as internal control to normalize the results.

respectively (figure 7).

Practically in development of virus resistant transgenic plants, specific hairpinRNA expression constructs have been designed for transformation. In this strategy, small dsRNAs which are hallmark of PTGS, are produced from the transformed construct and ultimately induce silencing. Scientists have used hpRNA construct for silencing of viral gene in potato and obtained efficient silencing results accompanied with production of siRNA [76]. Similarly, some others have compared various constructs in terms of their silencing potential and confirmed that most efficient and strong silencing in tobacco can be achieved through the expression of an intron containing construct, which trigger PTGS [77]. However, in another study where *PVY* resistant potato plants were obtained through CP gene expression, evidence for existence of both protein and RNA mediated mechanisms was verified [27].

While considering the appealing outcome of RNAi in development of virus resistant transgenic plants as reviewed in this article and the use of hairpin RNA for strong silencing, production of transgenic potato resistant against potato virus Y and sugarcane plants resistant against sugarcane mosaic virus developed by [69] at Centre of Excellence in Molecular Biology (CEMB), University of the Punjab has been quoted as an example in following chapter.

#### **8. Development of** *PVY* **and** *SCMV* **resistant transgenic plants**

Tabassum *et al*. [79] have developed *PVY* and *SCMV* resistant potato and sugarcane plants respectively through siRNA technology by targeting capsid protein gene of respective virus. In the study, the respective plant was equipped with shRNA cassette that reacts continuously against invading virus specifically, thus resulting in degradation of viral mRNA in a sequence-specific manner. Specialty of this shRNA cassette is that it contained screened siRNA (the one most efficient in *in-vitro* experiments) out of bulk. The 22nt long siRNA was used as core sequence in shRNA cassette while loop sequence and flanking sequences were taken from highly active regulatory microRNA of respective host plant.

Initially, for screening of siRNA out of bulk, a strategy based on transient transfection assay was optimized in mammalian cell line (CHO). mRNA knockdown efficiency of capsid protein gene of target virus was analyzed by real-time PCR. In case of *PVY* capsid gene, one specific siRNA out of a total six was found to be the most effective for knockdown of respective mRNA in transfected CHO cells by up to 80-90%. Data obtained showed that all six siRNAs used reduced the mRNA expression of target gene to some extent but only siRNA1 significantly reduced CP-*PVY* mRNA expression by up to 12.25 fold and, as is clearly shown in figure 6, expression was almost diminished or very faint in cells transfected with siRNA1 as compared to the control where scrambled siRNA was transfected. The remaining siRNA knockdown values were: siRNA 2 - 7x decrease ; siRNA 3 - 8x decrease; siRNA 4 - 10.8 x decrease; siRNA 5 - 9x decrease and siRNA 6 - 10x decrease. These values were based on Ct values obtained from real-time PCR studies [78].

122 Functional Genomics

following chapter.

termed as short hairpin RNA (shRNA) which has the ability to suppress the expression of desired genes via RNA interference [69]. Silencing can be more efficiently achieved by utilizing shRNA cassettes [70-72] which usually include a specific plant promoter and terminator sequences to control the expression of inversely repeated sequences of the dsRNA. Upon subsequent delivery of shRNA cassette in the plant cells, dsRNA molecules comprised of a loop (single-stranded) and a stem region (double-stranded) are formed. Further, stem region is used by Dicer as a substrate and trigger RNAi mechanism [72-74]. RNA silencing mediated by the use of shRNA cassette enforces stable and heritable gene silencing [67] as it utilizes the specific promoter to ensure that the shRNA is always expressed. Another reason which justify that the silencing efficiency can be more powerful when using shRNA cassette is due to the fact that dsRNA are being fed into a later step in the silencing pathway where they act as a substrate for Dicer (RNaseIII like enzyme) and therefore bypasses the step in which

Practically in development of virus resistant transgenic plants, specific hairpinRNA expression constructs have been designed for transformation. In this strategy, small dsRNAs which are hallmark of PTGS, are produced from the transformed construct and ultimately induce silencing. Scientists have used hpRNA construct for silencing of viral gene in potato and obtained efficient silencing results accompanied with production of siRNA [76]. Similarly, some others have compared various constructs in terms of their silencing potential and confirmed that most efficient and strong silencing in tobacco can be achieved through the expression of an intron containing construct, which trigger PTGS [77]. However, in another study where *PVY* resistant potato plants were obtained through CP gene expression, evidence

While considering the appealing outcome of RNAi in development of virus resistant transgenic plants as reviewed in this article and the use of hairpin RNA for strong silencing, production of transgenic potato resistant against potato virus Y and sugarcane plants resistant against sugarcane mosaic virus developed by [69] at Centre of Excellence in Molecular Biology (CEMB), University of the Punjab has been quoted as an example in

Tabassum *et al*. [79] have developed *PVY* and *SCMV* resistant potato and sugarcane plants respectively through siRNA technology by targeting capsid protein gene of respective virus. In the study, the respective plant was equipped with shRNA cassette that reacts continuously against invading virus specifically, thus resulting in degradation of viral mRNA in a sequence-specific manner. Specialty of this shRNA cassette is that it contained screened siRNA (the one most efficient in *in-vitro* experiments) out of bulk. The 22nt long siRNA was used as core sequence in shRNA cassette while loop sequence and flanking sequences were taken from highly active regulatory microRNA of respective host plant.

Initially, for screening of siRNA out of bulk, a strategy based on transient transfection assay was optimized in mammalian cell line (CHO). mRNA knockdown efficiency of capsid

for existence of both protein and RNA mediated mechanisms was verified [27].

**8. Development of** *PVY* **and** *SCMV* **resistant transgenic plants** 

dsRNAs need plant encoded RdRps for their production [75].

**Figure 6.** Relative measure of the knockdown of mRNA expression of CP-*PVY* gene in transient transfection assays. Knockdown values are based on relative Ct values obtained in realtime PCR assay; GAPDH was used as internal control to normalize the results.

Similar findings were met when knockdown in mRNA expression of CP-*SCMV* was studied *in-vitro* through transient transfection assays. As clear from figure 7, siRNA1 reduced the mRNA expression of target gene by upto 96%, while inhibition by siRNA2 was 46%, siRNA3 and siRNA4 inhibited target gene mRNA expression upto 50% and 77% respectively (figure 7).

Subsequently, the screened siRNA for both viruses was used in shRNA cassette which is thought to synthesize target specific siRNAs that continuously guard the plant against respective viral attack. shRNA cassette cloned in pCAMBIA1301 vector and transformed in potato and sugarcane through Agrobacterium- and particle bombardment method respectively. Results were compared with control non-transgenic plants. Figure 8 and 9 depicts the results, clearly indicating that in transgenic potato having shRNA1 cassette integrated in them, mRNA knockdown was upto 96% whereas in transgenic potato plants

having shRNA4 cassette in them, *PVY* knockdown was upto 57% as compared to control where *PVY* infection was maximum.

How RNA Interference Combat Viruses in Plants 125

These shRNAs are supposed to create long-term targeted gene inhibition in cells and whole plant. Our shRNA construct designing was based on the hypothesis that if we express potentially effective screened siRNA in hairpin form which is further combined with the power of most active regulatory microRNA in respective plant, the level of resistance will be far more effective. Applying this theme, we were able to obtain transgenic potato and

sugarcane plants where resistance level against targeted virus was upto immunity.

**Figure 9.** Percentage inhibition in mRNA expression of *SCMV* rendered by integrated shRNA1 and shRNA4 cassette in sugarcane plants. Transgenic plants were subjected to bioassay by *SCMV* infection

One important aspect of this strategy in engineering *PVY*-resistant plants is the fact that the integrated shRNA sequence is not of viral origin nor it is translated into a protein. Moreover, the actual RNA transcript is almost undetectable because it gets cleaved quickly in small fragments through RNAi pathway. These two features limit the environmental risks of this strategy, such as trans-encapsidation or recombination of the transgene with an

Bushra Tabassum, Idrees Ahmad Nasir, Usman Aslam and Tayyab Husnain

*National Centre of Excellence in Molecular Biology (CEMB), University of the Punjab, Lahore,* 

RNAi (RNA interference); *PVY* (Potato virus Y); *SCMV* (Sugarcane mosaic virus); PTGS (Post Transcriptional Gene Silencing); siRNA (small interfering RNA); shRNA (short hairpin

and RT-PCR was performed 30 days post inoculation.

incoming virus.

**Author details** 

**Abbreviations** 

*Pakistan* 

**Figure 7.** Relative measure of the knockdown of mRNA expression of CP-*SCMV* gene in transient transfection assays.

Similarly, in transgenic sugarcane plants, shRNA1 reduced the mRNA expression of *SCMV* to lesser extent with 30% reduction only while shRNA4 caused maximum knockdown of 95% as compared to the control non-transgenic sugarcane plant.

**Figure 8.** Percentage inhibition in mRNA expression of *PVY* rendered by integrated shRNA1 and shRNA4 cassette in potato plants. Transgenic plants were subjected to bioassay by *PVY* inoculation and RT-PCR was performed 30 days post *PVY* inoculation.

In conclusion, we have developed transgenic potato and sugarcane plants that were highly resistant against *PVY* and *SCMV* infection respectively. This resistance was because of the shRNA cassette integrated in them that is targeted against capsid protein gene of each virus. These shRNAs are supposed to create long-term targeted gene inhibition in cells and whole plant. Our shRNA construct designing was based on the hypothesis that if we express potentially effective screened siRNA in hairpin form which is further combined with the power of most active regulatory microRNA in respective plant, the level of resistance will be far more effective. Applying this theme, we were able to obtain transgenic potato and sugarcane plants where resistance level against targeted virus was upto immunity.

**Figure 9.** Percentage inhibition in mRNA expression of *SCMV* rendered by integrated shRNA1 and shRNA4 cassette in sugarcane plants. Transgenic plants were subjected to bioassay by *SCMV* infection and RT-PCR was performed 30 days post inoculation.

One important aspect of this strategy in engineering *PVY*-resistant plants is the fact that the integrated shRNA sequence is not of viral origin nor it is translated into a protein. Moreover, the actual RNA transcript is almost undetectable because it gets cleaved quickly in small fragments through RNAi pathway. These two features limit the environmental risks of this strategy, such as trans-encapsidation or recombination of the transgene with an incoming virus.

#### **Author details**

124 Functional Genomics

transfection assays.

where *PVY* infection was maximum.

having shRNA4 cassette in them, *PVY* knockdown was upto 57% as compared to control

**Figure 7.** Relative measure of the knockdown of mRNA expression of CP-*SCMV* gene in transient

**Figure 8.** Percentage inhibition in mRNA expression of *PVY* rendered by integrated shRNA1 and shRNA4 cassette in potato plants. Transgenic plants were subjected to bioassay by *PVY* inoculation and

In conclusion, we have developed transgenic potato and sugarcane plants that were highly resistant against *PVY* and *SCMV* infection respectively. This resistance was because of the shRNA cassette integrated in them that is targeted against capsid protein gene of each virus.

shRNA1 shRNA4

95% as compared to the control non-transgenic sugarcane plant.

RT-PCR was performed 30 days post *PVY* inoculation.

**PVY mRNA inhibition** 

**percentage**

Similarly, in transgenic sugarcane plants, shRNA1 reduced the mRNA expression of *SCMV* to lesser extent with 30% reduction only while shRNA4 caused maximum knockdown of

> Bushra Tabassum, Idrees Ahmad Nasir, Usman Aslam and Tayyab Husnain *National Centre of Excellence in Molecular Biology (CEMB), University of the Punjab, Lahore, Pakistan*

#### **Abbreviations**

RNAi (RNA interference); *PVY* (Potato virus Y); *SCMV* (Sugarcane mosaic virus); PTGS (Post Transcriptional Gene Silencing); siRNA (small interfering RNA); shRNA (short hairpin RNA); ssRNA (single stranded RNA); dsRNA (double stranded RNA); RdRp (RNA dependent RNA polymerase); hpRNA (hairpin RNA).

How RNA Interference Combat Viruses in Plants 127

[16] Carrington JC, Kasschau KD, Mahajan SK, Schaad MC (1996) Cell-to-cell and long-

[17] Carr JP, Marsh LE, Lomonossoff GP, Sekiya ME, Zaitlin M (1992) Resistance to tobacco mosaic virus induced by the 54-kDa gene sequence requires expression of the 54-kDa

[18] Golemboski DB, Lomonossoff GP, Zaitlin M (1990) Plants transformed with a tobacco mosaic virus nonstructural gene sequence are resistant to the virus. Proc. Natl. Acad.

[19] Lomonossoff GP. Pathogen-derived resistance to plant viruses. Annu. Rev.

[20] Baulcombe DC. RNA as a target and an initiator of post-transcriptional gene silencing

[21] Farinelli L, Malnoe P, Collet GF (1992) Heterologous encapsidation of potato virus Y strainO (PVY-O) with the transgene protein of PVY strain (PVY-N) in *Solanum tuberosum*

[22] Kollar A, Thole V, Dalmay T, Salamon P, Balazs E (1993) Efcient pathogen-derived resistance induced by integrated potato virus Y coat protein gene in tobacco. Biochemie

[23] Malnoe P, Farinelli L, Collet G, Reust W (1994) Small-scale eld tests with transgenic potato, cv. Bintje, to test the resistance to primary and secondary infections with potato

[24] Smith HA, Powers H, Swaney S, Brown C, Dougherty WG (1995) Transgenic potato virus Y resistance in potato: evidence for anRNA-mediated cellular response.

[25] Maki-Valkama T, Pehu T, Santala A, Valkonen JP, Koivu K, Lehto K, Pehu E (2000) High level of resistance to potato virus Y expressing P1 sequence in antisense

[26] Gargouri-Bouzid R, Jaoua L, Mansour R B, Hathat Y, Ayadi M, Ellouz R (2005) PVY resistant transgenic potato plants (cv. Claustar) expressing the viral coat protein. J. Plant

[27] Hassairi A, Masmoudi, K, Albouy J, Robaglia C, Jjullien M, Ellouz R (1998) Transformation of two potato cultivars `Spunta' and `Claustar' (Solanum tuberosum) with lettuce mosaic virus coat protein gene and heterologous immunity to potato virus

[28] Kaniewski W, Lawson G, Sammons B, Haley L, Hart J, Delan-nay X, Tumer NE (1990) Field resistance of transgenic Russet Burbank potato to effects of infection by potato

[29] Lawson G, Kaniewski W, Haley L, Rozman R, Newell C, Sanders P, Tumer NE (1990) Engineering resistance to mixed virus infection in a commercial potato cultivar: resistance to potato virus X and potato virus Y in transgenic Russet Burbank.

distance transport of viruses in plants. Plant Cell 8:1669-l681.

in transgenic plants. Plant Molecular Biology 32(1-2): 79-88.

orientation in transgenic potato. Molecular Breeding 6: 95–104.

virus X and potato virus Y. Biotechnology 8: 750–754.

protein. Mol. Plant–Microbe Interact. 5: 397–404.

Sci. USA 87:6311–6315.

Phytopathol. 33:323-343.

75:623–629.

cv. Bintje. Biotechnology 10:1020–1025.

virus Y. Plant Mol. Biol. 25: 963–975.

Phytopathology 85:864–870

Y. Plant Sci. Limerick. 136: 31-42.

Biotechnology 8: 1277–134.

Biotechnol. 3:1–5

#### **9. References**


[16] Carrington JC, Kasschau KD, Mahajan SK, Schaad MC (1996) Cell-to-cell and longdistance transport of viruses in plants. Plant Cell 8:1669-l681.

126 Functional Genomics

**9. References** 

490.

Protect. 115: 2-3.

Journal 27(6): 581-590.

Acad. Sci. USA 92:1–5.

terms. Phytopathology 73:127-128.

virus infection. Annu. Rev. Phytopathol. 28:451-74.

plants for virus resistance. Arch. Virol. 115:1-21.

diseases. Crit. Rev. Plant Sci. 11:17-33.

RNA); ssRNA (single stranded RNA); dsRNA (double stranded RNA); RdRp (RNA

[1] Pakistan economic Survey (2010-2011) Government of Pakistan; Ministry of finance.

[1] Beachy RN (1997) Mechanisms and applications of pathogen-derived resistance in

[4] Anonymous (1997) Sugar and sweetener situation and outlook yearbook. US

[5] Hema M, Joseph J, Gopinath K, Sreenivasulu P, Savithri HS (1999) Molecular characterization and interviral relationships of a exuous lamentous virus causing mosaic disease of sugarcane (*Saccharum ofcinarum* L.) in India. Arch. Virol. 144: 479–

[6] Koike T, Martin DP, Johnson EM Jr (1989) Role of calcium channels in the ability of membrane depolarization to prevent neuronal drath induced by trophic factor deprivation: evidence that levels of internal calcium determine NGF dependence of

[7] Akhtar KP, Ryu KH, Saleem MY, Asghar M, Jamil FF, Haq MA, Khan IA (2008) Occurrence of Cucumber mosaic virus Subgroup IA in tomato in Pakistan. J. Plant Dis.

[8] Solomon-Blackburn RM, Barker H (2001) Breeding virus resistant potatoes (Solanum tuberosum): a review of traditional and molecular approaches. Heredity 86:17-35. [9] Cooper JI, Jones AT (1983) Responses of plants to viruses, proposals for the use of

[10] Walkey DGA (1991) Applied plant virology. 2nd edition. Chapman and Hall India. [11] Wesley SV, Helliwell CA, Smith NA, Wang MB, Rouse DT, Liu Q, et al (2001) Construct design for efficient, effective and high throughput gene silencing in plants. The Plant

[12] Beachy RN, Loesch-Fries S, Tumer NE (1990) Coat protein mediated resistance against

[13] Gadani F, Mansky LM, Medici R, Miller WA, Hill JH (1990) Genetic engineering of

[14] Hull R, Davies JW (1992) Approaches to nonconventional control of plant virus

[15] Yusibov V, Loesch-Fries LS (1995) High-affinity RNA-binding domains of alfalfa mosaic virus coat protein are not required for coat protein-mediated resistance. Proc. Natl.

[2] Briddon RW, Markham PJ (2000) Cotton leaf curl virus disease. Virus Res. 71:151-159. [3] Novy RG, Nasruddin A, Ragsdale DW, Radcliffe EB (2002) Genetic Resistances to Potato Leafroll Virus, Potato Virus Y, and Green Peach Aphid in Progeny of *Solanum* 

dependent RNA polymerase); hpRNA (hairpin RNA).

http://www.finance.gov.pk/survey\_0910.html.

*etuberosum.* American Journal of Potato Research 79(1): 9-18.

sympathetic ganglion cells. Proc. Natl. Acad. Sci. 86:6421-25.

transgenic plants. Curr. Opin. Biotechnol. 8:215–220.

department of Agriculture Washington DC.


[30] Okamoto D, Nielsen SVS, Albrechtsen M, Borkhardt B (1996) General resistance against potato virus Y introduced into a commercial potato cultivar by genetic transformation with PVYN coat protein gene. Potato Research 39:271-82.

How RNA Interference Combat Viruses in Plants 129

[49] Baulcombe DC (2004) RNA silencing in plants. Nature. 431(7006):356-63.

interfering RNA in RNA silencing. EMBO J. 21:4671–4679.

assembly of the RNAi enzyme complex. Cell. 115:199–208.

stocks to non-silenced scions. EMBO J. 15: 4738–4745.

promoterless DNA. Cell. 95:177–187.

introduction of DNA. Curr. Biol. 9: 59–66.

Natl. Acad. Sci. 95:13959–13964.

a 3'-end uridylation activity in Arabidopsis. Curr. Biol.15:1501-1507.

the putative transmembrane protein SID-1. Science. 295: 2456–2459.

transcriptional gene silencing in plants. Science 286:950-952.

interfering RNA in RNA silencing. EMBO J. 21:4671–4679.

silencing in plants. Genes Dev. 17:49–63.

93.

1265.

13351.

948–58.

389(6651):553.

[50] Tang G, Reinhart BJ, Bartel DP, Zamore PD (2003) A biochemical framework for RNA

[51] Hamilton A, Voinnet O, Chappell L, Baulcombe D (2002) Two classes of short

[52] Tijsterman M, Plasterk RH (2004) Dicers at RISC; the mechanism of RNAi. Cell. 117:1–3. [53] Gregory RI, Chendrimada TP, Cooch N, Shiekhattar R (2005) Human RISC Couples MicroRNA Biogenesis and Posttranscriptional Gene Silencing. Cell. 123:631–640. [54] Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD (2003) Asymmetry in the

[55] Li J, Yang Z, Yu B, Liu J, Chen X (2005) Methylation protects miRNAs and siRNAs from

[56] Fagard M, Vaucheret H (2000) Systemic silencing signal(s). Plant Mol. Biol. 43(2-3):285-

[57] Palauqui JC, Elmayan T, Pollien JM, Vaucheret H (1997) Systemic acquired silencing: transgene-specic post-transcriptional silencing is transmitted by grafting from silenced

[58] Voinnet O, Baulcombe DC (1997) Systemic signalling in gene silencing. Nature.

[59] Voinnet O, Vain P, Angell S, Baulcombe DC (1998) Systemic spread of sequencespecific transgene RNA degradation is initiated by localised introduction of ectopic

[60] Winston WM, Molodowitch C, Hunter CP (2002) Systemic RNAi in *C. elegans* requires

[61] Palauqui JC, Balzergue S (1999) Activation of systemic acquired silencing by localised

[62] Mlotshwa S, Voinnet O, Mette MF, Matzke M, Vaucheret H, Ding SW, Pruss G, Vance VB (2002) RNA silencing and the mobile silencing signal. Plant Cell 14:289–301. [63] Hamilton AJ, Baulcombe DC (1999) A species of small antisense RNA in post-

[64] Hamilton A, Voinnet O, Chappell L, Baulcombe D (2002) Two classes of short

[65] Plasterk RHA (2002) RNA silencing: the genomes immune system. Science 296:1263-

[66] Bruening G (1998) Plant gene silencing regularized. Proc. Nat. Acad. Sci. 95:13349-

[67] Waterhouse PM, Graham MW, Wang MB (1998) Virus resistance and gene silencing in plants can be induced by simultaneous expression of sense and antisense RNA. Proc.

[68] Paddison PJ, Caudy AA, Bernstein E, Hannon GJ, Conklin DS (2002) Short hairpin RNAs (shRNAs) induce sequence-specific silencing in mammalian cells. Genes Dev. 16:


[49] Baulcombe DC (2004) RNA silencing in plants. Nature. 431(7006):356-63.

128 Functional Genomics

[30] Okamoto D, Nielsen SVS, Albrechtsen M, Borkhardt B (1996) General resistance against potato virus Y introduced into a commercial potato cultivar by genetic transformation

[31] Pehu TM, Maki-Valkama TK, Valkonen JPT, Koivu KT, Lehto KM, Pehu EP (1995) Potato plants transformed with a potato virus Y P1 gene sequence are resistant to PVYo.

[32] Tacke E, Salamini F, Rohde W (1996) Genetic engineering of potato for board-spectrum

[33] Bejarano ER, Lichtenstein CP (1992) Prospects for engineering virus resistance in plants

[34] Culver JN (1995) Molecular strategies to develop virus-resistant plants. CRC Press, Inc. [35] Chi-Ham CL, Clark KL, Bennett AB (2010) The intellectual property landscape for gene

[36] Stam M, de Bruin R, Kenter S, van der Hoorn RAL, van Blokland R, Mol JNM, et al (1997) Post-transcriptional silencing of chalcone synthase in petunia by inverted

[37] Luff B, Pawlowski L, Bender J (1999) An inverted repeat triggers cytosine methylation

[38] Mette MF, van der Winden J, Matzke MA, Matzke AJ (1999) Production of aberrant promoter transcripts contributes to methylation and silencing of unlinked homologous

[39] Pelissier T, Thalmeir S, Kempe D, Sanger HL, Wassenegger M (1999) Heavy de novo methylation at symmetrical and non-symmetrical sites is a hallmark of RNA-directed

[40] Wassenegger M (2000) RNA-directed DNA methylation. Plant Mol. Biol. 43: 203–220. [41] Jones AL, Thomas CL, Maule AJ (1998) De novo methylation and co-suppression induced by a cytoplasmically replicating plant RNA virus. EMBO J. 17: 6385–6393 [42] Fire A, Xu S, Montgomery M, Kostas S, Driver S, Mello C (1998) Potent and specific genetic interference by double-stranded RNA in *Caenorhabditis elegans*. Nature

[43] Kasschau KD, Carrington JC (1998) A counter defensive strategy of plant viruses:

[44] Anandalakshmi R, Pruss GJ, Ge X, Marathe R, Mallory AC, Smith TH, et al (1998) A viral suppressor of gene silencing in plants. Proc. Natl. Acad. Sci. USA 95:13079-84. [45] Brigneti G, Voinnet O, Li WX, Ji LH, Ding SW, Baulcombe DC (1998) Viral pathogenicity determinants are suppressors of transgene silencing in *Nicotiana* 

[46] Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell.

[47] Lecellier CH, Voinnet O (2004) RNA silencing: no mercy for viruses? Immunol. Rev.

[48] Vastenhouw NL, Plasterk RH (2004) RNAi protects the *Caenorhabditis elegans* germline

suppression of posttranscriptional gene silencing. Cell 95:461-470.

protection against virus infection. Nature Biotechnology 19:1597-1601.

suppression technologies in plants. Nat. Biotechnol. 28(1):32-36.

of identical sequences in Arabidopsis. Mol. Cell. 3:505-511.

with PVYN coat protein gene. Potato Research 39:271-82.

with antisense RNA. Trends Biotechnol. 10:383–388.

American Potato Journal 72: 523-532.

transgene repeats. Plant J.12:63–82.

promoters in trans. EMBO J. 18: 241–248.

*benthamiana*. EMBO J. 17:6739–6746.

against transposition. Trends Genet. 20:314–319.

391(6669):806–811.

116:281–97.

198:285–303.

DNA methylation. Nucleic Acids Res. 27:1625-1634


[69] Horiguchi G (2004) RNA silencing in plants: a shortcut to functional analysis. Differentiation. 72:65-73.

**Chapter 7** 

© 2012 Panara et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2012 Panara et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Medicago truncatula* **Functional Genomics –** 

Legume functional genomics has moved many steps forward in the last two decades thanks to the improvement of genomics technologies and to the efforts of the research community. Tools for functional genomics studies are now available in *Lotus japonicus*, *Medicago truncatula* and soybean. In this chapter we focus on *M.truncatula*, as a model species for forage legumes, on the main achievements obtained due to the reported resources and on

**2. Why do we need a functional genomics tool for forage legumes?** 

Legumes are widely grown for grain and forage production, their world economic importance being second only to grasses. Legume species are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as nodule. Moreover legumes are able to establish other types of interactions such as arbuscular mycorrhyzal symbiosis with several fungi. For these outstanding biological properties legumes are considered among the most promising species for improving the sustainability of agricultural systems. In fact for farming system to remain productive and to be environmentally and economically sustainable on the long term it is necessary to replenish the reserves of nutrients which are removed or lost from the soil. Nodulating legumes have the potential to provide all nitrogen required for their growth and in this way to influence its balance in the soil and thus its availability for subsequent crops. In addition by reducing the inputs of fertilizers, legumes reduce the risk of nitrogen contamination of water resources. Furthermore, probably due to the wealth of interactions with other organisms, legumes have evolved an intricate network of secondary metabolites

**An Invaluable Resource for Studies on** 

Francesco Panara, Ornella Calderini and Andrea Porceddu

the future perspectives for the study of gene function in this species.

**Agriculture Sustainability** 

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51016

**1. Introduction** 


## *Medicago truncatula* **Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability**

Francesco Panara, Ornella Calderini and Andrea Porceddu

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51016

#### **1. Introduction**

130 Functional Genomics

Differentiation. 72:65-73.

45(4):490–495.

plants. FEBS Lett. 579:5982–5987.

Phytopathology 85:864–870.

of the Punjab Lahore, Pakistan.

[69] Horiguchi G (2004) RNA silencing in plants: a shortcut to functional analysis.

[70] Watson JM, Fusaro AF, Wang M, Waterhouse PM (2005) RNA silencing platforms in

[71] Hirai S, Oka SI, Adachi E, Kodama H (2007) The effects of spacer sequences on

[72] Meyer S, Nowak K, Sharma VK, Schulzer J, Mendel RR, Hansch R (2004) vector for

[73] Miki D, Shimamoto K (2004) Simple RNAi Vectors for St in Rice. Plant Cell Physiol.

[74] Prins M, Laimer M, Noris E, Schubert J, Wassenegger M, Tepfer M (2008) Strategies for antiviral resistance in transgenic plants. Molecular plant pathology. 9(1):73–83. [75] Missiou A, Kalantidis K, Boutla A, Tzortzakaki S, Tabler M, Tsagris M (2004) Generation of transgenic potato plants highly resistant to potato virus Y (PVY) through

[76] Smith HA, Powers H, Swaney S, Brown C, Dougherty WG (1995) Transgenic potato virus Y resistance in potato: evidence for anRNA-mediated cellular response.

[77] Tabassum B, Nasir IA, Husnain T (2011) Potato virus Y mRNA expression knockdown mediated by siRNAs in cultured mammalian cell line. Virol Sin. 26(2):105-13. [78] Tabassum B, Nasir IA, Aslam U, Husnain T (2012) PhD dissertation. CEMB, University

silencing efficiency of plant RNAi vectors. Plant Cell Reports. 26(5):651-659.

RNAi technology in Poplar. Plant Biol. (Stuttg) 6. 100-103.

RNA silencing. Molecular Breeding 14: 185–197.

Legume functional genomics has moved many steps forward in the last two decades thanks to the improvement of genomics technologies and to the efforts of the research community. Tools for functional genomics studies are now available in *Lotus japonicus*, *Medicago truncatula* and soybean. In this chapter we focus on *M.truncatula*, as a model species for forage legumes, on the main achievements obtained due to the reported resources and on the future perspectives for the study of gene function in this species.

#### **2. Why do we need a functional genomics tool for forage legumes?**

Legumes are widely grown for grain and forage production, their world economic importance being second only to grasses. Legume species are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as nodule. Moreover legumes are able to establish other types of interactions such as arbuscular mycorrhyzal symbiosis with several fungi. For these outstanding biological properties legumes are considered among the most promising species for improving the sustainability of agricultural systems. In fact for farming system to remain productive and to be environmentally and economically sustainable on the long term it is necessary to replenish the reserves of nutrients which are removed or lost from the soil. Nodulating legumes have the potential to provide all nitrogen required for their growth and in this way to influence its balance in the soil and thus its availability for subsequent crops. In addition by reducing the inputs of fertilizers, legumes reduce the risk of nitrogen contamination of water resources. Furthermore, probably due to the wealth of interactions with other organisms, legumes have evolved an intricate network of secondary metabolites

© 2012 Panara et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 Panara et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

that, as the recent advances in the knowledge of their nutraceutical properties are proving, can be considered of great importance for livestock welfare and for the quality of their products.

*Medicago truncatula* Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability 133

384 inbred lines of *M.truncatula* with a 5x coverage and a subset of 30 with deep coverage (20x) will be resequenced through an Illumina-Solexa sequencing pipeline by the *Medicago* HapMap Project. In the first report on the analysis of sequence data from 26 *M.truncatula* accessions with ~15x average genome coverage, 3,063,923 mapped single nucleotide polymorphisms (SNPs) were described and first estimates of nucleotide diversity (θw =0.0063 and θπ =0.0043 bp−1), population scaled recombination rate and rate of decay of linkage disequilibrium have been published (3). More recently the same material was used to estimate population recombination rates at 1 kb scale and very interestingly in the three chromosomes analysed recombination was higher near centromeric regions in stark contrast to what observed in every non-plant system and in the majority of plants that show a

In parallel, plant phenotyping is ongoing in greenhouse experiments for the *Medicago* lines. The combination of genetic and phenotypic data will be organized in a platform for

The discovery of gene function in model species is accomplished by exploiting resources such as mutant collections, using the ability to implement plant genetic transformation and analyzing gene transcription. In *M. truncatula* all the three approaches can be performed.

Several strategies were pursued in *M. truncatula* to produce mutant collections (Tab. 1) and

**Background Notes**

1,500 seeds treated for 24h with 0.2%EMS. 400 M1 plants obtained. 250,000 M2 seeds

harvested as a single batch.

bulked from each lot.

19,000 as reported in

http://bioinfo4.noble.org/mutant/

http://bioinfo4.noble.org/mutant/

negative gradient of recombination from telomeric to centromeric regions (4).

population 2828

(11) FNB Jemalong A17 80.000 M1, 460.000 M2

(6) γ-ray Jemalong line J5 462 M1 plants, screened as M2 families.

(8) EMS Jemalong A17 3-7,000 M1 plants in 10-20 lots. M2 seeds

(9) T-DNA R108-1 (c3) Test populations with 3 different T-DNAs (10) Tnt1 R108-1 (c3) First test population with Tnt1 (~200 R0

(12;13) Tnt1 R108-1 (c3) 7,000 Tnt1 mutants, presently extended to

plants)

genome-wide association mapping (GWAS) studies.

they will be analysed in the following section.

**technique** 

(5) EMS Jemalong

nitrosourea

**Reference Mutagenesis** 

(7) Ethyl-

**4. Functional genomics in** *Medicago truncatula* 

Two legume species, *Medicago truncatula* and *Lotus japonicus*, are being used as model to study legume genetics and genomics. *Medicago truncatula* is closely related to alfalfa, the most important forage legume in the world. It has a small, diploid genome, it is self fertile and amenable to genetic transformation. In the present review we summarize the state of the art of *M. truncatula* genomics with particular emphasis on the available resources for functional genomics studies such as mutant collections.

#### **3.** *Medicago truncatula* **genome sequencing**

Functional genomics is greatly aided by knowledge on genome sequence and transcriptome of the target species. A concerted effort was carried out in *M. truncatula* which made genome data available to the community.

Legumes are the plant family with the greatest amount of genomic data available. Three legume species, *Medicago truncatula*, *Lotus japonicus*, and *Glycine max*, have been sequenced (1). The assembly of *Medicago truncatula* genome is close to completion (2).

*M. truncatula* sequencing was initially carried out on Bacterial Artificial Chromosome (BAC) libraries following a BAC-by-BAC approach focused on gene-rich BACs. To date the available sequence data consist of three main batches: i) 246Mb of non redundant sequences that could be organized in large scaffolds separated by gaps and anchored to the eight *M.truncatula* physical chromosomes, ii) 17.3Mb of unanchored scaffolds and iii) 104.2Mb of additional unique sequence obtained by next generation sequencing (NGS) with Illumina sequencing. In total, 367.5Mb of *M.truncatula* genome representing 73,5% of the ~500Mb of the predicted genome size and about 94% of the expressed genes is available.

Taken together BAC sequences and non-redundant Illumina assemblies contain 62,388 gene loci with 14,322 gene prediction annotated as transposons. The average *M. truncatula* gene is 2,211 bp in length, contains 4.0 exons and has a coding sequence of 1,001 bp. Genome analysis and comparisons to other sequenced genomes allowed the identification of a 58- Myr-ago whole genome duplication (WGD) that has been associated with the evolution of rhizobial nodulation in *M. truncatula* and its relatives. Some nodulation-specific signalling components might have evolved through duplication and neo-functionalization from more ancient genes involved in host-mycorrhyzal signalling (2).

Another interesting feature is the presence of many amplified and somehow specialized gene families like nine leghaemoglobines, 563 Nodule Cystein-Rich Peptides (NCRs), 764 nucleotide-binding site and leucine-rich repeat (NBS-LRR) genes, genes in the flavonoid pathway such as chalcone synthases (CHS), chalcone reductases, chalcone isomerases. Many gene duplications occurred with the creation of large gene clusters (2).

The availability of a first draft of the *Medicago truncatula* genome sequence has promoted several initiatives aimed at identifying molecular markers suitable for both evolutionary and genetic mapping studies.

384 inbred lines of *M.truncatula* with a 5x coverage and a subset of 30 with deep coverage (20x) will be resequenced through an Illumina-Solexa sequencing pipeline by the *Medicago* HapMap Project. In the first report on the analysis of sequence data from 26 *M.truncatula* accessions with ~15x average genome coverage, 3,063,923 mapped single nucleotide polymorphisms (SNPs) were described and first estimates of nucleotide diversity (θw =0.0063 and θπ =0.0043 bp−1), population scaled recombination rate and rate of decay of linkage disequilibrium have been published (3). More recently the same material was used to estimate population recombination rates at 1 kb scale and very interestingly in the three chromosomes analysed recombination was higher near centromeric regions in stark contrast to what observed in every non-plant system and in the majority of plants that show a negative gradient of recombination from telomeric to centromeric regions (4).

In parallel, plant phenotyping is ongoing in greenhouse experiments for the *Medicago* lines. The combination of genetic and phenotypic data will be organized in a platform for genome-wide association mapping (GWAS) studies.

#### **4. Functional genomics in** *Medicago truncatula*

132 Functional Genomics

that, as the recent advances in the knowledge of their nutraceutical properties are proving, can be considered of great importance for livestock welfare and for the quality of their products.

Two legume species, *Medicago truncatula* and *Lotus japonicus*, are being used as model to study legume genetics and genomics. *Medicago truncatula* is closely related to alfalfa, the most important forage legume in the world. It has a small, diploid genome, it is self fertile and amenable to genetic transformation. In the present review we summarize the state of the art of *M. truncatula* genomics with particular emphasis on the available resources for

Functional genomics is greatly aided by knowledge on genome sequence and transcriptome of the target species. A concerted effort was carried out in *M. truncatula* which made genome

Legumes are the plant family with the greatest amount of genomic data available. Three legume species, *Medicago truncatula*, *Lotus japonicus*, and *Glycine max*, have been sequenced

*M. truncatula* sequencing was initially carried out on Bacterial Artificial Chromosome (BAC) libraries following a BAC-by-BAC approach focused on gene-rich BACs. To date the available sequence data consist of three main batches: i) 246Mb of non redundant sequences that could be organized in large scaffolds separated by gaps and anchored to the eight *M.truncatula* physical chromosomes, ii) 17.3Mb of unanchored scaffolds and iii) 104.2Mb of additional unique sequence obtained by next generation sequencing (NGS) with Illumina sequencing. In total, 367.5Mb of *M.truncatula* genome representing 73,5% of the ~500Mb of

Taken together BAC sequences and non-redundant Illumina assemblies contain 62,388 gene loci with 14,322 gene prediction annotated as transposons. The average *M. truncatula* gene is 2,211 bp in length, contains 4.0 exons and has a coding sequence of 1,001 bp. Genome analysis and comparisons to other sequenced genomes allowed the identification of a 58- Myr-ago whole genome duplication (WGD) that has been associated with the evolution of rhizobial nodulation in *M. truncatula* and its relatives. Some nodulation-specific signalling components might have evolved through duplication and neo-functionalization from more

Another interesting feature is the presence of many amplified and somehow specialized gene families like nine leghaemoglobines, 563 Nodule Cystein-Rich Peptides (NCRs), 764 nucleotide-binding site and leucine-rich repeat (NBS-LRR) genes, genes in the flavonoid pathway such as chalcone synthases (CHS), chalcone reductases, chalcone isomerases. Many

The availability of a first draft of the *Medicago truncatula* genome sequence has promoted several initiatives aimed at identifying molecular markers suitable for both evolutionary and

(1). The assembly of *Medicago truncatula* genome is close to completion (2).

the predicted genome size and about 94% of the expressed genes is available.

ancient genes involved in host-mycorrhyzal signalling (2).

gene duplications occurred with the creation of large gene clusters (2).

functional genomics studies such as mutant collections.

**3.** *Medicago truncatula* **genome sequencing** 

data available to the community.

genetic mapping studies.

The discovery of gene function in model species is accomplished by exploiting resources such as mutant collections, using the ability to implement plant genetic transformation and analyzing gene transcription. In *M. truncatula* all the three approaches can be performed.

Several strategies were pursued in *M. truncatula* to produce mutant collections (Tab. 1) and they will be analysed in the following section.



*Medicago truncatula* Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability 135

**Phenotype Gene/Mutant line** 

*speckle (spk)* 

Individuation of 4 complementation groups (DMI1, DMI2, DMI3, NSP): *dmi1-1* = C71 = *domi dmi1-2* = B129 *dmi1-3* = Y6 *dmi2-1* = TR25 *dmi2-2* = TR26 *dmi2-3* = P1 *dmi3-1* = TRV25 *nsp1-1* = B85 *nsp1-2* = C54

*cod1, cod2, cod3, cod4, cod5,* 

*cod6, cod7* 

*hcl = B56* 

*pdl1* 

TR69, TR74, TR183, TRV15

M1 populations with different size. Genetic analysis of the two collections allowed to define that the number of meristematic cells that contribute to seed (germ-line) in *Medicago truncatula* is 3. The number of mutations detected in the two EMS populations was 1 every 485 kb. A pilot reverse genetic experiment with 56 target genes revealed an efficiency of 13 independent alleles per exon screened, 67% of which were missense and 5% nonsense mutations. An Italian functional genomics initiative produced a small collection of TILLING mutants with about 2500 M2 lines and a reported efficiency of about 4 independent alleles for target sequence. Catalogue of mutant phenotypes were developed and services for reverse screening with target sequence are available (http://inra.fr/legumbase). A list of

*M.truncatula* mutants is reported in Table 2.

**New mutants** 

(5) EMS 1 Nod+Fix- TE7=Mt*sym*1 (6) γ-ray 2 Nod-, Myc- TR25, TR26

(8) EMS 1 EIN, Nod++ *sickle = skl1*  (8) EMS 1 Nod- C71 = *Domi*  (19) γ-ray 1 Nod- TRV25

(22) EMS 7 Calcium oxalate

(23) EMS 1 Nod-, root hair

(24) EMS 1 Blocked in the formation

(6) γ-ray 4 Nod±, Myc+ TR34, TR79, TR89, TRV9 (6) γ-ray 9 Nod+Fix-, Myc+ TR3, TR9, TR13, TR36, TR62,

(20) EMS 3 developmental *mtapetala (tap), palmyra (plm),* 

(21) EMS 5 Nod- B85, B129, C54, P1, Y6

defective

deformation

of nodule primordial

(6) γ-ray 3 Nod++Nts, Myc+ TR122, TRV3, TRV8

**technique** 

**Reference Mutagenesis** 

**Table 1.** Mutant collections in *Medicago* spp. **EMS = ethyl methanesulphonate.**

#### **5. Chemical-physical mutagenesis**

#### **5.1. Target Induced Local Lesion IN Genomes (TILLING)**

Alkylating agents such as ethyl methanesulphonate (EMS) have been used to develop mutant collections of *Medicago truncatula*. EMS induces single base pair C/G to A/T substitution in nucleotides. The mutagenized seeds are germinated and the resulting plants are selfed to produce M1 progenies. The M1 plants are then grown and a TILLING M2 collection is established by growing few seeds from each M1 plant. Total genomic DNA is purified from each M1 plant and pooled. The mutant collections are usually screened with reverse genetics approaches. TILLING involves the identification of mismatches in heteroduplexes formed by single stranded DNA from the wild type and mutant alleles of the target locus. The target sequences are generated by PCR amplification from bulked DNA isolated from single M1 plants using labelled primers appropriate for the detection strategy employed. The amplicons are then heated, causing strand separation, re-annealed in order to form heteroduplexes, cleaved by an endonuclease active on single stranded DNA (i.e. CelI from celery) at the mismatch point and the products separated by electrophoresis. Several EMS mutant collections of *Medicago truncatula* are available. Within the framework of the European Grain Legume Integrated Project two mutant collections were established. The two collections showed the same number of M2 lines that however were obtained from

**Reference Mutagenesis** 

(14) Activation

tagging

(15) EMS Medicago

**5. Chemical-physical mutagenesis** 

**technique** 

(14) Tnt1 R108-1 (c3) ~1000 R0

(14) EMS Jemalong 2HA 2500 M2 plants

littoralis 'Angel'

(16) Tnt1 Jemalong 2HA Mutants produced in the frame of the

(17) FNB Jemalong A17 31,200 M1 plants, 156,000 M2 plants (18) EMS Jemalong A17 http://195.220.91.17/legumbase/

**Table 1.** Mutant collections in *Medicago* spp. **EMS = ethyl methanesulphonate.**

**5.1. Target Induced Local Lesion IN Genomes (TILLING)** 

**Background Notes**

Development of new annual medics varieties

european Grain Legumes Integrated Project

(GLIP). The total number of mutants produced by 10 labs all around Europe should be several thousands (~6000). 2000 of them will integrate the Tnt1 collection at

2 populations. The first (not using single seed descent, SSD) 500 M1 produced 4500 M2. In the second (using SSD) 4350 M1 and

(i.e. resistant to herbicides)

R108-1 (c3) ~100 mutant lines

Noble.

4350 M2.

Alkylating agents such as ethyl methanesulphonate (EMS) have been used to develop mutant collections of *Medicago truncatula*. EMS induces single base pair C/G to A/T substitution in nucleotides. The mutagenized seeds are germinated and the resulting plants are selfed to produce M1 progenies. The M1 plants are then grown and a TILLING M2 collection is established by growing few seeds from each M1 plant. Total genomic DNA is purified from each M1 plant and pooled. The mutant collections are usually screened with reverse genetics approaches. TILLING involves the identification of mismatches in heteroduplexes formed by single stranded DNA from the wild type and mutant alleles of the target locus. The target sequences are generated by PCR amplification from bulked DNA isolated from single M1 plants using labelled primers appropriate for the detection strategy employed. The amplicons are then heated, causing strand separation, re-annealed in order to form heteroduplexes, cleaved by an endonuclease active on single stranded DNA (i.e. CelI from celery) at the mismatch point and the products separated by electrophoresis. Several EMS mutant collections of *Medicago truncatula* are available. Within the framework of the European Grain Legume Integrated Project two mutant collections were established. The two collections showed the same number of M2 lines that however were obtained from M1 populations with different size. Genetic analysis of the two collections allowed to define that the number of meristematic cells that contribute to seed (germ-line) in *Medicago truncatula* is 3. The number of mutations detected in the two EMS populations was 1 every 485 kb. A pilot reverse genetic experiment with 56 target genes revealed an efficiency of 13 independent alleles per exon screened, 67% of which were missense and 5% nonsense mutations. An Italian functional genomics initiative produced a small collection of TILLING mutants with about 2500 M2 lines and a reported efficiency of about 4 independent alleles for target sequence. Catalogue of mutant phenotypes were developed and services for reverse screening with target sequence are available (http://inra.fr/legumbase). A list of *M.truncatula* mutants is reported in Table 2.



*Medicago truncatula* Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability 137

**Phenotype Gene/Mutant line** 

Various

*nst1* 

*mtstp1* 

*fcl1* 

*rdn1* 

*Lha* 

*Stf* 

*irg1* 

**Reference Mutagenesis** 

(49) Activation

tagging

**5.2. "Delete a gene" collections** 

616,000 M2 FNB families.

mutants from these collections.

**technique** 

**New mutants** 

(42) Tnt1 various Leaf epidermal

(43) Tnt1 1 Lack of lignin in the

(44) Tnt1 1 Secondary cell wall

(45) Fast neutron 1 Compund leaf

(46) Fast neutron 1 Root determined

(52) Tnt1 1 Reduced leaf blade

(53) Tnt1 1 Inhibition of rust germ

(47;48) Fast neutron 1 Myc-, Nod- *Vpy* 

(50)(1) Tnt1 1 Stay green *MtSGR*  (51) Tnt1 1 Smooth leaf margin *slm1* 

(40) Fast neutron 1 Leaf dissection *palm1* 

(41) T-DNA 1 Compact roots *cra1* (not tagged)

morphology

interfascicular region

thickening in pith

development

noudulation

expansion

**Table 2.** *Medicago truncatula* mutants. Nodulation phenotypes: Nod++ = hypernodulator, Nod+ = wild type nodulator, Nod± = reduced nodulation, Nod- = lack of nodules, Nod-/+ = late nodulation. Nitrogen fixation phenotypes: Fix+ = wild type, Fix- = no fixation. Mychorrhization phenotypes: Myc+ = wild type, Myc- = absent or reduced mychorrhizas, Myc-/+ = mix of normal colonization and events of formation only of appressoria with no intercellular hyphae developing from them, Myc++ = hyper responsive to mychorrhization. Nts = nitrate tolerant nodulation. EIN = ethylene insensitive.

Irradiation of plant seeds to appropriate dose of fast neutrons and γ-rays results in deletion of DNA fragments of variable lengths with an average modest reduction of seed viability.

Large mutant collections by seed irradiation have been created for *Medicago truncatula* functional genomics studies. Although the first experiments were based on γ-ray irradiation of the Jemalong J5 seeds (6) the main body of the collection was obtained by Fast Neutron Bombardment (FNB) of the genotype A17. Globally the two larger collections, stored at the John Innes Center and at the Noble Foundation, consist of about

Both reverse and forward genetic approaches have been successfully applied to study

tube differentiation

1 Lack of hemolytic saponins


**Table 2.** *Medicago truncatula* mutants. Nodulation phenotypes: Nod++ = hypernodulator, Nod+ = wild type nodulator, Nod± = reduced nodulation, Nod- = lack of nodules, Nod-/+ = late nodulation. Nitrogen fixation phenotypes: Fix+ = wild type, Fix- = no fixation. Mychorrhization phenotypes: Myc+ = wild type, Myc- = absent or reduced mychorrhizas, Myc-/+ = mix of normal colonization and events of formation only of appressoria with no intercellular hyphae developing from them, Myc++ = hyper responsive to mychorrhization. Nts = nitrate tolerant nodulation. EIN = ethylene insensitive.

#### **5.2. "Delete a gene" collections**

136 Functional Genomics

**Reference Mutagenesis** 

**technique** 

**New mutants** 

(24) EMS 1 Blocked in the formation

(21) EMS 3 Nod-, root hair

(25) EMS 6 Oxalate crystal

(26) Fast Neutron 2 Nod-, cortical cell

(27) EMS 1 Nod-, does not respond

(28) EMS 1 Nod++ *Sunn* (29) EMS 1 Blocked in the formation

(30) EMS 1 Numerous infections and

(31) EMS 1 Nod-, defective in lateral

(34) Fast Neutron 1 Nod- *bit1* (35) Tnt1 1 Single leaflet *sgl1*

(39) EMS 1 Myc++, Nod-/+ B9

(36) Fast Neutron 1 Increased nodule

(37) EMS 1 Impaired in nodule

(38) EMS 1 Aberrant root hair

**Phenotype Gene/Mutant line** 

*lin1*

(first citation)

*cmd1, cmd2, cmd3, cmd4,* 

*hcl-1* = B56 *hcl-2* = W1 *hcl-3* = AF3

*cmd5, cmd6* 

*nsp2-1 nsp2-2* 

*lin1*

*Nip*

*Latd*

*Efd*

*Api*

*Rpg*

(first description)

*Mtsym21* = TRV49

2H-8; *dnf7* = 4D-5

*dnf2* = 1B-5; *dnf3* = 2C-2; *dnf4* = 2E-1; *dnf5* = 2F-16; *dnf6* =

*nfp* = C31

and/or maintenance of epidermal cell infection

morphology defective

to Nod Factors by induction of root hair

and/or maintenance of epidermal cell infection

deformation

polyphenolics

(32) γ-ray 2 Nod+,Fix-,Myc+ *Mtsym20 =* TRV43, TRV54

(33) Fast Neutron 6 Fix- *dnf1-1* = 1D-1; *dnf1-2* = 4A-17;

number

primordium invasion

curling and infection thread formation

(32) γ-ray 1 Nod-/+,Myc+ *Mtsym15 =* TRV48;

(32) γ-ray 1 Nod-,Myc-/+ *Mtsym16 =* TRV58

root development

deformations

division

Irradiation of plant seeds to appropriate dose of fast neutrons and γ-rays results in deletion of DNA fragments of variable lengths with an average modest reduction of seed viability.

Large mutant collections by seed irradiation have been created for *Medicago truncatula* functional genomics studies. Although the first experiments were based on γ-ray irradiation of the Jemalong J5 seeds (6) the main body of the collection was obtained by Fast Neutron Bombardment (FNB) of the genotype A17. Globally the two larger collections, stored at the John Innes Center and at the Noble Foundation, consist of about 616,000 M2 FNB families.

Both reverse and forward genetic approaches have been successfully applied to study mutants from these collections.

Reverse screening of FNB populations have been carried out by the DeTILLING strategy described by Rogers C *et al.* (17). This strategy allows detection of mutants by PCR on bulks of DNAs of FNB mutants. The wild type target amplification is avoided by a strategy that combines restriction enzyme digestion of the template and the use of poison primers. With this strategy a mutant recovery rate of 29% has been obtained from a population of 156.000 M2 plants (4 genes out of 14 screened).

*Medicago truncatula* Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability 139

The collection maintained at the Noble Foundation (http://bioinfo4.noble.org/mutant/) which includes also the first mutants generated by CNRS in France, is based on the genotype R108-c3. Another collection of about 1000 lines from the same R-108 line was

In the framework of the GLIP project 8000 Tnt1 mutants were produced from the Jemalong 2HA (2HA3-9-10-3) line. The GLIP collection is maintained by the various labs that participated to the project and a subset of plants were merged with the collection at the

Iantcheva and colleagues reported that Tnt1 transposition efficiency in Jemalong 2HA has a lower efficiency with only 10-15 new insertions per line and a variable percentage of regenerated plants without transposition (16). The adoption of 2HA line for mutagenesis instead of R108, was motivated by the highest DNA homology to the line used for genome sequencing (Jemalong A17), and for the presence of active and characterized endogenous

Tnt1 mutant collections have been screened with both forward and reverse genetic approaches. Forward approaches have been based on cloning of host sequence flanking the insertion sites and subsequent identification of events linked to the studied mutation. Based on the duplicated Tnt1 long terminal repeats (LTR) sequences several molecular approaches including thermal asymmetric interlaced (TAIL)-PCR, Inverse-PCR have been used to recover the host sequences flanking the insertion sites (60). Segregation analysis of each cloned insertion site can then be used to select the event linked to the mutation. In alternative the insertion sites associated with the mutations can be selected by segregation analysis prior to host sequence cloning by employing a sequence specific amplification

Confirmation of the identity of the mutation can be obtained by means of complementation tests based on the reintroduction of the wild type gene sequence in the mutated background. In alternative one could obtain independent alleles of the target gene and compare their similarity to the original mutant phenotype. This can be done using TILLING and Tnt1 mutant populations as demonstrated by many publications that report successful recovery of alleles by reverse screening (61) and Table 3. The power of the Tnt1 mutagenesis approach is also witnessed by the prevalence of publications reporting successful gene

cloning based on such strategy compared to the others since 2008 (Table 2 and 3).

(63) *dmi1* AY497771, possible membrane receptor

(64) *dmi3* Ca2 and Calmodulin dependent

protein kinase

**Reference Mutant Gene Approach**  (62) *dmi2* NORK Physical mapping

Physical mapping

mapping/Transcriptional

Physical

based cloning

produced by CNR-IGV in Italy.

Noble Foundation.

retroelements (59).

polymorphism (S-SAP) based protocol.

Nevertheless deletion size can hamper reverse genetics screening (Chen R., personal communication) leaving forward genetics as the main choice in case of FNB populations. However map-based cloning required to discover the mutation of interest is helped by strategies such as transcriptional cloning, originally devised by Mitra R M *et al.* (54), which has allowed the identification of FNB induced mutations (see Table 3). This approach relies on the identification of mutated genes through detailed genome-wide transcriptomic analyses. Also genome-wide analyses of FNB mutant are expected to benefit of the recent development of a *Medicago truncatula* genome-wide tiling array by Nimblegen. A list of *Medicago truncatula* FNB mutants characterized by forward genetics approaches is reported in Table 2.

#### **6. Insertional mutagenesis with DNA mobile elements**

#### **6.1. Tnt1**

T-DNA tagging has been the strategy of choice for many mutant collections in *Arabidopsis* and it has allowed fundamental discoveries in gene functions and advances in both basic and applied plant research (55). Unfortunately only *Arabidopsis* can be transformed easily by the floral-dip method which allows the generation of large numbers of mutants in a costeffective manner. Up to now transformation for the other plant species including *M. truncatula* can only be achieved by tissue culture-based protocols requiring great efforts to produce the number of mutants that would allow a significant genome coverage. An interesting strategy has been recently published in the legume *Lotus japonicus* based on the endogenous retrotransposon LORE1 (56;57). LORE1, originally activated via tissue culture, retained its activity for some regenerated plants in the subsequent generations. Based on such discovered germline activity, tagged M1 mutant collections were produced by seed propagation from activated starter lines (M0) (57;58).

In *M.truncatula* large scale collections of mutants have been constructed using the tobacco Tnt1 retrotransposon. d'Erfurth and colleagues have demonstrated that in the *Medicago truncatula* R108 genotype, this element has the ability to transpose during the early steps of in vitro regeneration (10) with a high rate of insertion in transcribed genomic regions. Sequence analyses of insertion sites has showed the virtual absence of insertion site preference. The average amount of new insertions per regenerated line was calculated in the order of ~25. Based on these data it was shown that a collection of 14-16.000 Tnt1 lines will store tagging events for about 90% of *M.truncatula* genes (13). Such an ambitious objective has been pursued by working on two *Medicago truncatula* lines.

The collection maintained at the Noble Foundation (http://bioinfo4.noble.org/mutant/) which includes also the first mutants generated by CNRS in France, is based on the genotype R108-c3. Another collection of about 1000 lines from the same R-108 line was produced by CNR-IGV in Italy.

138 Functional Genomics

in Table 2.

**6.1. Tnt1** 

M2 plants (4 genes out of 14 screened).

Reverse screening of FNB populations have been carried out by the DeTILLING strategy described by Rogers C *et al.* (17). This strategy allows detection of mutants by PCR on bulks of DNAs of FNB mutants. The wild type target amplification is avoided by a strategy that combines restriction enzyme digestion of the template and the use of poison primers. With this strategy a mutant recovery rate of 29% has been obtained from a population of 156.000

Nevertheless deletion size can hamper reverse genetics screening (Chen R., personal communication) leaving forward genetics as the main choice in case of FNB populations. However map-based cloning required to discover the mutation of interest is helped by strategies such as transcriptional cloning, originally devised by Mitra R M *et al.* (54), which has allowed the identification of FNB induced mutations (see Table 3). This approach relies on the identification of mutated genes through detailed genome-wide transcriptomic analyses. Also genome-wide analyses of FNB mutant are expected to benefit of the recent development of a *Medicago truncatula* genome-wide tiling array by Nimblegen. A list of *Medicago truncatula* FNB mutants characterized by forward genetics approaches is reported

T-DNA tagging has been the strategy of choice for many mutant collections in *Arabidopsis* and it has allowed fundamental discoveries in gene functions and advances in both basic and applied plant research (55). Unfortunately only *Arabidopsis* can be transformed easily by the floral-dip method which allows the generation of large numbers of mutants in a costeffective manner. Up to now transformation for the other plant species including *M. truncatula* can only be achieved by tissue culture-based protocols requiring great efforts to produce the number of mutants that would allow a significant genome coverage. An interesting strategy has been recently published in the legume *Lotus japonicus* based on the endogenous retrotransposon LORE1 (56;57). LORE1, originally activated via tissue culture, retained its activity for some regenerated plants in the subsequent generations. Based on such discovered germline activity, tagged M1 mutant collections were produced by seed

In *M.truncatula* large scale collections of mutants have been constructed using the tobacco Tnt1 retrotransposon. d'Erfurth and colleagues have demonstrated that in the *Medicago truncatula* R108 genotype, this element has the ability to transpose during the early steps of in vitro regeneration (10) with a high rate of insertion in transcribed genomic regions. Sequence analyses of insertion sites has showed the virtual absence of insertion site preference. The average amount of new insertions per regenerated line was calculated in the order of ~25. Based on these data it was shown that a collection of 14-16.000 Tnt1 lines will store tagging events for about 90% of *M.truncatula* genes (13). Such an ambitious objective

**6. Insertional mutagenesis with DNA mobile elements** 

propagation from activated starter lines (M0) (57;58).

has been pursued by working on two *Medicago truncatula* lines.

In the framework of the GLIP project 8000 Tnt1 mutants were produced from the Jemalong 2HA (2HA3-9-10-3) line. The GLIP collection is maintained by the various labs that participated to the project and a subset of plants were merged with the collection at the Noble Foundation.

Iantcheva and colleagues reported that Tnt1 transposition efficiency in Jemalong 2HA has a lower efficiency with only 10-15 new insertions per line and a variable percentage of regenerated plants without transposition (16). The adoption of 2HA line for mutagenesis instead of R108, was motivated by the highest DNA homology to the line used for genome sequencing (Jemalong A17), and for the presence of active and characterized endogenous retroelements (59).

Tnt1 mutant collections have been screened with both forward and reverse genetic approaches. Forward approaches have been based on cloning of host sequence flanking the insertion sites and subsequent identification of events linked to the studied mutation. Based on the duplicated Tnt1 long terminal repeats (LTR) sequences several molecular approaches including thermal asymmetric interlaced (TAIL)-PCR, Inverse-PCR have been used to recover the host sequences flanking the insertion sites (60). Segregation analysis of each cloned insertion site can then be used to select the event linked to the mutation. In alternative the insertion sites associated with the mutations can be selected by segregation analysis prior to host sequence cloning by employing a sequence specific amplification polymorphism (S-SAP) based protocol.

Confirmation of the identity of the mutation can be obtained by means of complementation tests based on the reintroduction of the wild type gene sequence in the mutated background. In alternative one could obtain independent alleles of the target gene and compare their similarity to the original mutant phenotype. This can be done using TILLING and Tnt1 mutant populations as demonstrated by many publications that report successful recovery of alleles by reverse screening (61) and Table 3. The power of the Tnt1 mutagenesis approach is also witnessed by the prevalence of publications reporting successful gene cloning based on such strategy compared to the others since 2008 (Table 2 and 3).



*Medicago truncatula* Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability 141

Tnt1 reverse

and TILLING

sequence cloning

sequence cloning

sequence cloning

sequence cloning

Tnt1 forward and flanking

Tnt1 forward and flanking

**Reference Mutant Gene Approach** 

(48) *vpy* Vapyrin Microarray based cloning,

(49) *lha* CYP716A12, Cytochrome P450 Flanking sequence tagging

(50) *MtSGR* Stay green gene Tnt1 forward and flanking

(51) *slm1* Auxin efflux carrier protein Tnt1 forward and flanking

homeobox transcription factor

Reverse genetics studies in *Medicago truncatula* did not only take advantage of the many mutant populations available but also of techniques based on post-transcriptional gene silencing (PTGS). In this case plants are transformed with a construct that will produce double-stranded RNAs that will guide sequence-specific mRNA degradation of the target gene. The phenotype of the transformed plants can gradually vary from wild type to knock-out thus many transformants are needed to obtain the desired effect. Mild effects can be beneficial in case of essential genes whose complete loss-of-function may cause lethal phenotypes. RNAi in *M.truncatula* has been extensively used to study gene function but it has not been a matter of a functional genomics approach as for *Arabidopsis* and the AGRIKOLA collection (80). Nevertheless many gene functions have been characterized exploiting RNAi. A list of gene function and *Medicago* 

(71) *mate2* MATE Tnt1 reverse

transcription factor

(79) *mtpar* MYB transcription factor Tnt1 reverse

*truncatula* physiology studies that used RNAi approaches is reported in Table 4.

**Reference Silenced gene Phenotype**

(75) Lyk4 Effect on infection thread morphology

(75) Lyk3 Marked reduction of nodulation when inoculated

(81) CDPK1 Reduced root hair and root cell lengths. Diminution

colonization.

(82) DMI2 Reduction of organelle-like symbiosomes in nodules

with Sm 2011ΔNodFE-GFP

of both rhizobial and mycorrhizal symbiotic

(78) *fta, ftc* MtFTa, MtFTc, protein ligands Tnt1 reverse

(52) *stf* Stenofolia, WUSCHEL-like

(53) *irg1* Cys(2)His(2) zinc finger

**Table 3.** *Medicago* genes characterized using mutants.

**7. RNAi and VIGS** 


**Table 3.** *Medicago* genes characterized using mutants.

#### **7. RNAi and VIGS**

140 Functional Genomics

**Reference Mutant Gene Approach**  (65) *nsp2* GRAS Transcriptional regulator Physical mapping (66) *nsp1* GRAS Transcriptional regulator Physical mapping

(46) *sunn* CLV1-like LRR receptor kinase Physical mapping and gene

(67) *mtpim* MADS-box Reverse screening on Tnt1

(68) *mtpt4* Phosphate transporter RNAi and TILLING

for nodulation (ERN)

(35) *sgl1* MtUNI (transcription factor) Tnt1 forward

required for nodule differentiation

(71) *mate1* MATE Tnt1 reverse (72) *ugt78g1* Glucosyl transferase Tnt1 reverse

transcription factor

(74) *MtSYMREM1*, remorin Tnt1 reverse

(75) *ccr1, ccr2* Cinnamoyl CoA Reductase Tnt1 reverse (76) *ugt73f3* Glucosyl transferase Tnt1 reverse

(77) *rdn1* Uncharacterized plant family Mapping

(44) *dnf1* Signal peptidase complex subunit Fast neutron microarray

(43) *nst1* NAC transcription factor Forward screening and Tnt1

(44) *mtstp1* WRKY transcription factor Forward screening and Tnt1

(45) *fcl1* Class M KNOX Fast neutron forward, map

(38) *rpg* Putative long coiled-coil protein Map based cloning (28) *sickle* MtEIN2, ethylene signaling gene Map based cloning and

U-box and WD40 repeat domains

(70) *srlk* LRR kinase TILLING reverse and RNAi

(34) *bit1* ERF transcription factor required

(36) *efd* Ethilene responsive factor

(69) *lin* E3 ubiquitin ligase containing a

(73) *mtapetala MtPI*, MADS Box transcription factor

(40) *palm1* Cys(2)His(2)zinc finger

homology

collection

(reverse)

cloning

Transcriptional based

Fast neutron reverse

gene homology

Positional cloning

RNAi and mutation segregation analisys

Tnt1 reverse

based cloning

Fast neutron forward and

flanking region cloning

flanking region cloning

based cloning and Tnt1

reverse

Reverse genetics studies in *Medicago truncatula* did not only take advantage of the many mutant populations available but also of techniques based on post-transcriptional gene silencing (PTGS). In this case plants are transformed with a construct that will produce double-stranded RNAs that will guide sequence-specific mRNA degradation of the target gene. The phenotype of the transformed plants can gradually vary from wild type to knock-out thus many transformants are needed to obtain the desired effect. Mild effects can be beneficial in case of essential genes whose complete loss-of-function may cause lethal phenotypes. RNAi in *M.truncatula* has been extensively used to study gene function but it has not been a matter of a functional genomics approach as for *Arabidopsis* and the AGRIKOLA collection (80). Nevertheless many gene functions have been characterized exploiting RNAi. A list of gene function and *Medicago truncatula* physiology studies that used RNAi approaches is reported in Table 4.



*Medicago truncatula* Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability 143

Abolition of arbuscule formation.

genes. Reduction in nodule size.

Impaired rhizobial colonization.

nitrogen fixation associated with a reduction in the expression of the leghemoglobin and thioredoxin S1

nodules. Reduction of nodule number and nitrogen

structure of root hairs. Reduced root hair growth.

microbial infection. Promoted mycorrhizal and A.euteiches early hyphal root colonization.

legumin in seeds and germination deficiency.

(103) Vapyrin Impaired passage across epidermis by AM fungi.

(105) γECS Lower homoglutathione content. Lower biological

(106) MtSAP1 Lower level of storage globulin proteins, vicilin and

(107) MtNR1, MtNR2 Reduced nitrate or nitrite reductase activity and NO

fixation capacity.

level. (108) MtNoa/Rif1 Decrease in NO production in roots but not in

(109) MtROPGEF2 Effect on cytosolic Ca2+ gradient and subcellular

Virus-induced gene silencing (VIGS) is a PTGS technique that can be used transiently by scrubbing leaves or introducing the viral vector in the plant by agro-infiltration. VIGS is being used for large scale forward genetics screening by inoculation of cDNA library and subsequent identification of the gene involved in the process of interest (112). Viral vectors working on *Medicago truncatula* have been recently described. Grønlund *et al.* used successfully a Pea Early Browning Virus (PEBV) based vector for both transient expression of reporter genes and for silencing of the Phytoene Desaturase (PDS) gene that causes a bleaching phenotype (113). Várallyay and colleagues constructed two VIGS vectors based on the Sunnhemp Mosaic Virus (SHMV) that can systemically infect *M.truncatula* without causing severe symptoms and reported a successful silencing of the Chlorata 42 gene (114). Large scale screenings based on

Functional genomics of forage legumes started with the aim of determining the molecular and genetic bases of nitrogen fixation and since the beginning mutant collections have been thoroughly screened also for mycorrhyzal symbiosis. These aspects are still being

(110) MtROP9 Reduced growth , no ROS generation after

(111) MtNAC969 Improved growth under salt stress.

**Table 4.** Use of RNAi approaches in *Medicago truncatula*.

VIGS analysis have not been reported for *M. truncatula* as far.

**8. Perspectives** 

**Reference Silenced gene Phenotype**

(104) MtAOC1 No nodulation phenotype observed

(102) MtN5 Reduced nodulation

*Medicago truncatula* Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability 143


**Table 4.** Use of RNAi approaches in *Medicago truncatula*.

Virus-induced gene silencing (VIGS) is a PTGS technique that can be used transiently by scrubbing leaves or introducing the viral vector in the plant by agro-infiltration. VIGS is being used for large scale forward genetics screening by inoculation of cDNA library and subsequent identification of the gene involved in the process of interest (112). Viral vectors working on *Medicago truncatula* have been recently described. Grønlund *et al.* used successfully a Pea Early Browning Virus (PEBV) based vector for both transient expression of reporter genes and for silencing of the Phytoene Desaturase (PDS) gene that causes a bleaching phenotype (113). Várallyay and colleagues constructed two VIGS vectors based on the Sunnhemp Mosaic Virus (SHMV) that can systemically infect *M.truncatula* without causing severe symptoms and reported a successful silencing of the Chlorata 42 gene (114). Large scale screenings based on VIGS analysis have not been reported for *M. truncatula* as far.

#### **8. Perspectives**

142 Functional Genomics

**Reference Silenced gene Phenotype**

(76) MtHAP2-1 Alteration of nodule development (84) MtCPK3 Increased average nodule number

(89) HMGR1 Dramatic decrease in nodulation. (90) IPD3 No obvious phenotype observed

(93) MtFNSII-1, MtFNSII-2 Reduced nodulation

(73) MtPI, MtNGL9 Altered flower development

(85) MtCRE1 Cytokinin-insensitive roots, increate number of

(87) CHS Reduced levels of flavonoids and subsequent

(63) MtPT4 Premature death of mycorrhizal arbuscules.

(94) MtCDD1 Alteration of the Arbuscular Mycorrhizal –

(95) MtDXS2 Reduction of AM-induced apocarotenoid

(78) MtSERF1 Strong inhibition of somatic embryogenesis

(96) MtWUS Strong inhibition of somatic embryogenesis

(65) Srlk Transgenic root growth less inibited by salt stress. (97) FLOT2, FLOT4 Reduced nodulation and root development. (98) MtMSBP1 Aberrant mycorrhizal phenotype with thik and

(99) MtCDC16 Decreased number of lateral roots and increased

(101) MtSNARP2 Aberrant early senescent nodules where

(74) MtSYMREM1 Reduced nodulation and abnormal nodule

(100) NPR1 Acceleration of root hair curling at the beginning of

development

symbiosis estabilishment

A.euteiches.

(91) MtSNF4b Reduced seed longevity, alteration in non reducing sugar content.

development.

accumulation.

lateral roots, strong reduction in nodulation.

Reduced colonization by the root pathogen

Reduced nodule number and altered symbiosome

mediated accumulation of apocarotenoids

septated appressoria, decrease number of arbuscules and distorted arbuscule morphology.

number of nodules. Reduced sensivity to auxin.

differentiated bacteroids degenerate rapidly.

Reduced number of nodules

inability to nodulate

(83) NFP Nod-

(86) MtPIN2, MtPIN3, MtPIN4

(88) PR10-1 (pathogenesis related)

(92) ENOD40-1, ENOD40- 2

> Functional genomics of forage legumes started with the aim of determining the molecular and genetic bases of nitrogen fixation and since the beginning mutant collections have been thoroughly screened also for mycorrhyzal symbiosis. These aspects are still being

investigated and we expect that many more results will be published in the next years. A better understanding of nitrogen fixation and symbiosis is fundamental for the development of a sustainable agriculture aiming at a reduction of inputs and at maintaining soil fertility. Nitrogen (N) is one of the crucial nutrients for all organisms including plants. The doubling of world food production in the past four decades was contributed by a sevenfold increase of N fertilization (115). The anthropogenic N which is mostly lost to air, water and land affects climate, the chemistry of the atmosphere, and the composition and function of terrestrial and aquatic ecosystems (116). Improving the ability of plants to exploit environmental nitrogen would decrease N fertilization and its negative consequences; therefore a deep understanding of legume symbiosis with nitrogen fixing bacteria could help the long term goal of transferring the associative ability of legume species to nonsymbiotic crops of agronomic relevance. As a consequence functional genomics of nodulation will have an impact on reduction of intensive agriculture practices with benefits for the preservation of environment and quality of human activities.

*Medicago truncatula* Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability 145

studies for the understanding of the molecular and cellular biology of PA polymerization, transport, and storage helped by the functional genomics tools available for *M.truncatula*. Recent positive achievements were obtained by biotechnological strategies based on the overexpression of MYB transcription factors that induced PAs accumulation in both alfalfa

In addition to well-known beneficial properties of flavonoids (cit) recent evidence suggests that flavonoids themselves, particularly fractions rich in PAs, can significantly reduce cognitive deterioration in animal model systems (120-122), and may more generally promote improvements in memory acquisition, consolidation, storage, and retrieval under

In Chinese medicine one of the oldest herbal medicine was obtained by the roots of the legume plant licorice (*Glychyrriza glabra*).containing the triterpenoid saponin glychyrrizin exhibiting a wide range of pharmacological activities. Cytochrome P450 monooxygenases were proved to be responsible for synthesis of glychyrrizin via oxidative steps based on

In forage legumes saponins can be toxic to monogastric animals and reduce forage palatability for ruminants. Mutant analysis in *M.truncatula* has unveiled the genetic control of key biosynthetic steps for saponins related to oxidation and glycosilation (49;124),

Both human and animal nutritional science are bound to profit from plant genetic analysis and nutritional genomics, opening possibilities to more personalized approaches to

[1] Sato S, Sachiko I,T Satoshi . Structural analyses of the genomes in legumes. Curr Opin

[2] Young ND, Debellé Fdr, Oldroyd GED, Geurts R, Cannon SB, Udvardi MK, et al. The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature

opening possibilities of biotechnological manipulation of saponins in alfalfa.

*CNR (National Council of Research) – Institute of Plant Genetics, Perugia, Italy* 

and clover leaves (79).

nondegenerative conditions.

biochemical experiments (123).

**Author details** 

Andrea Porceddu\* *University of Sassari, Italy* 

**9. References** 

Corresponding Author

 \*

medicine and improvement of the quality of life.

Francesco Panara and Ornella Calderini

Plant Biol 2010;13 (2):146-52.

2011;480 (7378):520-4.

Another positive role for legumes in an environmental perspective is addressed by species such as *Lotus spp.* that have strong adaptive characteristics making them good candidates for restoration and phytoremediation of degraded environments (117). This happens in the Flooding Pampa (Argentina) where the presence of proteinaceous forages was reestablished by the introduction of *L. tenuis*, being the other legume species reduced by the harsh environmental condition.

Pastures and feedstuff including forage legumes have a higher quality compared to those based only on grasses and provide an important input of protein in animal nutrition. More recently public and scientific debate has reassessed forage legumes importance for the quality of livestock nutrition and welfare has having relevant consequences on the quality of final products (meat, milk etc.) and ultimately on human health. This happened because of the occurrence of bovine spongiform encephalopathy (BSE) related to the traditional use of offal in animal feed lots as a source of protein.

Functional genomics in *M.truncatula* proved useful in the study and comprehension of many aspects of plant development and plant secondary metabolism that could not be discovered in earlier models such as *Arabidopsis.* The availability of genomics tools in an increasing number of species has the effect of widening the possibility of new discoveries in the field of plant biology. Worth mentioning the recent advances in understanding compound leaf development and zygomorphic flower ontogeny based on the analysis of several mutants in *M.truncatula* .

Living organisms, and among them plants, can be considered as an abundant and diverse set of biofactories with the ability to synthesize an enormous variety of chemical compounds. Legumes contain chemicals that can prove useful for their anti-oxidant, antiviral, anti-microbial, anti-diabetic, anti-allergenic and anti-inflammatory properties (118) . These properties are related to secondary molecules such as flavonoids and saponins.

Modest levels of protoanthocyanidins (PAs) in forages reduce the occurrence of bloat and at the same time promote increased dietary protein nitrogen utilization in ruminant animals (119). The lack of PAs in the leaves of the major forage legume such as alfalfa has prompted studies for the understanding of the molecular and cellular biology of PA polymerization, transport, and storage helped by the functional genomics tools available for *M.truncatula*. Recent positive achievements were obtained by biotechnological strategies based on the overexpression of MYB transcription factors that induced PAs accumulation in both alfalfa and clover leaves (79).

In addition to well-known beneficial properties of flavonoids (cit) recent evidence suggests that flavonoids themselves, particularly fractions rich in PAs, can significantly reduce cognitive deterioration in animal model systems (120-122), and may more generally promote improvements in memory acquisition, consolidation, storage, and retrieval under nondegenerative conditions.

In Chinese medicine one of the oldest herbal medicine was obtained by the roots of the legume plant licorice (*Glychyrriza glabra*).containing the triterpenoid saponin glychyrrizin exhibiting a wide range of pharmacological activities. Cytochrome P450 monooxygenases were proved to be responsible for synthesis of glychyrrizin via oxidative steps based on biochemical experiments (123).

In forage legumes saponins can be toxic to monogastric animals and reduce forage palatability for ruminants. Mutant analysis in *M.truncatula* has unveiled the genetic control of key biosynthetic steps for saponins related to oxidation and glycosilation (49;124), opening possibilities of biotechnological manipulation of saponins in alfalfa.

Both human and animal nutritional science are bound to profit from plant genetic analysis and nutritional genomics, opening possibilities to more personalized approaches to medicine and improvement of the quality of life.

#### **Author details**

144 Functional Genomics

investigated and we expect that many more results will be published in the next years. A better understanding of nitrogen fixation and symbiosis is fundamental for the development of a sustainable agriculture aiming at a reduction of inputs and at maintaining soil fertility. Nitrogen (N) is one of the crucial nutrients for all organisms including plants. The doubling of world food production in the past four decades was contributed by a sevenfold increase of N fertilization (115). The anthropogenic N which is mostly lost to air, water and land affects climate, the chemistry of the atmosphere, and the composition and function of terrestrial and aquatic ecosystems (116). Improving the ability of plants to exploit environmental nitrogen would decrease N fertilization and its negative consequences; therefore a deep understanding of legume symbiosis with nitrogen fixing bacteria could help the long term goal of transferring the associative ability of legume species to nonsymbiotic crops of agronomic relevance. As a consequence functional genomics of nodulation will have an impact on reduction of intensive agriculture practices with benefits

Another positive role for legumes in an environmental perspective is addressed by species such as *Lotus spp.* that have strong adaptive characteristics making them good candidates for restoration and phytoremediation of degraded environments (117). This happens in the Flooding Pampa (Argentina) where the presence of proteinaceous forages was reestablished by the introduction of *L. tenuis*, being the other legume species reduced by the

Pastures and feedstuff including forage legumes have a higher quality compared to those based only on grasses and provide an important input of protein in animal nutrition. More recently public and scientific debate has reassessed forage legumes importance for the quality of livestock nutrition and welfare has having relevant consequences on the quality of final products (meat, milk etc.) and ultimately on human health. This happened because of the occurrence of bovine spongiform encephalopathy (BSE) related to the traditional use of

Functional genomics in *M.truncatula* proved useful in the study and comprehension of many aspects of plant development and plant secondary metabolism that could not be discovered in earlier models such as *Arabidopsis.* The availability of genomics tools in an increasing number of species has the effect of widening the possibility of new discoveries in the field of plant biology. Worth mentioning the recent advances in understanding compound leaf development and zygomorphic flower ontogeny based on the analysis of several mutants in *M.truncatula* .

Living organisms, and among them plants, can be considered as an abundant and diverse set of biofactories with the ability to synthesize an enormous variety of chemical compounds. Legumes contain chemicals that can prove useful for their anti-oxidant, antiviral, anti-microbial, anti-diabetic, anti-allergenic and anti-inflammatory properties (118) .

Modest levels of protoanthocyanidins (PAs) in forages reduce the occurrence of bloat and at the same time promote increased dietary protein nitrogen utilization in ruminant animals (119). The lack of PAs in the leaves of the major forage legume such as alfalfa has prompted

These properties are related to secondary molecules such as flavonoids and saponins.

for the preservation of environment and quality of human activities.

harsh environmental condition.

offal in animal feed lots as a source of protein.

Francesco Panara and Ornella Calderini *CNR (National Council of Research) – Institute of Plant Genetics, Perugia, Italy* 

Andrea Porceddu\* *University of Sassari, Italy* 

#### **9. References**


<sup>\*</sup> Corresponding Author

[3] Branca A, Paape TD, Zhou P, Briskine R, Farmer AD, Mudge J, et al. Whole-genome nucleotide diversity, recombination, and linkage disequilibrium in the model legume Medicago truncatula. Proc Natl Acad Sci U S A 2011;108 (42):E864-E870.

*Medicago truncatula* Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability 147

[20] Penmetsa RV. Production and characterization of diverse developmental mutants of

[21] Catoira R, Timmers AC, Maillet F, Galera C, Penmetsa RV, Cook D, et al. The HCL gene of Medicago truncatula controls Rhizobium-induced root hair curling. Development

[22] Nakata PA. Isolation of Medicago truncatula mutants defective in calcium oxalate

[23] Wais RJ, Galera C, Oldroyd G, Catoira R, Penmetsa RV, Cook D, et al. Genetic analysis of calcium spiking responses in nodulation mutants of Medicago truncatula. Proc Natl

[24] Cohn JR, Uhm T, Ramu S, Nam YW, Kim DJ, Penmetsa RV, et al. Differential regulation of a family of apyrase genes from Medicago truncatula. Plant Physiol 2001;125 (4):2104-

[25] McConn MM. Oxalate reduces calcium availability in the pads of the prickly pear cactus through formation of calcium oxalate crystals. J Agric Food Chem 2004;52 (5):1371-4. [26] Oldroyd G. Identification and Characterization of Nodulation-Signaling Pathway 2, a Gene of Medicago truncatula Involved in Nod Factor Signaling. Plant Physiology

[27] Amor BB, Shaw SL, Oldroyd GED, Maillet F, Penmetsa RV, Cook D, et al. The NFP locus of Medicago truncatula controls an early step of Nod factor signal transduction upstream of a rapid calcium flux and root hair deformation. Plant J 2003;34 (4):495-506. [28] Penmetsa RV, Uribe P, Anderson J, Lichtenzveig J, Gish JC, Nam YW, et al. The Medicago truncatula ortholog of Arabidopsis EIN2, sickle, is a negative regulator of

[29] Kuppusamy KT, Endre G, Prabhu R, Penmetsa RV, Veereshlingam H, Cook DR, et al. LIN, a Medicago truncatula gene required for nodule differentiation and persistence of

[30] Veereshlingam H, Haynes JG, Penmetsa RV, Cook DR, Sherrier DJ. nip, a symbiotic Medicago truncatula mutant that forms root nodules with aberrant infection threads

[31] Bright LJ, Liang Y, David, .Mitchell and Harris J. The LATD Gene of Medicago truncatula Is Required for Both Nodule and Root Development. MolecularPlant

[32] Morandi D, Prado E, Sagan M&DGr. Characterisation of new symbiotic Medicago truncatula (Gaertn.) mutants, and phenotypic or genotypic complementary information

[33] Starker CG, Parra-Colmenares AL, Smith L, Mitra RM. Nitrogen fixation mutants of Medicago truncatula fail to support plant and bacterial symbiotic gene expression. Plant

[34] Middleton PH, Jakab J, Penmetsa RV, Starker CG, Doll J, Kalò P, et al. An ERF transcription factor in Medicago truncatula that is essential for Nod factor signal

symbiotic and pathogenic microbial associations. Plant J 2008;55 (4):580-95.

and plant defense-like response. Plant Physiol 2004;136 (3):3692-702.

on previously described mutants. Mycorrhiza 2005;15 (4):283-9.

rhizobial infections. Plant Physiol 2004;136 (3):3682-91.

Microbe Interaction 2005;18(6) :521-432.

transduction. Plant Cell 2007;19 (4):1221-34.

Physiol 2006;140 (2):671-80.

Medicago truncatula. Plant Physiol 2000;123 (4):1387-98.

crystal formation. Plant Physiol 2000;124 (3):1097-104.

Acad Sci U S A 2000;97 (24):13407-12.

2001;128 (9):1507-18.

2003;131(3) :1027-32.

19.


*Medicago truncatula* Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability 147

[20] Penmetsa RV. Production and characterization of diverse developmental mutants of Medicago truncatula. Plant Physiol 2000;123 (4):1387-98.

146 Functional Genomics

Science 1995;111 :63-71.

Breeding 2002;10 (4) :203-15.

Plant J 2008;54 (2):335-47.

2009;7 (5):430-41.

truncatula. Plant J 2003;34 (1):95-106.

1997;3 :275.

[3] Branca A, Paape TD, Zhou P, Briskine R, Farmer AD, Mudge J, et al. Whole-genome nucleotide diversity, recombination, and linkage disequilibrium in the model legume

[4] Paape T, Zhou P, Branca A, Briskine R, Young N and Tiffin P. Fine-Scale Population Recombination Rates, Hotspots, and Correlates of Recombination in the Medicago

[5] Benaben V, Duc G, Lefebvre V. TE7, An Inefficient Symbiotic Mutant of Medicago

[6] Sagan M, Morandi D, Tarenghi E and Duc G.. Selection of nodulation and mycorrhizal mutants in the modelplantMedicagotruncatula (Gaertn.) after γ-raymutagenesis. Plant

[7] Cook DR, VandenBosch K, de Bruijn FJ. Model legumes get the nod. The Plant Cell

[8] Penmetsa R and Cook DR. A Legume Ethylene-Insensitive Mutant Hyperinfected by Its

[9] Scholte M, d'Erfurth I, Rippa S, Mondy S, Cosson V, Durand P, et al. T-DNA tagging in the model legume Medicago truncatula allows efficient gene discovery. Molecular

[10] d'Erfurth I, Cosson V, Eschstruth A, Lucas H, Kondorosi A and Ratet P. Efficient transposition of the Tnt1 tobacco retrotransposon in the model legume Medicago

[11] Wang H. Fast neutron bombardment (FNB) mutagenesis for forward and reverse genetic studies in plants. ed. Global Science Books, Isleworth, UK, pp 629-639; 2006. [12] Tadege M, Ratet P and Mysore K. Insertional mutagenesis: a Swiss Army knife for functional genomics of Medicago truncatula. Trends Plant Sci 2005;10 (5):229-35. [13] Tadege M, Wen J, He J, Tu H, Kwak Y, Eschstruth A, et al. Large-scale insertional mutagenesis using the Tnt1 retrotransposon in the model legume Medicago truncatula.

[14] Porceddu A, Panara F, Calderini O, Molinari L, Taviani P, Lanfaloni L, et al. An Italian functional genomic resource for Medicago truncatula. BMC Res Notes 2008;1 :129. [15] Oldach KH, Peck DM, Cheon JUDY, Williams KJ and Nair R. Identification of a Chemically Induced Point Mutation Mediating Herbicide Tolerance in Annual Medics

[16] Iantcheva A, Chabaud M, Cosson V, Barascud M, Schutz B, Primard-Brisset C, et al. Osmotic shock improves Tnt1 transposition frequency in Medicago truncatula cv

[17] Rogers C, Wen J, Chen R and Oldroyd G. Deletion-Based Reverse Genetics in Medicago

[18] Signor CL, Savois V, Aubert Gg, Verdier J, Nicolas M, Pagny G, et al. Optimizing TILLING populations for reverse genetics in Medicago truncatula. Plant Biotechnol J

[19] Sagan M, deLarambergue H. Genetic analysis of symbiosis mutants in Medicago truncatula. ed. Kluwer Academic Publishers, Dordrecht, The Netherlands; 1998.

Jemalong during in vitro regeneration. Plant Cell Rep 2009;28 (10):1563-72.

Medicago truncatula. Proc Natl Acad Sci U S A 2011;108 (42):E864-E870.

truncatula Genome. Genome Biol Evol 2012;4 (5):726-37.

Rhizobial Symbiont. Science 1997;275 (5299):527-30.

(Medicago spp.). Annals of Botany 2008;101 :997-1005.

truncatula. Plant Physiology 2009;151(3) :1077-86.

truncatula Gaertn. cv Jemalong. Plant Physiol 1995;107 (1):53-62.


[35] Wang H, Chen J, Wen J, Tadege M, Li G, Liu Y, et al. Control of compound leaf development by FLORICAULA/LEAFY ortholog SINGLE LEAFLET1 in Medicago truncatula. Plant Physiol 2008;146 (4):1759-72.

*Medicago truncatula* Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability 149

[49] Carelli M, Biazzi E, Panara F, Tava A, Scaramelli L, Porceddu A, et al. Medicago truncatula CYP716A12 is a multifunctional oxidase involved in the biosynthesis of

[50] Zhou C, Han L, Pislariu C, Nakashima J, Fu C, Jiang Q, et al. From model to crop: functional analysis of a STAY-GREEN gene in the model legume Medicago truncatula and effective use of the gene for alfalfa improvement. Plant Physiol 2011;157 (3):1483-96. [51] Zhou C, Han L, Hou C, Metelli A, Qi L, Tadege M, et al. Developmental analysis of a Medicago truncatula smooth leaf margin1 mutant reveals context-dependent effects on

[52] Tadege M&Mysore K. Tnt1 retrotransposon tagging of STF in Medicago truncatula reveals tight coordination of metabolic, hormonal and developmental signals during

[53] Uppalapati SR, Ishiga Y, Doraiswamy V, Bedair M, Mittal S, Chen J, et al. Loss of abaxial leaf epicuticular wax in Medicago truncatula irg1/palm1 mutants results in reduced spore differentiation of anthracnose and nonhost rust pathogens. Plant Cell

[54] Mitra RM, Gleason CA, Edwards A, Hadfield J, Downie JA, GED GO&SL. A Ca2+/calmodulin-dependent protein kinase required for symbiotic nodule development: Gene identification by transcript-based cloning. Science 2004;101(13 :4701. [55] O'Malley R and Ecker J. Linking genotype to phenotype using the Arabidopsis

[56] Fukai E, Umehara Y, Sato S, Endo M, Kouchi H, Hayashi M, et al. Derepression of the plant Chromovirus LORE1 induces germline transposition in regenerated plants. PLoS

[57] Fukai E, Soyano T, Umehara Y, Nakayama S, Hirakawa H, Tabata S, et al. Establishment of a Lotus japonicus gene tagging population using the exon-targeting

[58] Urbanski DF, Malolepszy A, Stougaard Jand S.Ugerrǿj. Genome-wide LORE1 retrotransposon mutagenesis and high-throughput insertion detection in Lotus

[59] Rakocevic A, Mondy S, Tirichine Ll, Cosson V, Brocard L, Iantcheva A, et al. MERE1, a low-copy-number copia-type retroelement in Medicago truncatula active during tissue

[61] Cheng X, Wen J, Tadege M, Ratet P. Reverse genetics in medicago truncatula using Tnt1

[62] Endre G, Kereszt A, Kevei Zn, Mihacea S, Kalò Pand Kiss G. A receptor kinase gene

[63] Ané JM, Kiss GrB, Riely BK, Penmetsa RV, Oldroyd G, Ayax C, et al. Medicago truncatula DMI1 required for bacterial and fungal symbioses in legumes. Science

hemolytic saponins. Plant Cell 2011;23 (8):3070-81.

compound leaf development. Plant Cell 2011;23 (6):2106-24.

leaf morphogenesis. Mob Genet Elements 2011;1 (4):301-3.

unimutant collection. The Plant Journal 2010;61 (6) :928-40.

endogenous retrotransposon LORE1. Plant J 2012;69 (4):720-30.

[60] Ratet P. Medicago truncatula handbook. ed. Noble Foundation; 2006.

insertion mutants. Methods in molecular biology 2011;678 :179-190.

regulating symbiotic nodule development. Nature 2002;417 (6892):962-6.

2012;24 (1):353-70.

Genet 2010;6 (3):e1000868.

2004;303 (5662):1364-7.

japonicus. Plant J 2012;69 (4):731-41.

culture. Plant Physiol 2009;151 (3):1250-63.


[49] Carelli M, Biazzi E, Panara F, Tava A, Scaramelli L, Porceddu A, et al. Medicago truncatula CYP716A12 is a multifunctional oxidase involved in the biosynthesis of hemolytic saponins. Plant Cell 2011;23 (8):3070-81.

148 Functional Genomics

[35] Wang H, Chen J, Wen J, Tadege M, Li G, Liu Y, et al. Control of compound leaf development by FLORICAULA/LEAFY ortholog SINGLE LEAFLET1 in Medicago

[36] Vernié T, Moreau S, de Billy Fo, Plet J, Combier JP, Rogers C, et al. EFD Is an ERF transcription factor involved in the control of nodule number and differentiation in

[37] Teillet A, Garcia J, de Billy Fo, Gherardi Ml, Huguet T, Barker DG, et al. api, A novel Medicago truncatula symbiotic mutant impaired in nodule primordium invasion. Mol

[38] Arrighi JFo, Godfroy O, de Billy Fo, Saurat O, Jauneau A&GC. The RPG gene of Medicago truncatula controls Rhizobium-directed polar growth during infection. Proc

[39] Morandi D, le Signor C, Gianinazzi-Pearson V and Duc G. A Medicago truncatula mutant hyper-responsive to mycorrhiza and defective for nodulation. Mycorrhiza

[40] Chen J, Yu J, Ge L, Wang H, Berbel A, Liu Y, et al. Control of dissected leaf morphology by a Cys(2)His(2) zinc finger transcription factor in the model legume Medicago

[41] Laffont C, Blanchet S, Lapierre C, Brocard L, Ratet P, Crespi M, et al. The compact root architecture1 gene regulates lignification, flavonoid production, and polar auxin

[42] Vassileva V, Zehirov G, Ugrinova M. Variable leaf epidermal leaf morphology in Tnt1 insertional mutants of the model legume Medicago truncatula LEGUME MEDICAGO

[43] Zhao Q, Gallego-Giraldo L, Wang H, Zeng Y, Ding SY, Chen F&DR. An NAC transcription factor orchestrates multiple features of cell wall development in Medicago

[44] Wang D, Griffitts J, Starker C, Fedorova E, Limpens E, Ivanov S, et al. A nodule-specific protein secretory pathway required for nitrogen-fixing symbiosis. Science 2010;327

[45] Peng J, Yu J, ans Yingqing Guo ans Guangming Li HW, Bai G and and Chen R. Regulation of Compound Leaf Development in Medicago truncatula by Fused

[46] Schnabel E, Journet EP, de Carvalho-Niebel F, Duc G. and Frugoli J. The Medicago truncatula SUNN gene encodes a CLV1-like leucine-rich repeat receptor kinase that

[47] Murray JD. Invasion by invitation: rhizobial infection in legumes. Mol Plant Microbe

[48] Murray JD, Muni RRD, Torres-Jerez I, Tang Y, Allen S, Andriankaja M, et al. Vapyrin, a gene essential for intracellular progression of arbuscular mycorrhizal symbiosis, is also essential for infection by rhizobia in the nodule symbiosis of Medicago truncatula. Plant

Compound Leaf1, a Class M KNOX Gene. The Plant Cell 2011;23 :3929-43.

regulates nodule number and root length. Plant Mol Biol 2005;58 (6):809-22.

truncatula. Plant Physiol 2008;146 (4):1759-72.

Plant Microbe Interact 2008;21 (5):535-46.

Natl Acad Sci U S A 2008;105 (28):9817-22.

truncatula. Plant J 2010;63 (1):100-14.

2009;19 :4635-441.

(5969):1126-9.

Interact 2011;24 (6):631-9.

J 2011;65 (2):244-52.

Medicago truncatula. Plant Cell 2008;20 (10):2696-713.

truncatula. Proc Natl Acad Sci U S A 2010;107 (23):10754-9.

transport in Medicago truncatula. Plant Physiol 2010;153 (4):1597-607.

TRUNCATULA. Biotechnol\&Biotechnol Eq 2010;24(4) :2060-5.


[64] Lèvy J, Bres C, Geurts R, Chalhoub B, Kulikova O, Duc G, et al. A putative Ca2+ and calmodulin-dependent protein kinase required for bacterial and fungal symbioses. Science 2004;303 (5662):1361-4.

*Medicago truncatula* Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability 151

[78] Laurie ÌR, Diwadkar P, Jaudal M, Zhang L, Hecht V, Wen J, et al. The Medicago FLOWERING LOCUS T Homolog, MtFTa1, Is a Key Regulator of Flowering Time. Plant

[79] Verdier J, Zhao J, Torres-Jerez I, Ge S, Liu C, He X, et al. MtPAR MYB transcription factor acts as an on switch for proanthocyanidin biosynthesis in Medicago truncatula.

[80] Hilson Pierre, Small Ian, Kuiper Martin. European consortia building reference resources for Arabidopsis functional genomics. Curr Opin Plant Biol 2012;6:426-9. [81] Ivashuta S, Liu J, Liu J, Lohar DP, Haridas S, Bucciarelli B, et al. RNA interference identifies a calcium-dependent protein kinase involved in Medicago truncatula root

[82] Limpens E, Mirabella R, Fedorova E, Franken C, Franssen H, Bisseling T&GR. Formation of organelle-like N2-fixing symbiosomes in legume root nodules is

[83] Arrighi JF, Barre A, Amor BB, Bersoult A, Soriano LC, Mirabella R, et al. The Medicago truncatula lysin [corrected] motif-receptor-like kinase gene family includes NFP and

[84] Gargantini PR, Gonzalez-Rizzo S, Chinchilla D, Raices M, Giammaria V, Ulloa RM, et al. A CDPK isoform participates in the regulation of nodule number in Medicago

[85] Gonzalez-Rizzo S, Crespi M and Frugier F. The Medicago truncatula CRE1 cytokinin receptor regulates lateral root development and early symbiotic interaction with

[86] Huo X, Schnabel E, Hughes K and Frugoli J. RNAi Phenotypes and the Localization of a Protein::GUS Fusion Imply a Role for Medicago truncatula PIN Genes in Nodulation. J

[87] Wasson AP, Pellerone FI. Silencing the flavonoid pathway in Medicago truncatula inhibits root nodule formation and prevents auxin transport regulation by rhizobia.

[88] Colditz F, Niehaus K and Krajinski F. Silencing of PR-10-like proteins in Medicago truncatula results in an antagonistic induction of other PR proteins and in an increased tolerance upon infection with the oomycete Aphanomyces euteiches. Planta 2007;226

[89] Kevei Zn, Lougnon G, Mergaert P, Horvàth GbV, Kereszt A, Jayaraman D, et al. 3 hydroxy-3-methylglutaryl coenzyme a reductase 1 interacts with NORK and is crucial

[90] Messinese E, Mun JH, Yeun LH, Jayaraman D, Rougé P, Barre A, et al. A novel nuclear protein interacts with the symbiotic DMI3 calcium- and calmodulin-dependent protein

kinase of Medicago truncatula. Mol Plant Microbe Interact 2007;20 (8):912-21. [91] Rosnoblet C, Aubry C, Leprince O, Vu BL, Rogniaux H and Buitink J. The regulatory gamma subunit SNF4b of the sucrose non-fermenting-related kinase complex is involved in longevity and stachyose accumulation during maturation of Medicago

for nodulation in Medicago truncatula. Plant Cell 2007;19 (12):3974-89.

controlled by DMI2. Proc Natl Acad Sci U S A 2005;102 (29):10375-80.

new nodule-expressed genes. Plant Physiol 2006;142 (1):265-79.

Sinorhizobium meliloti. Plant Cell 2006;18 (10):2680-93.

Physiology 2011;156 :2207-24.

Proc Natl Acad Sci U S A 2012;109 (5):1766-71.

development. Plant Cell 2005;17 (11):2911-21.

truncatula. Plant J 2006;48 (6):843-56.

Plant Growth Regul 2006;25 (2):156-65.

truncatula seeds. Plant J 2007;51 (1):47-59.

Plant Cell 2006;18 (7):1617-29.

(1):57-71.


*Medicago truncatula* Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability 151

[78] Laurie ÌR, Diwadkar P, Jaudal M, Zhang L, Hecht V, Wen J, et al. The Medicago FLOWERING LOCUS T Homolog, MtFTa1, Is a Key Regulator of Flowering Time. Plant Physiology 2011;156 :2207-24.

150 Functional Genomics

Science 2004;303 (5662):1361-4.

Science 2005;308 (5729):1786-9.

Science 2005;308 (5729):1789-91.

U S A 2007;104 (5):1720-5.

Plant Cell 2009;21 (8):2323-40.

Acad Sci U S A 2010;107 (5):2343-8.

Physiol 2011;157 (1):328-40.

2009;151 (3):1239-49.

[64] Lèvy J, Bres C, Geurts R, Chalhoub B, Kulikova O, Duc G, et al. A putative Ca2+ and calmodulin-dependent protein kinase required for bacterial and fungal symbioses.

[65] Kalò P, Gleason C, Edwards A, Marsh J, Mitra RM, Hirsch S, et al. Nodulation signaling in legumes requires NSP2, a member of the GRAS family of transcriptional regulators.

[66] Smit P, Raedts J, Portyanko V, Debellé Fdr, Gough C, Bisseling T and Geurts R. NSP1 of the GRAS protein family is essential for rhizobial Nod factor-induced transcription.

[67] Benlloch R, d'Erfurth I, Ferrandiz C, Cosson V, Beltràn JP, Canas LA, et al. Isolation of mtpim proves Tnt1 a useful reverse genetics tool in Medicago truncatula and uncovers

[69] Kiss E, Olàh Br, Kalò P, Morales M, Heckmann AB, Borbola A, et al. LIN, a novel type of U-box/WD40 protein, controls early infection by rhizobia in legumes. Plant Physiol

[70] de Lorenzo L, Merchan F, Laporte P, Thompson R, Clarke J, Sousa C and Crespi M. A novel plant leucine-rich repeat receptor kinase regulates the response of Medicago

[71] Zhao J and Dixon R. MATE transporters facilitate vacuolar uptake of epicatechin 3'-Oglucoside for proanthocyanidin biosynthesis in Medicago truncatula and Arabidopsis.

[72] Peel GJ, Pang Y, Modolo LV. The LAP1 MYB transcription factor orchestrates anthocyanidin biosynthesis and glycosylation in Medicago. Plant J 2009;59 (1):136-49. [73] Benlloch R, Roque En, Ferràndiz C, Cosson V, Caballero T, Penmetsa RV, et al. Analysis of B function in legumes: PISTILLATA proteins do not require the PI motif for floral

[74] Lefebvre B, Timmers T, Mbengue M, Moreau S, Hervé C, Tòth K, et al. A remorin protein interacts with symbiotic receptors and regulates bacterial infection. Proc Natl

[75] Zhou R, Jackson L, Shadle G, Nakashima J, Temple S, Chen F. Distinct cinnamoyl CoA reductases involved in parallel routes to lignin in Medicago truncatula. Proceedings of the National Academy of Sciences National Acad Sciences; 2010;107(41) :17803-17808. [76] Naoumkina M and Dixon R. Characterization of the mannan synthase promoter from

[77] Schnabel EL, Kassaw TK, Smith LS, Marsh JF, Oldroyd GE, Long SR. The ROOT DETERMINED NODULATION1 gene regulates nodule number in roots of Medicago truncatula and defines a highly conserved, uncharacterized plant gene family. Plant

organ development in Medicago truncatula. Plant J 2009;60 (1):102-11.

guar (Cyamopsis tetragonoloba). Plant Cell Rep 2011;30 (6):997-1006.

truncatula roots to salt stress. Plant Cell 2009;21 (2):668-80.

new aspects of AP1-like functions in legumes. Plant Physiol 2006;142 (3):972-83. [68] Javot H, Penmetsa RV, Terzaghi N, Cook DR. A Medicago truncatula phosphate transporter indispensable for the arbuscular mycorrhizal symbiosis. Proc Natl Acad Sci


[92] Wan X, Hontelez J, Lillo A, Guarnerio C, van de Peut D, Fedorova E, et al. Medicago truncatula ENOD40-1 and ENOD40-2 are both involved in nodule initiation and bacteroid development. J Exp Bot 2007;58 (8):2033-41.

*Medicago truncatula* Functional Genomics – An Invaluable Resource for Studies on Agriculture Sustainability 153

[106] Gimeno-Gilles C, Gervais ML, Planchet E, Satour P, Limami AM. A stress-associated protein containing A20/AN1 zing-finger domains expressed in Medicago truncatula

[107] Horchani F, Prèvot M, Boscari A, Evangelisti E, Meilhoc E, Bruand C, et al. Both plant and bacterial nitrate reductases contribute to nitric oxide production in Medicago

[108] Pauly N, Ferrari C, Andrio E, Marino D, Piardi Sp, Brouquisse R, et al. MtNOA1/RIF1 modulates Medicago truncatula-Sinorhizobium meliloti nodule development without

[109] Riely BK, He H, Venkateshwaran M, Sarma B, Schraiber J, Anè J-M&CD. Identification of legume RopGEF gene families and characterization of a Medicago truncatula

[110] Kiirika LM, Bergmann HF, Schikowsky C, Wimmer D, Korte J, Schmitz U, et al. Silencing of the Rac1 GTPase MtROP9 in Medicago truncatula Stimulates Early Mycorrhizal and Oomycete Root Colonizations But Negatively Affects Rhizobial

[111] de Zélicourt A, Diet A, Marion J, Laffont C, Ariel F, Moison Ml, et al. Dual involvement of a Medicago truncatula NAC transcription factor in root abiotic stress

[112] Senthil-Kumar M and Mysore K. New dimensions for VIGS in plant functional

[113] Grǿnlund M, Constantin G, Piednoir E, Kovacev J, Johansen IE. Virus-induced gene silencing in Medicago truncatula and Lathyrus odorata. Virus Res 2008;135 (2):345-9. [114] Vàrallyay E, Lichner Z, Sáfrány J, Havelda Z, Salamon P, Bisztray G&Bn. Development of a virus induced gene silencing vector from a legumes infecting

[115] Ollivier J, Töwe S, Bannert A, Hai B, Kastl EM, Meyer A, et al. Nitrogen turnover in

[116] Galloway JN, Townsend AR, Erisman JW, Bekunda M, Cai Z, Freney JR, et al. Transformation of the nitrogen cycle: recent trends, questions, and potential solutions.

[117] Escaray FJ, Menendez AB, Gárriz A, Pieckenstain FL, Estrella MJ, Castagno LN, et al. Ecological and agronomic importance of the plant genus Lotus. Its application in grassland sustainability and the amelioration of constrained and contaminated soils.

[119] Dixon RA. Flavonoids and isoflavonoids: from plant biology to agriculture and

[120] Ho L, Chen LH, Wang J, Zhao W, Talcott ST, Ono K, et al. Heterogeneity in red wine polyphenolic contents differentially influences Alzheimer's disease-type

neuropathology and cognitive deterioration. J Alzheimers Dis 2009;16 (1):59-72.

[118] Howieson JG. Nitrogen-fixing leguminous symbioses. %P ed. Springer; 2008.

truncatula nitrogen-fixing nodules. Plant Physiol 2011;155 (2):1023-36.

RopGEF mediating polar growth of root hairs. Plant J 2011;65 (2):230-43.

response and symbiotic nodule senescence. Plant J 2012;70 (2):220-30.

affecting its nitric oxide content. J Exp Bot 2011;62 (3):939-48.

seeds. Plant Physiol Biochem 2011;49 (3):303-10.

Infection. Plant Physiol 2012;159 (1):501-16.

genomics. Trends Plant Sci 2011;16 (12):656-65.

tobamovirus. Acta Biol Hung 2010;61 (4):457-69.

neuroscience. Plant Physiol 2010;154 (2):453-7.

Science 2008;320 (5878):889-92.

Plant Sci 2012;182 :121-33.

soil and global change. FEMS Microbiol Ecol 2011;78 (1):3-16.


[106] Gimeno-Gilles C, Gervais ML, Planchet E, Satour P, Limami AM. A stress-associated protein containing A20/AN1 zing-finger domains expressed in Medicago truncatula seeds. Plant Physiol Biochem 2011;49 (3):303-10.

152 Functional Genomics

(2):741-51.

(3):716-33.

(12):1577-87.

[92] Wan X, Hontelez J, Lillo A, Guarnerio C, van de Peut D, Fedorova E, et al. Medicago truncatula ENOD40-1 and ENOD40-2 are both involved in nodule initiation and

[93] Zhang J, Subramanian S, Zhang Y&Yu O. Flavone synthases from Medicago truncatula are flavanone-2-hydroxylases and are important for nodulation. Plant Physiol 2007;144

[94] Floss DS, Hause B, Lange PR, Kùster H, Strack D and Walter M. Knock-down of the MEP pathway isogene 1-deoxy-D-xylulose 5-phosphate synthase 2 inhibits formation of arbuscular mycorrhiza-induced apocarotenoids, and abolishes normal expression of

[95] Floss DS, Schliemann W, Schmidt JÃ, Strack D and Walter M. RNA interferencemediated repression of MtCCD1 in mycorrhizal roots of Medicago truncatula causes accumulation of C27 apocarotenoids, shedding light on the functional role of CCD1.

[96] Chen SK, Kurdyukov S, Kereszt A, Wang XD, Gresshoff PM. The association of homeobox gene expression with stem cell formation and morphogenesis in cultured

[97] Haney CH. Plant flotillins are required for infection by nitrogen-fixing bacteria. Proc

[98] Kuhn H, Kùster H&RN. Membrane steroid-binding protein 1 induced by a diffusible fungal signal is critical for mycorrhization in Medicago truncatula. New Phytol 2010;185

[99] Kuppusamy KT, Ivashuta S, Bucciarelli B, Vance CP, Gantt JS. Knockdown of CELL DIVISION CYCLE16 reveals an inverse relationship between lateral root and nodule numbers and a link to auxin in Medicago truncatula. Plant Physiol 2009;151 (3):1155-66. [100] Peleg-Grossman S, Golani Y, Kaye Y, Melamed-Book N and Levine A. NPR1 protein regulates pathogenic and symbiotic interactions between Rhizobium and legumes and

[101] Laporte P, Satiat-Jeunemaìtre B, Velasco I, Csorba T, Van de Velde W, Campalans A, et al. A novel RNA-binding peptide regulates the establishment of the Medicago truncatula-Sinorhizobium meliloti nitrogen-fixing symbiosis. Plant J 2010;62 (1):24-38. [102] Pii Y, Astegno A, Peroni E, Zaccardelli M, Pandolfini T and Crimi M. The Medicago truncatula N5 gene encoding a root-specific lipid transfer protein is required for the symbiotic interaction with Sinorhizobium meliloti. Mol Plant Microbe Interact 2009;22

[103] Pumplin N, Mondo SJ, Topp S, Starker CG, Gantt JS. Medicago truncatula Vapyrin is a novel protein required for arbuscular mycorrhizal symbiosis. Plant J 2010;61 (3):482-94. [104] Zdyb A, Demchenko K, Heumann J, Mrosk C, Grzeganek P, Gòbel C, et al. Jasmonate biosynthesis in legume and actinorhizal nodules. New Phytol 2011;189 (2):568-79. [105] Msehli SE, Lambert A, Baldacci-Cresp F, Hopkins J, Boncompagni E, Smiti SA, et al. Crucial role of (homo)glutathione in nitrogen fixation in Medicago truncatula nodules.

bacteroid development. J Exp Bot 2007;58 (8):2033-41.

Plant Physiol 2008;148 (3):1267-82.

Medicago truncatula. Planta 2009;230 (4):827-40.

Natl Acad Sci U S A 2010;107 (1):478-83.

non-legumes. PLoS One 2009;4 (12):e8399.

New Phytol 2011;192 (2):496-506.

mycorrhiza-specific plant marker genes. Plant J 2008;56 (1):86-100.


[121] Pasinetti GM, Zhao Z, Qin W, Ho L, Shrishailam Y, Macgrogan D, et al. Caloric intake and Alzheimer's disease. Experimental approaches and therapeutic implications. Interdiscip Top Gerontol 2007;35 :159-75.

**Chapter 8** 

© 2012 Pathak and Ali, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2012 Pathak and Ali, licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**Repetitive DNA: A Tool to Explore** 

**Animal Genomes/Transcriptomes** 

The analysis of genetic diversity and relatedness within and between the different species and populations has been a major theme of research for many biologists. With the availability of whole-genome sequencing for an increasing number of species, focus has been shifted to the development of molecular markers based on DNA or protein polymorphism. DNA sequences originate and undergo evolutionary metamorphoses' and thus may be used as powerful genetic markers to characterize genomes of wide range of species. This type of analysis is called fingerprinting, profiling or genotyping. DNA profiling based on typing individuals using highly variable minisatellites in the human genome was first developed by Jeffreys et al (1985). He demonstrated short repeat sequences tandemly arranged within the gene(s) and each organism has a unique pattern of the arrangement of these minisatellites, the only exception being multiple individuals from a single zygote (e.g. identical twins). DNA fingerprinting technique was notably used to help solve crimes and determine paternity. In addition, with the advances in Molecular biology techniques, isolation of genes tagged with minisatellites has become the most powerful tool

The term "repetitive sequences" (repeats, DNA repeats, repetitive DNA) refers to DNA fragments that are present in multiple copies in the genome. These sequences exhibit a high degree of polymorphism due to variation in the number of their repeat units caused by mutations involving several mechanisms (Tautz, 1989). This hypervariability among related and unrelated organisms makes them excellent markers for mapping, characterization of the genomes, genotype phenotype correlation, marker assisted selection of the crop plants, molecular ecology and diversity related studies. The nature of repeats provides ample working flexibilities over the other marker systems. This is because: (i) short tandem repetitive (STR) sequences are evenly distributed all over the genome (ii), are often

Additional information is available at the end of the chapter

Deepali Pathak and Sher Ali

http://dx.doi.org/10.5772/48259

**1. Introduction** 

for genome analysis.


**Chapter 8** 

## **Repetitive DNA: A Tool to Explore Animal Genomes/Transcriptomes**

Deepali Pathak and Sher Ali

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/48259

#### **1. Introduction**

154 Functional Genomics

(15):5144-50.

Interdiscip Top Gerontol 2007;35 :159-75.

[121] Pasinetti GM, Zhao Z, Qin W, Ho L, Shrishailam Y, Macgrogan D, et al. Caloric intake and Alzheimer's disease. Experimental approaches and therapeutic implications.

[122] Wang J, Ferruzzi MG, Ho L, Blount J, Janle EM, Gong B, et al. Brain-targeted proanthocyanidin metabolites for Alzheimer's disease treatment. J Neurosci 2012;32

[123] Seki A, Satoru S; Kiyoshi O; Masaharu M; Toshiyuki O; Hiroshi M; et al.. Triterpene functional genomics in licorice for identification of CYP72A154 involved in the

[124] Naoumkina MA, Modolo LV, Huhman DV, Urbanczyk-Wochniak E, Tang Y, Sumner LW. Genomic and coexpression analyses predict multiple genes involved in triterpene

saponin biosynthesis in Medicago truncatula. Plant Cell 2010;22 (3):850-66.

biosynthesis of glycyrrhizin. Plant Cell 2011, 23(6) 4112-4123.

The analysis of genetic diversity and relatedness within and between the different species and populations has been a major theme of research for many biologists. With the availability of whole-genome sequencing for an increasing number of species, focus has been shifted to the development of molecular markers based on DNA or protein polymorphism. DNA sequences originate and undergo evolutionary metamorphoses' and thus may be used as powerful genetic markers to characterize genomes of wide range of species. This type of analysis is called fingerprinting, profiling or genotyping. DNA profiling based on typing individuals using highly variable minisatellites in the human genome was first developed by Jeffreys et al (1985). He demonstrated short repeat sequences tandemly arranged within the gene(s) and each organism has a unique pattern of the arrangement of these minisatellites, the only exception being multiple individuals from a single zygote (e.g. identical twins). DNA fingerprinting technique was notably used to help solve crimes and determine paternity. In addition, with the advances in Molecular biology techniques, isolation of genes tagged with minisatellites has become the most powerful tool for genome analysis.

The term "repetitive sequences" (repeats, DNA repeats, repetitive DNA) refers to DNA fragments that are present in multiple copies in the genome. These sequences exhibit a high degree of polymorphism due to variation in the number of their repeat units caused by mutations involving several mechanisms (Tautz, 1989). This hypervariability among related and unrelated organisms makes them excellent markers for mapping, characterization of the genomes, genotype phenotype correlation, marker assisted selection of the crop plants, molecular ecology and diversity related studies. The nature of repeats provides ample working flexibilities over the other marker systems. This is because: (i) short tandem repetitive (STR) sequences are evenly distributed all over the genome (ii), are often

© 2012 Pathak and Ali, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 Pathak and Ali, licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

conserved between closely related species (iii) and are co-dominant. With these innate attributes, very small quantities of DNA can be used for simultaneous detection of the alleles tagged with STR employing minisatellite associated sequence amplification (MASA).

Repetitive DNA: A Tool to Explore Animal Genomes/Transcriptomes 157

2002). The later includes satellite, minisatellites and microsatellites. Satellite DNAs are predominantly associated with centromeric heterochromatin and the same is being increasingly utilized as a versatile tool for genome analysis, genetic mapping and for understanding chromosomal organization. On the other hand minisatellite and microsatellites are dispersed throughout the genome and are highly polymorphic in all populations studied. This arrangement has led to their extensive use as genetic markers for fingerprinting, genotyping, and for forensic analysis in human system. Based on their

These are are short sequences (5 to10 bp) amounting 10% of the genome and repeated a number of times, usually occurring as tandem repeats (present in approximately 106 copies per haploid genome). However, they are not interspersed with different non-repetitive sequences. Usually, the sequence of each repeating unit is conserved. Most of the sequences in this class are located in the heterochromatin regions of the centromeres or telomeres of the chromosomes. Highly repetitive sequences interacting with specific proteins are

These are represented by monomer sequences, usually less than 2000-bp long, tandemly reiterated up to 105 copies per haploid animals and located in the pericentromeric and or telomeric heterochromatic regions (Charlesworth et al 1994). Satellite DNA constitutes from 1 to 65% of the total DNA of numerous organisms, including that of animals, plants, and prokaryotes. The term "satellite" in the genetic sense was first coined by the Russian cytologist Sergius Navashin, in 1912, initially in Russian ("sputnik") and Latin (*satelle*), and was later translated to "satellite" (Battaglia, 1999). The more familiar usage of "satellite" relates to a small band of DNA with a density different (usually lower, because of a high AT-content) from the bulk of the genomic DNA, which are separated from the main band following CsCl centrifugation (Kit, 1961). Nucleotide changes and copy number variations fuel the process of their evolution within and across the species (Ugarkovic and Plohl, 2002). Satellite fraction(s), though not conserved evolutionarily (Bhatnagar et al 2004; Amor and Choo, 2002), are unique to a species and usually show similarity amongst related group of

These include short (150 to 300-bp) sequences or long ones (5-kbp) amounting about 40% and 1-2% of the total genome, respectively. These are dispersed throughout the euchromatin having 103-105 copies per haploid genome. These sequences are involved in the regulation of gene expression. In some cases, long dispersed repeats of 300 to 600-bp show homology

arrangements, repetitive DNA sequences are classed into two types **(**Figure 1).

involved in organizing chromosome pairing during meiosis and recombination.

animals (Pathak et al 2011; Henikoff et al 2001; Ali and Gangadharan, 2000).

**2.2. Moderately or dispersed repetitive sequences** 

**2.1. Highly repetitive sequences** 

*2.1.1. Satellite DNA* 

with the retro viruses.

Current data base has the information on the genomes of various livestock species, like cattle, sheep, goat, pig, horse, chicken, Silkworm, Honey Bee, Rabbit, Dog, cat and duck (Georges and Andersson, 1996). However complete sequence analysis of several important species such as Yak, Banteng, Zebu, Donkey, Goose, Turkey, Camel and Water buffalo are still underway. Efforts are required to characterize genes controlling important traits in order to produce genetically healthy breeds and segregate superior germplasm wherever possible. Water buffalo, *Bubalus bubalis* is important domestic animal worldwide having immense potential in agriculture, dairy and meat industries. We have studied several repeat loci in buffalo genome using Restriction Fragment Length Polymorphism and characterized a number of important genes employing Minisatellite Associated Sequence Amplification (MASA).

MASA forms a rich basis of functional and comparative genomics contributing towards the understanding of genome organization, gene expression and development of molecular synteny. This approach also enables characterization of the same genes across the individuals within a species and amongst the individuals between the species. Thus, information about the organization of gene, its expressional, mutational and phylogenetic status, chromosomal location and genetic variations across the genomes maximize the chances of narrowing the search of possible genetic markers.

In this chapter, we discuss overall organization of the repetitive sequences, their origin, distribution, application in genome analysis and implications. In addition, use of repetitive sequences in bubaline genome mining is highlighted elucidating the potential of functional and comparative genomics. Thus, organizational variation and expressional profile of a single gene originating from a specific tissue may be studied in many ways to meet the varying requirements of biology.

#### **2. Organization of repetitive sequences**

The mammals have approximately 3 billion base pairs per haploid genome harboring about 20,000-25000 genes. A minor part of the genome (5-10%) is coding sequences (International Human Genome Sequencing Consortium, 2004; Hochgeschwender and Brennan, 1991) and the remaining part is non-coding representing repetitive DNA (Bromham, 2002). Comparison of the genome size of different eukaryotes shows that the amount of noncoding DNA is highly variable and constitutes 30% to about 99% of the total genome (Elgar and Vavouri, 2008; Cavalier-Smith, 1985). The non-coding repetitive sequences are dynamic elements, which reshape their host's genome by generating rearrangements, shuffling of genes and modulating pattern of expression. This dynamism of repeats leads to evolutionary divergence that can be used in species identification, phylogenetic inference and for studying process of sporadic mutations and natural selection. These repetitive sequences are mainly composed of interspersed and tandem repeats (Slamovits and Rossi, 2002). The later includes satellite, minisatellites and microsatellites. Satellite DNAs are predominantly associated with centromeric heterochromatin and the same is being increasingly utilized as a versatile tool for genome analysis, genetic mapping and for understanding chromosomal organization. On the other hand minisatellite and microsatellites are dispersed throughout the genome and are highly polymorphic in all populations studied. This arrangement has led to their extensive use as genetic markers for fingerprinting, genotyping, and for forensic analysis in human system. Based on their arrangements, repetitive DNA sequences are classed into two types **(**Figure 1).

#### **2.1. Highly repetitive sequences**

These are are short sequences (5 to10 bp) amounting 10% of the genome and repeated a number of times, usually occurring as tandem repeats (present in approximately 106 copies per haploid genome). However, they are not interspersed with different non-repetitive sequences. Usually, the sequence of each repeating unit is conserved. Most of the sequences in this class are located in the heterochromatin regions of the centromeres or telomeres of the chromosomes. Highly repetitive sequences interacting with specific proteins are involved in organizing chromosome pairing during meiosis and recombination.

#### *2.1.1. Satellite DNA*

156 Functional Genomics

(MASA).

conserved between closely related species (iii) and are co-dominant. With these innate attributes, very small quantities of DNA can be used for simultaneous detection of the alleles tagged with STR employing minisatellite associated sequence amplification (MASA). Current data base has the information on the genomes of various livestock species, like cattle, sheep, goat, pig, horse, chicken, Silkworm, Honey Bee, Rabbit, Dog, cat and duck (Georges and Andersson, 1996). However complete sequence analysis of several important species such as Yak, Banteng, Zebu, Donkey, Goose, Turkey, Camel and Water buffalo are still underway. Efforts are required to characterize genes controlling important traits in order to produce genetically healthy breeds and segregate superior germplasm wherever possible. Water buffalo, *Bubalus bubalis* is important domestic animal worldwide having immense potential in agriculture, dairy and meat industries. We have studied several repeat loci in buffalo genome using Restriction Fragment Length Polymorphism and characterized a number of important genes employing Minisatellite Associated Sequence Amplification

MASA forms a rich basis of functional and comparative genomics contributing towards the understanding of genome organization, gene expression and development of molecular synteny. This approach also enables characterization of the same genes across the individuals within a species and amongst the individuals between the species. Thus, information about the organization of gene, its expressional, mutational and phylogenetic status, chromosomal location and genetic variations across the genomes maximize the

In this chapter, we discuss overall organization of the repetitive sequences, their origin, distribution, application in genome analysis and implications. In addition, use of repetitive sequences in bubaline genome mining is highlighted elucidating the potential of functional and comparative genomics. Thus, organizational variation and expressional profile of a single gene originating from a specific tissue may be studied in many ways to meet the

The mammals have approximately 3 billion base pairs per haploid genome harboring about 20,000-25000 genes. A minor part of the genome (5-10%) is coding sequences (International Human Genome Sequencing Consortium, 2004; Hochgeschwender and Brennan, 1991) and the remaining part is non-coding representing repetitive DNA (Bromham, 2002). Comparison of the genome size of different eukaryotes shows that the amount of noncoding DNA is highly variable and constitutes 30% to about 99% of the total genome (Elgar and Vavouri, 2008; Cavalier-Smith, 1985). The non-coding repetitive sequences are dynamic elements, which reshape their host's genome by generating rearrangements, shuffling of genes and modulating pattern of expression. This dynamism of repeats leads to evolutionary divergence that can be used in species identification, phylogenetic inference and for studying process of sporadic mutations and natural selection. These repetitive sequences are mainly composed of interspersed and tandem repeats (Slamovits and Rossi,

chances of narrowing the search of possible genetic markers.

varying requirements of biology.

**2. Organization of repetitive sequences** 

These are represented by monomer sequences, usually less than 2000-bp long, tandemly reiterated up to 105 copies per haploid animals and located in the pericentromeric and or telomeric heterochromatic regions (Charlesworth et al 1994). Satellite DNA constitutes from 1 to 65% of the total DNA of numerous organisms, including that of animals, plants, and prokaryotes. The term "satellite" in the genetic sense was first coined by the Russian cytologist Sergius Navashin, in 1912, initially in Russian ("sputnik") and Latin (*satelle*), and was later translated to "satellite" (Battaglia, 1999). The more familiar usage of "satellite" relates to a small band of DNA with a density different (usually lower, because of a high AT-content) from the bulk of the genomic DNA, which are separated from the main band following CsCl centrifugation (Kit, 1961). Nucleotide changes and copy number variations fuel the process of their evolution within and across the species (Ugarkovic and Plohl, 2002). Satellite fraction(s), though not conserved evolutionarily (Bhatnagar et al 2004; Amor and Choo, 2002), are unique to a species and usually show similarity amongst related group of animals (Pathak et al 2011; Henikoff et al 2001; Ali and Gangadharan, 2000).

#### **2.2. Moderately or dispersed repetitive sequences**

These include short (150 to 300-bp) sequences or long ones (5-kbp) amounting about 40% and 1-2% of the total genome, respectively. These are dispersed throughout the euchromatin having 103-105 copies per haploid genome. These sequences are involved in the regulation of gene expression. In some cases, long dispersed repeats of 300 to 600-bp show homology with the retro viruses.

Repetitive DNA: A Tool to Explore Animal Genomes/Transcriptomes 159

Retrotransposons are the biggest class of the transposons. An important characteristic of this type of transposable element is that they usually contain sequences with potential regulatory activity. They have features of non-vertebrate eukaryotic genomes (i.e. plants, fungi, invertebrates and microbial eukaryotes). These elements code for mRNA molecule which is processed and polyadenylated. Retrotransposons have very high copy numbers. In

LINEs are several thousand base pairs in size and make up about 17% of the total human genome (Richard and Batzer, 2009). They contain reverse-transcriptase-like gene involved in retrotransposition process. Many LINEs also code for an endonuclease (e.g. RNase H). The most abundant LINE family is the 7-kbp, L1 repeat element having >500,000 copies and accounts for approximately 15% of the human genome (Lander et al 2001). Despite its abundance, no function of LINE 1 repeat is yet known. Initial studies on mouse have associated L1s in shaping the structure and expression of the transcriptomes (Han et al 2004;

SINEs are small elements, usually 100 to 500-bp in length, accounting for 11% of the human genome (Richard and Batzer 2009). SINEs do not have reverse transcriptase gene, instead they borrow reverse transcriptase enzymes from other retroelements. Well-known example of SINE in the human genome is *Alu* sequences (Capy et al 1998), which are 350 base pairs long, do not

DNA transposons do not require RNA intermediate and transpose in a direct DNA-to-DNA manner. In eukaryotes, DNA transposons are less common than retrotransposons, but they have a special place in genetics because a family of plant DNA transposons - the Ac/Ds elements of maize. There are two types of DNA transposons that both require enzymes

Tandem repeats consists of repeat arrays of two to several thousand-sequence units arranged in a head to tail fashion. Tandem repeats may be further classified according to the

These are characterized by tandemly repeated DNA in which the repeat unit is approximately 50-400 times, producing blocks that can be hundreds of kilobases long. Some mega satellites are composed of coding repeats. For example: RNA genes, and the

length and copy number of the basic repeat units as well as its genomic localization.

contain any coding sequences, and have over 1 million copies (Roy-Engel et al 2001)

maize, these elements occupy half of the genome.

LINEs (Long Interspersed Nuclear Elements)

SINEs (Short Interspersed Nuclear Elements)

coded by genes within the transposons.

deubiquitinating enzyme gene USP17.

*2.2.1.1.2. Non LTR elements* 

Han and Boeke, 2005).

*2.2.1.2. DNA Transposons* 

*2.2.2. Tandem repeats* 

*2.2.2.1. Mega satellite DNA* 

**Figure 1.** Schematic diagram showing biological categories of the different repetitive sequences.

On the basis of their mode of amplification, repetitive DNA sequences may be tandemly arranged or interspersed in the genome (Slamovits and Rossi, 2002).

#### *2.2.1. Interspersed repetitive DNA*

Interspersed repeat sequences scattered throughout the genome have arisen by transposition, having "ability to jump from one place to another in the genome" (Miller and Capy, 2004; Brown, 2002). Even though the individual units of interspersed repetitive noncoding DNA are not clustered, taken together they account for approximately 45% of the human genome. By the mechanism of their transposition, interspersed repeats are classified into two classes:

#### *2.2.1.1. RNA transposons*

RNA transposons also known as retroelements found in eukaryotic genome require reverse transcription for their activity. Based on their structural relationship, RNA transposons are divided into two general categories:

#### *2.2.1.1.1. LTR elements*

LTR includes retroviruses whose genomes are made up of RNA. They infect different types of vertebrates.

#### Endogenous retroviruses (ERVs)

These are retroviruses integrated into the vertebrate chromosomes and inherited from generation to generation as part of the host genome. Some are still active and might, at some stage in a cell's lifetime, direct synthesis of the exogenous viruses. However, majority of them are decayed relics and no longer have the capacity to form viruses (Patience et al 1997). Retrotransposons are the biggest class of the transposons. An important characteristic of this type of transposable element is that they usually contain sequences with potential regulatory activity. They have features of non-vertebrate eukaryotic genomes (i.e. plants, fungi, invertebrates and microbial eukaryotes). These elements code for mRNA molecule which is processed and polyadenylated. Retrotransposons have very high copy numbers. In maize, these elements occupy half of the genome.

#### *2.2.1.1.2. Non LTR elements*

158 Functional Genomics

**Figure 1.** Schematic diagram showing biological categories of the different repetitive sequences.

arranged or interspersed in the genome (Slamovits and Rossi, 2002).

*2.2.1. Interspersed repetitive DNA* 

divided into two general categories:

Endogenous retroviruses (ERVs)

into two classes:

*2.2.1.1. RNA transposons* 

*2.2.1.1.1. LTR elements* 

of vertebrates.

On the basis of their mode of amplification, repetitive DNA sequences may be tandemly

Interspersed repeat sequences scattered throughout the genome have arisen by transposition, having "ability to jump from one place to another in the genome" (Miller and Capy, 2004; Brown, 2002). Even though the individual units of interspersed repetitive noncoding DNA are not clustered, taken together they account for approximately 45% of the human genome. By the mechanism of their transposition, interspersed repeats are classified

RNA transposons also known as retroelements found in eukaryotic genome require reverse transcription for their activity. Based on their structural relationship, RNA transposons are

LTR includes retroviruses whose genomes are made up of RNA. They infect different types

These are retroviruses integrated into the vertebrate chromosomes and inherited from generation to generation as part of the host genome. Some are still active and might, at some stage in a cell's lifetime, direct synthesis of the exogenous viruses. However, majority of them are decayed relics and no longer have the capacity to form viruses (Patience et al 1997). LINEs (Long Interspersed Nuclear Elements)

LINEs are several thousand base pairs in size and make up about 17% of the total human genome (Richard and Batzer, 2009). They contain reverse-transcriptase-like gene involved in retrotransposition process. Many LINEs also code for an endonuclease (e.g. RNase H). The most abundant LINE family is the 7-kbp, L1 repeat element having >500,000 copies and accounts for approximately 15% of the human genome (Lander et al 2001). Despite its abundance, no function of LINE 1 repeat is yet known. Initial studies on mouse have associated L1s in shaping the structure and expression of the transcriptomes (Han et al 2004; Han and Boeke, 2005).

SINEs (Short Interspersed Nuclear Elements)

SINEs are small elements, usually 100 to 500-bp in length, accounting for 11% of the human genome (Richard and Batzer 2009). SINEs do not have reverse transcriptase gene, instead they borrow reverse transcriptase enzymes from other retroelements. Well-known example of SINE in the human genome is *Alu* sequences (Capy et al 1998), which are 350 base pairs long, do not contain any coding sequences, and have over 1 million copies (Roy-Engel et al 2001)

#### *2.2.1.2. DNA Transposons*

DNA transposons do not require RNA intermediate and transpose in a direct DNA-to-DNA manner. In eukaryotes, DNA transposons are less common than retrotransposons, but they have a special place in genetics because a family of plant DNA transposons - the Ac/Ds elements of maize. There are two types of DNA transposons that both require enzymes coded by genes within the transposons.

#### *2.2.2. Tandem repeats*

Tandem repeats consists of repeat arrays of two to several thousand-sequence units arranged in a head to tail fashion. Tandem repeats may be further classified according to the length and copy number of the basic repeat units as well as its genomic localization.

#### *2.2.2.1. Mega satellite DNA*

These are characterized by tandemly repeated DNA in which the repeat unit is approximately 50-400 times, producing blocks that can be hundreds of kilobases long. Some mega satellites are composed of coding repeats. For example: RNA genes, and the deubiquitinating enzyme gene USP17.

#### *2.2.2.2. Minisatellite DNA*

This comprises tandem copies of repeats that are 6-100 nucleotides in length (Tautz, D. 1993). Alec Jeffrey's first described minisatellites in 1985, from the non-coding (intron) regions of the human myoglobin gene. Since then similar DNA structures have been reported in many organisms including bacteria (Skuce et al 2002), avian (Reed et al 1996), higher plants (Sykorová et al 2006; Durward et al 1995), protozoan (Feng et al 2011; Bishop et al 1998), and yeast (Kelly et al 2011; Haber and Louis, 1998) genomes. Comparison of the repeat units in classical minisatellites led to early notion of consensus or core sequences, which exhibit some behavioral similarities with the *Chi* sequences of λ phage (GCTGTGG). Also called as variable number of tandem repeats (VNTR) (Brown 2002), majority of the minisatellites are GC rich, with a strong strand asymmetry. Often minisatellites form families of related sequences that occur at many hundred loci in the nuclear genome. In human genome, number of minisatellite loci is estimated to be approximately 3000 and each locus contains a distinctive repeat unit with respect to size and sequence content. The degree of repetition ranges from two to several hundreds. Repeat unit within a minisatellite usually display small variations in sequence. Minisatellite mutations usually consist of gains or losses of one or more repeat units. Such mutations at hypervariable minisatellite loci are up to 1000 times more common than mutations in protein coding genes (Debrauwere et al 1997).

Repetitive DNA: A Tool to Explore Animal Genomes/Transcriptomes 161

length

Length repeat unit

37 37 1

29 37 1

No. Repeat units

(TTTGGGG)n are found in the ciliated protozoan *Tetrahymena* and *Oxytricha* species, respectively; and (TG1-3)n is found in the yeast *Saccharomyces cerevisiae*. In organisms whose telomeres have been examined in detail, the GT strand extends 12 to 16 nucleotides (two repeats) beyond the complementary C-rich strand. The unique structure of telomere is

involved in the maintenance of the integrity of the chromosome ends.

2 0-33.6-37 GCCCTTCCTCCGGAGCCCTCCTCCAGCC

5 0-AY-29 GAGGARYAGAAAGGYGRGYRVTGTGGG

CTTCCTCCA

CGC

1988)

*2.2.2.2.2. Subtelomeric repeats* 

chromosomes (Norman, 2001).

*2.2.2.3. Microsatellite /Short Sequence Repeats (SSRs)* 

S.No. Probe name Sequence 5'-3' Total

1 0-33.6-22 (CCTCCAGCCCT)2 22 11 2

3 0-33.15-32 (CACCTCTCCACCTGCC)2 32 16 2 4 0-33.15-80 (CACCTCTCCACCTGCC)5 80 16 5

6 0-YN-124 TCCTGAACAACCCCACTGTACTTCCCA 27 31 1 7 0-33.1 GTGCCTGCTTCCCTTCCCTCTCTTGTC 27 62 1 8 0-34BHI CCTGCTCCGCTCACGTGGCCCACGCAC 27 ? 1 9 0-CCR-26 (CCR)8CC 26 3 8.67 10 0-CCA-26 (CCA)8CC 26 3 8.67 11 0-H-Ras CACTCCCCCTTCTCTCCAGGGGACGCCA 28 28 1 12 0-GACA-16 (GACA)4 16 4 4 13 0-GACA-24 (GACA)6 24 4 6

**Table 1.** Sequences and hybridization characteristics of the oligonucleotides probes (Ali and Wallace,

Sub-telomeric Repeats are the classes of repetitive sequences that are interspersed within the last 500,000 bases of non-repetitive DNA located adjacent to the telomere. Some sequences are chromosome specific whereas others seem to be present near the ends of all the human

Tandem repeats are made up of usually, di-, tri-, or tetranucleotide units (1-6 bps), were earlier called simple sequences (Tautz and Renz, 1984). Later, this class of DNA was coined as microsatellites by Tautz 1989. Microsatellites or simple sequence repeats (SSRs) are ubiquitously interspersed in coding and non-coding regions of the eukaryotic and prokaryotic genomes (Gur-Arie et al 2000; Toth et al 2000). All the SSRs taken together occupy about 3% of the human genome in which they are widely dispersed and associated

In the humans, majority of minisatellites are clustered near sub-telomeric ends of the chromosomes limiting their usefulness for extensive gene mapping (Lopes et al 2006; Royle et al 1988), but there are examples of interstitial locations (alpha globin gene cluster (Proudfoot et al 1982) and type II collagen gene (Stoker et al 1985). Minisatellites of other species, such as mice or bovine (Georges et al 1991), are not always preferentially clustered at chromosomal termini as in the human genome, but are distributed along the entire length of chromosomes. Unlike microsatellites, which usually alter during the DNA synthesis stage of the mitotic cell cycle, minisatellites alter during meiosis, undergoing changes in overall length and repeat composition (Jarman and Wells, 1989; Jeffrey's et al 1998). Minisatellite tracts have proven very useful for genomic mapping (Legendre et al 2007; Jeffrey's et al 1985) and linkage studies (Nakamura et al 1987). Examples of human minisatellite used for fingerprinting include consensus sequence of 33.6, 33.15 repeat loci. List of other minisatellite sequences according to Ali and Wallace, (1988) are mentioned in Table 1.

#### *2.2.2.2.1. Telomeric repeats*

These are composed of multiple repeats of short sequence elements (typically 5 to 8-bp in length, with a GT-rich strand oriented 5' to 3' toward the end of the chromosome) and range in length from a few repeat units to >10-kbp. Long simple sequence tandem repeats of interstitial TTAGGG arrays form a three-dimensional nuclear network of poorly transcribed domains, which involve gene silencing by repositioning. This network, as well as clusters of retroelements properly positioned in the nucleus, form unique lineage-specific structures that affect gene expression (Tomilin, 2008). The repeated sequence (TTAGGG)n is found at telomeres in all vertebrates, certain slime molds, and trypanosomes; (TTGGGG)n and (TTTGGGG)n are found in the ciliated protozoan *Tetrahymena* and *Oxytricha* species, respectively; and (TG1-3)n is found in the yeast *Saccharomyces cerevisiae*. In organisms whose telomeres have been examined in detail, the GT strand extends 12 to 16 nucleotides (two repeats) beyond the complementary C-rich strand. The unique structure of telomere is involved in the maintenance of the integrity of the chromosome ends.


**Table 1.** Sequences and hybridization characteristics of the oligonucleotides probes (Ali and Wallace, 1988)

#### *2.2.2.2.2. Subtelomeric repeats*

160 Functional Genomics

1997).

*2.2.2.2.1. Telomeric repeats* 

*2.2.2.2. Minisatellite DNA* 

This comprises tandem copies of repeats that are 6-100 nucleotides in length (Tautz, D. 1993). Alec Jeffrey's first described minisatellites in 1985, from the non-coding (intron) regions of the human myoglobin gene. Since then similar DNA structures have been reported in many organisms including bacteria (Skuce et al 2002), avian (Reed et al 1996), higher plants (Sykorová et al 2006; Durward et al 1995), protozoan (Feng et al 2011; Bishop et al 1998), and yeast (Kelly et al 2011; Haber and Louis, 1998) genomes. Comparison of the repeat units in classical minisatellites led to early notion of consensus or core sequences, which exhibit some behavioral similarities with the *Chi* sequences of λ phage (GCTGTGG). Also called as variable number of tandem repeats (VNTR) (Brown 2002), majority of the minisatellites are GC rich, with a strong strand asymmetry. Often minisatellites form families of related sequences that occur at many hundred loci in the nuclear genome. In human genome, number of minisatellite loci is estimated to be approximately 3000 and each locus contains a distinctive repeat unit with respect to size and sequence content. The degree of repetition ranges from two to several hundreds. Repeat unit within a minisatellite usually display small variations in sequence. Minisatellite mutations usually consist of gains or losses of one or more repeat units. Such mutations at hypervariable minisatellite loci are up to 1000 times more common than mutations in protein coding genes (Debrauwere et al

In the humans, majority of minisatellites are clustered near sub-telomeric ends of the chromosomes limiting their usefulness for extensive gene mapping (Lopes et al 2006; Royle et al 1988), but there are examples of interstitial locations (alpha globin gene cluster (Proudfoot et al 1982) and type II collagen gene (Stoker et al 1985). Minisatellites of other species, such as mice or bovine (Georges et al 1991), are not always preferentially clustered at chromosomal termini as in the human genome, but are distributed along the entire length of chromosomes. Unlike microsatellites, which usually alter during the DNA synthesis stage of the mitotic cell cycle, minisatellites alter during meiosis, undergoing changes in overall length and repeat composition (Jarman and Wells, 1989; Jeffrey's et al 1998). Minisatellite tracts have proven very useful for genomic mapping (Legendre et al 2007; Jeffrey's et al 1985) and linkage studies (Nakamura et al 1987). Examples of human minisatellite used for fingerprinting include consensus sequence of 33.6, 33.15 repeat loci. List of other minisatellite sequences according to Ali and Wallace, (1988) are mentioned in Table 1.

These are composed of multiple repeats of short sequence elements (typically 5 to 8-bp in length, with a GT-rich strand oriented 5' to 3' toward the end of the chromosome) and range in length from a few repeat units to >10-kbp. Long simple sequence tandem repeats of interstitial TTAGGG arrays form a three-dimensional nuclear network of poorly transcribed domains, which involve gene silencing by repositioning. This network, as well as clusters of retroelements properly positioned in the nucleus, form unique lineage-specific structures that affect gene expression (Tomilin, 2008). The repeated sequence (TTAGGG)n is found at telomeres in all vertebrates, certain slime molds, and trypanosomes; (TTGGGG)n and Sub-telomeric Repeats are the classes of repetitive sequences that are interspersed within the last 500,000 bases of non-repetitive DNA located adjacent to the telomere. Some sequences are chromosome specific whereas others seem to be present near the ends of all the human chromosomes (Norman, 2001).

#### *2.2.2.3. Microsatellite /Short Sequence Repeats (SSRs)*

Tandem repeats are made up of usually, di-, tri-, or tetranucleotide units (1-6 bps), were earlier called simple sequences (Tautz and Renz, 1984). Later, this class of DNA was coined as microsatellites by Tautz 1989. Microsatellites or simple sequence repeats (SSRs) are ubiquitously interspersed in coding and non-coding regions of the eukaryotic and prokaryotic genomes (Gur-Arie et al 2000; Toth et al 2000). All the SSRs taken together occupy about 3% of the human genome in which they are widely dispersed and associated with many genes (Subramanian et al 2003). The significance of specific microsatellite in different regions has not been completely understood. However, some microsatellites occurring in flanking regions of coding sequences are believed to play significant roles in regulation of gene expression by forming various DNA secondary structures and offering a mechanism of unwinding (Catasti et al 1999). The variation of length and unit type of simple repeats in upstream activation sequences might influence transcriptional activity (Kim and Mullet, 1995; Epplen et al 1996; Martienssen and Colot, 2001; Zhang et al 2004), and affect interaction with different regulatory proteins during translation (Lue et al 1989).

Repetitive DNA: A Tool to Explore Animal Genomes/Transcriptomes 163

variants will be lost whereas others will increase in frequency, eventually replacing all others. These evolutionary changes leads to homogeneity in the repeats of an array within a species and heterogeneity in the units of the corresponding array in different species, giving rise to inter- species variations (Harris and Wright, 1995). This phenomenon however is affected by overall male female ratios, population size and possibility of infusion of newer genetic materials in a given gene pool and allele fixation involving evolutionary incubation

With respect to functional roles of these sequences, uncertainty persisted for a long time and it was largely believed that they represent detritus part of the genome (Ohno, 1972). However, recent studies have shown repeat elements influencing the structure, function, and evolution of the chromosomes in the host genomes (Sinden, 1999; Dey and Rath, 2005; Tang, 2011). Their association with the promoters and coding regions of the genes has made them very attractive objects of the study. Transcription, mRNA processing, translation, folding, stability and aggregation rates, as well as gross morphology have been found to be incrementally affected by the alterations in the tracts of tandem repeats (Fondon and Garner, 2004; Vinces, 2009). The human genome provides many instances of regulatory regions embedded in the remnants of repeat elements (Jordan et al. 2003) and studies have documented participation of repeat sequences in regulation of gene expression (Boeva et al., 2006). This suggests that the repeat elements play a major architectonic role in higher order of physical structuring of the genome (Shapiro and Sternberg, 2005; Vermaak et al 2009). More studies on repeat sequences will lead to an increased understanding on the functions

Primers based on VNTR provide an unprecedented opportunity to develop potential molecular markers for a particular species. Where a complete genome sequence is available for an organism, repeats may be annotated with their physical position on the genome. Markers may then be selected either for their location within a specific region of interest or for their even distribution across the regions. Where a full genome sequence is unavailable, location may be predicted through synteny using a sequenced genome or through previous mapping exercises. Alternatively, for a genome whose sequences are not known can still be analyzed employing primers from other species for gene amplification. A gene so amplified may then be localized onto the chromosomes employing FISH. Similar set of primer may be used to amplify cDNA of the species. This approach circumvents the need for screening the

Furthermore, for species which exhibit low levels of polymorphism at repeat loci, candidate polymorphic loci may be predicted through mining large sequence datasets. The presence of short sequence repeat (SSR) polymorphisms within aligned sequences of different origin

**4. Functional significance of repetitive sequences** 

**5. Significance of repetitive sequences as marker** 

and dysfunctions of the genomes.

genomic library.

time.

Microsatellites are usually characterized by low degree of repetition at a particular locus. However, these elements containing identical motifs may be found at many thousand genomic loci. When the occurrence of SSRs in different functional genome regions is considered, it turned out that most of them show much higher density in non-coding regions. Exceptions to the rule are trimers and hexamers that are nearly two times more prevalent in exons compared to introns and intergenic regions. Their high frequency in coding regions may be explained by the fact that they do not change the reading frames and gene coding properties, thus, are much better tolerated than other SSRs. Their positive selection in exons suggests some functions for these repeats.

The high mutation rate of these repeats and their frequent length polymorphism suggest that they may be involved in the regulation of gene expression thus leaving quantitative effects on the phenotype. Few examples of repeat units used for fingerprinting and transcriptome analysis includes (GATA/GACA)n, CA, (AT)n, (GAA)n, (TCC)n, (GGAT)n, (GGCA)n, and (TTAGGG)n.

#### **3. Evolution and inter-species variation of repeat sequences**

Several mechanisms have been proposed for their evolution, such as stand slippage during replication, base misalignment and unequal cross over between homologous chromosomes during meiosis, sister chromatid exchanges or even insertion of the viral genome (Barros, 2008; Jeffrey's et al 1985; Tautz, 1989).

Microsatellites tend to be highly polymorphic, suggesting a 'stepwise mutation' model in which most variations are introduced by replication slippage, changing the array length by only one or two repeats at a time, but also with occasional larger 'jumps' in size at much lower frequency. Minisatellites, evolve more readily by larger-scale mechanisms such as unequal exchanges. For all classes, there appears to be a general bias towards increase in array length through evolutionary time. Highly repetitive DNA tends to accumulate only in regions of low recombination such as centromeres and telomeres, where recombination is suppressed, while repeats occurring in euchromatin are much more susceptible to crossingover and tend to be more variable in copy number relative to their array length.

As mentioned above, mechanism of loss or gain of repeat by unequal cross over and gene conversation can lead to molecular drive of any given variant in a sexually dimorphic population. During the evolution of repetitive elements by unequal cross over, some variants will be lost whereas others will increase in frequency, eventually replacing all others. These evolutionary changes leads to homogeneity in the repeats of an array within a species and heterogeneity in the units of the corresponding array in different species, giving rise to inter- species variations (Harris and Wright, 1995). This phenomenon however is affected by overall male female ratios, population size and possibility of infusion of newer genetic materials in a given gene pool and allele fixation involving evolutionary incubation time.

#### **4. Functional significance of repetitive sequences**

162 Functional Genomics

with many genes (Subramanian et al 2003). The significance of specific microsatellite in different regions has not been completely understood. However, some microsatellites occurring in flanking regions of coding sequences are believed to play significant roles in regulation of gene expression by forming various DNA secondary structures and offering a mechanism of unwinding (Catasti et al 1999). The variation of length and unit type of simple repeats in upstream activation sequences might influence transcriptional activity (Kim and Mullet, 1995; Epplen et al 1996; Martienssen and Colot, 2001; Zhang et al 2004), and affect

Microsatellites are usually characterized by low degree of repetition at a particular locus. However, these elements containing identical motifs may be found at many thousand genomic loci. When the occurrence of SSRs in different functional genome regions is considered, it turned out that most of them show much higher density in non-coding regions. Exceptions to the rule are trimers and hexamers that are nearly two times more prevalent in exons compared to introns and intergenic regions. Their high frequency in coding regions may be explained by the fact that they do not change the reading frames and gene coding properties, thus, are much better tolerated than other SSRs. Their positive

The high mutation rate of these repeats and their frequent length polymorphism suggest that they may be involved in the regulation of gene expression thus leaving quantitative effects on the phenotype. Few examples of repeat units used for fingerprinting and transcriptome analysis includes (GATA/GACA)n, CA, (AT)n, (GAA)n, (TCC)n, (GGAT)n,

Several mechanisms have been proposed for their evolution, such as stand slippage during replication, base misalignment and unequal cross over between homologous chromosomes during meiosis, sister chromatid exchanges or even insertion of the viral genome (Barros,

Microsatellites tend to be highly polymorphic, suggesting a 'stepwise mutation' model in which most variations are introduced by replication slippage, changing the array length by only one or two repeats at a time, but also with occasional larger 'jumps' in size at much lower frequency. Minisatellites, evolve more readily by larger-scale mechanisms such as unequal exchanges. For all classes, there appears to be a general bias towards increase in array length through evolutionary time. Highly repetitive DNA tends to accumulate only in regions of low recombination such as centromeres and telomeres, where recombination is suppressed, while repeats occurring in euchromatin are much more susceptible to crossing-

As mentioned above, mechanism of loss or gain of repeat by unequal cross over and gene conversation can lead to molecular drive of any given variant in a sexually dimorphic population. During the evolution of repetitive elements by unequal cross over, some

interaction with different regulatory proteins during translation (Lue et al 1989).

selection in exons suggests some functions for these repeats.

**3. Evolution and inter-species variation of repeat sequences** 

over and tend to be more variable in copy number relative to their array length.

(GGCA)n, and (TTAGGG)n.

2008; Jeffrey's et al 1985; Tautz, 1989).

With respect to functional roles of these sequences, uncertainty persisted for a long time and it was largely believed that they represent detritus part of the genome (Ohno, 1972). However, recent studies have shown repeat elements influencing the structure, function, and evolution of the chromosomes in the host genomes (Sinden, 1999; Dey and Rath, 2005; Tang, 2011). Their association with the promoters and coding regions of the genes has made them very attractive objects of the study. Transcription, mRNA processing, translation, folding, stability and aggregation rates, as well as gross morphology have been found to be incrementally affected by the alterations in the tracts of tandem repeats (Fondon and Garner, 2004; Vinces, 2009). The human genome provides many instances of regulatory regions embedded in the remnants of repeat elements (Jordan et al. 2003) and studies have documented participation of repeat sequences in regulation of gene expression (Boeva et al., 2006). This suggests that the repeat elements play a major architectonic role in higher order of physical structuring of the genome (Shapiro and Sternberg, 2005; Vermaak et al 2009). More studies on repeat sequences will lead to an increased understanding on the functions and dysfunctions of the genomes.

#### **5. Significance of repetitive sequences as marker**

Primers based on VNTR provide an unprecedented opportunity to develop potential molecular markers for a particular species. Where a complete genome sequence is available for an organism, repeats may be annotated with their physical position on the genome. Markers may then be selected either for their location within a specific region of interest or for their even distribution across the regions. Where a full genome sequence is unavailable, location may be predicted through synteny using a sequenced genome or through previous mapping exercises. Alternatively, for a genome whose sequences are not known can still be analyzed employing primers from other species for gene amplification. A gene so amplified may then be localized onto the chromosomes employing FISH. Similar set of primer may be used to amplify cDNA of the species. This approach circumvents the need for screening the genomic library.

Furthermore, for species which exhibit low levels of polymorphism at repeat loci, candidate polymorphic loci may be predicted through mining large sequence datasets. The presence of short sequence repeat (SSR) polymorphisms within aligned sequences of different origin would be indicative of the level of polymorphism at that locus. These selection strategies could greatly reduce the time and cost associated with the development of repeat markers. Integration of this repetitive sequence data with genome databases would provide further benefits to genome researchers.

Repetitive DNA: A Tool to Explore Animal Genomes/Transcriptomes 165

**7. Repetitive sequences as molecular markers in bovid genome** 

animal biotechnology in particular.

**8. Restriction Fragment Length Polymorphism (RFLP)** 

be conserved only in buffalo, cattle, goat, and sheep (Pathak et al 2006; 2011).

**9. Minisatellite Associated Sequence Amplification (MASA)** 

MASA involves random amplification of genomic or cDNA with primers specific to minisatellites by PCR. MASA can be performed with a small quantity of target substrate.

The basic technique for detecting RFLPs involves the fragmentation of genomic DNA by a restriction enzyme. The resulting DNA fragments are then separated by length through a process known as agarose gel electrophoresis, and transferred to a membrane via the Southern blot procedure. Hybridization of the membrane to a labeled DNA probe then determines the size of the fragments, which are complementary to the probe. An RFLP occurs when the size of a detected fragment varies between individuals. Each fragment size is considered an allele, and can be used in genetic analysis. RFLP's are quick, simple and inexpensive ways to assay DNA sequence differences. It is the first DNA polymorphism to be widely used for genomic characterization, which detects variations ranging from gross rearrangements to single base changes. The polymorphisms are found by their effects on sites for restriction enzyme mediated cleavage of preparations of high molecular weight DNA. In buffalo, RFLP approach has been used to gain insight into organization and allele length variation of satellite fractions (Chattopadhyay et al 2001; Bhatnagar et al 2004). From our laboratory, *BamH*1 derived pDS5 and pDS4 and *Rsa*I derived pDp1-pDp4 were found to

Based on repeat sequences, a number of probes with varying length and sequence complexities have been successfully used as genetic markers (Kapur et al 2003; Jobling and Tyler-Smith, 2003; Bashamboo and Ali, 2001; Amos et al 1991; Tourmente et al 1994; Ali et al 1986). Earlier conventional protein and biochemical markers were used for breeding program of bubaline species (Wilson and Strobeck, 1999). Subsequently, diallelic Restriction Fragment Length Polymorphism (RFLP) for the loci homologous to cattle (Blott et al 1999) were used. However due to low levels of polymorphism detected with these markers, their application remained limited. RFLP technology was followed by Random amplification of Polymorphic DNA (RAPD), followed by Amplified fragment length Polymorphism (AFLP) besides minisatellite markers. A series of synthetic oligonucleotide probes were developed as markers for genetic analysis and molecular systematics of Bubaline and related genomes. While probes based on repeat sequences are available there is no clear cut experimental approach that could assist identification and segregation of elite animals with superior QTL loci. This is because most of the physical and physiological attributes recognized to be the part of the elite animals, are controlled by several genes and it is extremely challenging to uncover all such genes implicated with superior germplasm. However marker based analysis would possibly bridge the gap and facilitate much-needed advance research to segregate genetically superior germplasm in the context of animal genetics in general and

#### **6.** *Bubalus bubalis* **genome**

The water buffalo (*Bubalus bubalis*) population in the world is actually about 168 million head, of which 161 million can be found in Asia (95.83 percent); 3717 million are in Africa and Egypt (2.24 percent); 3.3 million (1.96 percent) in South America, 40 000 in Australia (0.02 percent); 500 000 in Europe (0.30 percent). Asian buffalo or Water buffalo is classified under the Genus: *Bubalus*, Species: *bubalis*. Asian buffalo includes two subspecies known as the River and Swamp types, the morphology and purposes of which are different so are the genetics. The River buffalo has 50 chromosomes of which five pairs are sub-metacentric, while 20 are acrocentric: the Swamp buffalo has 48 chromosomes, of which 19 pairs are metacentric. Swamp buffaloes are stocky animals with marshy land habitats. They are primarily used for draught power in paddy fields and haulage but are also used for meat and milk production. They produce a valuable milk yield of up to 600 kg milk per year, Swamp buffaloes are mostly found in South East Asian countries. A few animals can also be found in the northeastern states of India (Sethi, 2003). River buffaloes are generally large in size, with curled horns and are mainly found in India, Pakistan and in some countries of western Asia. They prefer to enter clear water, and are primarily used for milk meat and draught purposes. Each subspecies includes several breeds. Buffaloes are known to be better at converting poor-quality roughage into milk and meat. They are reported to have a 5 percent higher digestibility of crude fiber than high-yielding cows; and a 4-5 percent higher efficiency of utilization of metabolic energy for milk production (Mudgal, 1988).

India has about 97 million animals, which represents 92% of the world buffalo population. India possesses the best River milk breeds in Asia e.g. Murrah, Nili-Ravi, Surti Jaffarabadi, Mehsana, Kundi, Bhadavari and Nagpuri which originated from the north-western states of India (Sethi, 2003). However, despite the importance of buffalo to the economic and social fabric of the region, its population has been declining. There are many reasons for the decline of buffalo populations, foremost of which are: increased agricultural mechanization; increased urbanization, industrialization, and reforestation limiting paddy areas for buffaloes; growing buffalo slaughter rate to satisfy meat demands of a fast-growing population; poor reproductive performance; and lack of proper attention by policy makers and researchers. The low reproductive efficiency in female buffalo can be attributed to delayed puberty, higher age at calving, long postpartum anoestrus period, long calving interval, lack of overt sign of heat, and low conception rate. In addition, female buffaloes have few primordial follicles and a high rate of follicular atresia. Understanding potential quantitative trait loci associated with economically important traits will help in segregating genetically superior breeds.

#### **7. Repetitive sequences as molecular markers in bovid genome**

164 Functional Genomics

benefits to genome researchers.

**6.** *Bubalus bubalis* **genome** 

genetically superior breeds.

would be indicative of the level of polymorphism at that locus. These selection strategies could greatly reduce the time and cost associated with the development of repeat markers. Integration of this repetitive sequence data with genome databases would provide further

The water buffalo (*Bubalus bubalis*) population in the world is actually about 168 million head, of which 161 million can be found in Asia (95.83 percent); 3717 million are in Africa and Egypt (2.24 percent); 3.3 million (1.96 percent) in South America, 40 000 in Australia (0.02 percent); 500 000 in Europe (0.30 percent). Asian buffalo or Water buffalo is classified under the Genus: *Bubalus*, Species: *bubalis*. Asian buffalo includes two subspecies known as the River and Swamp types, the morphology and purposes of which are different so are the genetics. The River buffalo has 50 chromosomes of which five pairs are sub-metacentric, while 20 are acrocentric: the Swamp buffalo has 48 chromosomes, of which 19 pairs are metacentric. Swamp buffaloes are stocky animals with marshy land habitats. They are primarily used for draught power in paddy fields and haulage but are also used for meat and milk production. They produce a valuable milk yield of up to 600 kg milk per year, Swamp buffaloes are mostly found in South East Asian countries. A few animals can also be found in the northeastern states of India (Sethi, 2003). River buffaloes are generally large in size, with curled horns and are mainly found in India, Pakistan and in some countries of western Asia. They prefer to enter clear water, and are primarily used for milk meat and draught purposes. Each subspecies includes several breeds. Buffaloes are known to be better at converting poor-quality roughage into milk and meat. They are reported to have a 5 percent higher digestibility of crude fiber than high-yielding cows; and a 4-5 percent higher

efficiency of utilization of metabolic energy for milk production (Mudgal, 1988).

India has about 97 million animals, which represents 92% of the world buffalo population. India possesses the best River milk breeds in Asia e.g. Murrah, Nili-Ravi, Surti Jaffarabadi, Mehsana, Kundi, Bhadavari and Nagpuri which originated from the north-western states of India (Sethi, 2003). However, despite the importance of buffalo to the economic and social fabric of the region, its population has been declining. There are many reasons for the decline of buffalo populations, foremost of which are: increased agricultural mechanization; increased urbanization, industrialization, and reforestation limiting paddy areas for buffaloes; growing buffalo slaughter rate to satisfy meat demands of a fast-growing population; poor reproductive performance; and lack of proper attention by policy makers and researchers. The low reproductive efficiency in female buffalo can be attributed to delayed puberty, higher age at calving, long postpartum anoestrus period, long calving interval, lack of overt sign of heat, and low conception rate. In addition, female buffaloes have few primordial follicles and a high rate of follicular atresia. Understanding potential quantitative trait loci associated with economically important traits will help in segregating Based on repeat sequences, a number of probes with varying length and sequence complexities have been successfully used as genetic markers (Kapur et al 2003; Jobling and Tyler-Smith, 2003; Bashamboo and Ali, 2001; Amos et al 1991; Tourmente et al 1994; Ali et al 1986). Earlier conventional protein and biochemical markers were used for breeding program of bubaline species (Wilson and Strobeck, 1999). Subsequently, diallelic Restriction Fragment Length Polymorphism (RFLP) for the loci homologous to cattle (Blott et al 1999) were used. However due to low levels of polymorphism detected with these markers, their application remained limited. RFLP technology was followed by Random amplification of Polymorphic DNA (RAPD), followed by Amplified fragment length Polymorphism (AFLP) besides minisatellite markers. A series of synthetic oligonucleotide probes were developed as markers for genetic analysis and molecular systematics of Bubaline and related genomes. While probes based on repeat sequences are available there is no clear cut experimental approach that could assist identification and segregation of elite animals with superior QTL loci. This is because most of the physical and physiological attributes recognized to be the part of the elite animals, are controlled by several genes and it is extremely challenging to uncover all such genes implicated with superior germplasm. However marker based analysis would possibly bridge the gap and facilitate much-needed advance research to segregate genetically superior germplasm in the context of animal genetics in general and animal biotechnology in particular.

#### **8. Restriction Fragment Length Polymorphism (RFLP)**

The basic technique for detecting RFLPs involves the fragmentation of genomic DNA by a restriction enzyme. The resulting DNA fragments are then separated by length through a process known as agarose gel electrophoresis, and transferred to a membrane via the Southern blot procedure. Hybridization of the membrane to a labeled DNA probe then determines the size of the fragments, which are complementary to the probe. An RFLP occurs when the size of a detected fragment varies between individuals. Each fragment size is considered an allele, and can be used in genetic analysis. RFLP's are quick, simple and inexpensive ways to assay DNA sequence differences. It is the first DNA polymorphism to be widely used for genomic characterization, which detects variations ranging from gross rearrangements to single base changes. The polymorphisms are found by their effects on sites for restriction enzyme mediated cleavage of preparations of high molecular weight DNA. In buffalo, RFLP approach has been used to gain insight into organization and allele length variation of satellite fractions (Chattopadhyay et al 2001; Bhatnagar et al 2004). From our laboratory, *BamH*1 derived pDS5 and pDS4 and *Rsa*I derived pDp1-pDp4 were found to be conserved only in buffalo, cattle, goat, and sheep (Pathak et al 2006; 2011).

#### **9. Minisatellite Associated Sequence Amplification (MASA)**

MASA involves random amplification of genomic or cDNA with primers specific to minisatellites by PCR. MASA can be performed with a small quantity of target substrate. The novel part of the current approach is that functional, structural and regulatory genes associated with minisatellites are accessed without screening the conventional cDNA library proving this be highly useful for such genome analysis where prior information is absent or inadequately available. The expression profile of genes based on MASA under normal and abnormal conditions is envisaged to be of great relevance for identification of event/stage specific mRNA transcripts. In the context of comparative genomics, mRNA transcripts commonly expressing in a large number of species may be segregated. Following this approach, genes with highest levels of expression in a given tissue may be easily identified and the information from different breeds of animals may be established. In addition, differential expression of genes accessed by MASA may be used to establish genotype phenotype correlation in the context of genetic diseases, cancer biology, stem cell research, tissue engineering, organ transplantation, animal cloning, characterization of genetic integrity of different cell lines and conducting translational research. Minisatellite sequences 33.6, 33.15 have been widely used to explore bubaline genome (Srivastava et al 2006; 2008; Pathak et al 2010). In addition microsatellite probes (2-6 base pairs) such as (AT)n, (CA)n, (GAA)n, (TCC)n, (GACA)n, (GATA)n, (GGAT)n, (GGCA)n and (TTAGGG) were used to analyze buffalo genome (Rawal et al 2012; kumar et al 2011). Following this approach, additional oligo primers based on VNTR loci may be used to undertake analysis of any desired species, cell lines, biopsied samples and cell lines.

Repetitive DNA: A Tool to Explore Animal Genomes/Transcriptomes 167

by RT-PCR both for the *CDH1* (E-cadherin) and *CD45* (tyrosine phosphatase). Similarly, presence of DNA was ruled out by PCR using β-actin primers. Following this, approximately 10 μg of RNA from different tissues and spermatozoa was reverse transcribed into cDNA using commercially available high capacity cDNA RT kit (Applied Biosystems, USA). The success of cDNA synthesis was confirmed by PCR employing 35

Using oligo primer and cDNA from different tissues and spermatozoa, PCR amplifications were carried out. The reaction conditions involved 95°C denaturation for 5 min followed by 35 cycles each consisting denaturation at 95°C for 1 min, annealing at the optimal temperature for 1.5 min, extension of the primer at 72°C for 1 min and final extension at 72°C for 10 min. Approximately, 25 μl of amplified product was resolved on a 20-cm-long, 3% (w/v) agarose gel in 1× TBE buffer at a constant voltage. The distinct bands were sliced from the gel, purified and cloned into pGEMT-easy vector (Promega, USA). In water buffalo, *Bubalus bubalis* using cDNA from the spermatozoa and eight different somatic tissues and an oligo primer based on two units of consensus of 33.6 repeat loci (5' CCTCCAGCCCTCCTCCAGCCCT 3'), Minisatellite-associated sequence amplification

**Figure 2.** A representative agarose gel showing minisatellite associated sequence amplification (MASA) with cDNA from different somatic tissues and spermatozoa of buffalo as shown on top of the lanes in panel **(A)**. -actin was used as an internal control **(B)**. M is the molecular marker given in base pairs (bp)

Approximately, 4-5 μg of genomic DNA from buffalo, cattle, goat and sheep were subjected individually to restriction digestion using 4-5 units of *Bam*HI and *Rsa*1enzyme. The digested

cycles of amplification using buffalo derived β-actin primers.

(MASA) identified 29 mRNA transcripts (Figure 2).

(for details, see Pathak et al, 2010).

**10.4. Restriction digestion of buffalo genomic DNA** 

**10.3. Minisatellite Associated Sequence amplification (MASA)** 

#### **10. Technical approaches and methodologies**

We describe some of our works related to characterization of the buffalo genome. Further, in the context of functional and comparative genomics, DNA from across the species were also used. DNA was largely procured from the blood samples though in some cases, solid tissues were also used.

#### **10.1. Collection of blood samples and isolation of genomic DNA**

DNA was extracted from peripheral blood of buffalo *Bubalus bubalis*, goat *Cipra hircus* sheep *Ovis aries* tiger *Panthera tigris*, lion *Panthera leo*, humans *Homo sapiens,* langur *Presbytis entellus,* Indian rhinoceros *Rhinoceros unicornis,* fish *Hetropnustes fossilis*, bird *Columba livia*, baboon *Papio hamadryas*, pig *Sus scrofa,* rat *Rattus norvegicus,* jungle cat *Felis chaus*, bonnet monkey *Macaca radiate* and leopard *Panthera pardus*. Intactness of DNA was checked on 1% agarose gel and DNA was PCR amplified using bubaline derived β actin primers and visualized on UV transilluminator.

#### **10.2. RNA isolation and synthesis of cDNA**

Using buffalo as an experimental animal, total RNA was extracted from testis, kidney, liver, spleen, lung, heart, ovary, brain and sperm using TRIzol (Molecular Research Center, Inc., Cincinnati, OH) following manufacturer's instructions. To check the contamination of mRNA from the cells other than spermatozoa, RNA extractions from the sperms were tested by RT-PCR both for the *CDH1* (E-cadherin) and *CD45* (tyrosine phosphatase). Similarly, presence of DNA was ruled out by PCR using β-actin primers. Following this, approximately 10 μg of RNA from different tissues and spermatozoa was reverse transcribed into cDNA using commercially available high capacity cDNA RT kit (Applied Biosystems, USA). The success of cDNA synthesis was confirmed by PCR employing 35 cycles of amplification using buffalo derived β-actin primers.

#### **10.3. Minisatellite Associated Sequence amplification (MASA)**

166 Functional Genomics

were also used.

visualized on UV transilluminator.

**10.2. RNA isolation and synthesis of cDNA** 

The novel part of the current approach is that functional, structural and regulatory genes associated with minisatellites are accessed without screening the conventional cDNA library proving this be highly useful for such genome analysis where prior information is absent or inadequately available. The expression profile of genes based on MASA under normal and abnormal conditions is envisaged to be of great relevance for identification of event/stage specific mRNA transcripts. In the context of comparative genomics, mRNA transcripts commonly expressing in a large number of species may be segregated. Following this approach, genes with highest levels of expression in a given tissue may be easily identified and the information from different breeds of animals may be established. In addition, differential expression of genes accessed by MASA may be used to establish genotype phenotype correlation in the context of genetic diseases, cancer biology, stem cell research, tissue engineering, organ transplantation, animal cloning, characterization of genetic integrity of different cell lines and conducting translational research. Minisatellite sequences 33.6, 33.15 have been widely used to explore bubaline genome (Srivastava et al 2006; 2008; Pathak et al 2010). In addition microsatellite probes (2-6 base pairs) such as (AT)n, (CA)n, (GAA)n, (TCC)n, (GACA)n, (GATA)n, (GGAT)n, (GGCA)n and (TTAGGG) were used to analyze buffalo genome (Rawal et al 2012; kumar et al 2011). Following this approach, additional oligo primers based on VNTR loci may be used to undertake analysis of any

We describe some of our works related to characterization of the buffalo genome. Further, in the context of functional and comparative genomics, DNA from across the species were also used. DNA was largely procured from the blood samples though in some cases, solid tissues

DNA was extracted from peripheral blood of buffalo *Bubalus bubalis*, goat *Cipra hircus* sheep *Ovis aries* tiger *Panthera tigris*, lion *Panthera leo*, humans *Homo sapiens,* langur *Presbytis entellus,* Indian rhinoceros *Rhinoceros unicornis,* fish *Hetropnustes fossilis*, bird *Columba livia*, baboon *Papio hamadryas*, pig *Sus scrofa,* rat *Rattus norvegicus,* jungle cat *Felis chaus*, bonnet monkey *Macaca radiate* and leopard *Panthera pardus*. Intactness of DNA was checked on 1% agarose gel and DNA was PCR amplified using bubaline derived β actin primers and

Using buffalo as an experimental animal, total RNA was extracted from testis, kidney, liver, spleen, lung, heart, ovary, brain and sperm using TRIzol (Molecular Research Center, Inc., Cincinnati, OH) following manufacturer's instructions. To check the contamination of mRNA from the cells other than spermatozoa, RNA extractions from the sperms were tested

**10.1. Collection of blood samples and isolation of genomic DNA** 

desired species, cell lines, biopsied samples and cell lines.

**10. Technical approaches and methodologies** 

Using oligo primer and cDNA from different tissues and spermatozoa, PCR amplifications were carried out. The reaction conditions involved 95°C denaturation for 5 min followed by 35 cycles each consisting denaturation at 95°C for 1 min, annealing at the optimal temperature for 1.5 min, extension of the primer at 72°C for 1 min and final extension at 72°C for 10 min. Approximately, 25 μl of amplified product was resolved on a 20-cm-long, 3% (w/v) agarose gel in 1× TBE buffer at a constant voltage. The distinct bands were sliced from the gel, purified and cloned into pGEMT-easy vector (Promega, USA). In water buffalo, *Bubalus bubalis* using cDNA from the spermatozoa and eight different somatic tissues and an oligo primer based on two units of consensus of 33.6 repeat loci (5' CCTCCAGCCCTCCTCCAGCCCT 3'), Minisatellite-associated sequence amplification (MASA) identified 29 mRNA transcripts (Figure 2).

**Figure 2.** A representative agarose gel showing minisatellite associated sequence amplification (MASA) with cDNA from different somatic tissues and spermatozoa of buffalo as shown on top of the lanes in panel **(A)**. -actin was used as an internal control **(B)**. M is the molecular marker given in base pairs (bp) (for details, see Pathak et al, 2010).

#### **10.4. Restriction digestion of buffalo genomic DNA**

Approximately, 4-5 μg of genomic DNA from buffalo, cattle, goat and sheep were subjected individually to restriction digestion using 4-5 units of *Bam*HI and *Rsa*1enzyme. The digested DNA fragments were resolved on 0.8% agarose gel in 0.5X TBE for approximately 16-18 hours. In water buffalo, two distinct DNA bands of 1378 and 673 bp with Bam *HI* and four bands of 1331, 651, 603 and 339 base pairs were cut, gel purified (Figure 3). The eluted fragments were cloned and sequenced following standard protocol. For Southern hybridization, DNA was transferred onto Nylon membrane and immobilized by exposure to UV. Membranes were rinsed in 2X SSC, dried and UV cross- linked. Blots were hybridized at 600C overnight with 32P α-dCTP labeled recombinant plasmid (25 ng) using random priming method (rediprimeTM II kit, Amersham Pharmacia biotech, USA). Washing of the membranes was done using standard protocols and signals were recorded by exposure of the blot to X-ray film (Pathak 2006; 2011).

Repetitive DNA: A Tool to Explore Animal Genomes/Transcriptomes 169

pDS4 fragments per the haploid genome and ~2 × 104 copies of pDp1, ~ 3000 copies of pDp2 and pDp3 and ~ 1000 of pDp4 in buffalo, cattle, goat and sheep genomes (Figure 4), respectively. (Pathak et al 2006; 2011) The copy number assessment of these repeats in different known and nondescript breeds of buffalo may enable to establish a correlation, if

**Figure 4.** Standard curve based on 10 fold dilution series of pDp1, pDp2, pDp3, pDp4 and genomic DNA from buffalo, cattle, goat and sheep showing the amplification plot (a-d) panel (A), corresponding slopes of -3.3 to -3.5, panel (B) and a single dissociation peak, panel (C), substantiating maximum efficiency of the PCR reaction and high specificity of the primers with target DNA. Arrow indicates

approximately same copy number for pDp1, pDp2, pDp3, pDp4 indicating their conservation across the

Relative expression using Real Time PCR was carried out for the desired fragments with Sybr Green assay using cDNA from different tissues and spermatozoa. The primers were designed using Primer Express 2.0 (Applied Biosystems) software. The cyclic conditions were same as that used for copy number calculations. The reaction was performed following standard protocol (Sriavastava et al 2006). The specificity of each primer pair and the efficiency of the amplification were tested by assaying serial dilutions of the cDNA hybridized with oligonucleotides specific for target and normalization control (*GAPDH*). The difference in the Ct value between the target cDNA from different tissues and the control samples (the tissue showing least expression) was used for calculation. The expression level of the desired fragments was calculated using the formula: expression =

genomic DNA from buffalo, cattle, goat and sheep Buffalo, cattle, goat and sheep showed

bovid species (Pathak et al 2011).

any, towards the delineation of different breeds.

**Figure 3.** Agarose gel showing restriction digestion of buffalo *Bubalus bubalis,* genomic DNA with *BamH*I **(A)** and *Rsa*I **(B)** enzymes. The two discernible bands 673 bp and 1378 bp with *BamH*I digestion and four bands 1331, 651, 603 and 339 bp in *Rsa*I are highlighted. Molecular weight marker is given on the left in base-pair (bp). Since the patterns are not gender species, this suggests that the bands are originated from the autosomes (for details, Pathak et al 2006; 2011).

#### **10.5. Copy number assessment and relative expression using Real Time PCR**

Copy number of desired fragment was calculated based on absolute quantitation assay using SYBR Green dye and Sequence Detection System- 7500 (ABI, USA). The primers specific to fragments, respectively, were designed using Primer Express Software V2.0 (ABI). The standard curve was obtained using 10 folds dilution series of the recombinant plasmids ranging from 30, 00,000 to 30 copies taking 3.36 pg DNA per haploid genome of (assuming haploid genome of farm animals =3.3 pg, wt per base pair = 1.096 × 10-21 gm) as standards. The reactions were performed in triplicate using 96 well plates in a 25 μl reaction volume, each having 0.5 ng of buffalo genomic DNA and 50 nM of corresponding primers, employing conditions of 500C for 2 min, 950C for 10 min, followed by 40 cycles of 950C for 10 sec and 600C for 1 min. Real-time PCR analysis uncovered 1234 and 3420 copies of pDS5 and pDS4 fragments per the haploid genome and ~2 × 104 copies of pDp1, ~ 3000 copies of pDp2 and pDp3 and ~ 1000 of pDp4 in buffalo, cattle, goat and sheep genomes (Figure 4), respectively. (Pathak et al 2006; 2011) The copy number assessment of these repeats in different known and nondescript breeds of buffalo may enable to establish a correlation, if any, towards the delineation of different breeds.

168 Functional Genomics

DNA fragments were resolved on 0.8% agarose gel in 0.5X TBE for approximately 16-18 hours. In water buffalo, two distinct DNA bands of 1378 and 673 bp with Bam *HI* and four bands of 1331, 651, 603 and 339 base pairs were cut, gel purified (Figure 3). The eluted fragments were cloned and sequenced following standard protocol. For Southern hybridization, DNA was transferred onto Nylon membrane and immobilized by exposure to UV. Membranes were rinsed in 2X SSC, dried and UV cross- linked. Blots were hybridized at 600C overnight with 32P α-dCTP labeled recombinant plasmid (25 ng) using random priming method (rediprimeTM II kit, Amersham Pharmacia biotech, USA). Washing of the membranes was done using standard protocols and signals were recorded

**Figure 3.** Agarose gel showing restriction digestion of buffalo *Bubalus bubalis,* genomic DNA with *BamH*I **(A)** and *Rsa*I **(B)** enzymes. The two discernible bands 673 bp and 1378 bp with *BamH*I digestion and four bands 1331, 651, 603 and 339 bp in *Rsa*I are highlighted. Molecular weight marker is given on the left in base-pair (bp). Since the patterns are not gender species, this suggests that the bands are

(a) (b)

**10.5. Copy number assessment and relative expression using Real Time PCR** 

Copy number of desired fragment was calculated based on absolute quantitation assay using SYBR Green dye and Sequence Detection System- 7500 (ABI, USA). The primers specific to fragments, respectively, were designed using Primer Express Software V2.0 (ABI). The standard curve was obtained using 10 folds dilution series of the recombinant plasmids ranging from 30, 00,000 to 30 copies taking 3.36 pg DNA per haploid genome of (assuming haploid genome of farm animals =3.3 pg, wt per base pair = 1.096 × 10-21 gm) as standards. The reactions were performed in triplicate using 96 well plates in a 25 μl reaction volume, each having 0.5 ng of buffalo genomic DNA and 50 nM of corresponding primers, employing conditions of 500C for 2 min, 950C for 10 min, followed by 40 cycles of 950C for 10 sec and 600C for 1 min. Real-time PCR analysis uncovered 1234 and 3420 copies of pDS5 and

by exposure of the blot to X-ray film (Pathak 2006; 2011).

originated from the autosomes (for details, Pathak et al 2006; 2011).

**Figure 4.** Standard curve based on 10 fold dilution series of pDp1, pDp2, pDp3, pDp4 and genomic DNA from buffalo, cattle, goat and sheep showing the amplification plot (a-d) panel (A), corresponding slopes of -3.3 to -3.5, panel (B) and a single dissociation peak, panel (C), substantiating maximum efficiency of the PCR reaction and high specificity of the primers with target DNA. Arrow indicates genomic DNA from buffalo, cattle, goat and sheep Buffalo, cattle, goat and sheep showed approximately same copy number for pDp1, pDp2, pDp3, pDp4 indicating their conservation across the bovid species (Pathak et al 2011).

Relative expression using Real Time PCR was carried out for the desired fragments with Sybr Green assay using cDNA from different tissues and spermatozoa. The primers were designed using Primer Express 2.0 (Applied Biosystems) software. The cyclic conditions were same as that used for copy number calculations. The reaction was performed following standard protocol (Sriavastava et al 2006). The specificity of each primer pair and the efficiency of the amplification were tested by assaying serial dilutions of the cDNA hybridized with oligonucleotides specific for target and normalization control (*GAPDH*). The difference in the Ct value between the target cDNA from different tissues and the control samples (the tissue showing least expression) was used for calculation. The expression level of the desired fragments was calculated using the formula: expression = (1+E) -∆Ct, where E is the efficiency of the PCR and ∆Ct = difference in threshold cycle value between the test sample and endogenous control. To achieve the maximum (one) efficiency of the Real Time PCR, the amplicon size was kept small (70-150 bp) so that the expression level of the test gene remains 2 -∆Ct. Each experiment was repeated three times to ensure consistency of the results. Maximum expression of pDS5 and pDS4 was seen in the spleen and liver, respectively. pDp1 showed maximum expression in lung, pDp2 and pDp3 both in Kidney, and pDp4 in ovary. Nine, 33.6 MASA amplified transcripts showed highest expression in spermatozoa and one each in liver and lung (Figure 5).

Repetitive DNA: A Tool to Explore Animal Genomes/Transcriptomes 171

development of MASA mediated tissue specific transcript profiles is envisaged to go a long way in undertaking molecular characterization of not only the genome of buffalo but also

**10.6 Chromosome preparation and Fluorescence** *In Situ* **Hybridization (FISH)** 

When once mRNA transcripts are made available, it becomes feasible to fish out full length cDNA clones employing 3' and 5'RACE. This approach enables isolation of genes without screening the cDNA library thus circumventing many arduous steps. When clones representing genes are obtained, these can be used for conducting fluorescence hybridization onto the metaphase chromosomes. We have used some of the clones to successfully conduct FISH on buffalo chromosomes to localize the genes and also uncover

For chromosome culture, 2 ml of blood was drawn into heparinized vacutainer tubes with sterile syringe from buffalo, cattle, goat, sheep and human. To sterile tissue culture flask containing 5 ml RPMI- 1640, 20% fetal Bovine serum, 2% Phytohemagglutinin, PHA (2 mg/ml), 5 l concavalin A (3 g/ml), 2.5 l mercaptoethanol (50 M), 50 l LPS (10 g/ml), 2.2 ml Antibiotic/antimycotic (0.15mg/ml), 500 μl of blood was added and whole mixture was incubated for 72 hours in 5% CO2 at 370C. After 70 hours, colcemid (10 μg/μl) was added in culture flasks to arrest cells in metaphase stage and cells were further incubated for 2 hours. The cells were then subjected to 0.56%KCL for 30 mins followed by fixative treatment (Methanol: glacial Acetic acid, 3:1). Few drops of cell suspension were dropped onto pre-cleaned chilled slide and blow-dried. The slides were Giemsa stained (Gibco, BRL) for 20 minutes, washed with PBS / distilled water and observed under microscope to record

For probe preparation plasmids containing gene of interest were labeled with desired fluorochromes using Nick Translation Kit from Vysis, (Illinois, USA) following supplier's instructions. Hybridization was carried out in 20 μl volume containing 50% formamide, 10% Dextran sulphate, Cot 1 DNA and 2X SSC, pH 7 for 16 hours at 370C in a moist chamber. Post hybridization washes were done in 2X SSC at 370C (low stringent condition) and then at 600C in 0.1X SSC (under high stringent condition). Slides were counterstained with DAPI, screened under Olympus Fluorescence Microscope (BX51) and images were captured with Olympus U-CMAD-2 CCD camera. Chromosome mapping was done following the International System for Chromosome Nomenclature. The pDS5, representing the 1378-bp fragment, showed FISH signals in the centromeric region of acrocentric chromosomes only, whereas pDS4, corresponding to 673 bp, detected signals in the centromeric regions of all the chromosomes. *Rsa*I derived pDp1, pDp2 and pDp3 showed distribution of repeats to all across the buffalo chromosomes (Pathak et al 2006;

Chromosomal mapping of *SARS2* gene, using bovine *SARS2* BAC probe localized (Figure 7)

the genes to buffalo metaphase chromosome 18 (Pathak et al 2010).

those of other economically important animals.

the distribution of several species of repeat elements.

metaphases.

2011) (Figure 6).

**Figure 5.** Expressional analyses of the representative 33.6 tagged mRNA transcripts. a-l represents different tissues, gonad, and spermatozoa. Note the maximum expression of some representative mRNA transcripts in the spermatozoa corresponding to Dp1, 4, 8, 10, 17,19, 20, and 26 shown in a, c, d, f, h, i, j, and l), respectively, and exclusive expression of Dp9 in liver (e). Bars represent relative expression of the transcript(s) in folds. Transcript IDs are mentioned on top left corner and tissues, below the panels (Pathak et al 2010)

These 9 transcripts in the spermatozoa, representing vital genes supports their involvement in sperm development and possibly overall testicular functions. In the context of animal biotechnology, such selective tissue specific expression profile is very important to segregate the genetically superior germplasm or any other physical and physiological attributes. This is true particularly in case of buffalo since this species has several breeds. Clearly, development of MASA mediated tissue specific transcript profiles is envisaged to go a long way in undertaking molecular characterization of not only the genome of buffalo but also those of other economically important animals.

#### **10.6 Chromosome preparation and Fluorescence** *In Situ* **Hybridization (FISH)**

170 Functional Genomics

(1+E) -∆Ct, where E is the efficiency of the PCR and ∆Ct = difference in threshold cycle value between the test sample and endogenous control. To achieve the maximum (one) efficiency of the Real Time PCR, the amplicon size was kept small (70-150 bp) so that the expression level of the test gene remains 2 -∆Ct. Each experiment was repeated three times to ensure consistency of the results. Maximum expression of pDS5 and pDS4 was seen in the spleen and liver, respectively. pDp1 showed maximum expression in lung, pDp2 and pDp3 both in Kidney, and pDp4 in ovary. Nine, 33.6 MASA amplified transcripts showed highest

**Figure 5.** Expressional analyses of the representative 33.6 tagged mRNA transcripts. a-l represents different tissues, gonad, and spermatozoa. Note the maximum expression of some representative mRNA transcripts in the spermatozoa corresponding to Dp1, 4, 8, 10, 17,19, 20, and 26 shown in a, c, d, f, h, i, j, and l), respectively, and exclusive expression of Dp9 in liver (e). Bars represent relative expression of the transcript(s) in folds. Transcript IDs are mentioned on top left corner and tissues,

These 9 transcripts in the spermatozoa, representing vital genes supports their involvement in sperm development and possibly overall testicular functions. In the context of animal biotechnology, such selective tissue specific expression profile is very important to segregate the genetically superior germplasm or any other physical and physiological attributes. This is true particularly in case of buffalo since this species has several breeds. Clearly,

below the panels (Pathak et al 2010)

expression in spermatozoa and one each in liver and lung (Figure 5).

When once mRNA transcripts are made available, it becomes feasible to fish out full length cDNA clones employing 3' and 5'RACE. This approach enables isolation of genes without screening the cDNA library thus circumventing many arduous steps. When clones representing genes are obtained, these can be used for conducting fluorescence hybridization onto the metaphase chromosomes. We have used some of the clones to successfully conduct FISH on buffalo chromosomes to localize the genes and also uncover the distribution of several species of repeat elements.

For chromosome culture, 2 ml of blood was drawn into heparinized vacutainer tubes with sterile syringe from buffalo, cattle, goat, sheep and human. To sterile tissue culture flask containing 5 ml RPMI- 1640, 20% fetal Bovine serum, 2% Phytohemagglutinin, PHA (2 mg/ml), 5 l concavalin A (3 g/ml), 2.5 l mercaptoethanol (50 M), 50 l LPS (10 g/ml), 2.2 ml Antibiotic/antimycotic (0.15mg/ml), 500 μl of blood was added and whole mixture was incubated for 72 hours in 5% CO2 at 370C. After 70 hours, colcemid (10 μg/μl) was added in culture flasks to arrest cells in metaphase stage and cells were further incubated for 2 hours. The cells were then subjected to 0.56%KCL for 30 mins followed by fixative treatment (Methanol: glacial Acetic acid, 3:1). Few drops of cell suspension were dropped onto pre-cleaned chilled slide and blow-dried. The slides were Giemsa stained (Gibco, BRL) for 20 minutes, washed with PBS / distilled water and observed under microscope to record metaphases.

For probe preparation plasmids containing gene of interest were labeled with desired fluorochromes using Nick Translation Kit from Vysis, (Illinois, USA) following supplier's instructions. Hybridization was carried out in 20 μl volume containing 50% formamide, 10% Dextran sulphate, Cot 1 DNA and 2X SSC, pH 7 for 16 hours at 370C in a moist chamber. Post hybridization washes were done in 2X SSC at 370C (low stringent condition) and then at 600C in 0.1X SSC (under high stringent condition). Slides were counterstained with DAPI, screened under Olympus Fluorescence Microscope (BX51) and images were captured with Olympus U-CMAD-2 CCD camera. Chromosome mapping was done following the International System for Chromosome Nomenclature. The pDS5, representing the 1378-bp fragment, showed FISH signals in the centromeric region of acrocentric chromosomes only, whereas pDS4, corresponding to 673 bp, detected signals in the centromeric regions of all the chromosomes. *Rsa*I derived pDp1, pDp2 and pDp3 showed distribution of repeats to all across the buffalo chromosomes (Pathak et al 2006; 2011) (Figure 6).

Chromosomal mapping of *SARS2* gene, using bovine *SARS2* BAC probe localized (Figure 7) the genes to buffalo metaphase chromosome 18 (Pathak et al 2010).

(A)

Repetitive DNA: A Tool to Explore Animal Genomes/Transcriptomes 173

quality, physical traits and disease resistance. For example, the fatty acid composition of dairy and beef food products, increased disease resistance (and thus increased animal welfare). Similarly, decreased methane emissions in cattle help to address the needs of consumers and society for sustainable and cost-effective food production. A number of gene and marker tests are now available commercially from genotyping service companies. Examples are CAST (meat quality), ESR and EPOR (litter size), FUT1 (E. coli disease resistance), HAL (halothane – meat quality, stress), IGF2 (carcase), MC4R (growth and fat), PRKAG3 (meat quality) and RN (meat quality) (Walters, 2011). Many scientists are using genomic information embedded on SNP50 BeadChip, a glass slide containing thousands of DNA markers, to determine disease-resistant genes in cattle, swine, sheep, poultry, fish and then selectively mating the animals in order to create disease resistant animals. Understanding all the expressed genes, their organization and mode of action in bubaline or any other farm animals will positively bridge the gap and facilitate the much needed growth

**Figure 7.** Localization of SARS2 gene on the representative interphase nuclei and metaphase

the homologous chromosomes. (Pathak et al 2010).

chromosome 18 using FISH.SARS2 BAC probe showing signals on buffalo metaphase chromosome 18 (A) and interphase nuclei (a-e) (B). Note two signals in the interphase nuclei corresponding to those on

(a) (b) (c) (d) (e)

of animal biotechnology.

**Figure 6.** Chromosomal localization of pDS5 clone **(A)** on **(a)** buffalo and **(b)** cattle metaphase chromosomes. Note the absence of signal in all the bi-armed chromosomes. Fluorescence *in situ*  hybridization (FISH) of pDp1 **(B)** clone on buffalo metaphase chromosomes **(c)**. Note the dispersed signals over the metaphase chromosomes. (See Pathak et al 2006; 2011).

#### **11. Applications in animal biotechnology**

With the availability of human genome sequence much emphases is given to sequence all the potential farm animals. Despite of importance as farm animal research data on the water buffalo is limited. Water buffalo breeders and farmers have been facing many challenges and problems, such as poor reproductive efficiency, sub-optimal production potential, higher than normal incidence of infertility, and lower rates of calf survival. Genome research has created a broad basis for promoting and utilizing gene technologies in many fields of livestock production. Genome biotechnology will provide a major opportunity to advance sustainable animal production systems of higher productivity through manipulating the variation within and between breeds to realize more rapid and better-targeted gains in breeding value. This type of research will also make it possible to distinguish molecular phenotypes and thus improve the use of genetic resources of domestic animals.

To date, researchers have identified several genes or DNA regions that are associated with traits of economic importance including reproduction, growth, lean body, fat quantity, meat quality, physical traits and disease resistance. For example, the fatty acid composition of dairy and beef food products, increased disease resistance (and thus increased animal welfare). Similarly, decreased methane emissions in cattle help to address the needs of consumers and society for sustainable and cost-effective food production. A number of gene and marker tests are now available commercially from genotyping service companies. Examples are CAST (meat quality), ESR and EPOR (litter size), FUT1 (E. coli disease resistance), HAL (halothane – meat quality, stress), IGF2 (carcase), MC4R (growth and fat), PRKAG3 (meat quality) and RN (meat quality) (Walters, 2011). Many scientists are using genomic information embedded on SNP50 BeadChip, a glass slide containing thousands of DNA markers, to determine disease-resistant genes in cattle, swine, sheep, poultry, fish and then selectively mating the animals in order to create disease resistant animals. Understanding all the expressed genes, their organization and mode of action in bubaline or any other farm animals will positively bridge the gap and facilitate the much needed growth of animal biotechnology.

172 Functional Genomics

(A)

(B)

**Figure 6.** Chromosomal localization of pDS5 clone **(A)** on **(a)** buffalo and **(b)** cattle metaphase chromosomes. Note the absence of signal in all the bi-armed chromosomes. Fluorescence *in situ*  hybridization (FISH) of pDp1 **(B)** clone on buffalo metaphase chromosomes **(c)**. Note the dispersed

phenotypes and thus improve the use of genetic resources of domestic animals.

With the availability of human genome sequence much emphases is given to sequence all the potential farm animals. Despite of importance as farm animal research data on the water buffalo is limited. Water buffalo breeders and farmers have been facing many challenges and problems, such as poor reproductive efficiency, sub-optimal production potential, higher than normal incidence of infertility, and lower rates of calf survival. Genome research has created a broad basis for promoting and utilizing gene technologies in many fields of livestock production. Genome biotechnology will provide a major opportunity to advance sustainable animal production systems of higher productivity through manipulating the variation within and between breeds to realize more rapid and better-targeted gains in breeding value. This type of research will also make it possible to distinguish molecular

(c)

(a) (b)

To date, researchers have identified several genes or DNA regions that are associated with traits of economic importance including reproduction, growth, lean body, fat quantity, meat

signals over the metaphase chromosomes. (See Pathak et al 2006; 2011).

**11. Applications in animal biotechnology** 

**Figure 7.** Localization of SARS2 gene on the representative interphase nuclei and metaphase chromosome 18 using FISH.SARS2 BAC probe showing signals on buffalo metaphase chromosome 18 (A) and interphase nuclei (a-e) (B). Note two signals in the interphase nuclei corresponding to those on the homologous chromosomes. (Pathak et al 2010).

#### **12. Concluding remarks**

Genetic improvement of animals warrants continuous and complex processes of sustained research employing cutting edge tools and techniques of modern biology and recombinant DNA technology. A much deeper and detailed understanding on a given species would eventually prove to be highly useful for possible manipulation of a desired genome. Improvement of domestic animal traits has been the foremost important task for animal breeding. In this pursuit, many techniques have been developed and tested. In recent years, advances in molecular genetics have introduced a new generation of molecular markers for the genetic improvement of the animals. However, utilization of marker-based information for genetic improvement depends on the choice and judicious use of an appropriate marker system for a given application. Selection of markers for different applications is influenced by the degree of polymorphism, reproducibility of the technique, speed of the experiments and cost involved.

Repetitive DNA: A Tool to Explore Animal Genomes/Transcriptomes 175

[6] Barros, P., Blanco, M.G., Boán, F., & Gómez-Márquez, J. (2008). Evolution of a complex

[7] Bashamboo, A., & Ali, S. (2001). Minisatellite Associated Sequence Amplification (Masa) Of The Hypervariable Repeat Marker 33.15 Reveals A Male Specific Band In Humans.

[8] Battaglia, E. (1999). The Chromosome Satellite (Navashin's "Sputnik" Or Satelles): A-345-Terminological Comment. *Acta Biologica Cracoviensia, Series Botanica* 41, 15-18. [9] Bhatnagar, S., Bashamboo, A., Chattopadhyay, M., Gangadharan, S., & Ali, S. (2004). A 1.3 kb satellite DNA from *Bubalus bubalis* not conserved evolutionarily is transcribed. *Z* 

[10] Bishop, R., Morzaria, S., & Gobright, E. (1998). Linkage Of Two Distinct At-Rich Minisatellites At Multiple Loci In The Genome Of *Theileria Parva*. *Gene* 216, 245-254 [11] Blott, S.C., Williams, J.L., & Haley, C.S. (1999). Discriminating among cattle breeds

[12] Boeva,V., Regnier, M., Papatsenko, D., Makeev, V. (2006). Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression.

[13] Bromham, L. (2002).The Human Zoo: Endogenous Retroviruses in the Human Genome.

[16] Catasti, P., Chen, X., Mariappan, S.V., Bradbury, E.M. & Gupta, G. (1999) DNA Repeats

[17] Cavalier-Smith, T. 1985. Eukaryotic Gene Numbers, Non-Coding DNA, And Genome Size. *In* The Evolution Of Genome Size. *Edited By* T. Cavalier-Smith. *John Wiley And* 

[18] Charlesworth, B., Sniegowski, P., & Stephan, W. (1994). The Evolutionary Dynamics Of

[19] Chattopadhyay, M., Prashant, S. G., Kapur, V., Azfer, Md. A., Prakash, B., & Ali, S. (2001). Satellite Tagged Transcribing Sequences In The Bubaline *Bubalus bubalis* Genome Undergo Programmed Modulation In The Meiocytes: Possible Implication In

[20] Debrauwere, H., Gendrel, G.C., Lechat, S., & Dutreix, M. (1997). Differences and similarities between various tandem repeat sequences: Minisatellites and

[21] Dey, I., & Rath, P.C. (2005). A Novel Rat Genomic Simple Repeat DNA With RNA-Homology Shows Triplex (H-DNA)-Like Structure And Tissue-Specific RNA

[22] Durward, E., Shiu, O.Y., Luczak, B., & Mitchelson, K. R. (1995). Identification Of Clones Carrying Minisatellite-Like Loci In An Arabidopsis Thaliana Yac Library. *Journal Of* 

[14] Brown, T. A. (2002). The Repetitive DNA Content Of Geno*mes. Genomes* 59-64. [15] **C**apy P. (1998). Evolutionary biology. A plastic genome. Nature 396(6711):522-3.

minisatellite DNA sequence. *Mol Phylogenet Evol.* 49(2):488-494.

*Mol. Cell. Probes* 15, 89-92.

*Naturforsch C* 59(11-12):874-879.

*Trends Ecol. Evol.* 17, 91–97.

*Sons, Chichester, U.K*. Pp. 69–103.

microsatellites. *Biochimie.* 79:577–586.

*Experimental Botany* 46, 271-274.

using genetic markers*. Heredity (Edinb)* Pt 6:613-619.

*Bioinformatics* 2006 Mar 15; 22(6):676-684.

In The Human Genome. Genetica 106, 15-36.

Repetitive DNA In Eukaryotes. *Nature* 371, 215-220.

Transcriptional Inactivation. *DNA Cell Biol.* 20, 587-593.

Expression. *Biochem. Biophys. Res. Commun.* 276, 286-228.

As the situation stand now, for a given biological phenomenon where multiple genes are implicated, technical approaches need to be developed to segregate the entire possible genes specific to that phenomenon. A good example is the spermatogenesis that involves putatively close to about 400 plus genes. However, their clear cut involvement and characterization in any species has still not been achieved. When once, such information is made available, this would then provide much needed basis of functional and comparative genomics. Perhaps then, molecular delineation of the "so-called" elite animals or specific breed representing superior germplasm would become feasible.

#### **Author details**

*Deepali Pathak and Sher Ali National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi, India* 

#### **13. References**


[6] Barros, P., Blanco, M.G., Boán, F., & Gómez-Márquez, J. (2008). Evolution of a complex minisatellite DNA sequence. *Mol Phylogenet Evol.* 49(2):488-494.

174 Functional Genomics

and cost involved.

**Author details** 

**13. References** 

664, 9-67.

*Deepali Pathak and Sher Ali* 

**12. Concluding remarks** 

Genetic improvement of animals warrants continuous and complex processes of sustained research employing cutting edge tools and techniques of modern biology and recombinant DNA technology. A much deeper and detailed understanding on a given species would eventually prove to be highly useful for possible manipulation of a desired genome. Improvement of domestic animal traits has been the foremost important task for animal breeding. In this pursuit, many techniques have been developed and tested. In recent years, advances in molecular genetics have introduced a new generation of molecular markers for the genetic improvement of the animals. However, utilization of marker-based information for genetic improvement depends on the choice and judicious use of an appropriate marker system for a given application. Selection of markers for different applications is influenced by the degree of polymorphism, reproducibility of the technique, speed of the experiments

As the situation stand now, for a given biological phenomenon where multiple genes are implicated, technical approaches need to be developed to segregate the entire possible genes specific to that phenomenon. A good example is the spermatogenesis that involves putatively close to about 400 plus genes. However, their clear cut involvement and characterization in any species has still not been achieved. When once, such information is made available, this would then provide much needed basis of functional and comparative genomics. Perhaps then, molecular delineation of the "so-called" elite animals or specific

[1] Ali, S., & Gangadharan, S. (2000). Differential Evolution Of Coding And Non-Coding Sequences In Related Vertebrates: Implications In Probe Design. *Proc. Ind. Nat. Sci. Acad.*

[2] Ali, S., & Wallace, R. B. (1988). Intrinsic Polymorphism Of Variable Number Tandem

[3] Ali, S., Müller, C.R., & Epplen, J.T. (1986). DNA finger printing by oligonucleotide

[4] Amor, D.J., & Choo, K. H. (2002). Neocentromeres: Role In Human Disease, Evolution,

[5] Amos, W., & Hoelzel, A.R. (1991). Long-Term Preservation Of Whale Skin For DNA

Repeat Loci In The Human Genome. *Nucleic Acids Res.* 16, 8487-8496.

probes specific for simple repeats. *Hum Genet.* 74(3):239-243.

And Centromere Study*. Am. J. Hum. Genet.* 71, 695–714.

Analysis. *Rep. Int. Whal. Commn.* 13, 99–103.

breed representing superior germplasm would become feasible.

*National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi, India* 


[23] Elgar, G., & Vavouri, T. (2008). Tuning in to the signals: noncoding sequence conservation in vertebrate genomes. *Trends Genet.* **24** (7): 344–52.

Repetitive DNA: A Tool to Explore Animal Genomes/Transcriptomes 177

[42] Jordan, I. K., Rogozin, I. B., Glazko, G.V. & Koonin, E.V.(2003). Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends in Genetics

[43] Kapur, K., Prasanth, S. G., O'ryan, C., Azfer, Md, A., & Ali, S. (2003). Development Of A DNA Marker By Minisatellite Associated Sequence Amplification (Masa) From The

[44] Kelly, M.K., Alver, B., & Kirkpatrick, D.T. (2011). Minisatellite alterations in ZRT1 mutants occur via RAD52-dependent and RAD52-independent mechanisms in

[45] Kim, M. & Mullet, J.E. (1995) Identification Of A Sequence-Specific DNA Binding Factor Required For Transcription Of The Barley Chloroplast Blue Light-Responsive Psbd-Psbc

[46] Kit, S. (1961). Equilibrium Sedimentation In Density Gradients Of DNA Preparations

[47] Kumar, S., Gupta, R., Kumar, S., & Ali, S. (2011). Molecular mining of alleles in water buffalo *Bubalus bubalis* and characterization of the TSPY1 and COL6A1 genes. *PLoS One* 

[48] Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., Fitzhugh, W., Funke, R., Gage, D., et al. (2001). Initial Sequencing

[49] Legendre, M., Pochet, N., Pak, T., & , K.J. (2007). Sequence-based estimation of minisatellite and microsatellite repeat variability. *Genome Res.* 17(12):1787-1796. [50] Lopes, J., Ribeyre, C., & Nicolas, A. (2006). Complex minisatellite rearrangements generated in the total or partial absence of Rad27/hFEN1 activity occur in a single

generation and are Rad51 and Rad52 dependent. *Mol Cell Biol.* 26(17):6675-6689. [51] Lue, N. F., Buchman, A. R., & Kornberg, R. D. (1989) Activation Of Yeast RNA Polymerase Ii Transcription By A Thymidine-Rich Upstream Element In Vitro. *Proc.* 

[52] Martienssen, R. A., & Colot, V. (2001). DNA Methylation And Epigenetic Inheritance In

[53] Miller, W. J., & Capy, P., eds. (2004), Mobile genetic elements: protocols and genomic

[54] Mudgal, V.O. (1988). Comparative Efficiency For Milk Production Of Buffaloes And Cattle In The Tropics. *Proceedings Of Ii World Buffalo Congress.* New Delhi, India, Vol Ii,

[55] Nakamura, Y., Leppert, M., O'connell, P., Wolff, R., Holm, T., Culver, M., Martin, C., Fujimoto, E., Hoff, M., Kumlin, E. et al. (1987).Variable Number Of Tandem Repeat

[56] Norman, A., D. (2001). APPENDIX 1B Overview of Human Repetitive DNA Sequences.

[57] Ohno, S. (1972). So Much `Junk' In Our Genomes. *Brookhaven Symp. Biol.* 23, 366-370.

(Vntr) Markers For Human Gene Mapping. *Science* 235, 1616–1622.

Endangered Indian Rhino (*Rhinoceros Unicornis*). *Mol. Cell. Probes* 17, 1-4.

quiescent stationary phase yeast cells. DNA Repair (Amst). 10(6):556-566.

19, 68–72.

6(9):e24958.

Promoter. *Plant Cell* 7, 1445-1457.

*Natl. Acad. Sci. USA* 86, 486-490.

applications, *Humana Press*, 289.

Current Protocols in Human Genetics.

Part Ii, 454–462.

From Animal Tissues. *J. Mol. Biol.* 3,711-716.

and Analysis of the Human Genome. *Nature* 409, 860–921.

Plants And Filamentous Fungi. *Science* 293, 1070-1074.


[42] Jordan, I. K., Rogozin, I. B., Glazko, G.V. & Koonin, E.V.(2003). Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends in Genetics 19, 68–72.

176 Functional Genomics

6,907-921.

48, 132-135.

In Bovidea. *Genomics* 11, 24–32.

Polymorphism. *Genome Res.* 10, 62–71.

[23] Elgar, G., & Vavouri, T. (2008). Tuning in to the signals: noncoding sequence

[24] Epplen, J.T., Kyas, A. & Maueler, W. (1996) Genomic Simple Repetitive DNAs Are

[25] Feng, Y., Yang, W., Ryan, U., Zhang, L., Kvác, M., Koudela, B., Modry, D., Li, N., Fayer, R., & Xiao, L. (2011). Development of a multilocus sequence tool for typing Cryptosporidium muris and Cryptosporidium andersoni. *J Clin Microbiol.* 49(1):34-41. [26] Fondon, J.W., & Garner, H.R. (2004) Molecular Origins Of Rapid And Continuous

[27] Furano, A. V. (2000). The Biological Properties And Evolutionary Dynamics Of

[28] Georges, M., & Andersson, L. (1996). Livestock Genomics Comes Of Age. Genome Res.

[29] Georges, M., Gunawardana, A., Threadgill, D.W., & Lathrop, M. (1991). Characterization Of A Set Of Variable Number Of Tandem Repeat Markers Conserved

[30] Gur-Arie, R., Cohen, C.J., Eitan, Y., Shelef, L., Hallerman, E.M. & Kashi, Y. (2000) Simple Sequence Repeats In Escherichia Coli: Abundance, Distribution, Composition, And

[31] Haber, J.E., & Louis, E.J. (1998). Minisatellite Origins In Yeast And Humans. *Genomics* 

[32] Han, J.S., Boeke, J.D. (2005). LINE-1 retrotransposons: modulators of quantity and

[33] Han, J.S., Szak, S.T., & Boeke, J.D. (2004). Transcriptional disruption by the L1 retrotransposon and implications for mammalian transcriptomes. *Nature* 429:268-274. [34] Harris, A.S., & Wright, J.M. (1995). Nucleotide sequence and genomic organization of

[35] Henikoff, S., Ahmad, K., & Malik, H. S. (2001). The Centromere Paradox: Stable

[36] Hochgeschwender, U., & Brennan, M.B. (1991) Identifying Genes Within The Genome:

[37] International Human Genome Sequencing Consortium. (2004) Finishing the

[38] Jarman, A.P., & Wells, R.A. (1989). Hypervariable Minisatellites: Recombinators Or

[39] Jeffreys, A. J., Neil, D. L., & Neumann, R. (1998). Repeat Instability At Human

[40] Jeffreys, A.J., & Wilson, V., & Thein, S.L. (1985). Hyper Variable 'Minisatellite' Regions

[41] Jobling, M.A., & Tyler-Smith, C. (2003). The human Y chromosome: an evolutionary

quality of mammalian gene expression? *Bioessays* **27:**775-784.

Inheritance With Rapidly Evolving DNA. *Science* 293, 1098–1102.

New Ways For Finding The Needle In A Haystack. *Bioessays* 13, 139-144.

euchromatic sequence of the human genome. Nature 431(7011):931-945.

Minisatellites Arising From Meiotic Recombination. *Embo J.* 17, 4147–4157.

cichlid fish minisatellites. *Genome* 38(1):177-184.

Innocent Bystanders? *Trends Genet.* 5,367–371.

marker comes of age. *Nat Rev Genet.* 4(8):598-612.

In Human DNA. *Nature* 314, 67-73.

Targets For Differential Binding Of Nuclear Proteins. *Febs Letters* 389, 92-95.

Morphological Evolution. *Proc. Natl. Acad. Sci. USA* 101, 18058–18063.

Mammalian Line-1 Retrotransposons Prog. *Nucleic Acid Res.* 64,255-294.

conservation in vertebrate genomes. *Trends Genet.* **24** (7): 344–52.


[58] Pathak, D., & Ali, S. (2011). RsaI repetitive DNA in Buffalo *Bubalus bubalis* representing retrotransposons, conserved in bovids, are part of the functional genes. *BMC Genomics* 12:338.

Repetitive DNA: A Tool to Explore Animal Genomes/Transcriptomes 179

[74] Srivastava, J., Premi, S., Pathak, D., Ahsan, Z., Tiwari, M., Garg, L.C., & Ali, S. (2006). Transcriptional status of known and novel genes tagged with consensus of 33.15 repeat loci employing minisatellite-associated sequence amplification (MASA) and real-time

[75] Stoker, N.G., Cheah, K.S.E., Griffin, J.R., Pope, F.M., & Solomon, E. (1985). A Highly Polymorphic Region 3′ To The Human Type Ii Collagen Gene. *Nucleic Acids Res.* 13,

[76] Subramanian, V.M., Madgula, R., George, R.K., Mishra, M.W., Pandit, C.S. Kumar, & L.

[77] Sykorová, E., Fajkus, J., Mezníková, M., Lim, K.Y., Neplechová, K., Blattner, F.R., Chase, M.W., & Leitch, A.R. (2006). Minisatellite telomeres occur in the family Alliaceae but are

[78] Tang, S.J. (2011). Chromatin Organization by Repetitive Elements (CORE): A Genomic Principle for the Higher-Order Structure of Chromosomes. *Genes* 2011, *2*(3), 502-515 [79] Tautz, D. (1989) Hypervariability of simple sequences as a general source for

[80] Tautz, D. (1993). Notes On The Definition And Nomenclature Of Tandemly Repetitive

[81] Tautz, D., & Renz, M. (1984). Simple Sequences Are Ubiquitous Repetitive Components

[82] Tomilin, N.V. (2008). Regulation Of Mammalian Gene Expression By Retroelements

[83] Toth, G., Gaspari, Z. & Jurka, J. (2000). Microsatellites In Different Eukaryotic Genomes:

[84] Tourmente, S., Deragon, J.M., Lafleuriel, J., Tutois, S., Pelissier, T., Cuvillier, C., Espagnol, M.C., & Picard, G. (1994). Characterization Of Minisatellites In Arabidopsis Thaliana With Sequence Similarity To The Human Minisatellite Core Sequence. Nucleic

[85] Ugarkovic, D., & Plohl, M. (2002). Variation In Satellite DNA Profiles-Causes And

[86] Vermaak, D., Bayes, J.J., & Malik, H.S. (2009). A surrogate approach to study the evolution of noncoding DNA elements that organize eukaryotic genomes. *J Hered.* 

[87] Vinces, M.D., Legendre, M., Caldara, M., Hagihara, M., & , K.J. (2009). Unstable tandem repeats in promoters confer transcriptional evolvability. *Science* 324(5931):1213-1216. [88] Walters, R. (2011). More commercial benefits on horizon as pig genome project nears completion. Information from pig genome already being used in the industry. *Pig* 

[89] Wilson, G.A., & Strobeck, C. (1999). The isolation and characterization of microsatellite loci in bison, and their usefulness in other artiodactyls. *Anim Genet.* 30(3):226-227.

PCR in water buffalo, Bubalus bubalis. *DNA Cell Biol.* 25 (1):31-48.

polymorphic DNA markers. *Nucleic Acids Res.* 17(16):6463-6471.

Of Eukaryotic Genomes. Nucleic Acids Res. 25, 4127-4138.

And Non-Coding Tandem Repeats. *Bioessays* 30, 338–348.

Survey And Analysis. *Genome Res.* 10, 967–981.

Acids Research 22, 3317–3321.

Effects. *EMBO* 21, 5955-5959.

100(5):624-636.

*International ,* 41, 2:16

DNA Sequences. In DNA Fingerprinting: State Of The Science. 21–28.

4613–4622.

Singh. (2003). *Bioinformatics* 19, 549–552.

lost in Allium. *Am J Bot.* 93(6):814-823.


[74] Srivastava, J., Premi, S., Pathak, D., Ahsan, Z., Tiwari, M., Garg, L.C., & Ali, S. (2006). Transcriptional status of known and novel genes tagged with consensus of 33.15 repeat loci employing minisatellite-associated sequence amplification (MASA) and real-time PCR in water buffalo, Bubalus bubalis. *DNA Cell Biol.* 25 (1):31-48.

178 Functional Genomics

12:338.

18(4):441-458.

*Genomics* 3, 352–360.

*Microbiology* 148, 519-528.

[58] Pathak, D., & Ali, S. (2011). RsaI repetitive DNA in Buffalo *Bubalus bubalis* representing retrotransposons, conserved in bovids, are part of the functional genes. *BMC Genomics*

[59] Pathak, D., Srivastava, J., Premi, S., Tiwari, M., Garg, L.C., Kumar, S., & Ali, S. (2006). Chromosomal localization, copy number assessment, and transcriptional status of *BamH*I repeat fractions in water buffalo Bubalus bubalis. *DNA Cell Biol.* 25(4):206-214. [60] Pathak, D., Srivastava, J., Samad, R., Parwez, I., Kumar, S., & Ali, S. (2010). Genomewide search of the genes tagged with the consensus of 33.6 repeat loci in buffalo *Bubalus bubalis* employing minisatellite-associated sequence amplification. *Chromosome Res.*

[61] Patience, C., Takeuchi, Y., & Weiss, R. A. (1997). Infection Of Human Cells By An

[62] Proudfoot, N.J., Gill, A., & Maniatis, T. (1982). The Structure Of The Human Zeta-Globin Gene And A Closely Linked, Nearly Identical Pseudogene. *Cell* 31, 553-563. [63] Rawal, L., Ali, S., & Ali, S. (2012). Molecular mining of GGAA tagged transcripts and

[64] Reed, J.M., Fleischer, R. C., Eberhard, J., & Oring, L.W. (1996). Minisatellite DNA Variability In Two Populations Of Spotted Sandpipers Actitis Macularia In Minnesota,

[65] Richard, C. & Mark, B. (2009). The impact of retrotransposons on human genome

[66] Roy-Engel A, M., Carroll, M.L., Vogel, E., et al. (2001). Alu insertion polymorphisms for

[67] Royle, J.R., Clarkson, R.E., Wong, Z., & Jeffreys, A.J. (1988). Clustering Of Hypervariable Minisatellites In The Proterminal Regions Of Human Autosomes.

[68] Sethi, R. K. (2003). Improving Rivering And Swamp Buffaloes Through Breeding. 4th

[69] Shapiro, J.A., & Von Sternberg, R.(2005). Why repetitive DNA is essential to genome

[70] Sinden, R.R. (1999). Biological Implications Of The DNA Structures Associated With

[71] Skuce, R. A., Mccorry, T. P., Mccarroll, J. F., Roring, S.M.M., Scott, A.N., Brittain, D., Hughes, S.L., Hewinson, R.G., Sydney, D., & Neil, L. (2002). Discrimination Of Mycobacterium Tuberculosis Complex Bacteria Using Novel Vntr-Pcr Targets.

[72] Slamovits, C.H., & Rossi, M.S. (2002). Satellite DNA: Agent Of Chromosomal Evolution

[73] Srivastava, J., Premi, S., Kumar, S., & Ali, S. (2008). Organization and differential expression of the GACA/GATA tagged somatic and spermatozoal transcriptomes in

their expression in water buffalo Bubalus bubalis. *Gene* 492(1):290-295.

the study of human genomic diversity. *Genetics* 159 (1): 279–290.

Disease-Causing Triplet Repeats. *Am. J. Hum. Genet.* 64, 346–353.

Endogenous Retrovirus Of Pigs. *Nat. Med.* 3, 276-282.

U.S.A. Wader Study Group Bull. 79, 115-117.

Asian Buffalo Congress Lead Papers, 50.

function. *Biol Rev Camb Philos Soc.*80(2):227-50

In Mammals. Mastozoología Neotropical 9, 297-308.

Buffalo Bubalus bubalis. *BMC Genomics* 9:132.

evolution. *Nature Reviews Genetics* 10 (10): 691–703.


[90] Zhang, L., Yuan, D., Yu, S., Li, Z., Cao, Y., Miao, Z., Qian, H., & Tang, K. (2004). Preference of simple sequence repeats in coding and non-coding regions of Arabidopsis thaliana. *Bioinformatics* 20(7):1081-1086.

**Chapter 9** 

© 2012 ten Have et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2012 ten Have et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**Dynamic Proteomics:** 

http://dx.doi.org/10.5772/50786

**1. Introduction** 

**Methodologies and Analysis** 

Sara ten Have, Kelly Hodge and Angus I. Lamond

Proteins are dynamic and any detailed description of the proteome must reflect the dynamic variations in protein properties. For example, most proteins form complexes with other protein partners, can undergo various post translational modifications and can accumulate in different sub compartments of the cell. Spatial and temporal variations between proteins in different compartments and/or cell types mean that each experiment for mass spectrometric analysis must be carefully designed to optimise the data that can be obtained. Recent improvements in experimental methodologies and in the resolution and sensitivity of Mass Spectrometers, have expanded the complexity of proteomic analysis that is now possible[1]. In this chapter we outline current workflows and methodologies that facilitate complex proteomic analyses, from the design and execution of experiments, though to the

SILAC labelling can be used to quantitate a wide range of biological experiments based upon differential comparisons of two or three cell states or conditions. For example, immuno-precipitation and protein-protein interaction analysis, cellular fractionation for localisation studies and measurements of protein synthesis, degradation and turnover can all be quantitated using the SILAC approach [2-7].The SILAC approach can also be used to carry out high throughput analyses on entire proteomes and can help to identify subsets of

Reliable interpretation of SILAC data requires computational analysis. Widely accessible spread sheet applications like excel are commonly used for this task. This involves numerous peptide and protein identifications, with several isotope ratio and/or intensity values associated with each identification. The interpretation of these data is often the most complex part of the proteomics experiment. How to go about data quality assurance and

Additional information is available at the end of the chapter

analysis and interpretation of the resulting data.

proteins that respond to specific cellular perturbations.

#### **Chapter 9**

## **Dynamic Proteomics: Methodologies and Analysis**

Sara ten Have, Kelly Hodge and Angus I. Lamond

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/50786

#### **1. Introduction**

180 Functional Genomics

thaliana. *Bioinformatics* 20(7):1081-1086.

[90] Zhang, L., Yuan, D., Yu, S., Li, Z., Cao, Y., Miao, Z., Qian, H., & Tang, K. (2004). Preference of simple sequence repeats in coding and non-coding regions of Arabidopsis

> Proteins are dynamic and any detailed description of the proteome must reflect the dynamic variations in protein properties. For example, most proteins form complexes with other protein partners, can undergo various post translational modifications and can accumulate in different sub compartments of the cell. Spatial and temporal variations between proteins in different compartments and/or cell types mean that each experiment for mass spectrometric analysis must be carefully designed to optimise the data that can be obtained. Recent improvements in experimental methodologies and in the resolution and sensitivity of Mass Spectrometers, have expanded the complexity of proteomic analysis that is now possible[1]. In this chapter we outline current workflows and methodologies that facilitate complex proteomic analyses, from the design and execution of experiments, though to the analysis and interpretation of the resulting data.

> SILAC labelling can be used to quantitate a wide range of biological experiments based upon differential comparisons of two or three cell states or conditions. For example, immuno-precipitation and protein-protein interaction analysis, cellular fractionation for localisation studies and measurements of protein synthesis, degradation and turnover can all be quantitated using the SILAC approach [2-7].The SILAC approach can also be used to carry out high throughput analyses on entire proteomes and can help to identify subsets of proteins that respond to specific cellular perturbations.

> Reliable interpretation of SILAC data requires computational analysis. Widely accessible spread sheet applications like excel are commonly used for this task. This involves numerous peptide and protein identifications, with several isotope ratio and/or intensity values associated with each identification. The interpretation of these data is often the most complex part of the proteomics experiment. How to go about data quality assurance and

© 2012 ten Have et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 ten Have et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

culling, as well as modelling the data in such a way as to draw valid conclusions will be discussed in this chapter.

Dynamic Proteomics: Methodologies and Analysis 183

**Figure 1.** Structures of the common amino acids.

Mass Spectrometry-based proteomics is evolving into a multidimensional analysis world (e.g. identification, quantification, space and time), where not only do we identify and quantify the proteome but also characterize changes in protein properties (e.g. subcellular location) at different time points, and under different conditions (e.g. response to a drug treatment). These types of analyses help to provide a functional characterisation of the genome and may facilitate the application of proteomics for clinical studies.

### **2. Mass Spectrometry of proteins**

Mass Spectrometry of proteins is based on several principals of chemistry and physics, namely mass and generation of charged molecules, or ions. Given the known composition of amino acids (Figure 1), and the inferred knowledge of protein composition (from the Human Genome Project [8], the protein products are predicted from the genetic code), we can therefore compute in silico the predicted molecular weight of every protein. Additionally, the mass change resulting from any modification (made in the laboratory, for example reduction and alkylation of cysteine di-sulphide bonds) can be accurately predicted and therefore matches can be made between these calculated values and the experimental ion masses measured in a mass spectrometer.

For complex protein mixtures (e.g., cellular lysates, immune-precipitates, whole organism and/or tissue lysates) protein analysis is typically performed using the following methodology:

Protein solubilisation; A separation step to fractionate the complex protein mixture (e.g. gel electrophoresis, isoelectric focussing or size exclusion chromatography); Reduction and Alkylation (to disrupt di-sulphide bonds in proteins, and add a carbomidomethyl modification to cysteine residues to inhibit di-sulphide re-formation); Digestion using a proteolytic enzyme such as trypsin (as bottom up –or peptide level analysis is the most common form of protein analysis in mass spectrometry); HPLC reversed phase chromatography (which reduces the complexity of peptide samples sufficiently for the instrument to measure the individual ions); Electro-spray ionisation and tandem mass spectrometry (Figure 2). Through the process of electro-spray ionisation peptides can be charged (mostly positively charged), and the charge used to control their movement through the instrument. The mass spectrometer then performs a survey scan (characterising all of the peptide ion masses present in a given time window), followed by several sequencing scans, which isolate and fragment peptides, one ion mass at a time, by colliding each selected peptide ion with inert gas molecules, thereby generating fragment or daughter ions. These fragment ions characterise the amino acid sequence of the selected peptide. Using this analysis strategy the current generation of mass spectrometer can generate ~30,000 or more spectra from a typical protein lysate, thereby identifying and quantifying hundreds to thousands of separate proteins, depending on the complexity of the sample.

methodology:

discussed in this chapter.

**2. Mass Spectrometry of proteins** 

ion masses measured in a mass spectrometer.

culling, as well as modelling the data in such a way as to draw valid conclusions will be

Mass Spectrometry-based proteomics is evolving into a multidimensional analysis world (e.g. identification, quantification, space and time), where not only do we identify and quantify the proteome but also characterize changes in protein properties (e.g. subcellular location) at different time points, and under different conditions (e.g. response to a drug treatment). These types of analyses help to provide a functional characterisation of the

Mass Spectrometry of proteins is based on several principals of chemistry and physics, namely mass and generation of charged molecules, or ions. Given the known composition of amino acids (Figure 1), and the inferred knowledge of protein composition (from the Human Genome Project [8], the protein products are predicted from the genetic code), we can therefore compute in silico the predicted molecular weight of every protein. Additionally, the mass change resulting from any modification (made in the laboratory, for example reduction and alkylation of cysteine di-sulphide bonds) can be accurately predicted and therefore matches can be made between these calculated values and the experimental

For complex protein mixtures (e.g., cellular lysates, immune-precipitates, whole organism and/or tissue lysates) protein analysis is typically performed using the following

Protein solubilisation; A separation step to fractionate the complex protein mixture (e.g. gel electrophoresis, isoelectric focussing or size exclusion chromatography); Reduction and Alkylation (to disrupt di-sulphide bonds in proteins, and add a carbomidomethyl modification to cysteine residues to inhibit di-sulphide re-formation); Digestion using a proteolytic enzyme such as trypsin (as bottom up –or peptide level analysis is the most common form of protein analysis in mass spectrometry); HPLC reversed phase chromatography (which reduces the complexity of peptide samples sufficiently for the instrument to measure the individual ions); Electro-spray ionisation and tandem mass spectrometry (Figure 2). Through the process of electro-spray ionisation peptides can be charged (mostly positively charged), and the charge used to control their movement through the instrument. The mass spectrometer then performs a survey scan (characterising all of the peptide ion masses present in a given time window), followed by several sequencing scans, which isolate and fragment peptides, one ion mass at a time, by colliding each selected peptide ion with inert gas molecules, thereby generating fragment or daughter ions. These fragment ions characterise the amino acid sequence of the selected peptide. Using this analysis strategy the current generation of mass spectrometer can generate ~30,000 or more spectra from a typical protein lysate, thereby identifying and quantifying hundreds to thousands of separate proteins, depending on the complexity of the sample.

genome and may facilitate the application of proteomics for clinical studies.

**Figure 1.** Structures of the common amino acids.

Dynamic Proteomics: Methodologies and Analysis 185

SILAC specific instructions and some product information can be found in the Supplementary Methods section under 'SILAC'. Please note other suppliers are available

\*'Light labelled' proteins should be prepared from cells grown on media identical to the heavy labelled media (i.e. media, to which amino acids with the normal 'light' isotopes and dialysed Calf serum are added). [Using 'normal' non-SILAC media is not sufficient as this will have a different composition, due mostly to the non-dialysed calf serum used in typical media. Non-dialysed calf serum may have more small molecules than dialysed, which could potentially change the growth rate of cells, therefore giving differential growth conditions between the control sample and the

**Figure 3.** The principle of SILAC. Cells which have been grown for >6 generations in SILAC media contain proteins completely substituted with heavy isotope-labelled amino acids. These different mass

As mentioned in the introduction, many proteins do not act in isolation, but form complexes with partner proteins and a major goal is to identify the detailed composition of these

labels can be used for distinguishing and comparing proteins in a wide range of experimental conditions. When mixed with a control sample, with a different SILAC label, the resulting spectra of

and these can be found on the internet.

sample of interest.

each peptide, allow accurate relative quantitation.

**4. Protein interaction proteomics** 

**Figure 2.** The basic workflow of bottom up proteomics, and how peptides are measured in a mass spectrometer.

#### **3. SILAC**

Relative quantification of proteins in two or more samples is aided by using isotope labelling techniques such as SILAC [9]. SILAC (Stable Isotope Labelling of Amino acids in Cell culture) is a quantitative method of analysis where specific amino acids (typically arginine and/or lysine) undergo a forced enrichment of heavy carbon, nitrogen and hydrogen isotopes (namely C13, N15 and deuterium, all of which are not radioactive) in cell culture, by using amino acid depleted media. After approximately 6 cell division cycles the vast majority of the proteins are completely substituted with the heavy isotope labelled amino acids (Figure 3) [9]. These basic amino acids are cleavage sites for the enzyme trypsin, ensuring every tryptic peptide measured contains a single SILAC isotope label. Modern mass spectrometers are extremely sensitive instruments that can detect the changes in weight caused by the presence of different isotopes e.g. 'medium' labelling of proteins is generally created by using L-arginine-13 C6 14 N4 and L-lysine 2 H4 (R6K4). Thus the 'medium' labelled arginine and lysine will have an increased mass of 6Da and 4Da, respectively, relative to the normal 'light' isotopes in naturally occurring amino acids. The MS spectra display these differences as distinctive double (for 2 SILAC labels) or triple (for 3 SILAC labels) peaks at a given mass for the endogenous/light peptide

Using this technology experimental scenarios have been established allowing the characterisation of the dynamic proteome [2-4, 6, 7].

SILAC specific instructions and some product information can be found in the Supplementary Methods section under 'SILAC'. Please note other suppliers are available and these can be found on the internet.

\*'Light labelled' proteins should be prepared from cells grown on media identical to the heavy labelled media (i.e. media, to which amino acids with the normal 'light' isotopes and dialysed Calf serum are added). [Using 'normal' non-SILAC media is not sufficient as this will have a different composition, due mostly to the non-dialysed calf serum used in typical media. Non-dialysed calf serum may have more small molecules than dialysed, which could potentially change the growth rate of cells, therefore giving differential growth conditions between the control sample and the sample of interest.

**Figure 3.** The principle of SILAC. Cells which have been grown for >6 generations in SILAC media contain proteins completely substituted with heavy isotope-labelled amino acids. These different mass labels can be used for distinguishing and comparing proteins in a wide range of experimental conditions. When mixed with a control sample, with a different SILAC label, the resulting spectra of each peptide, allow accurate relative quantitation.

#### **4. Protein interaction proteomics**

184 Functional Genomics

spectrometer.

**3. SILAC** 

**Figure 2.** The basic workflow of bottom up proteomics, and how peptides are measured in a mass

Relative quantification of proteins in two or more samples is aided by using isotope labelling techniques such as SILAC [9]. SILAC (Stable Isotope Labelling of Amino acids in Cell culture) is a quantitative method of analysis where specific amino acids (typically arginine and/or lysine) undergo a forced enrichment of heavy carbon, nitrogen and hydrogen isotopes (namely C13, N15 and deuterium, all of which are not radioactive) in cell culture, by using amino acid depleted media. After approximately 6 cell division cycles the vast majority of the proteins are completely substituted with the heavy isotope labelled amino acids (Figure 3) [9]. These basic amino acids are cleavage sites for the enzyme trypsin, ensuring every tryptic peptide measured contains a single SILAC isotope label. Modern mass spectrometers are extremely sensitive instruments that can detect the changes in weight caused by the presence of different isotopes e.g. 'medium' labelling of proteins is generally created by using L-arginine-13 C6 14 N4 and L-lysine 2 H4 (R6K4). Thus the 'medium' labelled arginine and lysine will have an increased mass of 6Da and 4Da, respectively, relative to the normal 'light' isotopes in naturally occurring amino acids. The MS spectra display these differences as distinctive double (for 2 SILAC labels) or triple (for 3 SILAC

Using this technology experimental scenarios have been established allowing the

labels) peaks at a given mass for the endogenous/light peptide

characterisation of the dynamic proteome [2-4, 6, 7].

As mentioned in the introduction, many proteins do not act in isolation, but form complexes with partner proteins and a major goal is to identify the detailed composition of these respective multi-protein complexes. However, the dynamic nature of the proteome means that there may not be a unique description of protein complexes. For example, at different cell cycle stages and/or under changed conditions (e.g. following drug treatment) the partner proteins in a complex might either change, or vary in abundance and/or modification state. Our aim, therefore, is to analyse both the composition and dynamic nature of protein complexes.

Dynamic Proteomics: Methodologies and Analysis 187

specific proteins of interest. To allow for this a bead control is often included as part of the experiment. A bead control is provided by applying equal amounts of the cell lysate of choice to the beads being used for the immune-precipitation, sans antibody. This generates a sample that will predominantly contain non-specific binding proteins, which can be identified during the analysis and distinguished from the genuine protein interaction partners in the complex of interest. While label free MS analysis is effective for this protocol, differential SILAC labelling to distinguish the control and experimental conditions (e.g. R0K0- bead control, R6K4 protein of interest, R10K8 protein + drug) improves the accuracy

Extensive analysis of hundreds of immune-precipitations, with lysates from various different cell lines and bead types has been used to generate a database recording protein identification frequencies, i.e., recording the number of previous experiments where any given protein was identified [4]. The higher the number of times a protein is identified in different IP experiments, the more likely it represents a non-specific binder. This protein frequency library information is available as an online resource for comparing immune-

A basic immune-precipitation protocol can be found in the Supplementary Methods section

The use of SILAC labelling enables a wide range of assay formats to be designed for quantitative comparison of protein properties under different conditions. For example, using SILAC in conjunction with cellular fractionation, immune-precipitation and time course experiments it is possible to analyse the kinetics of protein transport, synthesis,

Using a combination of physical and chemical separation methods, including differential density centrifugation, it is possible to fractionate cells and isolate subcellular organelles and components such as cytoplasm, nucleoplasm, membranes etc. There are also commercial kits available that can be used to fractionate cells and combined with MS analysis. The cellular fractionation most commonly used in our hands concentrates on distinguishing between cytoplasmic and nuclear localisation of proteins in eukaryotic cells, allowing analysis of compartmentalisation of protein function and nucleo-cytoplasmic transport under different cell growth conditions and responses [3, 11]. Figure 5 illustrates the procedure and the specifics of the methodology can be found in the Supplementary

The principles of the fractionation strategy, as applied to mammalian cells grown in culture, are as follows; application of a hypotonic (low salt) buffer to freshly trypsinised cells, followed by a gentle mechanical disruption with a dounce homogeniser. This causes the cells to swell, and hence disrupts the outer cell membrane. The resulting 'cellular' suspension is centrifuged such that larger organelles, including the nucleus (which at this stage is intact) will spin down into a pellet, whilst the soluble material and smaller

of quantitation and efficiency for this kind of analysis.

under 'Immuno-precipitation Protocol'.

Methods under 'Cellular Fractionation'.

degradation and interaction.

**5. Dynamic proteomics: How it's done** 

precipitation results; see http://www.peptracker.com/datavisual/.

**Figure 4.** Immunne-precipitation. The beads used in immune-precipitation experiments contribute a significant amount of non-specific binding proteins, and therefore need to be accounted for with the use of a bead control. This is done using cell lysate and the beads of choice, without the antibody, leading to a sample which contains only background proteins. With this information the genuine interactors can more easily be determined.

Using immune-precipitation (harnessing the specificity of antibodies to protein targets) for protein interaction experiments has long been the gold standard, particularly in combination with traditional western blot analysis. With the application of mass spectrometry to characterise immuno-precipitates, the analysis has now expanded to identifying hundreds of proteins in each IP. A large percentage of the proteins pulled down in an immune-precipitation experiment bind non-specifically, for example binding to the beads used as the solid substrate for the antibody rather than to the bait or target protein (Figure 4). The beads often have a high general binding affinity for protein [4, 10]. Without good controls these non-specifically binding proteins can occlude identification of the specific proteins of interest. To allow for this a bead control is often included as part of the experiment. A bead control is provided by applying equal amounts of the cell lysate of choice to the beads being used for the immune-precipitation, sans antibody. This generates a sample that will predominantly contain non-specific binding proteins, which can be identified during the analysis and distinguished from the genuine protein interaction partners in the complex of interest. While label free MS analysis is effective for this protocol, differential SILAC labelling to distinguish the control and experimental conditions (e.g. R0K0- bead control, R6K4 protein of interest, R10K8 protein + drug) improves the accuracy of quantitation and efficiency for this kind of analysis.

Extensive analysis of hundreds of immune-precipitations, with lysates from various different cell lines and bead types has been used to generate a database recording protein identification frequencies, i.e., recording the number of previous experiments where any given protein was identified [4]. The higher the number of times a protein is identified in different IP experiments, the more likely it represents a non-specific binder. This protein frequency library information is available as an online resource for comparing immuneprecipitation results; see http://www.peptracker.com/datavisual/.

A basic immune-precipitation protocol can be found in the Supplementary Methods section under 'Immuno-precipitation Protocol'.

#### **5. Dynamic proteomics: How it's done**

186 Functional Genomics

nature of protein complexes.

more easily be determined.

respective multi-protein complexes. However, the dynamic nature of the proteome means that there may not be a unique description of protein complexes. For example, at different cell cycle stages and/or under changed conditions (e.g. following drug treatment) the partner proteins in a complex might either change, or vary in abundance and/or modification state. Our aim, therefore, is to analyse both the composition and dynamic

**Figure 4.** Immunne-precipitation. The beads used in immune-precipitation experiments contribute a significant amount of non-specific binding proteins, and therefore need to be accounted for with the use of a bead control. This is done using cell lysate and the beads of choice, without the antibody, leading to a sample which contains only background proteins. With this information the genuine interactors can

Using immune-precipitation (harnessing the specificity of antibodies to protein targets) for protein interaction experiments has long been the gold standard, particularly in combination with traditional western blot analysis. With the application of mass spectrometry to characterise immuno-precipitates, the analysis has now expanded to identifying hundreds of proteins in each IP. A large percentage of the proteins pulled down in an immune-precipitation experiment bind non-specifically, for example binding to the beads used as the solid substrate for the antibody rather than to the bait or target protein (Figure 4). The beads often have a high general binding affinity for protein [4, 10]. Without good controls these non-specifically binding proteins can occlude identification of the The use of SILAC labelling enables a wide range of assay formats to be designed for quantitative comparison of protein properties under different conditions. For example, using SILAC in conjunction with cellular fractionation, immune-precipitation and time course experiments it is possible to analyse the kinetics of protein transport, synthesis, degradation and interaction.

Using a combination of physical and chemical separation methods, including differential density centrifugation, it is possible to fractionate cells and isolate subcellular organelles and components such as cytoplasm, nucleoplasm, membranes etc. There are also commercial kits available that can be used to fractionate cells and combined with MS analysis. The cellular fractionation most commonly used in our hands concentrates on distinguishing between cytoplasmic and nuclear localisation of proteins in eukaryotic cells, allowing analysis of compartmentalisation of protein function and nucleo-cytoplasmic transport under different cell growth conditions and responses [3, 11]. Figure 5 illustrates the procedure and the specifics of the methodology can be found in the Supplementary Methods under 'Cellular Fractionation'.

The principles of the fractionation strategy, as applied to mammalian cells grown in culture, are as follows; application of a hypotonic (low salt) buffer to freshly trypsinised cells, followed by a gentle mechanical disruption with a dounce homogeniser. This causes the cells to swell, and hence disrupts the outer cell membrane. The resulting 'cellular' suspension is centrifuged such that larger organelles, including the nucleus (which at this stage is intact) will spin down into a pellet, whilst the soluble material and smaller cytoplasmic material will stay in the supernatant. Thereafter stronger mechanical disruption is employed (e.g. sonication) to lyse the nucleus, and one or more additional fractionation steps (e.g. density gradients) are used to separate organelles and other subcellular structures based on properties such as their size, density and/or shape.

Dynamic Proteomics: Methodologies and Analysis 189

Bearing in mind different cell lines have varying cell cycle length, the online Protein Turnover Viewer can allow comparison of new results with this database, and hence reveal differences in behaviour between cell lines. The Protein Turnover Viewer has an easily navigable interface, allowing Uniprot identifiers to be used to identify a protein of interest to

**Figure 5.** Cellular fractionation, and SILAC cellular fractionation. The physiological properties of the cellular structure enable effective separation of parts of the cell, using combinations of chemistry, centrifugal properties and varying strengths of mechanical disruption. This method in combination with SILAC enables characterisation of different conditions in one experiment, describing quantitatively

find out the data on its turnover.

the regulation and location of proteins.

This can be combined with MS-based approaches and SILAC to determine changes in the subcellular organisation of the proteome induced by stress or other perturbations (e.g. UV, drug treatment etc.). This is done by growing cells in media with different SILAC labels, using one of the labels as an untreated control sample (e.g. 'light') while exposing cells grown in a different label (e.g. 'medium' or 'heavy') to the perturbation, e.g. stress, drug treatment. After incubating for the desired time, which will vary depending on the treatment being performed, equal numbers of cells from each control and experimental sample can be mixed and the fractionation protocol carried out. Alternatively, the cell fractionation can be performed separately for the different samples and then mixed to combine equal amounts of protein from each (Figure 5). In this technique proteins remaining unchanged as a result of the perturbation will show a SILAC ratio for the control and experimental isotopic forms of ~1 (or if a log ratio is plotted, 0). In contrast, proteins which have been altered as a result of the experimental treatment (e.g. moved from cytoplasm to nucleus) will show either an increased or decreased SILAC ratio, according to the design of the experiment. This conveniently highlights a particular subset of proteins that may respond to a specific perturbation and provides in parallel a direct comparison with the bulk response of the large number of cell proteins sampled in high throughput.

This approach can also be used in combination with a Pulse SILAC experimental set-up, as discussed below.

The cellular fractionation protocol described above allows the characterisation of changes in the steady state localisation of proteins and of kinetics of protein movement, but this is not the full story. Although the location of a protein is fundamental to its function, the change induced by your experimental variable might also affect protein turnover, either by changing rates of protein synthesis, degradation or both. So how do we characterise this? Pulse SILAC techniques have enabled an elegant experimental procedure to characterise the time dynamics of the proteome [3, 12, 13]. This involves generating a population of completely labelled cells in medium label (e.g.R6K4), and switching the media over to heavy (e.g.R10K8). Over time conversion of all the medium labelled protein into heavy labelled protein occurs. Collecting cells at various time points, and mixing these with light labelled cells (50:50 as per usual SILAC) as a control steady state of protein expression (Figure 6), gives samples which characterise protein synthesis and degradation (13).

The benefit of this kind of experimental set up is evident in the downstream data analysis. Decrease in the medium to light ratios describes the degradation rate of a given protein, whilst increase in the heavy to light ratio describes the synthesis of new proteins. The time point at which these 2 curves intersect (assuming you have a sufficient number of time points for accurate measurements) describes the time required for turnover of 50% of the protein. Analysis of proteome turnover in the HeLa and HCT116 cell lines has been carried out and made publicly available at http://www.peptracker.com/turnoverInformation/. Bearing in mind different cell lines have varying cell cycle length, the online Protein Turnover Viewer can allow comparison of new results with this database, and hence reveal differences in behaviour between cell lines. The Protein Turnover Viewer has an easily navigable interface, allowing Uniprot identifiers to be used to identify a protein of interest to find out the data on its turnover.

188 Functional Genomics

discussed below.

cytoplasmic material will stay in the supernatant. Thereafter stronger mechanical disruption is employed (e.g. sonication) to lyse the nucleus, and one or more additional fractionation steps (e.g. density gradients) are used to separate organelles and other subcellular structures

This can be combined with MS-based approaches and SILAC to determine changes in the subcellular organisation of the proteome induced by stress or other perturbations (e.g. UV, drug treatment etc.). This is done by growing cells in media with different SILAC labels, using one of the labels as an untreated control sample (e.g. 'light') while exposing cells grown in a different label (e.g. 'medium' or 'heavy') to the perturbation, e.g. stress, drug treatment. After incubating for the desired time, which will vary depending on the treatment being performed, equal numbers of cells from each control and experimental sample can be mixed and the fractionation protocol carried out. Alternatively, the cell fractionation can be performed separately for the different samples and then mixed to combine equal amounts of protein from each (Figure 5). In this technique proteins remaining unchanged as a result of the perturbation will show a SILAC ratio for the control and experimental isotopic forms of ~1 (or if a log ratio is plotted, 0). In contrast, proteins which have been altered as a result of the experimental treatment (e.g. moved from cytoplasm to nucleus) will show either an increased or decreased SILAC ratio, according to the design of the experiment. This conveniently highlights a particular subset of proteins that may respond to a specific perturbation and provides in parallel a direct comparison with the bulk response of the large number of cell proteins sampled in high throughput.

This approach can also be used in combination with a Pulse SILAC experimental set-up, as

The cellular fractionation protocol described above allows the characterisation of changes in the steady state localisation of proteins and of kinetics of protein movement, but this is not the full story. Although the location of a protein is fundamental to its function, the change induced by your experimental variable might also affect protein turnover, either by changing rates of protein synthesis, degradation or both. So how do we characterise this? Pulse SILAC techniques have enabled an elegant experimental procedure to characterise the time dynamics of the proteome [3, 12, 13]. This involves generating a population of completely labelled cells in medium label (e.g.R6K4), and switching the media over to heavy (e.g.R10K8). Over time conversion of all the medium labelled protein into heavy labelled protein occurs. Collecting cells at various time points, and mixing these with light labelled cells (50:50 as per usual SILAC) as a control steady state of protein expression (Figure 6),

The benefit of this kind of experimental set up is evident in the downstream data analysis. Decrease in the medium to light ratios describes the degradation rate of a given protein, whilst increase in the heavy to light ratio describes the synthesis of new proteins. The time point at which these 2 curves intersect (assuming you have a sufficient number of time points for accurate measurements) describes the time required for turnover of 50% of the protein. Analysis of proteome turnover in the HeLa and HCT116 cell lines has been carried out and made publicly available at http://www.peptracker.com/turnoverInformation/.

gives samples which characterise protein synthesis and degradation (13).

based on properties such as their size, density and/or shape.

**Figure 5.** Cellular fractionation, and SILAC cellular fractionation. The physiological properties of the cellular structure enable effective separation of parts of the cell, using combinations of chemistry, centrifugal properties and varying strengths of mechanical disruption. This method in combination with SILAC enables characterisation of different conditions in one experiment, describing quantitatively the regulation and location of proteins.

Dynamic Proteomics: Methodologies and Analysis 191

Cox and Matthias Mann [14, 15]. It is made available as freeware and can be downloaded from http://maxquant.org/. MaxQuant includes a search engine that can use raw MS data from the mass spectrometer, perform peak picking, mass recalibration, SILAC pair matching and quantification, label free quantification, database searching (using Andromeda), and output peptide and protein data in extensive detail [14, 15]. While other commercial and freeware software options are also available for analysis of MS data we routinely use the MaxQuant package which works very well specifically for the protocols

Data grouping is a way of making large data sets easier to manage. In an ideal world having a database with experimental values linked to reliable meta data describing the experimental parameters is the best case scenario for proteomic data management [2-4,

Online versions of proteomic databases are available which allow mass spectrometry based experimental data upload, and subsequent comparison to other datasets contained in the database, such as PRIDE (http://www.ebi.ac.uk/pride/)[16]. Several other MS data repositories (namely Tranche and PeptideAtlas) have combined with PRIDE to form the Proteome Xchange (http://www.proteomeexchange.org) which enables submission from a single webpage and the combination of the data from all three repositories. In depth analytics on this data has not been performed- comparisons are mainly based around

Quantitative comparisons of datasets in this forum aren't possible but grouping/result set selection according to numerous meta data and protein identifiers is possible. Absolute quantification comparisons with experimental datasets is possible through PaxDB (http://pax-db.org)[17] which not only contains data for most model organisms but has correlated absolute quantitation information from 28 datasets, and computed the average parts per million value for thousands of proteins. These data can be searched 100 identifiers

Using the MaxQuant software for data processing allows the grouping and separation of data from individual MS analyses [14, 15]. MaxQuant can combine data from all the protein fractions from a sample (if it has been pre-fractionated before MS), and can separate different samples from different conditions, but combine and output the results in one excel sheet. This facilitates direct comparison between all samples with all ratio/intensity data

When the appropriate population statistical analyses have been performed and a statistically valid significance cut-off has been calculated, the candidates for up- or down-regulated proteins from each group can be identified. When performing analysis of proteome dynamics, these results can also be compared with other variables. For example, a cell

described here.

15].

at a time.

present.

**6.2. Data grouping** 

protein identification, and classification.

**Figure 6.** Pulse SILAC. Pulse SILAC uses an established labelled population of cells and when a media swap (i.e. from R6K4 to R10K8) is instigated, measurements of protein degradation and synthesis can be performed, when mixed with a control population (R0K0).

This technique is not only useful for steady state, or 'normal' protein turnover analysis. It fits very well to drug treatment kinetics, microRNA effects, DNA damage analysis (e.g. UV or chemical induced), or physiological perturbations (e.g. hypoxia or other forms of stress). Analysis of the resulting data is more complex than a more simple SILAC experiment and the data set larger, but provides a useful wealth of information about protein dynamics.

#### **6. Data analysis**

Data analysis of SILAC experiments needs to be tailored to the specific question, but the beginnings of the analysis process are very similar and can follow this method:

MaxQuant Data Culling Population Statistics Data Grouping

#### **6.1. MaxQuant**

MaxQuant is a comprehensive software package widely used for the analysis and quantitation of MS-based proteomic data, including SILAC, that was created by Jurgen Cox and Matthias Mann [14, 15]. It is made available as freeware and can be downloaded from http://maxquant.org/. MaxQuant includes a search engine that can use raw MS data from the mass spectrometer, perform peak picking, mass recalibration, SILAC pair matching and quantification, label free quantification, database searching (using Andromeda), and output peptide and protein data in extensive detail [14, 15]. While other commercial and freeware software options are also available for analysis of MS data we routinely use the MaxQuant package which works very well specifically for the protocols described here.

#### **6.2. Data grouping**

190 Functional Genomics

**Figure 6.** Pulse SILAC. Pulse SILAC uses an established labelled population of cells and when a media swap (i.e. from R6K4 to R10K8) is instigated, measurements of protein degradation and synthesis can be

This technique is not only useful for steady state, or 'normal' protein turnover analysis. It fits very well to drug treatment kinetics, microRNA effects, DNA damage analysis (e.g. UV or chemical induced), or physiological perturbations (e.g. hypoxia or other forms of stress). Analysis of the resulting data is more complex than a more simple SILAC experiment and the data set larger, but provides a useful wealth of information about protein dynamics.

Data analysis of SILAC experiments needs to be tailored to the specific question, but the

MaxQuant is a comprehensive software package widely used for the analysis and quantitation of MS-based proteomic data, including SILAC, that was created by Jurgen

beginnings of the analysis process are very similar and can follow this method:

MaxQuant Data Culling Population Statistics Data Grouping

performed, when mixed with a control population (R0K0).

**6. Data analysis** 

**6.1. MaxQuant** 

Data grouping is a way of making large data sets easier to manage. In an ideal world having a database with experimental values linked to reliable meta data describing the experimental parameters is the best case scenario for proteomic data management [2-4, 15].

Online versions of proteomic databases are available which allow mass spectrometry based experimental data upload, and subsequent comparison to other datasets contained in the database, such as PRIDE (http://www.ebi.ac.uk/pride/)[16]. Several other MS data repositories (namely Tranche and PeptideAtlas) have combined with PRIDE to form the Proteome Xchange (http://www.proteomeexchange.org) which enables submission from a single webpage and the combination of the data from all three repositories. In depth analytics on this data has not been performed- comparisons are mainly based around protein identification, and classification.

Quantitative comparisons of datasets in this forum aren't possible but grouping/result set selection according to numerous meta data and protein identifiers is possible. Absolute quantification comparisons with experimental datasets is possible through PaxDB (http://pax-db.org)[17] which not only contains data for most model organisms but has correlated absolute quantitation information from 28 datasets, and computed the average parts per million value for thousands of proteins. These data can be searched 100 identifiers at a time.

Using the MaxQuant software for data processing allows the grouping and separation of data from individual MS analyses [14, 15]. MaxQuant can combine data from all the protein fractions from a sample (if it has been pre-fractionated before MS), and can separate different samples from different conditions, but combine and output the results in one excel sheet. This facilitates direct comparison between all samples with all ratio/intensity data present.

When the appropriate population statistical analyses have been performed and a statistically valid significance cut-off has been calculated, the candidates for up- or down-regulated proteins from each group can be identified. When performing analysis of proteome dynamics, these results can also be compared with other variables. For example, a cell fractionation experiment performed, in conjunction with a time course of a drug treatment. Time course data can also be analysed to determine trends. It is important to have a zero time point, to describe the basal protein level, and use this to normalise values from the later time points followed by detection and grouping of trends. Most proteins will show little or no change over time but specific groups may show trends, for example reflecting regulation as a result of cell cycle, which appear as one or more peaks/troughs (figure 8) that can be identified by clustering analysis (this analysis was done with StatistiXL (http://www.statistixl.com/features/cluster.aspx) and further correlated with other data, such as GO terms or protein network information (Network analysis was done with String data base analysis http://string-db.org/[18]) . In the example shown, network analysis of the proteins found to have similar expression trends indicated that that the proteins identified were linkers between 2 or more functional networks, showing the transfer of effect through regulation, over time. With any other kind of grouping, such as for example Go term or subcellular location, this association between known networks would not be determined; it is only seen in the regulation trend association.

Dynamic Proteomics: Methodologies and Analysis 193

Life scientists working in the proteomics field have had the privilege of being at the cutting edge of an emerging technology that has opened up new possibilities for improving experimental design and data analysis. As proteomics can be "characterized more by its diversity than a common methodological or subject orientation"[1] the applications developed to accommodate this diversity should be made available and accessible to the

The methods described here allow the description and measurement of protein-protein interactions, changes in proteome localisation and rates of synthesis and degradation. While the bench-top methodologies are relatively straightforward, the key to harnessing the biological value of the experiments often lies in the methods used to analyse the resulting data. We recommend systematic recording and management of all data, from all experiments. Systematic recording of detailed meta data can be used to extract information and obtain new results through a comparison of data trends across many different and often

All protocols discussed above can be found on greproteomics.lifesci.dundee.ac.uk and

The following protocol provides a step by step guideline for preparing SILAC media and

Order no: Sigma, L-Arginine (A8094, 25g), L-lysine (L8662, 25g), L-Methionoine

N/B: It is advisable to prepare 500µl aliquots of amino acids in PBS which can be stored at -

R6K4 and R10K8 amino acids- please note all amino acids are purchased via Cambridge Isotope Lab (CIL; North America, www.isotope.com) for UK see http://www.cgkas.com

unrelated experiments. We term this approach, 'Super Experiments'.

**8.1. SILAC-Stable Isotope Labelling of Amino acids in Culture** 

Media can be bought ready or be made by the user, prior to use.

1. DMEM or RPMI minus arginine, lysine and methionine.

20⁰C. Add 500µl aliquot of each when preparing SILAC media.

**7. Conclusions** 

wider scientific community.

www.lamondlab.com websites.

**8. Supplementary methods** 

growing labelled cells in tissue culture.

*Order no: contact your local sales rep* 

Stock concentrations: Arg0: 84mg/ml Lys0: 146mg/ml Met0: 30mg/ml

*Order no: Invitrogen, cat no S181D (500ml)*  3. Standard amino acids (ROKO media for control)

2. Dialyzed FBS (fetal calf serum)

(M5308, 25g)

For media:

**Figure 7.** Hierarchical clustering of protein ratios over time, leading to effective grouping of expression trends. This kind of trend grouping and analysis was not possible by grouping according to GO terms or cellular location, or network association. Network analysis of these clustered groups **after**  hierarchical clustering is advisable however, as interaction between known networks is often identified.

#### **7. Conclusions**

192 Functional Genomics

identified.

is only seen in the regulation trend association.

fractionation experiment performed, in conjunction with a time course of a drug treatment. Time course data can also be analysed to determine trends. It is important to have a zero time point, to describe the basal protein level, and use this to normalise values from the later time points followed by detection and grouping of trends. Most proteins will show little or no change over time but specific groups may show trends, for example reflecting regulation as a result of cell cycle, which appear as one or more peaks/troughs (figure 8) that can be identified by clustering analysis (this analysis was done with StatistiXL (http://www.statistixl.com/features/cluster.aspx) and further correlated with other data, such as GO terms or protein network information (Network analysis was done with String data base analysis http://string-db.org/[18]) . In the example shown, network analysis of the proteins found to have similar expression trends indicated that that the proteins identified were linkers between 2 or more functional networks, showing the transfer of effect through regulation, over time. With any other kind of grouping, such as for example Go term or subcellular location, this association between known networks would not be determined; it

**Figure 7.** Hierarchical clustering of protein ratios over time, leading to effective grouping of expression trends. This kind of trend grouping and analysis was not possible by grouping according to GO terms

or cellular location, or network association. Network analysis of these clustered groups **after**  hierarchical clustering is advisable however, as interaction between known networks is often Life scientists working in the proteomics field have had the privilege of being at the cutting edge of an emerging technology that has opened up new possibilities for improving experimental design and data analysis. As proteomics can be "characterized more by its diversity than a common methodological or subject orientation"[1] the applications developed to accommodate this diversity should be made available and accessible to the wider scientific community.

The methods described here allow the description and measurement of protein-protein interactions, changes in proteome localisation and rates of synthesis and degradation. While the bench-top methodologies are relatively straightforward, the key to harnessing the biological value of the experiments often lies in the methods used to analyse the resulting data. We recommend systematic recording and management of all data, from all experiments. Systematic recording of detailed meta data can be used to extract information and obtain new results through a comparison of data trends across many different and often unrelated experiments. We term this approach, 'Super Experiments'.

All protocols discussed above can be found on greproteomics.lifesci.dundee.ac.uk and www.lamondlab.com websites.

#### **8. Supplementary methods**

#### **8.1. SILAC-Stable Isotope Labelling of Amino acids in Culture**

The following protocol provides a step by step guideline for preparing SILAC media and growing labelled cells in tissue culture.

Media can be bought ready or be made by the user, prior to use.

For media:


N/B: It is advisable to prepare 500µl aliquots of amino acids in PBS which can be stored at - 20⁰C. Add 500µl aliquot of each when preparing SILAC media.


R6K4 and R10K8 amino acids- please note all amino acids are purchased via Cambridge Isotope Lab (CIL; North America, www.isotope.com) for UK see http://www.cgkas.com


Dynamic Proteomics: Methodologies and Analysis 195

We have also included a shortened version of the protocol that will give only cytoplasmic

N/B: Normal fractionation requires 5-15 x14cm circular dishes of completely confluent cells.

1. From confluent dishes. Trypsinise cells and spin in centrifuge for 4mins at 1000rpm.

2. Re-suspend pellet in 5ml of ice-cold Buffer A (see Buffer A recipe). Incubate cells on ice

3. Transfer re-suspended pellet into a pre-chilled 7ml dounce homogeniser and break cells

4. Centrifuge dounced cells for 5mins at 4⁰C, 1000rpm. Retain supernatant as cytoplasmic

5. Re-suspend pellet in 3ml of S1 (0.25M Sucrose, 10mM MgCl2) and layer over a 3ml cushion of S2 (0.35M Sucrose, 0.5mM MgCl2) by slowly pipetting S1 solution on top of

7. Remove supernatant (retain if necessary) and re-suspend in 3ml of S2 (0.35M Sucrose, 0.5mM MgCl2) and sonicate for 6 x 10 secs (with a 10 sec rest on ice between each sonication) using a probe sonicator. (N/B: if a probe sonicator is not available a bath sonicator can be used providing samples are sonicated in an ice bath to prevent

8. Layer the sonicated sample over 3ml S3 (0.88M Sucrose, 0.5mM MgCl2) again by pipetting solution slowly on top S3 layer. Spin samples for 10mins at 4⁰C, 3500rpm.

10. Wash pellet by re-suspending in 500µl of S2 (0.35M Sucrose, 0.5mM MgCl2) and spin for

Nucleoli pellet can be stored in any volume of buffer at -80⁰C and can be spun out again

1. From confluent dishes. Trypsinise cells and spin in centrifuge for 4mins at 1000rpm.

2. From confluent dishes. Trypsinise cells and spin in centrifuge for 4mins at 1000rpm.

3. Transfer re-suspended pellet into a pre-chilled 7ml dounce homogeniser and break cells open using 10 strokes of a tight pestle. Centrifuge dounced cells for 5mins at 4⁰C,

4. Re-suspend nuclear pellet in 3ml of S1 (0.25M Sucrose, 10mM MgCl2) and layer over a 3ml cushion of S2 (0.35M Sucrose, 0.5mM MgCl2) by slowly pipetting S1 solution on top

5. Centrifuge for 10mins at 4⁰C, 3500rpm and retain pellet as nuclear fraction.

**Cytoplasmic, Nucleoplasmic and Nucleoli fractionation** 

Wash pellet with PBS and spin again.

open using 10 strokes of a tight pestle.

6. Centrifuge for 5mins at 4⁰C, 2500rpm.

9. Retain supernatant as nucleoplasmic fraction!

using the same centrifugation parameters as step 10.

Wash pellet with PBS and spin again.

Wash pellet with PBS and spin again.

**Cytoplasm and Nuclei fractions only** 

5mins at 4⁰C, 3500rpm- this is the nucleoli fraction.

1000rpm. Retain supernatant as cytoplasmic fraction.

and nuclei fractions.

for 5mins.

fraction.

overheating)

of S2.

S2.

These will make enough SILAC media for approximately 12 bottles of media however you can buy smaller amounts of the amino acids if you only plan to do 1 or 2 experiments.


4. Cell Dissociation Buffer

Order no: Invitrogen, cat no. 13151-014 (100ml)

When passaging cells it is very important to NOT USE trypsin!! As this may provide a pool of unlabelled amino acids)

Preparing the SILAC media (500ml):

To 500ml DMEM/RPMI media add:


Mix well then filter through 0.22µm sterile filter. Store at 4⁰C.

Cells should be grown for a minimum 6 passages for complete labelling.

#### **8.2. Cellular fractionation protocol**

This protocol will provide an effective technique to fractionate a variety of different cell types into cytoplasmic, nucleoplasmic and nucleoli fractions. The exact recipes for the solutions required throughout the protocol are provided at the end.

We have also included a shortened version of the protocol that will give only cytoplasmic and nuclei fractions.

N/B: Normal fractionation requires 5-15 x14cm circular dishes of completely confluent cells.

#### **Cytoplasmic, Nucleoplasmic and Nucleoli fractionation**

194 Functional Genomics

**15N4, 98%)** 

**15N2, 98%)** 

**15N4, 98%)** 

**15N2, 98%)** 

**L-arginine-HCL (U-13C6, 98% :** 

**L-lysine-2HCL (U-13C6, 98% :** 

**L-arginine-HCL (U-13C6, 98% :** 

**L-lysine-2HCL (U-13C6, 98% :** 

Order no: Invitrogen, cat no. 13151-014 (100ml)

3. 5.5ml Pen/Strep (and/or other antibiotics, if desired)

Mix well then filter through 0.22µm sterile filter. Store at 4⁰C.

Cells should be grown for a minimum 6 passages for complete labelling.

solutions required throughout the protocol are provided at the end.

4. Cell Dissociation Buffer

of unlabelled amino acids)

Preparing the SILAC media (500ml):

To 500ml DMEM/RPMI media add:

1. 500ml DMEM/RPMI media

5. 0.5ml Arg stock (R0, R6, R10) 6. 0.5ml Lys stock (K0, K4, K8)

**8.2. Cellular fractionation protocol** 

2. 50ml dialysed FBS

4. 0.5ml Met0 stock

**Amino Acid Symbol Cat. No Pack Size L-arginine-HCL (U-13C6, 98%)** R8 CLM-2265 0.5g

These will make enough SILAC media for approximately 12 bottles of media however you can buy smaller amounts of the amino acids if you only plan to do 1 or 2 experiments.

When passaging cells it is very important to NOT USE trypsin!! As this may provide a pool

This protocol will provide an effective technique to fractionate a variety of different cell types into cytoplasmic, nucleoplasmic and nucleoli fractions. The exact recipes for the

**Amino Acid Symbol Cat. No Pack Size L-arginine-HCL (U-13C6, 98%)** R8 CLM-2265 0.1g

**L-lysine-2HCL (U-13C6, 98%)** K6 CLM-2247 0.1g

**L-lysine-2HCL (U-13C6, 98%)** K6 CLM-2247 0.5g

R10 CNLM-539 0.5g

K8 CNLM-291 0.5g

R10 CNLM-539 0.1g

K8 CNLM-291 0.1g


Nucleoli pellet can be stored in any volume of buffer at -80⁰C and can be spun out again using the same centrifugation parameters as step 10.

#### **Cytoplasm and Nuclei fractions only**


Nuclear pellet can be stored in any volume of buffer at -80⁰C and can be spun out again using the same centrifugation parameters as step 5.

Dynamic Proteomics: Methodologies and Analysis 197

Up-scale volumes as necessary.

**Stock mM(final)**

**Table 5.** RIPA buffer- used frequently to prepare cellular lysate (10ml)

Protease inhibitor 1 mini EDTA-free COMPLETE tablet

This technique is very useful in the purification of a protein of interest. The technique works through the formation of an antigen: antibody complex which is attached to agarose/sepharose/metallic bead. The bead coupled to an antibody provide a matrix to which the protein of interest can bind allowing the other undesired components of the whole cell extract to be washed away. The eluted sample from the beads can then be further

The protocol that follows is a very generic standard procedure presented as an initial

recommendation for those who have not performed or optimised an IP previously.

Protease Inhibitor cocktail tablets (Roche, cat. 11-873-580-001). 1 per 50ml buffer.

N.B: All bead spin downs are done at 2000rpm for 2mins at 4°C.

Standard elution buffer: LDS sample buffer (invitrogen, cat. NP0007. Diluted 4x buffer 1:1

1. Place whole cell extract aliquot in a round-bottomed vial to ensure good mixing. Add antibody to the required specific dilution for what you're using (you may need to consult your information booklet for antibody dilution guidelines.) When using cell

1M Tris, pH7.5 50 5M NaCl 150 10% NP-40 1% 10% Deoxycholate 0.5%

**8.3. Immuno-precipitation protocol** 

processed by gel electrophoresis and MS.

100mM Glycine pH2.5 (adjusted with HCl)

with milliQ to obtain 2x solution.

Reagents required.

20mM Tris-HCl pH 7.5

Glycine elution buffer:

IP buffer:

150mM NaCl 1mM EDTA 0.05% Triton X-100

5% glycerol

*8.3.1. Method* 

#### *8.2.1. Solutions*


**Table 1.** Buffer A (10ml stock) is a hypotonic buffer that causes the cells to swell to they can be effectively broken open by dounce homogenizing.


**Table 2.** S1 (0.25M Sucrose, 10mM MgCl2) 20ml


**Table 3.** S2 (0.35M Sucrose, 0.5mM MgCl2), 40ml


**Table 4.** S3 (0.88M Sucrose, 0.5mM MgCl2), 20ml


Up-scale volumes as necessary.

196 Functional Genomics

*8.2.1. Solutions* 

using the same centrifugation parameters as step 5.

1M HEPES, Ph 7.9 10 1M MgCl2 1.5 2.5M KCl 10 1M DTT 0.5

effectively broken open by dounce homogenizing.

**Table 2.** S1 (0.25M Sucrose, 10mM MgCl2) 20ml

**Table 3.** S2 (0.35M Sucrose, 0.5mM MgCl2), 40ml

**Table 4.** S3 (0.88M Sucrose, 0.5mM MgCl2), 20ml

**Stock mM (final)**

dH2O Up to 10ml

**Stock mM (final)**

dH2O Up to 20ml

**Stock mM (final)**

dH2O Up to 40ml

**Stock mM (final)**

dH2O Up to 20ml

2.5M Sucrose 0.88 1M MgCl2 0.5

2.5M Sucrose 0.35 1M MgCl2 0.5

2.5M Sucrose 0.25 1M MgCl2 10

Nuclear pellet can be stored in any volume of buffer at -80⁰C and can be spun out again

Protease inhibitor 1 mini EDTA-free COMPLETE tablet **Table 1.** Buffer A (10ml stock) is a hypotonic buffer that causes the cells to swell to they can be

Protease inhibitor 1 mini EDTA-free COMPLETE tablet

Protease inhibitor 2 mini EDTA-free COMPLETE tablets

Protease inhibitor 1 mini EDTA-free COMPLETE tablet

**Table 5.** RIPA buffer- used frequently to prepare cellular lysate (10ml)

#### **8.3. Immuno-precipitation protocol**

This technique is very useful in the purification of a protein of interest. The technique works through the formation of an antigen: antibody complex which is attached to agarose/sepharose/metallic bead. The bead coupled to an antibody provide a matrix to which the protein of interest can bind allowing the other undesired components of the whole cell extract to be washed away. The eluted sample from the beads can then be further processed by gel electrophoresis and MS.

The protocol that follows is a very generic standard procedure presented as an initial recommendation for those who have not performed or optimised an IP previously.

Reagents required.

IP buffer: 20mM Tris-HCl pH 7.5 150mM NaCl 1mM EDTA 0.05% Triton X-100 5% glycerol Protease Inhibitor cocktail tablets (Roche, cat. 11-873-580-001). 1 per 50ml buffer.

Glycine elution buffer: 100mM Glycine pH2.5 (adjusted with HCl)

Standard elution buffer: LDS sample buffer (invitrogen, cat. NP0007. Diluted 4x buffer 1:1 with milliQ to obtain 2x solution.

#### *8.3.1. Method*

N.B: All bead spin downs are done at 2000rpm for 2mins at 4°C.

1. Place whole cell extract aliquot in a round-bottomed vial to ensure good mixing. Add antibody to the required specific dilution for what you're using (you may need to consult your information booklet for antibody dilution guidelines.) When using cell

fractions use 200µl of Cytoplasmic protein solution, and 50µl of Nucleoplasmic protein solution (up-scaling as required).

Dynamic Proteomics: Methodologies and Analysis 199

[2] Ahmad, Y., Boisvert, F. M., Lundberg, E., Uhlen, M., Lamond, A. I., Systematic analysis of protein pools, isoforms, and modifications affecting turnover and subcellular

[3] Boisvert, F. M., Ahmad, Y., Gierlinski, M., Charriere, F.*, et al.*, A quantitative spatial proteomics analysis of proteome turnover in human cells. *Mol Cell Proteomics* 2012, *11*,

[4] Boulon, S., Ahmad, Y., Trinkle-Mulcahy, L., Verheggen, C.*, et al.*, Establishment of a protein frequency library and its application in the reliable identification of specific

[5] Boisvert, F. M., Lamond, A. I., p53-Dependent subcellular proteome localization

[6] Larance, M., Kirkwood, K. J., Xirodimas, D. P., Lundberg, E.*, et al.*, Characterization of MRFAP1 turnover and interactions downstream of the NEDD8 pathway. *Mol Cell* 

[7] Deeb, S. J., D'Souza, R., Cox, J., Schmidt-Supprian, M., Mann, M., Super-SILAC allows classification of diffuse large B-cell lymphoma subtypes by their protein expression

[8] Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W.*, et al.*, The Sequence of the Human

[9] Ong, S. E., Blagoev, B., Kratchmarova, I., Kristensen, D. B.*, et al.*, Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression

[10] ten Have, S., Boulon, S., Ahmad, Y., Lamond, A. I., Mass spectrometry-based immunoprecipitation proteomics - the user's guide. *PROTEOMICS* 2011, *11*, 1153-1159. [11] Boisvert, F.-M., Lam, Y. W., Lamont, D., Lamond, A. I., A Quantitative Proteomics Analysis of Subcellular Proteome Localization and Changes Induced by DNA Damage.

[12] Schwanhausser, B., Gossen, M., Dittmar, G., Selbach, M., Global analysis of cellular

[13] Boisvert, F.-M., Ahmad, Y., Gierliński, M., Charrière, F.*, et al.*, A quantitative spatial proteomics analysis of proteome turnover in human cells. *Molecular & Cellular* 

[14] Cox, J., Mann, M., MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. *Nat Biotechnol* 

[15] Schaab, C., Geiger, T., Stoehr, G., Cox, J., Mann, M., Analysis of high accuracy, quantitative proteomics data in the MaxQB database. *Mol Cell Proteomics* 2012, *11*, M111

[16] Vizcaíno, J. A., Côté, R., Reisinger, F., Barsnes, H.*, et al.*, The Proteomics Identifications

[17] Wang, M., Weiss, M., Simonovic, M., Haertinger, G.*, et al.*, PaxDb, a database of protein abundance averages across all three domains of life. *Molecular & Cellular Proteomics* 

protein translation by pulsed SILAC. *PROTEOMICS* 2009, *9*, 205-209.

database: 2010 update. *Nucleic Acids Research* 2010, *38*, D736-D742.

localization. *Mol Cell Proteomics* 2012, *11*, M111 013680.

protein interaction partners. *Mol Cell Proteomics* 2010, *9*, 861-879.

following DNA damage. *PROTEOMICS* 2010, *10*, 4087-4097.

M111 011429.

*Proteomics* 2011.

2008, *26*, 1367-1372.

014068.

2012.

*Proteomics* 2012, *11*, M111 014407.

profiles. *Mol Cell Proteomics* 2012.

Genome. *Science* 2001, *291*, 1304-1351.

proteomics. *Mol Cell Proteomics* 2002, *1*, 376-386.

*Molecular & Cellular Proteomics* 2010, *9*, 457-470.

2. Incubate between 0.5hours and overnight at 4°C, rotating.

*NB. Perform all of the following steps on ice. Keep IP Buffer on ice also.* 


For further details on IPs and analysis with Mass Spec see the following;


#### **Author details**

Sara ten Have, Kelly Hodge and Angus I. Lamond *The Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dundee, Scotland, UK* 

#### **9. References**

[1] Lamond, A. I., Uhlen, M., Horning, S., Makarov, A.*, et al.*, Advancing cell biology through proteomics in space and time (PROSPECTS). *Mol Cell Proteomics* 2012, *11*, O112 017731.

[2] Ahmad, Y., Boisvert, F. M., Lundberg, E., Uhlen, M., Lamond, A. I., Systematic analysis of protein pools, isoforms, and modifications affecting turnover and subcellular localization. *Mol Cell Proteomics* 2012, *11*, M111 013680.

198 Functional Genomics

not required.

**Author details** 

**9. References** 

solution (up-scaling as required).

2. Incubate between 0.5hours and overnight at 4°C, rotating.

*NB. Perform all of the following steps on ice. Keep IP Buffer on ice also.* 

ratio with IP buffer. (i.e if 25µl of beads then 25µl of IP buffer)

10mins, at room temp each time, for doing in-solution digest).

fractions use 200µl of Cytoplasmic protein solution, and 50µl of Nucleoplasmic protein

3. Wash beads with 1ml of IP buffer and spin down. Repeat. Re-suspend the beads in a 1:1

N.B: if glycine buffer is used then it will result in a sample with an acidic pH. This needs to be neutralised so that further analysis can be done. Neutralisation of the sample can be done by slow, drop-by-drop addition of 1M Tris.HCl, pH 7.5. pH strips or LDS buffer (acidic pH will cause LDS buffer to turn yellow) colour can be used to check pH. In the case of insolution digest the protein will need to be precipitated- in which case pH adjustment is

8. Run both unbound and bound protein samples on a 1D 4-12% BisTris gel to provide a

 Boulon, S., Ahmad, Y., Trinkle-Mulcahy, L., Verheggen, C.*, et al.*, Establishment of a Protein Frequency Library and Its Application in the Reliable Identification of Specific

 Trinkle-Mulcahy, L., Boulon, S., Lam, Y. W., Urcia, R.*, et al.*, Identifying specific protein interaction partners using quantitative mass spectrometry and bead proteomes. *The* 

 Ten Have S, Boulon S, Ahmad Y, Lamond AI. Mass spectrometry-based immunoprecipitation proteomics - The user's guide. Proteomics. 2011 Mar;11(6):1153-9. doi:

[1] Lamond, A. I., Uhlen, M., Horning, S., Makarov, A.*, et al.*, Advancing cell biology through proteomics in space and time (PROSPECTS). *Mol Cell Proteomics* 2012, *11*, O112 017731.

complete comparison. In Gel Digestion protocol can then be undertaken.

Protein Interaction Partners. *Molecular & Cellular Proteomics* 2010, *9*, 861-879.

For further details on IPs and analysis with Mass Spec see the following;

*Journal of Cell Biology* 2008, *183*, 223-239.

10.1002/pmic.201000548. Epub 2011 Feb 16.

Sara ten Have, Kelly Hodge and Angus I. Lamond

*University of Dundee, Dundee, Scotland, UK* 

*The Centre for Gene Regulation and Expression, College of Life Sciences,* 

4. Add 50µl of bead slurry to each Ab-lysate sample and rotate for 1-3 hours at 4°C. 5. Spin down beads. Retain the supernatant as this contains the unbound proteins. 6. Wash beads 3x with 1ml IP buffer. Vortex for 1 min before spinning down the beads. 7. Completely remove all liquid from the beads using gel loading tips then elute the bound proteins with either 2x 30µl aliquots of 2x LDS sample buffer (shaking for 5mins, at 95°C each time, for running samples on gels) or 2x 30µl glycine buffer (shaking for


[18] Szklarczyk, D., Franceschini, A., Kuhn, M., Simonovic, M.*, et al.*, The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. *Nucleic Acids Res* 2011, *39*, D561-568.

*Nucleic Acids Res* 2011, *39*, D561-568.

[18] Szklarczyk, D., Franceschini, A., Kuhn, M., Simonovic, M.*, et al.*, The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored.

### *Edited by Germana Meroni and Francesca Petrera*

This book titled "Functional Genomics" contains a selection of chapters focused on crucial topics in functional genomics, from the analysis of the genetic code, to the understanding of the role of the different genes and to the proteomic implications. The book provides an overview on basic issues and some of the recent developments in medicinal science and technology. Covering all the aspects involved in such a broad theme as functional genomics and in all its applications would be impossible within the same book. The different chapters represent a brief introduction to the topic, connecting the most promising developments in functional genomics technologies, focusing on specific applications in biomedicine, agro-food technologies and zootechniques.

Functional Genomics

Functional Genomics

*Edited by Germana Meroni and Francesca Petrera*

Photo by Gio\_tto / iStock