**7. References**

192 Gene Duplication

From the analysis of maximum genetic distance between the two most distant proteins of the mycobacterial multiple genome clusters, we suggest that the divergence of at least one of the duplicate gene copies from the ancestral gene increases with the increase in cluster size. These homologous gene families consist of orthologs and paralogs. The lack of a strong correlation between the average genetic distance and cluster size of the duplicate gene copies in the multiple genome clusters indicates that the homologous gene families including proteins from different mycobacterial species have not undergone complete functional divergence. This is to

The average genetic distance estimated for single genome paralogous gene clusters, on the other hand, decreases with the increase in cluster size, suggesting that, on average, smaller families tend to diverge more rapidly than the larger families. This is apart from some members of the larger families, which have obviously diverged further as they are contributing to the increased maximum genetic distance with cluster size. Though gene duplication is considered to be an important mechanism for acquiring new genes, and creating evolutionary novelty (Torgerson and Singh, 2004), horizontal gene transfer (HGT) is also known to be a wide spread phenomenon, and a significant proportion of genes in bacteria are accepted to have been acquired by HGT (Price *et al.*, 2007). The genome of *M. tuberculosis* is known to contain 19 genes of eukaryotic origin, and it is speculated that the organism may have also acquired genes from other prokaryotes by HGT (Kinsella *et al.,* 2003). In addition, the occurrence of many intraspecies HGT events in the progenitor of *M. tuberculosis* has been reported (Rosas-Magallanes *et al.*, 2006). The ability of HGT to incorporate a new gene which is homologous to an existing gene family member is well recognized (Ochman, 2001; Kinsella *et al.*, 2003; Krzywinska, 2004), and in comparison to its gene family members, the newly introduced gene may be more divergent in sequence and function (Pushker *et al.*, 2004). Following duplication, such laterally transferred genes with already divergent functions may further diversify in the process of evolving new functions, and this could result in an increase in genetic distance between the laterally transferred duplicate gene and its paralog gene family members. The chance of this should increase

Phylogenetic analysis of the sigma factors in *M. tuberculosis* suggested that most of the sigma factors have orthologs in other mycobacteria. However, we could not observe orthologs in *M. leprae* for a few of the subfamilies, and this could have been due to the extensive loss of sigma factors during its reductive genome evolution. Agarwal *et al.,* 2007 reports that sigM proteins control only a small subset of genes, and their loss would not influence *M. tuberculosis* virulence (Agarwal *et al.,* 2007). The difference in the divergence of sigM in *M. tuberculosis* compared to other mycobacteria, and absence of its ortholog in *M. bovis* should be considered further to study the importance of the sigM factor in *M. tuberculosis* virulence.

The estimated duplicate gene percentages for *M. tuberculosis* from independent genome clustering (31%), InterPro signature methods (38%), across genome clustering (49%) and a union of the methods (51%) were all relatively high, showing the significance of gene duplication in *M. tuberculosis* genome evolution. The investigation of relationships between the GC composition and duplicate gene percentages identified from the sequence and InterPro domain data provides sufficient evidence to suggest that for the mycobacterial species, with the exception of *M. leprae*, the maintenance of high duplicate gene percentages

be expected for orthologs, which tend to maintain their functions.

with the number of members.

**5. Conclusions and future work** 


Analysis of Duplicate Gene Families

10325.

*4, No. 42.*

No. Pt 6, pp. 1707-1712.

Genome Biology*, Vol. 10, No. 1.*

Volume 186, No. 4, pp. 895–902.

33, Database issue, D297–D302.

duplications. *Genome Biology,* Vol. 3, No. 2.

in Microbial Genomes and Application to the Study of Gene Duplication in *M. tuberculosis* 195

Kersey, P.; Bower L, Morris L, Horne A, Petryszak R, Kanz C, Kanapin A, Das U, Michoud

Kinsella, RJ.; Fitzpatrick DA, Creevey CJ, McInerney JO. (2003). Fatty acid biosynthesis in

Kondrashov, FA.; Rogozin IB, Wolf YI, Koonin EV. (2002). Selection in the evolution of gene

Koonin, EV. & Wolf YI. (2008). Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. *Nucleic Acids Research*, Vol. 36, No. 21. *Koonin, EV. & Wolf, YI. (2009). Is evolution Darwinian or/and Lamarckian?* Biology Direct, *Vol.* 

Krzywinska, E.; Krzywinski J, & Schorey JS. 2004. Naturally occurring horizontal gene

*Lagomarsino, MC. (2009). Universal features in the genome-level evolution of protein domains.* 

Manganelli, R.; Provvedi R, Rodrigue S, Beaucher J, Gaudreau L, Smith I. (2004). σ Factors

Marri, PR.; Bannantine, JP. & Golding, GB. (2006). Comparative genomics of metabolic

Mulder, NJ; Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V,

Musto, H.; Naya H, Zavala A, Romero H, Alvarez-Valín F, and Bernardi G. (2006). Genomic GC

Mann, S. & Chen, YP. (2010). Bacterial genomic G + C composition-eliciting environmental

Mann, S.; Li, J. & Chen YP. (2010). Insights into Bacterial Genome Composition through

Nadeau, JH. & Sankoff, D. (1997). Comparable Rates of Gene LOSS and Functional

transfer *FEMS. Microbiology* Review, Vol. 30, No. 6, pp. 906-925.

*Biophysical Research Communications,* Vol. 347*,* No. 1, pp. 1-3.

*Nucleic Acids Research*. Vol. *35*, No. D224-D228.

adaptation. *Genomics*, Vol. 95, No. 1, pp. 7-15.

1, Pages 79-96.

147, No. 3, pp. 1259-1266.

transfer and homologous recombination in *Mycobacterium*. *Microbiology,* Vol. 150,

and Global Gene Regulation in *Mycobacterium tuberculosis. Journal of Bacteriology*,

pathways in mycobacterium species: gene duplication, gene decay and lateral gene

Cerutti L, Copley R, Courcelle E, Das U, Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D, Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A, Langendijk-Genevaux PS, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Nikolskaya AN, Orchard S, Orengo C, Petryszak R, Selengut JD, Sigrist CJ, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. (2007). New developments in the InterPro database.

level, optimal growth temperature, and genome size in prokaryotes. *Biochemical and* 

Variable Target GC Content Profiling. *Journal of Computational Biology*, Vol. 17, No.

Divergence After Genome Duplications Early in Vertebrate Evolution*. Genetics,* Vol.

K, Phan I, Gattiker A, Kulikova T, Faruque N, Duggan K, Mclaren P, Reimholz B, Duret L, Penel S, Reuter I and Apweiler R. (2005). Integr8 and Genome Reviews: integrated views of complete genomes and proteomes. *Nucleic Acids Research,* Vol.

Mycobacterium tuberculosis: Lateral gene transfer, adaptive evolution, and gene duplication*. Proceedings of National Academy of Science*, Vol. 100, No. 18, pp. 10320–


Apweiler, R.; Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, Bucher P, Cerutti L,

Basak, S. & Ghosh, TC. (2005). On the origin of genomic adaptation at high temperature for

Castresana, J. (2000). Selection of Conserved Blocks from Multiple Alignments for Their Use

Choi, I. & Kim, S. (2007). Global extent of horizontal gene transfer. *Proceedings of National* 

Cordero, OX. & Hogeweg, P. (2009). The impact of long-distance horizontal gene transfer on

Cun Y. (2010). The Evolutionary Dynamics of Mutant Allele at Duplicate Loci

DeRose-Wilson, LJ. & Gaut, BS. (2007). Transcription-related mtations and GC content drive

Fontan, PA.; Voskuil MI, Gomez M, Tan D, Pardini M, Manganelli R, Fattorini L, Schoolnik

Fontan, PA.; Aris V, Alvarez ME, Ghanny S, Cheng J, Soteropoulos P, Trevani A, Pine R, and

Force, A.; Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J. (1999). Preservation of

Fraser-Liggett, CM. (2005). Insights on biology and evolution from microbial genome

Guindon, S. & Gascuel, O. (2003). A Simple, Fast, and Accurate Algorithm to Estimate Large

Hamady, M; Betterton, MD., & Knight, R. (2006). Using the nucleotide substitution rate matrix to detect horizontal gene transfer. *BMC Bioinformatics*, Vol. 7, No. 476. He, X. & Zhang, J. (2005). Gene Complexity and Gene Duplicability. *Current Biology*. Vol. 15,

Hooper, DS. & Berg, GO. (2003). On the Nature of Gene Innovation: Duplication patterns in

Microbial Genomes. *Molecular Biology and Evolution*, Volume 20, No. 6, pp. 945-954.

and *Arabidopsis lyrata. BMC Evolutionary Biology*, Vol. 7, No. 66.

the Host Inflammatory Response. Vol. 198, pp. 877-85.

sequencing. *Genome Research,* Vol. 15, No. 12, pp.1603–1610.

*leprae*? *Trends in Microbiology,* Volume 11, No. 2, pp. 59–61.

*Academy of Science*, Vol. 104, No. 11, pp. 4489–4494.

330, No. 3, pp. 629-632.

51, pp. 21748–21753.

*arXiv*:1007.0333v1.

5628–5633.

4, pp. 1531-1545.

No. 11, pp. 1016-1021.

696–704.

540–552.

Corpet F, Croning MD, Durbin R, Falquet L, Fleischmann W, Gouzy J, Hermjakob H, Hulo N, Jonassen I, Kahn D, Kanapin A, Karavidopoulou Y, Lopez R, Marx B, Mulder NJ, Oinn TM, Pagni M, Servant F, Sigrist CJ, Zdobnov EM. (2001). The InterPro database, an integrated documentation resource for protein families, domains and functional sites. *Nucleic Acids Research*, Vol. 29, No. 1, pp. 37-40. Babu, MM. (2003). Did the loss of sigma factors initiate pseudogene accumulation in *M.* 

prokaryotic organisms. *Biochemical and Biophysical Research Communications,* Vol.

in Phylogenetic Analysis. *Molecular Biology and Evolution,* Vol. 17, Vol. 4, pp.

prokaryotic genome size. *Proceedings of National Academy of Science*, Vol. 106, No.

variation in nucleotide substitution rates across the genomes of *Arabidopsis thaliana* 

GK, Smith I. (2009). The Mycobacterium tuberculosis Sigma Factor B Is Required for Full Response to Cell Envelope Stress and Hypoxia In Vitro, but It Is Dispensable for In Vivo Growth. *Journal of Bacteriology*, Volume 191, No. 18, pp.

Smith I. (2008 ). Mycobacterium tuberculosis Sigma Factor E Regulon Modulates

duplicate genes by complementary, degenerative mutations. *Genetics,* Vol. 151, No.

Phylogenies by Maximum Likelihood. *Systematic Biology,* Vol. 52, No. 5, pp.


**11** 

*Brazil* 

**The Evolutionary History of CBF Transcription** 

Alexandro Cagliari, Andreia Carina Turchetto-Zolet, Felipe dos Santos Maraschin, Guilherme Loss, Rogério Margis and Marcia Margis-Pinheiro

Eukaryotic gene expression is often controlled by complex and refined combinatorial transcription factor networks composed of multiprotein complexes that derive their gene regulatory capacity from both intrinsic properties and from their *trans*-acting partners (Singh, 1998; Wolberger, 1998; Remenyi *et al.*, 2004). Participation in such higher complex order allows an organism to use single transcription factors to control multiple genes with

In this chapter, we provide a synopsis of the genetic and genomic mechanisms that might be responsible for the gene copy diversification observed in the eukaryotic NF-Y transcription factor family. We identify the genes coding for NF-Y transcription factors in eukaryotes with an emphasis on the duplication of the NF-Y family in the plant lineage and discuss the

Eukaryotic genes contain numerous *cis*-regulatory elements that mediate their induction, repression or basal transcription (Dynan and Tjian, 1985; Myers *et al.*, 1986; Maity and de Crombrugghe, 1998). These regulatory elements can be found in the proximity of transcribed genes, such as the promoter region and/or in distant regions of the genes where

The transcriptional regulation of several eukaryotic genes is coordinated through sequencespecific binding of proteins to the promoter region located upstream of the gene. During evolution, many of these protein-binding sequences, which are found in a wide variety of

The CCAAT box is one of the most common upstream elements, found in approximately 25–30% of eukaryotic promoters (Bucher, 1990; Mantovani, 1998). It is typically located between 60–100 bp upstream of the transcription start site and it can function in direct or in inverted orientations (Dorn *et al.*, 1987b; Bucher, 1990; Edwards *et al.*, 1998; Mantovani, 1998; Stephenson *et al.*, 2007) with possible cooperative interactions between multiple boxes (Tasanen *et al.*, 1992) or other conserved motifs (Muro *et al.*, 1992; Rieping and Schoffl, 1992;

organisms, have shown a high degree of conservation (Edwards *et al.*, 1998).

different temporal and spatial expression patterns (Siefers *et al.*, 2009).

important consequences of its gene diversification.

**2. The CCAAT** *cis***-element promoter** 

they may act as enhancers (de Silvio *et al.*, 1999).

**1. Introduction** 

**Factors: Gene Duplication of CCAAT** 

 **– Binding Factors NF-Y in Plants** 

*Universidade Federal do Rio Grande do Sul/UFRGS* 

