**5. Gene duplication of NF-Y in plants**

While duplication of NF-Y genes is poorly understood in the plant lineage, many of the functional mechanistic details are likely conserved across plant, animal and fungal lineages. This inference comes from strong cross-kingdom conservation of functional important amino acid residues in mammalian and yeast NF-Ys (Maity and de Crombrugghe, 1992; Maity *et al.*, 1992; Sinha *et al.*, 1995; Coustry *et al.*, 1996; Kim *et al.*, 1996; Sinha *et al.*, 1996; Mantovani, 1998; Romier *et al.*, 2003). CCAAT-like motifs are found in several plant promoters, and binding activity to CCAAT sequences has been identified in plant nuclear extracts (Yazawa and Kamada, 2007). Besides, at least some plant NF-YA and NF-YB subunits have been shown to complement yeast mutant strains lacking the corresponding NF-Y subunit. Additionally, several groups have demonstrated that each of the three plant NF-Y proteins can substitute their yeast counterparts in gene expression assays (Edwards *et al.*, 1998; Masiero *et al.*, 2002; Ben-Naim *et al.*, 2006; Siefers *et al.*, 2009). These observations indicate that plant NF-Y subunits might act as general transcription factors, as in mammals (Yamamoto *et al.*, 2009).

Although a complete functional plant NF-Y complex has not yet been described, the individual subunits are known to be involved in a number of important physiological processes, such as specific developmental processes and response to environmental stimuli (Lotan *et al.*, 1998; Kusnetsov *et al.*, 1999; Miyoshi *et al.*, 2003; Ben-Naim *et al.*, 2006; Combier *et al.*, 2006; Wenkel *et al.*, 2006; Cai *et al.*, 2007; Nelson *et al.*, 2007; Warpeha *et al.*, 2007; Siefers *et al.*, 2009). A well-established example is the NF-YB subunit gene called LEAFY COTYLEDON-1 (LEC1), which specifically controls embryo development, especially the maturation phase. LEC1 plays specialized roles not only because of its developmentally regulated expression but also due to its distinct molecular activity, as the *in vivo* function of LEC1 cannot be replaced by other NF-YB subunits, except for the most closely related Leafy Cotyledon 1 Like (L1L) (Kwong *et al.*, 2003; Lee *et al.*, 2003; Yamamoto *et al.*, 2009). In Arabidopsis, many NF-Y subunit genes are expressed ubiquitously, although some are differentially expressed. For example, while the AtNF-YC-4 transcript accumulates in seeds 7 days after germination, AtNF-YB-9 is only expressed in green siliques (Gusmaroli *et al.*, 2001).

Plant NF-Y function also appears to be important for responses to drought stress. Although a specific mechanism of action remains unclear, overexpression of the AtNF-YB1 subunit and its orthologue in maize (*Zea mays*), ZmNF-YB2, leads to enhanced drought resistance (Nelson *et al.*, 2007). Another study showed that overexpression of maize NF-YA5 reduced drought susceptibility, anthocyanin production and stomatal aperture, while *nf-ya5* mutants had the expected opposite phenotype in each situation (Li *et al.*, 2008). In addition, several

The Evolutionary History of CBF Transcription Factors:

*Strongylocentrotus* 

*Drosophila* 

**Phylo Specie Code Subunit A** 

Gene Duplication of CCAAT – Binding Factors NF-Y in Plants 207

**Metazoa** *Homo sapiens* Hsa 1 1 1

*Mus musculus* Mmu 1 1 1 *Rattus norvegicus* Rno 1 1 1 *Canis familaris* Cfa 1 1 1 *Monodelphis domestica* Mdo 1 1 1 *Gallus gallus* Gga 1 1 1 *Xenopus tropicalis* Xtr 1 1 1 *Gasterosteus aculeatus* Gac 1 1 1 *Oryzias latipes* Ola 1 1 1 *Takifugu rubripes* Tru 1 1 1 *Danio rerio* Dre 1 1 1 *Ciona savignyi* Csa 1 1 1 *Branchiostoma floridae* Bfl 2 2 2

*purpuratus* Spu 1 1 1

*melanogaster* Dme 1 1 1 *Anopheles gambiae* Aga 1 1 1 *Tribolium castaneum* Tca 1 1 1 *Caenorhabditis elegans* Cel 2 2 2 *Lottia gigantea* Lgi 2 2 2 *Nematostella vectensis* Nve 1 1 1

*Candida tropicalis* Ctr 1 1 1 *Tuber melanosporum* Tme 1 1 1 *Pyrenophora teres* Pte 1 1 1 *Aspergillus nidulans* Ani 1 1 1 *Chaetomium globosum* Cgl 1 1 1 *Penicillium marneffei* Pma 1 1 1 *Talaromyces stipitatus* Tst 1 1 1 *Sordaria macrospora* Sma 1 1 1

*Ricinus communis* Rco 6 12 7 *Populus tricocharpa* Ptr 8 17 9 *Medicago truncatula* Mtr 5 10 5 *Glycine max* Gma 21 25 11 *Cucumis sativus* Csa 6 11 3 *Prunus persica* Ppe 6 13 6 *Arabidopsis thaliana* Ath 10 13 13 *Carica papaya* Cpa 5 9 3 *Vitis vinifera* Vvi 7 12 5 *Sorghum bicolor* Sbi 9 10 7 *Zea mays* Zma 10 20 14 *Oryza sativa* Osa 11 10 8 *Brachipodyum* Bdi 7 13 10

**Fungi** *Neurospora crassa* Ncr 1 1 1

**Heterolobosea** *Naegleria gruberi* Ngr 1 1 1 **Metaphyta** *Manihot esculenta* Mes 12 15 9

**Genes Subunit B Genes Subunit C Genes** 

publications strongly suggest that NF-Y transcription factors are also involved in photoperiod-regulated flowering (Ben-Naim *et al.*, 2006; Wenkel *et al.*, 2006; Siefers *et al.*, 2009).

We adopted a high throughput comparative genomic approach to conduct a broad survey of fully sequenced genomes, including representatives of amoebozoa, yeasts, fungi, algae, mosses, plants, vertebrate and invertebrate species to identify the presence of homologous genes coding for each of the three subunits that form the NF-Y transcription factor (Table 1). NF-Y gene and protein sequences were obtained through blast searches (blastp, blastx and tblastx) against the Protein and Genome databases with the default parameters at the NCBI (National Center for Biotechnology Information - http://www.ncbi.nlm.nih.gov) and against completed genome projects database at the JGI (Joint Genome Institute http://www.jgi.doe.gov).

The results point to a scenario where all fungi and the majority of metazoa possess single genes coding for each of the NF-Y subunits (Table 1). The metazoa exceptions include the amphioxus *Branchiostoma floridae,* the nematode *Caenorhabditis elegans* and the gastropod *Lottia gigantea,* all of each present a proportional duplication of the three subunits, possessing two genes for each subunit (Table 1).

In contrast, plants possess gene families coding for each NF-Y subunit (Table 1). For instance, in the model plant *Arabidopsis thaliana* 10 genes coding for NF-YA, 13 for NFY-B, and 13 for NF-YC were identified. Because of the heterotrimeric composition, the 36 Arabidopsis NF-Y subunits could theoretically combine to generate 1.690 unique transcription factors (Siefers *et al.*, 2009). This Arabidopsis NF-Y expansion is a general feature of the plant lineage, including monocots and eudicots. In rice (*Oryza sativa*), for example, 11 genes were identified coding for the NF-YA subunit, 10 for NF-YB and 8 for NFY-C. Four of the rice NF-YB subunits have been characterized and at least one of these genes is involved in chloroplast development (Miyoshi *et al.*, 2003; Yazawa and Kamada, 2007). Interestingly, the moss *Physcomitrella patens* and the lycophyte *Selaginella mollendorffii* possess single genes coding for NF-YA subunits whereas the other subunits are encoded by multiple genes (Table 1).

Since the evolutionary rates can be species dependent, the difference observed in the number of genes of NF-Y subunits in eukaryotic class (Table 1), especially in vascular plants, can be result of recent duplication process that contribute to the establishment of genes families coding each NF-Y subunit. However, some duplicated genes might have suffered high level of diversification what could be responsible to prevent their identification in our analyses.

Representative plants genes (monocot and eudicot) were selected to perform phylogenetic analyses of the NF-Y subunits. The phylogenetic analysis was reconstructed after protein sequence alignments using a Bayesian approach in MrBayes 3.1.2 (Ronquist and Huelsenbeck, 2003). The mixed amino acid substitution model plus gamma and invariant sites was used in two independent runs of 5,000,000 generations each with two Metropoliscoupled Monte Carlo Markov chains (MCMCMC) that were run in parallel (starting each from a random tree). Markov chains were sampled every 100 generations, and the first 25% of the trees were discarded as burn-in. The remaining ones were used to compute the majority rule consensus tree, the posterior probability of clades and branch lengths (Figure 4 to 6).

publications strongly suggest that NF-Y transcription factors are also involved in photoperiod-regulated flowering (Ben-Naim *et al.*, 2006; Wenkel *et al.*, 2006; Siefers *et al.*,

We adopted a high throughput comparative genomic approach to conduct a broad survey of fully sequenced genomes, including representatives of amoebozoa, yeasts, fungi, algae, mosses, plants, vertebrate and invertebrate species to identify the presence of homologous genes coding for each of the three subunits that form the NF-Y transcription factor (Table 1). NF-Y gene and protein sequences were obtained through blast searches (blastp, blastx and tblastx) against the Protein and Genome databases with the default parameters at the NCBI (National Center for Biotechnology Information - http://www.ncbi.nlm.nih.gov) and against completed genome projects database at the JGI (Joint Genome Institute -

The results point to a scenario where all fungi and the majority of metazoa possess single genes coding for each of the NF-Y subunits (Table 1). The metazoa exceptions include the amphioxus *Branchiostoma floridae,* the nematode *Caenorhabditis elegans* and the gastropod *Lottia gigantea,* all of each present a proportional duplication of the three subunits,

In contrast, plants possess gene families coding for each NF-Y subunit (Table 1). For instance, in the model plant *Arabidopsis thaliana* 10 genes coding for NF-YA, 13 for NFY-B, and 13 for NF-YC were identified. Because of the heterotrimeric composition, the 36 Arabidopsis NF-Y subunits could theoretically combine to generate 1.690 unique transcription factors (Siefers *et al.*, 2009). This Arabidopsis NF-Y expansion is a general feature of the plant lineage, including monocots and eudicots. In rice (*Oryza sativa*), for example, 11 genes were identified coding for the NF-YA subunit, 10 for NF-YB and 8 for NFY-C. Four of the rice NF-YB subunits have been characterized and at least one of these genes is involved in chloroplast development (Miyoshi *et al.*, 2003; Yazawa and Kamada, 2007). Interestingly, the moss *Physcomitrella patens* and the lycophyte *Selaginella mollendorffii* possess single genes coding for NF-YA subunits whereas the other subunits are encoded by

Since the evolutionary rates can be species dependent, the difference observed in the number of genes of NF-Y subunits in eukaryotic class (Table 1), especially in vascular plants, can be result of recent duplication process that contribute to the establishment of genes families coding each NF-Y subunit. However, some duplicated genes might have suffered high level of diversification what could be responsible to prevent their identification in our

Representative plants genes (monocot and eudicot) were selected to perform phylogenetic analyses of the NF-Y subunits. The phylogenetic analysis was reconstructed after protein sequence alignments using a Bayesian approach in MrBayes 3.1.2 (Ronquist and Huelsenbeck, 2003). The mixed amino acid substitution model plus gamma and invariant sites was used in two independent runs of 5,000,000 generations each with two Metropoliscoupled Monte Carlo Markov chains (MCMCMC) that were run in parallel (starting each from a random tree). Markov chains were sampled every 100 generations, and the first 25% of the trees were discarded as burn-in. The remaining ones were used to compute the majority rule consensus tree, the posterior probability of clades and branch lengths (Figure 4

2009).

http://www.jgi.doe.gov).

multiple genes (Table 1).

analyses.

to 6).

possessing two genes for each subunit (Table 1).


The Evolutionary History of CBF Transcription Factors:

Gene Duplication of CCAAT – Binding Factors NF-Y in Plants 209

**Mes14**

**Rco12**

**Gma7**

**Gma12 Rco7**

**Gma23**

**Gma9**

**Mes15**

**Ptr14**

**Vvi4**

**Mtr9**

**Vvi6**

**Gma22**

**Gma24**

**Ptr6**

**Vvi8**

**Mes6**

0,99

0,92

0,98

1

**Ath4**

**Vvi9**

**Ptr16**

**Ptr15**

**Osa5**

**Sbi9**

**Ath5**

**Mes13**

**Rco11 Bdi3**

**Ptr10**

**Mes2**

**Ath7**

**Gma6**

**Mtr6**

1

0,97

0,92

1

0,95

1

0,99

1

0,98

0,7

0,97

0,97

0,99

1

0,91

1

0,95

0,79 0,59

> 0,99 1

0,99

0,94

0,89

0,99

0,99

**Ptr4**

**Rco6**

**Mes1**

**Vvi5**

**Mtr10**

**Ptr2**

**Ptr5**

**Mtr4**

**Ptr11 Bdi11**

**Ptr3**

0,63

0,89

0,89

**Bdi9**

**Bdi10**

**Osa7**

**Sbi7**

**Gma21**

**Mes12**

**Gma20**

**Vvi10**

**Rco9**

**Vvi2 Vvi7**

**Mtr8**

**Ath6**

**Gma15**

**Gma25**

**Sbi3**

**Mes9**

**Ptr12**

**Rco3**

**Mes7 Bdi13 Ptr8**

**Ptr1**

0,72

0,98

0,99

0,55

0,83

0,6

0,96

0,9

0,98

0,97

0,81

0,64

0,94

0,92

0,92

**Ath3**

**Ath2**

**Mtr1**

**Vvi12**

**Rco4**

**Gma1**

**Gma3**

**Sbi8**

**Sbi5**

**Osa2**

**Osa1**

**Ptr9**

**Rco5**

**Bdi1**

**Bdi6**

**Osa8**

**Sbi10**

**Ath9**

0,99

0,74

0,9

0,53

0,94

0,98 0,98

1

1

1

0,62

0,61

0,59

1

1

1

0,97

1

1

1

1

1

0,97

0,77

0,7

**Mes11**

**Rco10**

**Osa9**

**Gma5**

**Gma16**

**Gma11**

**Gma14**

**Gma17**

**Bdi12**

**Gma2**

**Osa10**

**Gma4**

**Sbi1**

**Mtr3**

**Mes8**

0,81

1

0,78

0,71

0,55

0,71

0,98

0,53

0,68

0,72

0,71

0,63

0,63

0,7

0,98

0,71

0,71

0,51

1

1

0,95

0,71

**Osa6**

**Rco8**

**Mes10**

**Vvi1**

**Mtr5**

**Ptr7**

**Mes3**

**Rco1**

**Ptr17**

**Gma8**

**Ath8**

0,66

0,93

1

1

0,94

0,63

0,68

0,73

0,95

0,9

1

1

**Ath10**

**Mes4**

**Mes5**

**Rco2**

**Gma13**

D

M

D M

D

M

D

D

M

M

D

D

**Gma18**

**Gma10**

**Gma19**

**Osa4**

**Bdi8**

**Sbi6**

**Sbi4**

**Osa3 Bdi5**

**Sbi2**

**Ath1**

1

1

1

1 0,91

1

1

1

1

1

0,8

0,93

0,83

0,99

**Bdi7**

**Bdi4**

**Bdi2**

**Vvi11**

**Mtr7**

**Vvi3**

**Mtr2**

**Ptr13**

20.0

Fig. 5. Phylogenetic tree of monocot and eudicot representatives of NF-YB subunit.

0,67

0,83

I

II

III

0,51

0,83

For details see legend of Figure 4.


Table 1. NF-Y genes identified in the fully eukaryotic sequenced genomes.

Fig. 4. Phylogenetic tree of monocot and eudicot representatives of NF-YA subunit.M: monocots; D: eudicots; Rco: *Ricinus communis*; Mes: *Manihot esculenta*; Ptr: *Populus tricocharpa*; Gma: *Glycine max*; Mtr: *Medicago truncatula*; Vvi: *Vitis vinifera*; Ath: *Arabidopsis thaliana*; Sbi: *Sorghum bicolor;* Bdi: *Brachipodyum distachyon*; Osa: *Oryza sativa*; red square: event of duplication inside the specie; green square: event of duplication inside the same plant family; black arrows: genes that possess an unresolved position in the phylogenetic tree; I to IV: independent phylogenetic gene clusters.

*Selaginella mollendorffii* Smo 1 5 3 *Physcomitrella patens* Ppa 1 6 6

0,95

0,97

0,95

0,96

*tricornutum* Ptri 1 1 1

**Gma6**

**Gma14**

**Mtr3**

**Mtr1**

**Gma15 Ptr8**

D

M

D

M

D

D

M

M

**Gma3**

**Ptr3**

**Mes3**

**Mes10**

**Rco2**

**Ath7**

**Vvi5**

**Osa3**

**Bdi1 Ath4**

**Sbi7**

**Mes5**

**Rco3**

**Osa7**

**Mes9**

**Ptr2**

**Ptr5**

**Vvi3**

1

1

1

1

0,94

0,98 0,63

0,73

1

0,95

0,98

0,98

1

1

1

0,98

1

1

1

1

1

1

1

1

1

1

0,77

0,76

1

<sup>1</sup> <sup>1</sup>

0,6 0,77

0,81

1

1

0,95

1

1

**Mes1**

**Rco6**

**Gma2**

**Gma21**

**Mes7 Gma12**

**Vvi6**

**Gma13**

**Gma17**

**Gma20**

**Sbi6**

**Mes2 Gma4 Gma16**

**Rco4**

**Ptr1**

**Gma1 Vvi1**

**Gma7**

**Bdi7**

**Gma9**

**Ath6**

**Gma8**

**Mtr4**

**Rco1**

**Mes6**

**Ath5**

**Ath3**

1

0,95

1

0,95

0,92

0,99

1

0,99

1

1

<sup>1</sup> 0,87

1

1

0,99

1 1

1

0,99 <sup>1</sup>

0,51

**Ath8**

**Gma19**

**Osa6**

**Sbi1**

0,99

1

0,95

0,57

1

1

1 0,76 1

1

0,85

1

1

<sup>1</sup> <sup>1</sup>

**Sbi3**

**Sbi5**

**Bdi6**

**Osa4**

**Mes12 Vvi7**

**Rco5 Mes11**

**Mes4**

**Osa1**

**Bdi2**

**Sbi9**

**Bdi4**

**Osa10**

**Ath2**

**Vvi2**

**Mtr5**

**Gma10**

**Gma11**

**Gma18**

**Gma5**

**Ptr6**

1

0,99

1

1

0,85

**Ptr4**

**Sbi2**

**Bdi3**

**Osa9 Ath10**

**Osa8**

**Osa5**

**Bdi5**

**Sbi8**

**Ptr7**

**Mes8 Osa2**

**Vvi4**

**Sbi4**

**Ath9**

**Ath1**

**Mtr2**

*distachyon*

Table 1. NF-Y genes identified in the fully eukaryotic sequenced genomes.

I

10.0

Fig. 4. Phylogenetic tree of monocot and eudicot representatives of NF-YA subunit.M: monocots; D: eudicots; Rco: *Ricinus communis*; Mes: *Manihot esculenta*; Ptr: *Populus tricocharpa*; Gma: *Glycine max*; Mtr: *Medicago truncatula*; Vvi: *Vitis vinifera*; Ath: *Arabidopsis thaliana*; Sbi: *Sorghum bicolor;* Bdi: *Brachipodyum distachyon*; Osa: *Oryza sativa*; red square: event of duplication inside the specie; green square: event of duplication inside the same plant family; black arrows: genes that possess an unresolved position in the phylogenetic

0,95

1

II

III

IV

tree; I to IV: independent phylogenetic gene clusters.

0,76

0,99

**Heterokonta** *Phaeodactylum* 

0,99

Fig. 5. Phylogenetic tree of monocot and eudicot representatives of NF-YB subunit. For details see legend of Figure 4.

The Evolutionary History of CBF Transcription Factors:

MADS box transcription factors in angiosperms (Shan *et al.*, 2009).

gene conversion (Figure 2e) (Conrad and Antonarakis, 2007).

*et al.*, 2004), as can be observed in table 1.

in other organisms (Stephenson *et al.*, 2007).

Gene Duplication of CCAAT – Binding Factors NF-Y in Plants 211

Phylogenetic analysis showed that the gene diversification of all NF-Y subunits likely resulted from several duplication events along evolution and diversification of plants (Figure 4 to 6). It was possible to observe the formation of four independent highly supported clusters for the NF-YA subunit (I to IV, Figure 4), three for NF-YB (I to III, Figure 5) and five for NF-YC (I to V, Figure 6). Based on these results, we suggest that each cluster might possess an independent ancestral subunit that the duplicated members of each group originated from. However, independent duplication events have occurred in many species after the divergence of monocots and eudicots. For example, the soybean and Arabidopsis genomes have experienced a series of recent duplication events (red squares in figures 4 to 6) that could be the result of chromosome duplication or could be derived from polyploidization events (soybean is a good example of a recent polyploidization). These duplications can help us to explain the differences observed in the number of genes coding for the NF-Y subunits in plants (Table 1). Additionally, these duplications seem to be relatively recent and can provide the raw material for neofunctionalization (Figure 2b) and functional divergence of duplicated genes (Figure 3). With few exceptions (genes that possess an unresolved position in the phylogenetic tree are plotted with black arrows, Figures 4 to 6), all clusters of a specific NF-Y subunit are formed by well-defined subclusters of monocot and eudicot representatives (Figure 4 to 6). Events of duplication inside a specific plant family were also observed between the two fabaceae species *Glycine max* and *Medicago truncatula* (green square, Figure 4), which could indicate concerted evolution of duplicated genes between these related species (Figure 2e). This is similar to the cladespecific shifts in selective constraint following concerted duplication events observed for

The duplication process is a prominent feature of plant genomic architecture (Figure 1). This has led many researchers to speculate that gene duplication may have played an important role in the evolution of phenotypic novelty within the plant lineage (Flagel and Wendel, 2009). As a result of pervasive and recurring small-scale duplications, which may be followed by functional divergence, many nuclear genes in plants are members of gene families and may exhibit copy number variation lineages (Blanc and Wolfe, 2004; Schlueter

Evidence for frequent gene duplication has also been observed in the evolutionary history of numerous gene families that have expanded during the diversification of the angiosperms (De Bodt *et al.*, 2005; Zahn *et al.*, 2005; Duarte *et al.*, 2010). In multigene families descended from a common ancestor, individual genes in the group exert similar functions and have similar DNA sequences (Conrad and Antonarakis, 2007). One concept, concerted evolution, applies particularly to localized and typically tandem copies of a gene. The concept posits that all genes in a given group evolve coordinately, and that homogenization is the result of

The emerging picture points to plant NF-Y complexes acting as essential regulatory hubs for many processes. Multiple NF-Y subunits in vascular plants may associate with each other in various combinations that regulate the expression of specific gene sets and might provide similar levels of combinatorial diversity for transcriptional fine-tuning (Siefers *et al.*, 2009).The amplification observed in the plant lineage (Table 1) raises the possibility that new and divergent functions of heterotrimeric complexes have evolved in plants (Nelson *et al.*, 2007) indicating a more complex regulatory role for the various NF-Y proteins in plants than

Fig. 6. Phylogenetic tree of monocot and eudicot representatives of NF-YC subunit. For details see legend of Figure 4.

0,99

1

0,62

V

IV

III

II

I

0,86

0,87

0,8

For details see legend of Figure 4.

0,87

**Ptr7**

**Ptr6**

**Rco3**

**Mes1**

**Ptr2**

**Ptr1**

**Ptr4**

**Ptr3**

1

1

1

0,92

0,91

1

1

0,95

**Rco2**

**Bdi1**

**Bdi2 Osa1**

**Vvi4**

**Rco5**

**ATh2 Gma10**

**Mes5**

**Mes6**

**Rco4**

**Mes2 Mes9**

**Mtr2**

**Vvi2**

**Mtr1**

**Gma4**

**Ptr5**

1 1

0,99

1

0,97

1

1

1 0,68

1

1

1

1

**Ptr9**

**Mtr5**

**Vvi5**

**ATh1**

**Rco1**

**Ath4**

**Mes3**

**Mes7**

**Bdi6**

**Osa6**

**Sbi2**

**ATh8**

**ATh5**

**ATh6**

**ATh7**

**Osa8**

**Osa3**

**Sbi4**

**Bdi7**

**Sbi7**

**Bdi5**

**Osa7 Bdi9**

**Bdi10**

M

M

D

D

**Bdi3**

**Bdi4**

1

1

0,96

1

1

1

0,97

0,94

**Osa4**

**Osa2**

**Sbi3**

**Sbi1**

**Gma6**

**Gma5**

**Gma9**

**Gma11**

**ATh9**

**ATh3**

**Gma7**

**Gma3**

**Osa5**

**Ath12**

**Mes8**

**Sbi5**

M

D

D

D

**Bdi8**

**Vvi1**

**Rco7**

**Mes4**

**Rco6**

**Vvi3**

**Ptr8**

0,81

0,78 0,61

1

1

0,71

1

1

0,99

0,71

0,65

1

0,99

0,99

0,91

1

1

1

0,76

1

1

1

0,89

0,72

1

0,93 1

1

0,75

0,6

0,71 0,72

0,94

**Sbi6 Gma2 Gma8**

**ATh13**

**ATh10 ATh11**

**Gma1**

**Mtr3**

**Mtr4**

9.0

Fig. 6. Phylogenetic tree of monocot and eudicot representatives of NF-YC subunit.

0,62

0,8

0,9

0,55

1

1

Phylogenetic analysis showed that the gene diversification of all NF-Y subunits likely resulted from several duplication events along evolution and diversification of plants (Figure 4 to 6). It was possible to observe the formation of four independent highly supported clusters for the NF-YA subunit (I to IV, Figure 4), three for NF-YB (I to III, Figure 5) and five for NF-YC (I to V, Figure 6). Based on these results, we suggest that each cluster might possess an independent ancestral subunit that the duplicated members of each group originated from. However, independent duplication events have occurred in many species after the divergence of monocots and eudicots. For example, the soybean and Arabidopsis genomes have experienced a series of recent duplication events (red squares in figures 4 to 6) that could be the result of chromosome duplication or could be derived from polyploidization events (soybean is a good example of a recent polyploidization). These duplications can help us to explain the differences observed in the number of genes coding for the NF-Y subunits in plants (Table 1). Additionally, these duplications seem to be relatively recent and can provide the raw material for neofunctionalization (Figure 2b) and functional divergence of duplicated genes (Figure 3). With few exceptions (genes that possess an unresolved position in the phylogenetic tree are plotted with black arrows, Figures 4 to 6), all clusters of a specific NF-Y subunit are formed by well-defined subclusters of monocot and eudicot representatives (Figure 4 to 6). Events of duplication inside a specific plant family were also observed between the two fabaceae species *Glycine max* and *Medicago truncatula* (green square, Figure 4), which could indicate concerted evolution of duplicated genes between these related species (Figure 2e). This is similar to the cladespecific shifts in selective constraint following concerted duplication events observed for MADS box transcription factors in angiosperms (Shan *et al.*, 2009).

The duplication process is a prominent feature of plant genomic architecture (Figure 1). This has led many researchers to speculate that gene duplication may have played an important role in the evolution of phenotypic novelty within the plant lineage (Flagel and Wendel, 2009). As a result of pervasive and recurring small-scale duplications, which may be followed by functional divergence, many nuclear genes in plants are members of gene families and may exhibit copy number variation lineages (Blanc and Wolfe, 2004; Schlueter *et al.*, 2004), as can be observed in table 1.

Evidence for frequent gene duplication has also been observed in the evolutionary history of numerous gene families that have expanded during the diversification of the angiosperms (De Bodt *et al.*, 2005; Zahn *et al.*, 2005; Duarte *et al.*, 2010). In multigene families descended from a common ancestor, individual genes in the group exert similar functions and have similar DNA sequences (Conrad and Antonarakis, 2007). One concept, concerted evolution, applies particularly to localized and typically tandem copies of a gene. The concept posits that all genes in a given group evolve coordinately, and that homogenization is the result of gene conversion (Figure 2e) (Conrad and Antonarakis, 2007).

The emerging picture points to plant NF-Y complexes acting as essential regulatory hubs for many processes. Multiple NF-Y subunits in vascular plants may associate with each other in various combinations that regulate the expression of specific gene sets and might provide similar levels of combinatorial diversity for transcriptional fine-tuning (Siefers *et al.*, 2009).The amplification observed in the plant lineage (Table 1) raises the possibility that new and divergent functions of heterotrimeric complexes have evolved in plants (Nelson *et al.*, 2007) indicating a more complex regulatory role for the various NF-Y proteins in plants than in other organisms (Stephenson *et al.*, 2007).

The Evolutionary History of CBF Transcription Factors:

and Raes, 2004; Sterck *et al.*, 2007).

gene duplication (Zhang, 2003).

evolution (Flagel and Wendel, 2009).

**6. Conclusions** 

genes in plant metabolism.

**7. Acknowledgement** 

Gene Duplication of CCAAT – Binding Factors NF-Y in Plants 213

The most important contribution of gene duplication to evolution is to supply new genetic material for mutation, drift and selection to act upon. This leads to the creation of new genes and new gene functions (Hurley *et al.*, 2005; Woollard, 2005; Schmidt and Davies, 2007), two important factors in the origin of genomic and organismal complexity (Gu *et al.*, 2002; Taylor

The plasticity of a genome or species in adapting to environmental changes would be severely limited without gene duplication, because no more than two variants (alleles) exist at any locus within a diploid individual. A good example is the dozens of duplicated immunoglobulin genes that constitute the vertebrate adaptive immune system. It seems difficult to imagine how this system could have acquired this high complexity level without

Plant gene families are largely conserved even over evolutionary time scales that encompass the diversification of all angiosperms and nonflowering plants (Rensing *et al*., 2008). This property of plant genomes indicates that plants have not created new gene families, but have been endowed with a basic genetic toolkit of ancient origin. Despite the evolutionary conservation of gene families, lineage-specific fluctuations in gene family size are frequently observed among taxa (Velasco *et al*., 2007; Ming *et al*., 2008; Rensing *et al*., 2008), which suggests that the diversity and lineage-specific phenotypic variation observed in land plants may not be explained by an equally diverse set of entirely novel genes. Indeed, much of plant diversity may have arisen from the duplication and adaptive specialization processes of pre-existing genes (neofunctionalization and subfunctionalization, Figure 2b and c, respectively). This perspective assigns gene duplication a central role in plant diversification, being a key process that generates the raw material necessary for adaptive

Whereas various classes of structural and metabolic genes preferentially return to a single copy state following whole-genome duplication (Paterson *et al.*, 2010), transcription factors tend to be preferentially retained among the duplicated genes in *A. thaliana* (Flagel and Wendel, 2009). Our findings support the hypotheses that this preference seems to be true for all plant species, based on the number of genes identified for each NF-Y subunit. Certainly, further studies encompassing functional assays are required to ascertain the role of these

The number of interacting partners in a molecular network (connectivity) of a particular gene also influences the probability of duplication gene retention (Flagel and Wendel, 2009). In this scenario, the high number of genes coding for the three subunits of NF-Y transcription factor in higher plants leads to numerous interaction possibilities among different genes of each subunit and among these genes and other transcription factors what

This work was supported by a CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico), CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior), FAPERGS (Fundação de Amparo a Pesquisa do Estado do Rio Grande do Sul), FINEP (Financiadora de Projetos) and MCT (Ministério de Ciência e Tecnologia). A. Cagliari

could contribute to gene retention of the NF-Y transcription factor family in plants.

The existence of multiple genes for each subunit in the plant genome indicates that the specificity of subunit interaction may be determined by preferential protein–protein interaction, tissue or cell-specific expression of each gene or a combination of both (Yazawa and Kamada, 2007). The large number of possible combinations has hindered the analysis of plant NF-Y complexes and suggests that they might act in a more intricate system than in vertebrates and yeast, which have only one gene that encodes each HAP subunit (Yazawa and Kamada, 2007). Additionally, the multiple copies for each NF-Y subunit raises a question if a specific NF-Y subunit interacts with any other two NF-Y subunits or if the NF-Y subunit interacts with only specific member(s) of the other two subunits (Thirumurugan *et al.*, 2008).

Although the presence of many genes encoding NF-Y subunits suggests a high degree of genetic redundancy in plants, the analysis of mutants in single NF-Y genes in Arabidopsis has been associated with defects in development and enhanced stress sensitivity, suggesting a specialized function for each member (Lotan *et al.*, 1998; Kwong *et al.*, 2003; Lee *et al.*, 2003; Zanetti *et al.*, 2010). This could indicate that duplicated genes have passed through a neofunctionalization process (Figure 2b).

Some proteins may require several key substitutions before acquiring a new function, while others may be more mutationally labile. An example includes the terpene synthase gene family in Norway spruce (*Picea abies*). These genes appear to have undergone repeated rounds of neofunctionalization (Figure 2b) (Keeling *et al.*, 2008) and a small number of key amino acid substitutions among paralogs was sufficient to alter the substrate specificity and terpenoid product profiles (Flagel and Wendel, 2009). Another example of neofunctionalization (Figure 2b) in plants is observed in Arabidopsis, where a specific amino acid residue identified in LEC1 and LEC1-LIKE (L1L) is responsible for differentiating their functions (seed development) from those of other NF-YB members (Kwong *et al.*, 2003; Lee *et al.*, 2003; Yamamoto *et al.*, 2009). In addition, the analysis of amino acid substitution rates in plants has been appointed for the asymmetric evolution of certain duplicates of NF-YB and NF-YC subunits, which appears to be coupled with the asymmetric divergence in gene function (Yang *et al.*, 2005; Yamamoto *et al.*, 2009).

With respect to expression patterns, the Arabidopsis NF-Y gene family presents some members that are ubiquitously expressed and others that are tissue specific or induced only after the switch to reproductive growth in flowers and siliques (Gusmaroli *et al.*, 2001; 2002; Yazawa and Kamada, 2007). The difference observed in the expression pattern of these genes could represent an example of *cis*-regulatory divergence (Figure 3c), where the *cis*element of gene evolves independently from the other members of gene family, and becomes regulated by different stimuli and/or *trans*-activators.

Because genes that harbor NF-Y binding domains include genes that are constitutive, inducible, and cell-cycle-dependent, the regulation of the expression of these genes cannot be exclusively due to NF-Y binding to DNA. In this scenario, the interaction with other transcription factors, either functionally or physically, will contribute to the NF-Y action (Matuoka and Chen, 2002). In addition, the independent evolution of protein-binding domains present in duplicated gene architecture can contribute to protein network divergence (Figure 3b2), increasing the numbers of possible interacting partners of NF-Y genes.

When compared with other forms of mutation, a notable feature of duplication is that it creates genetic redundancy. This redundancy fosters evolutionary innovation, creating the opportunity for duplicates to explore new evolutionary terrain (Flagel and Wendel, 2009). The most important contribution of gene duplication to evolution is to supply new genetic material for mutation, drift and selection to act upon. This leads to the creation of new genes and new gene functions (Hurley *et al.*, 2005; Woollard, 2005; Schmidt and Davies, 2007), two important factors in the origin of genomic and organismal complexity (Gu *et al.*, 2002; Taylor and Raes, 2004; Sterck *et al.*, 2007).

The plasticity of a genome or species in adapting to environmental changes would be severely limited without gene duplication, because no more than two variants (alleles) exist at any locus within a diploid individual. A good example is the dozens of duplicated immunoglobulin genes that constitute the vertebrate adaptive immune system. It seems difficult to imagine how this system could have acquired this high complexity level without gene duplication (Zhang, 2003).

Plant gene families are largely conserved even over evolutionary time scales that encompass the diversification of all angiosperms and nonflowering plants (Rensing *et al*., 2008). This property of plant genomes indicates that plants have not created new gene families, but have been endowed with a basic genetic toolkit of ancient origin. Despite the evolutionary conservation of gene families, lineage-specific fluctuations in gene family size are frequently observed among taxa (Velasco *et al*., 2007; Ming *et al*., 2008; Rensing *et al*., 2008), which suggests that the diversity and lineage-specific phenotypic variation observed in land plants may not be explained by an equally diverse set of entirely novel genes. Indeed, much of plant diversity may have arisen from the duplication and adaptive specialization processes of pre-existing genes (neofunctionalization and subfunctionalization, Figure 2b and c, respectively). This perspective assigns gene duplication a central role in plant diversification, being a key process that generates the raw material necessary for adaptive evolution (Flagel and Wendel, 2009).
