**4. Gene duplication and evolution**

DNA duplication act as one of the main forces driving the evolution of organisms by creating the raw genetic material that natural selection can subsequently modify. Gene duplications arise in eukaryotes at a rate of 0.01 paralogs per gene per million years (Lynch and Conery, 2000), the same order of magnitude of the mutation rate per nucleotide per year (De Grassi *et al.*, 2008). Duplication of individual genes, chromosomal segments, or entire genomes represent the primary source for the origin of evolutionary novelties, including new gene functions and expression patterns (Holland *et al.*, 1994; Sidow, 1996; Lynch and Conery, 2000). However, how duplicated genes successfully evolve from an initial state of complete redundancy, wherein one copy is likely to be expendable, to a stable situation in which both copies are maintained by natural selection, is unclear (Sidow, 1996; Lynch and Conery, 2000; Ober, 2010).

In the evolutionary history of plants, genome duplications have been relatively common, leading to the hypothesis that most angiosperms are to some extent polyploidal (Soltis, 2005). The genome of Arabidopsis, for example, possesses traces of at least three polyploidy events (Vision *et al.*, 2000; Simillion *et al.*, 2002), followed by subsequent gene loss (Bowers *et al.*, 2003; Ober, 2010).

Similar to a point mutation, a duplication that occurs in an individual can be fixed or lost in the population. Compared with pre-existing alleles, if a new allele of the duplicate gene is selectively neutral, it has a small probability (1/2N) to be fixed in a diploid population (where N is the effective population size). This suggests that the majority of duplicated genes will be lost. For those duplicated genes that do become fixed, the fixation time averages is 4N generations (Kimura, 1989; Zhang, 2003).

The Evolutionary History of CBF Transcription Factors:

colors for humans and related primates (Zhang, 2003).

Gene Duplication of CCAAT – Binding Factors NF-Y in Plants 203

functional divergence of the two opsins resulted in a 30-nanometer difference in their maximum absorption wavelength. This difference conferred a sensitivity to a wide range of

Fig. 2. Evolutionary fate of duplicated genes. Gene duplication may result in new functions

The fate of a gene that suffers duplication seems to be the result of diverse and, in some cases, interdependent factors (Taylor *et al.*, 2001). These variables include its functional category (Papp *et al.*, 2003; Kondrashov and Koonin, 2004; Marland *et al.*, 2004), degree of conservation (Conant and Wagner, 2002; Davis and Petrov, 2004; Jordan *et al.*, 2004; Braybrook and Harada, 2008), sensitivity to dosage effects (Kondrashov and Koonin, 2004), as well as its regulatory and architectural complexity (He and Zhang, 2005). Some observations indicate that natural selection created a preferential association of duplications with certain gene categories. For example, genes encoding proteins that interact with the environment are more frequently retained after the duplication process than genes which interact at intracellular compartments

via different scenarios. **(a)** nonfunctionalization; **(b)** neofunctionalization; **(c)** subfunctionalization; **(d)** genetic robustness and **(e)** gene conversion.

Adapted from Conrad and Antonarakis (2007).

On an evolutionary scale, gene duplication may result in new functions via different scenarios. Although the most likely outcome is a loss of function in one of the two gene copies (nonfunctionalization, Figure 2a), in rare instances one copy may acquire a novel evolutionarily advantageous function and become preserved by natural selection (neofunctionalization, Figure 2b), while the other copy retains the original function. Alternatively, after duplication, mutations may occur in both genes leading to specialization to perform complementary functions (subfunctionalization, Figure 2c) (Lynch and Conery, 2000; Lynch and Force, 2000). This process produces novel genetic variants that drive genetic innovation (Lynch and Conery, 2000; Conrad and Antonarakis, 2007). Because gene duplication generates functional redundancy, it is often not advantageous to the organism to possess two identical genes. In nonfunctionalization (Figure 2a), the accumulation of deleterious mutations might lead to the loss of the original function of one paralogue. Alternatively, instead of being completely lost, many duplicated genes are silenced or become pseudogenes and are thus either unexpressed or functionless (Gallagher *et al.*, 2004; Nicole *et al.*, 2006; Yang *et al.*, 2006; Beisswanger and Stephan, 2008; Xiong *et al.*, 2009). Pseudogenization is the most frequent fate of duplicated genes. In *Caenorhabditis elegans*, for example, genomic analyses have identified 2168 pseudogenes or approximately one pseudogene for every eight functional genes (Harrison *et al.*, 2001). In humans, one pseudogene was identified for approximately every two functional genes (Harrison *et al.*, 2002). As pseudogenes generally do not confer a selective advantage, they have a low probability of being fixed in large populations (Ober, 2010).

Unless the presence of an extra amount of gene product is advantageous, it is unlikely that two genes with the same function will be stably maintained in the genome of the organism (Nowak *et al.*, 1997). In subfunctionalization (Figure 2c), both duplicated copies may become, by accumulation of mutations, partially compromised to the point at which their total capacity is reduced to the level of the single-copy ancestral gene (Force *et al.*, 1999; Stoltzfus, 1999; Lynch and Force, 2000). Subfunctionalization can occur through the modification of the regulatory elements by mutations (Force *et al.*, 1999; Hinman and Davidson, 2007) or by epigenetic silencing (Rodin and Riggs, 2003). In an evolutionary scale, one of the most important forms of subfunctionalization is the division of gene expression after duplication (Force *et al.*, 1999). For example, zebrafish ENGRAILED 1 and ENGRAILED 1-B, generated by a chromosomal segmental duplication, are a pair of transcription factors that occurred in the lineage of ray-finned fish. While ENGRAILED-1 is expressed in the pectoral appendage bud, ENGRAILED 1-B is expressed in a specific set of neurons in the hindbrain/spinal cord (Force *et al.*, 1999). In yeast, more than 40% of gene pairs exhibit significant expression divergence (Gu *et al.*, 2002). Also, the comparison of 17 fungal genomes revealed that duplicated genes rarely diverge with respect to biochemical function, but typically diverge with respect to regulatory control (Wapinski *et al.*, 2007). On the other hand, if two redundant gene copies were retained without significant functional divergence in the genome, the organism may acquire increased genetic robustness against harmful mutations (Figure 1d) (Conrad and Antonarakis, 2007).

In neofunctionalization (Figure 2b), the ancestral gene keeps its ancestral function, while the duplicated gene gains a new function under positive selection for advantageous mutations (De Grassi *et al.*, 2008). However, in many cases, rather than an entirely new function, a related function evolves after gene duplication. For example, the red and green-sensitive opsin genes of humans where the result of a gene duplication that occurred in hominoids and Old World monkeys (Yokoyama and Yokoyama, 1989). After the duplication process,

On an evolutionary scale, gene duplication may result in new functions via different scenarios. Although the most likely outcome is a loss of function in one of the two gene copies (nonfunctionalization, Figure 2a), in rare instances one copy may acquire a novel evolutionarily advantageous function and become preserved by natural selection (neofunctionalization, Figure 2b), while the other copy retains the original function. Alternatively, after duplication, mutations may occur in both genes leading to specialization to perform complementary functions (subfunctionalization, Figure 2c) (Lynch and Conery, 2000; Lynch and Force, 2000). This process produces novel genetic variants that drive genetic innovation (Lynch and Conery, 2000; Conrad and Antonarakis, 2007). Because gene duplication generates functional redundancy, it is often not advantageous to the organism to possess two identical genes. In nonfunctionalization (Figure 2a), the accumulation of deleterious mutations might lead to the loss of the original function of one paralogue. Alternatively, instead of being completely lost, many duplicated genes are silenced or become pseudogenes and are thus either unexpressed or functionless (Gallagher *et al.*, 2004; Nicole *et al.*, 2006; Yang *et al.*, 2006; Beisswanger and Stephan, 2008; Xiong *et al.*, 2009). Pseudogenization is the most frequent fate of duplicated genes. In *Caenorhabditis elegans*, for example, genomic analyses have identified 2168 pseudogenes or approximately one pseudogene for every eight functional genes (Harrison *et al.*, 2001). In humans, one pseudogene was identified for approximately every two functional genes (Harrison *et al.*, 2002). As pseudogenes generally do not confer a selective advantage, they have a low

Unless the presence of an extra amount of gene product is advantageous, it is unlikely that two genes with the same function will be stably maintained in the genome of the organism (Nowak *et al.*, 1997). In subfunctionalization (Figure 2c), both duplicated copies may become, by accumulation of mutations, partially compromised to the point at which their total capacity is reduced to the level of the single-copy ancestral gene (Force *et al.*, 1999; Stoltzfus, 1999; Lynch and Force, 2000). Subfunctionalization can occur through the modification of the regulatory elements by mutations (Force *et al.*, 1999; Hinman and Davidson, 2007) or by epigenetic silencing (Rodin and Riggs, 2003). In an evolutionary scale, one of the most important forms of subfunctionalization is the division of gene expression after duplication (Force *et al.*, 1999). For example, zebrafish ENGRAILED 1 and ENGRAILED 1-B, generated by a chromosomal segmental duplication, are a pair of transcription factors that occurred in the lineage of ray-finned fish. While ENGRAILED-1 is expressed in the pectoral appendage bud, ENGRAILED 1-B is expressed in a specific set of neurons in the hindbrain/spinal cord (Force *et al.*, 1999). In yeast, more than 40% of gene pairs exhibit significant expression divergence (Gu *et al.*, 2002). Also, the comparison of 17 fungal genomes revealed that duplicated genes rarely diverge with respect to biochemical function, but typically diverge with respect to regulatory control (Wapinski *et al.*, 2007). On the other hand, if two redundant gene copies were retained without significant functional divergence in the genome, the organism may acquire increased genetic robustness against

In neofunctionalization (Figure 2b), the ancestral gene keeps its ancestral function, while the duplicated gene gains a new function under positive selection for advantageous mutations (De Grassi *et al.*, 2008). However, in many cases, rather than an entirely new function, a related function evolves after gene duplication. For example, the red and green-sensitive opsin genes of humans where the result of a gene duplication that occurred in hominoids and Old World monkeys (Yokoyama and Yokoyama, 1989). After the duplication process,

probability of being fixed in large populations (Ober, 2010).

harmful mutations (Figure 1d) (Conrad and Antonarakis, 2007).

functional divergence of the two opsins resulted in a 30-nanometer difference in their maximum absorption wavelength. This difference conferred a sensitivity to a wide range of colors for humans and related primates (Zhang, 2003).

Fig. 2. Evolutionary fate of duplicated genes. Gene duplication may result in new functions via different scenarios. **(a)** nonfunctionalization; **(b)** neofunctionalization; **(c)** subfunctionalization; **(d)** genetic robustness and **(e)** gene conversion. Adapted from Conrad and Antonarakis (2007).

The fate of a gene that suffers duplication seems to be the result of diverse and, in some cases, interdependent factors (Taylor *et al.*, 2001). These variables include its functional category (Papp *et al.*, 2003; Kondrashov and Koonin, 2004; Marland *et al.*, 2004), degree of conservation (Conant and Wagner, 2002; Davis and Petrov, 2004; Jordan *et al.*, 2004; Braybrook and Harada, 2008), sensitivity to dosage effects (Kondrashov and Koonin, 2004), as well as its regulatory and architectural complexity (He and Zhang, 2005). Some observations indicate that natural selection created a preferential association of duplications with certain gene categories. For example, genes encoding proteins that interact with the environment are more frequently retained after the duplication process than genes which interact at intracellular compartments

The Evolutionary History of CBF Transcription Factors:

duplicated genes (Conrad and Antonarakis, 2007).

**5. Gene duplication of NF-Y in plants** 

(Yamamoto *et al.*, 2009).

2001).

Gene Duplication of CCAAT – Binding Factors NF-Y in Plants 205

It has been shown that shortly after duplication the protein-coding sequence and *cis*regulatory regions of some duplicated genes can evolve independently (Figure 3a) (Wagner, 2000). This independent evolution can generate protein sequence divergence of duplicated genes (Figure 3b1) or protein network divergence (Figure 3b2), where the protein interaction domains (*cis*-regulatory elements) of the original sequence evolve by maintenance, gain, or loss of interacting partners. Alternatively, the divergence of *cis*-regulatory motifs in the promoter-proximal region (Figure 3c) can generate expression divergence between the

While duplication of NF-Y genes is poorly understood in the plant lineage, many of the functional mechanistic details are likely conserved across plant, animal and fungal lineages. This inference comes from strong cross-kingdom conservation of functional important amino acid residues in mammalian and yeast NF-Ys (Maity and de Crombrugghe, 1992; Maity *et al.*, 1992; Sinha *et al.*, 1995; Coustry *et al.*, 1996; Kim *et al.*, 1996; Sinha *et al.*, 1996; Mantovani, 1998; Romier *et al.*, 2003). CCAAT-like motifs are found in several plant promoters, and binding activity to CCAAT sequences has been identified in plant nuclear extracts (Yazawa and Kamada, 2007). Besides, at least some plant NF-YA and NF-YB subunits have been shown to complement yeast mutant strains lacking the corresponding NF-Y subunit. Additionally, several groups have demonstrated that each of the three plant NF-Y proteins can substitute their yeast counterparts in gene expression assays (Edwards *et al.*, 1998; Masiero *et al.*, 2002; Ben-Naim *et al.*, 2006; Siefers *et al.*, 2009). These observations indicate that plant NF-Y subunits might act as general transcription factors, as in mammals

Although a complete functional plant NF-Y complex has not yet been described, the individual subunits are known to be involved in a number of important physiological processes, such as specific developmental processes and response to environmental stimuli (Lotan *et al.*, 1998; Kusnetsov *et al.*, 1999; Miyoshi *et al.*, 2003; Ben-Naim *et al.*, 2006; Combier *et al.*, 2006; Wenkel *et al.*, 2006; Cai *et al.*, 2007; Nelson *et al.*, 2007; Warpeha *et al.*, 2007; Siefers *et al.*, 2009). A well-established example is the NF-YB subunit gene called LEAFY COTYLEDON-1 (LEC1), which specifically controls embryo development, especially the maturation phase. LEC1 plays specialized roles not only because of its developmentally regulated expression but also due to its distinct molecular activity, as the *in vivo* function of LEC1 cannot be replaced by other NF-YB subunits, except for the most closely related Leafy Cotyledon 1 Like (L1L) (Kwong *et al.*, 2003; Lee *et al.*, 2003; Yamamoto *et al.*, 2009). In Arabidopsis, many NF-Y subunit genes are expressed ubiquitously, although some are differentially expressed. For example, while the AtNF-YC-4 transcript accumulates in seeds 7 days after germination, AtNF-YB-9 is only expressed in green siliques (Gusmaroli *et al.*,

Plant NF-Y function also appears to be important for responses to drought stress. Although a specific mechanism of action remains unclear, overexpression of the AtNF-YB1 subunit and its orthologue in maize (*Zea mays*), ZmNF-YB2, leads to enhanced drought resistance (Nelson *et al.*, 2007). Another study showed that overexpression of maize NF-YA5 reduced drought susceptibility, anthocyanin production and stomatal aperture, while *nf-ya5* mutants had the expected opposite phenotype in each situation (Li *et al.*, 2008). In addition, several

(Li *et al.*, 2003; Marland *et al.*, 2004). In addition, genomes tend to retain duplicated genes involved in signal transduction and transcription, but to lose duplicated DNA repair genes (Blanc and Wolfe, 2004; Maere *et al.*, 2005; Paterson *et al.*, 2010).

Fig. 3. Functional divergence of duplicated genes. **(a)** The independent evolution of *cis*regulatory and the protein-coding regions. **(b1)** protein sequence divergence after duplication; **(b2)** protein network divergence with increase or loss of partners and **(c)** DNA sequence regulatory divergence after duplication (certain regulatory motifs are lost in one copy of the duplicated gene sequence). *t;* evolutionary time. Adapted from Conrad and Antonarakis (2007).

(Li *et al.*, 2003; Marland *et al.*, 2004). In addition, genomes tend to retain duplicated genes involved in signal transduction and transcription, but to lose duplicated DNA repair genes

Fig. 3. Functional divergence of duplicated genes. **(a)** The independent evolution of *cis*regulatory and the protein-coding regions. **(b1)** protein sequence divergence after

Antonarakis (2007).

duplication; **(b2)** protein network divergence with increase or loss of partners and **(c)** DNA sequence regulatory divergence after duplication (certain regulatory motifs are lost in one copy of the duplicated gene sequence). *t;* evolutionary time. Adapted from Conrad and

(Blanc and Wolfe, 2004; Maere *et al.*, 2005; Paterson *et al.*, 2010).

It has been shown that shortly after duplication the protein-coding sequence and *cis*regulatory regions of some duplicated genes can evolve independently (Figure 3a) (Wagner, 2000). This independent evolution can generate protein sequence divergence of duplicated genes (Figure 3b1) or protein network divergence (Figure 3b2), where the protein interaction domains (*cis*-regulatory elements) of the original sequence evolve by maintenance, gain, or loss of interacting partners. Alternatively, the divergence of *cis*-regulatory motifs in the promoter-proximal region (Figure 3c) can generate expression divergence between the duplicated genes (Conrad and Antonarakis, 2007).
