**4.1 Ancestral Bcnt protein with highly conserved C-terminal region (Bcnt-C)**

It has been proposed that duplicated genes yield genetic redundancy, which should result in either the acquisition of a gene with a novel function or the degeneration of one of the duplicated genes. Two paralogs, *p97Bcnt/Cfdp2* and *p97Bcnt-2* genes, were created via a process of tandem duplication followed by retrotransposon insertion. We expect that the three Bcnt-related proteins may play a role in more refined cellular signaling in ruminants. The vertebrate Bcnt/Cfdp1 protein includes a highly conserved 82-amino acid region at the C-terminus, termed Bcnt-C, which is not present in either p97Bcnt/Cfdp2 or p97Bcnt2 (Iwashita et al., 2003; 2009) (Fig. 2). Bcnt-C, known as the BCNT superfamily, is found in most eukaryotes, including yeast, and is classified into Pfam 07572 in the Pfam database (http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=pfam07572). Although the functions of the BCNT family members remain mostly unclear, a vertebrate Bcnt/Cfdp1 was recently identified as a centromere protein, CENP-29, in DT40 cells, a chicken B cell line transformed by avian leukosis virus (Ohta et al., 2010). Furthermore, a yeast ortholog, Swc5 /YBR231C/AOR1, is a component of the chromatin-remodeling complex SWR1 in *Saccharomyces cerevisiae* (budding yeast)(Wu et al., 2009). The SWR1 complex mediates the ATP-dependent exchange of histone H2A for the H2A variant HZT1, and the Swc5 null mutant shows phenotypes of decreased resistance to macromolecule synthesis inhibitors such as hydroxyurea and cycloheximide, and increased heat sensitivity in budding yeast. These data indicate that the yeast *Bcnt* ortholog is not essential for survival, but contributes to maintaining physiological homeostasis at the transcriptional level.

Whereas Bcnt-C is highly conserved among almost all eukaryotes, the N-terminal regions are less conserved. For example, the amino acids in Drosophila Bcnt (YETI) are ~50% identitical to those of bovine Bcnt/Cfdp1 in the C-terminal region, while the N-terminal region shows only ~22 % identity. Thus, although YETI is reported to bind to microtubule-based motor kinesin-I (Wisniewski, et al., 2003), a reevaluation is needed to confirm whether vertebrate Bcnt functions in intracellular trafficking because its interaction is mediated via its N-terminal region. One characteristic of the three Bcnt-related proteins is their different numbers of IR units: Bcnt/Cfdp1, p97Bcnt/Cfdp2, and p97Bcnt2 have one, two, and three IR units, respectively. Sequences homologous to the 40 amino acid IR unit are found in zebra fish (*Danio rerio*) and nematodes, but not in yeast. These IR units comprise intrinsically disordered regions that might present scaffolds for protein-protein interaction as described later.

## **4.2 Intrinsic disorder of the three Bcnt-related proteins and cellular localization**

The three Bcnt-related proteins move more slowly in sodium dodecyl sulfate acrylamide gel electrophoresis (SDS-PAGE) than expected, resulting in apparently higher molecular masses than those calculated. For example, bovine brain Bcnt/Cfdp1 has 297 amino acids and a calculated molecular mass of 33.3 kDa, but appears around 45 kDa in SDS-PAGE (Fig. 6). The situation is exactly the same for both the p97Bcnt/Cfdp2 and p97Bcnt2 proteins, which have calculated molecular masses of 66. 3 and 70.8 kDa, respectively (Iwashita et al., 2003, 2009). This might be caused by their physical properties in that the three Bcnt-related proteins are intrinsically disordered proteins (IDPs). It has been shown that many biologically active proteins lack a stable three-dimensional (3-D) structure; such proteins are referred to as IDPs (Dunker et al., 2008). IDPs are common to the three domains of life, and, especially in multicellular eukaryotic proteins, account for more than 70% of total proteins. They are involved in the regulation of various signalings through protein-protein

It has been proposed that duplicated genes yield genetic redundancy, which should result in either the acquisition of a gene with a novel function or the degeneration of one of the duplicated genes. Two paralogs, *p97Bcnt/Cfdp2* and *p97Bcnt-2* genes, were created via a process of tandem duplication followed by retrotransposon insertion. We expect that the three Bcnt-related proteins may play a role in more refined cellular signaling in ruminants. The vertebrate Bcnt/Cfdp1 protein includes a highly conserved 82-amino acid region at the C-terminus, termed Bcnt-C, which is not present in either p97Bcnt/Cfdp2 or p97Bcnt2 (Iwashita et al., 2003; 2009) (Fig. 2). Bcnt-C, known as the BCNT superfamily, is found in most eukaryotes, including yeast, and is classified into Pfam 07572 in the Pfam database (http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=pfam07572). Although the functions of the BCNT family members remain mostly unclear, a vertebrate Bcnt/Cfdp1 was recently identified as a centromere protein, CENP-29, in DT40 cells, a chicken B cell line transformed by avian leukosis virus (Ohta et al., 2010). Furthermore, a yeast ortholog, Swc5 /YBR231C/AOR1, is a component of the chromatin-remodeling complex SWR1 in *Saccharomyces cerevisiae* (budding yeast)(Wu et al., 2009). The SWR1 complex mediates the ATP-dependent exchange of histone H2A for the H2A variant HZT1, and the Swc5 null mutant shows phenotypes of decreased resistance to macromolecule synthesis inhibitors such as hydroxyurea and cycloheximide, and increased heat sensitivity in budding yeast. These data indicate that the yeast *Bcnt* ortholog is not essential for survival, but contributes

Whereas Bcnt-C is highly conserved among almost all eukaryotes, the N-terminal regions are less conserved. For example, the amino acids in Drosophila Bcnt (YETI) are ~50% identitical to those of bovine Bcnt/Cfdp1 in the C-terminal region, while the N-terminal region shows only ~22 % identity. Thus, although YETI is reported to bind to microtubule-based motor kinesin-I (Wisniewski, et al., 2003), a reevaluation is needed to confirm whether vertebrate Bcnt functions in intracellular trafficking because its interaction is mediated via its N-terminal region. One characteristic of the three Bcnt-related proteins is their different numbers of IR units: Bcnt/Cfdp1, p97Bcnt/Cfdp2, and p97Bcnt2 have one, two, and three IR units, respectively. Sequences homologous to the 40 amino acid IR unit are found in zebra fish (*Danio rerio*) and nematodes, but not in yeast. These IR units comprise intrinsically disordered regions

**4.1 Ancestral Bcnt protein with highly conserved C-terminal region (Bcnt-C)** 

**4. Characteristics of three** *Bcnt***-related gene products** 

to maintaining physiological homeostasis at the transcriptional level.

that might present scaffolds for protein-protein interaction as described later.

**4.2 Intrinsic disorder of the three Bcnt-related proteins and cellular localization**  The three Bcnt-related proteins move more slowly in sodium dodecyl sulfate acrylamide gel electrophoresis (SDS-PAGE) than expected, resulting in apparently higher molecular masses than those calculated. For example, bovine brain Bcnt/Cfdp1 has 297 amino acids and a calculated molecular mass of 33.3 kDa, but appears around 45 kDa in SDS-PAGE (Fig. 6). The situation is exactly the same for both the p97Bcnt/Cfdp2 and p97Bcnt2 proteins, which have calculated molecular masses of 66. 3 and 70.8 kDa, respectively (Iwashita et al., 2003, 2009). This might be caused by their physical properties in that the three Bcnt-related proteins are intrinsically disordered proteins (IDPs). It has been shown that many biologically active proteins lack a stable three-dimensional (3-D) structure; such proteins are referred to as IDPs (Dunker et al., 2008). IDPs are common to the three domains of life, and, especially in multicellular eukaryotic proteins, account for more than 70% of total proteins. They are involved in the regulation of various signalings through protein-protein

Fig. 6. Unique mobility of the Bcnt/Cfdp1 and p97Bcnt/Cfdp2 proteins in SDS-PAGE

Extracts of bovine brain (1), rat brain (2) and MDBK cells, a bovine kidney epithelial cell line (3) were separated in SDS polyacrylamide gels and subjected to immunoblotting with anti-Bcnt-C peptide antibody in the presence (**a**) or absence (**b**) of antigen peptide at a final concentration of 100 M, or with anti-p97Bcnt monoclonal antibodies (**c**). The two small black arrows indicate Bcnt/Cfdp1 with an apparent molecular mass of 45 kDa appearing as a doublet, probably due to phosphorylation (Iwashita et. al., 2003); the red large arrow indicates Bcnt/Cfdp1 with an apparent molecular mass of 53 kDa as described below

interactions that are frequently triggered by posttranslational modifications within the regions of intrinsic disorder (Dunker et al., 2008). IDPs may function as hub proteins via the formation of complexes with cellular proteins, which are then modulated by protein modifications such as phosphorylation, acetylation, ubiquitination, or degradation. By computational prediction, Bcnt/Cfdp1, p97Bcnt/Cfdp2, and p97Bcnt2 are all suggested to comprise intrinsically disordered regions, except for the core of RTE domains that correspond to AP-endonuclease in the two paralogs (Fig. 7). This computational prediction is partially supported by an NMR study of the 3-D structure of the N-terminal 40 amino acid residues of the Bcnt-C region prepared in *Escherichia coli* using 15N-labeled amino acids. The spectrum revealed a lack of fixed tertiary structure (courtesy of Dr. T. Kohno). Furthermore, the Bcnt/Cfdp1 protein forms a tight protein complex with cellular proteins in bovine placenta even in the presence of a detergent, CHAPSO, when evaluated by gel filtration chromatography on Sephacryl S-300 HR followed by western blotting. Both Bcnt/Cfdp1 and p97Bcnt/Cfdp2 are phosphoproteins that are potentially phosphorylated on serine residues by casein kinase II *in vitro* (Iwashita et. al., 1999). Recently, the two phosphorylated serine residues in human BCNT, 116S in the N-terminal region and 250S in the C-terminal region, were identified by mass spectrometric analysis (Dephoure et al., 2008). This phosphorylation is cell cycle independent. It should be noted that these two phosphorylated serine residues reside in amino acid sequences WASF and WESF, respectively, which implies a unique motif for specific phosphorylation. Phosphorylation on these motifs might be expected to play a role in switching, such as switching the cation mediated protein-ligand interaction (Zacharias & Dougherty, 2002). These characteristics suggest that the three Bcnt-related family members are hub-like molecules.

Bucentaur (Bcnt) Gene Family: Gene Duplication and Retrotransposon Insertion 393

Fig. 8. Subcellular distribution of the three Bcnt-related proteins in MDBK cells

The data are consistent with previously reported results (not shown)

Subcellular fractionation of cultured MDBK cells was carried out successively using a Subcellular Protein Fractionation Kit from Pierce. Constant volume amounts of each fraction were assessed by immunoblotting. A: anti-Bcnt-C peptide antibody (Iwashita et al., 2003), B: anti-p97Bcnt monoclonal antibodies (Nobukuni et al., 1997) and C: anti-p97Bcnt2 peptide antibody (Iwashita et al., 2009). The right panel shows the Coomassie Brilliant Blue staining pattern. The subcellular fractions are identified at the top of the panels. The effectiveness of cellular fractionation was evaluated by immunoblotting using three antigens; anti-p120GAP (a marker for the cytosolic fraction, Kobayashi et al., 1993), anti-Topoisomerase II (a marker for the nuclear fraction, Iwashita et al., 1999), and anti-actin (a marker for the cytoskeleton).

3'-phosphodiesterase, 3'-phosphatase, and RNase H (Barzilay et al., 1995). Many organisms possess two functional AP-endonucleases, which are thought to be important for cell viability. In contrast to non-vertebrate AP-endonuclease, vertebrate AP-endonuclease, which has an extra 6 kDa N-terminal region of intrinsic disorder, plays a role not only in repairing DNA damage, but also in regulating the redox state of various proteins that modulate transcription factors such as AP-1 (Fos/Jun), NF-B, HIF-1, and p53 (Tell et al., 2009; Busso et al., 2010); thus it is termed APE/Ref-1 (AP-endonuclease/Redox effector factor 1). This is natural considering that DNA damage is one of the most vital stresses faced by living organisms. The extra N-terminal region of human AP-endonuclease (APE1) contains multiple arginine/lysine rich elements, and provides a scaffold for protein-protein interaction for DNA repair proteins such as Pol B and XRCC, and transcription factors including STAT3, YB-1, and nucleophosmin (NPM1) (Vascotto et al., 2009; Busso et. al., 2010). Although we have not yet found evidence that p97Bcnt/Cfdp2 and p97Bcnt2 possess any of these activities, they have several characteristics common to mammalian APendonuclease with intrinsic disorder regions at both the N- and C-termini. Amino acid sequences in a part of the RTE domains are well conserved in all ruminants so far examined including *Lesser Malay chevrotain* (Iwashita et al., 2009). The central 239-amino acid region of the RTE domain (termed the core RTE domain) corresponds exactly to Endonuclease/Exonuclease/Phosphatase family members (http://www.ncbi.nlm.nih.gov/

Fig. 7. Characteristics of intrinsic disorder of three Bcnt-related proteins

The Bcnt/Cfdp1, p97Bcnt/Cfdp2, and p97Bcnt2 proteins are predicted to comprise intrinsically disordered regions. Amino acid sequences of the three Bcnt-related proteins were subjected to analysis by a soft server of DisProt (Sickmeier et al., 2007), and individual profiles were obtained. The data for p97Bcnt/Cfdp2 are not shown, but are quite similar to those for p97Bcnt2. Vertical axes indicate the disorder probability of each amino acid residue, and the horizontal axes indicate the number of amino acid residues. Schematic domain structures of Bcnt/Cfdp1 and p97Bcnt2 are shown for each profile for comparision. Similar results were obtained using another program, Anchor (Mészáros et al., 2009)

We have found that the Bcnt/Cfdp1 protein from MDBK cells, a bovine kidney epithelial cell line (Madin & Darby, 1965; Iwashita et al., 1999), migrates at around 53 kDa in SDS-PAGE, significantly bigger than the rat or bovine brain proteins (Fig. 6). The same shift is observed in many other ruminant organs such as bovine placenta, testis and goat kidney, but not in all rat organs. Although we have yet not determined the cause, clarification of this anomaly could shed light on the role of Bcnt/Cfdp1, because the modification may be related to Bcnt/Cfdp1 function. Whereas the~175 amino acid N-terminal regions of the three Bcnt-related proteins are acidic as a whole, they contain several arginine/lysine-rich elements, including a putative nuclear targeting motif of Arg-Lys-Arg-Lys (~61-64th). Therefore we examined the cellular distribution of the three Bcnt-related proteins in MDBK cells. The three were localized in both the cytosolic and nuclear fractions, and, in addition, both p97Bcnt/Cfdp2 and p97Bcnt2 were found significantly in the chromatin fractions (Fig. 8). These results suggest that Bcnt family members have the potential to function as shuttle molecules between the cytosol and nuclei. The nuclear localizations of p97Bcnt/Cfdp2 and p97Bcnt2 are consistent with their protein structure domains; the two paralogs include APendonuclease domains in the middle of the molecule as described in more detail below. On the other hand, either the 45 kDa (all rat organs and bovine brain) or 53 kDa (MDBK cells) Bcnt/Cfdp1 is scarcely found in the chromatin fraction, although chicken Bcnt/Cfdp1 has been reported as a centromere protein in a transformed cell line (Ohta et al., 2010).

### **4.3 RTE domains of p97Bcnt/Cfdp2 and p97Bcnt2**

AP-endonuclease is well known to function as an abasic endonuclease in the base excision repair pathway. It possesses multiple enzymic activities as a 3'-5' DNA exonuclease,

The Bcnt/Cfdp1, p97Bcnt/Cfdp2, and p97Bcnt2 proteins are predicted to comprise intrinsically disordered regions. Amino acid sequences of the three Bcnt-related proteins were subjected to analysis by a soft server of DisProt (Sickmeier et al., 2007), and individual profiles were obtained. The data for p97Bcnt/Cfdp2 are not shown, but are quite similar to those for p97Bcnt2. Vertical axes indicate the disorder probability of each amino acid residue, and the horizontal axes indicate the number of amino acid residues. Schematic domain structures of Bcnt/Cfdp1 and p97Bcnt2 are shown for each profile for comparision.

We have found that the Bcnt/Cfdp1 protein from MDBK cells, a bovine kidney epithelial cell line (Madin & Darby, 1965; Iwashita et al., 1999), migrates at around 53 kDa in SDS-PAGE, significantly bigger than the rat or bovine brain proteins (Fig. 6). The same shift is observed in many other ruminant organs such as bovine placenta, testis and goat kidney, but not in all rat organs. Although we have yet not determined the cause, clarification of this anomaly could shed light on the role of Bcnt/Cfdp1, because the modification may be related to Bcnt/Cfdp1 function. Whereas the~175 amino acid N-terminal regions of the three Bcnt-related proteins are acidic as a whole, they contain several arginine/lysine-rich elements, including a putative nuclear targeting motif of Arg-Lys-Arg-Lys (~61-64th). Therefore we examined the cellular distribution of the three Bcnt-related proteins in MDBK cells. The three were localized in both the cytosolic and nuclear fractions, and, in addition, both p97Bcnt/Cfdp2 and p97Bcnt2 were found significantly in the chromatin fractions (Fig. 8). These results suggest that Bcnt family members have the potential to function as shuttle molecules between the cytosol and nuclei. The nuclear localizations of p97Bcnt/Cfdp2 and p97Bcnt2 are consistent with their protein structure domains; the two paralogs include APendonuclease domains in the middle of the molecule as described in more detail below. On the other hand, either the 45 kDa (all rat organs and bovine brain) or 53 kDa (MDBK cells) Bcnt/Cfdp1 is scarcely found in the chromatin fraction, although chicken Bcnt/Cfdp1 has

Similar results were obtained using another program, Anchor (Mészáros et al., 2009)

been reported as a centromere protein in a transformed cell line (Ohta et al., 2010).

AP-endonuclease is well known to function as an abasic endonuclease in the base excision repair pathway. It possesses multiple enzymic activities as a 3'-5' DNA exonuclease,

**4.3 RTE domains of p97Bcnt/Cfdp2 and p97Bcnt2** 

Fig. 7. Characteristics of intrinsic disorder of three Bcnt-related proteins

Fig. 8. Subcellular distribution of the three Bcnt-related proteins in MDBK cells

Subcellular fractionation of cultured MDBK cells was carried out successively using a Subcellular Protein Fractionation Kit from Pierce. Constant volume amounts of each fraction were assessed by immunoblotting. A: anti-Bcnt-C peptide antibody (Iwashita et al., 2003), B: anti-p97Bcnt monoclonal antibodies (Nobukuni et al., 1997) and C: anti-p97Bcnt2 peptide antibody (Iwashita et al., 2009). The right panel shows the Coomassie Brilliant Blue staining pattern. The subcellular fractions are identified at the top of the panels. The effectiveness of cellular fractionation was evaluated by immunoblotting using three antigens; anti-p120GAP (a marker for the cytosolic fraction, Kobayashi et al., 1993), anti-Topoisomerase II (a marker for the nuclear fraction, Iwashita et al., 1999), and anti-actin (a marker for the cytoskeleton). The data are consistent with previously reported results (not shown)

3'-phosphodiesterase, 3'-phosphatase, and RNase H (Barzilay et al., 1995). Many organisms possess two functional AP-endonucleases, which are thought to be important for cell viability. In contrast to non-vertebrate AP-endonuclease, vertebrate AP-endonuclease, which has an extra 6 kDa N-terminal region of intrinsic disorder, plays a role not only in repairing DNA damage, but also in regulating the redox state of various proteins that modulate transcription factors such as AP-1 (Fos/Jun), NF-B, HIF-1, and p53 (Tell et al., 2009; Busso et al., 2010); thus it is termed APE/Ref-1 (AP-endonuclease/Redox effector factor 1). This is natural considering that DNA damage is one of the most vital stresses faced by living organisms. The extra N-terminal region of human AP-endonuclease (APE1) contains multiple arginine/lysine rich elements, and provides a scaffold for protein-protein interaction for DNA repair proteins such as Pol B and XRCC, and transcription factors including STAT3, YB-1, and nucleophosmin (NPM1) (Vascotto et al., 2009; Busso et. al., 2010). Although we have not yet found evidence that p97Bcnt/Cfdp2 and p97Bcnt2 possess any of these activities, they have several characteristics common to mammalian APendonuclease with intrinsic disorder regions at both the N- and C-termini. Amino acid sequences in a part of the RTE domains are well conserved in all ruminants so far examined including *Lesser Malay chevrotain* (Iwashita et al., 2009). The central 239-amino acid region of the RTE domain (termed the core RTE domain) corresponds exactly to Endonuclease/Exonuclease/Phosphatase family members (http://www.ncbi.nlm.nih.gov/

Bucentaur (Bcnt) Gene Family: Gene Duplication and Retrotransposon Insertion 395

Neisserial AP-endonuclease activity from the exonuclease (Carpenter et al., 2007), and a spontaneous substitution of Val to Gly in the C-terminal Archaeglobus AP-endonuclease, which participates in forming an abasic DNA binding pocket, is accompanied by an increase in non-specific endonuclease activity (Schmiedel et al., 2009). This is probably because APendonuclease possesses multiple enzymatic activities as described above. Thus it could be expected that p97Bcnt/Cfdp2 and p97Bcnt2 would have different enzymatic properties,

Fig. 10. 3-D comparison of the two core RTE domains of p97Bcnt/Cfdp2 and p97Bcnt2

of Dr. M. Tanio, National Institutes of Natural Sciences, Okazaki

3-D structures of the two core RTE domains of p97Bcnt/Cfdp2 and p97Bcnt2 were remodeled by the I-TASSER server (Roy et al., 2010), and p97Bcnt2 was superimposed on the p97Bcnt axis fixing the main alpha chain. RMSD (root mean square deviation) of 243 amino acid residues was 0.68 Å. From the data of the top templates of 2jc5A (Carpenter et al, 2007) and 2voaA (Schmiedel et al., 2009), the catalytic sites and DNA binding sites in red, or a loop involved in the targeting site in orange from a template of 2v0rA (Repanas et al., 2007) were identified. Twelve amino acid substitutions between the two domains are shown in yellow. These drawings were obtained using PyMOL. Analysis was carried out courtesy

To explore further whether the nucleotide substitutions in p97Bcnt2 reflect natural selection in the two paralogs, we examined the *dN* (non-synonymous substitution per site)/*dS* (synonymous substitution per site) values for both the core RTE domain (243-483th amino acids) and the remaining regions (177-242th and 484-500th amino acids). The *dN*/*dS* values are 0.029/0.160 in the core region and 0.166/0.414 in the other regions. Although there are more non-synonymous and synonymous substitutions outside the core RTE domain of p97bcnt2, the *dN*/*dS* values are < 1, suggesting no definite attribution to positive selection. On the other hand, the *dN*/*dS* value in the core RTE domain is much lower than that of the other regions, suggesting that selective constraints have been substantially strong in the core RTE domain (Iwashita etal., 2009). These data suggest that the recruited RTE domains in both p97Bcnt/Cfdp2 and p97Bcnt2 have played a crucial role in the duplicated novel genes.

with each compensating for the function of the other.

cdd?term=Pfam03372). The amino acid sequences of p97Bcnt/Cfdp2 and p97Bcnt2 were compared with those of three canonical AP-endonucleases: human APEX1, *Archaeoglobus* Af\_Exo, and *Neiserria* Nape (Fig. 9). Although the comparison revealed low overall identity (~20%) in the core RTE domains, eight amino acid residues involved in catalytic activity and at least 6 amino acids participating in substrate binding are conserved among the molecules. Furthermore, their 3-D structures could be remodeled with high accuracy, revealing the characteristics of Exo III or AP-endonuclease.


Fig. 9. Highly conserved amino acid residues of the core RTE domains critical for AP-endonuclease activity

The amino acid sequences of the core RTE domains of p97Bcnt/Cfdp2 (241-483th) and p97Bcnt2 (243-485th), APEX1(human APE, 61-318th), Nape (*Neisseria*, 1-259th, Carpenter et al., 2007) and Af\_Exo (*Archaeoglobus fulgidus*, 1-257th, Schmiedel et al., 2009) were aligned by the ClustalW2 program of EMBL-EBI. Residues critical for the catalytic activity of canonical AP-endonucleases are shown in red bold, and amino acid substitutions in the core RTE domains between p97Bcnt/Cfdp2 and p97Bcnt2 are indicated in blue bold

The 3-D structure of AP-endonuclease is evolutionarily well conserved and comprises two domains, each containing six-stranded sheets decorated by helixes on the concave site (Barzilay et al., 1995). The predicted 3-D structures of both RTE domains of p97Bcnt/Cfdp2 and p97Bcnt2 are quite similar to each other, and present possible DNA-binding sites between -helix domains and opposite both the N-terminal and C-terminal regions of the RTE domain. Next we superimposed the structure of p97Bcnt2 onto that of p97Bcnt/Cfdp2 to examine the structural relationship between the two domains (Fig. 10). Whereas the Nterminal ~60-amino acid region is variable between the two, there are only 12 amino acid differences in the 239-amino acid core RTE domain. It is noteworthy that 7 of the 12 different residues are located in the neighborhood of the predicted active sites. It is characteristic that the enzymatic properties of AP-endonuclease change significantly with subtle changes in the neighborhood of the active cavity. For example, a single amino acid substitution restores

cdd?term=Pfam03372). The amino acid sequences of p97Bcnt/Cfdp2 and p97Bcnt2 were compared with those of three canonical AP-endonucleases: human APEX1, *Archaeoglobus* Af\_Exo, and *Neiserria* Nape (Fig. 9). Although the comparison revealed low overall identity (~20%) in the core RTE domains, eight amino acid residues involved in catalytic activity and at least 6 amino acids participating in substrate binding are conserved among the molecules. Furthermore, their 3-D structures could be remodeled with high accuracy, revealing the

Fig. 9. Highly conserved amino acid residues of the core RTE domains critical for

domains between p97Bcnt/Cfdp2 and p97Bcnt2 are indicated in blue bold

The amino acid sequences of the core RTE domains of p97Bcnt/Cfdp2 (241-483th) and p97Bcnt2 (243-485th), APEX1(human APE, 61-318th), Nape (*Neisseria*, 1-259th, Carpenter et al., 2007) and Af\_Exo (*Archaeoglobus fulgidus*, 1-257th, Schmiedel et al., 2009) were aligned by the ClustalW2 program of EMBL-EBI. Residues critical for the catalytic activity of canonical AP-endonucleases are shown in red bold, and amino acid substitutions in the core RTE

The 3-D structure of AP-endonuclease is evolutionarily well conserved and comprises two domains, each containing six-stranded sheets decorated by helixes on the concave site (Barzilay et al., 1995). The predicted 3-D structures of both RTE domains of p97Bcnt/Cfdp2 and p97Bcnt2 are quite similar to each other, and present possible DNA-binding sites between -helix domains and opposite both the N-terminal and C-terminal regions of the RTE domain. Next we superimposed the structure of p97Bcnt2 onto that of p97Bcnt/Cfdp2 to examine the structural relationship between the two domains (Fig. 10). Whereas the Nterminal ~60-amino acid region is variable between the two, there are only 12 amino acid differences in the 239-amino acid core RTE domain. It is noteworthy that 7 of the 12 different residues are located in the neighborhood of the predicted active sites. It is characteristic that the enzymatic properties of AP-endonuclease change significantly with subtle changes in the neighborhood of the active cavity. For example, a single amino acid substitution restores

characteristics of Exo III or AP-endonuclease.

AP-endonuclease activity

Neisserial AP-endonuclease activity from the exonuclease (Carpenter et al., 2007), and a spontaneous substitution of Val to Gly in the C-terminal Archaeglobus AP-endonuclease, which participates in forming an abasic DNA binding pocket, is accompanied by an increase in non-specific endonuclease activity (Schmiedel et al., 2009). This is probably because APendonuclease possesses multiple enzymatic activities as described above. Thus it could be expected that p97Bcnt/Cfdp2 and p97Bcnt2 would have different enzymatic properties, with each compensating for the function of the other.

Fig. 10. 3-D comparison of the two core RTE domains of p97Bcnt/Cfdp2 and p97Bcnt2

3-D structures of the two core RTE domains of p97Bcnt/Cfdp2 and p97Bcnt2 were remodeled by the I-TASSER server (Roy et al., 2010), and p97Bcnt2 was superimposed on the p97Bcnt axis fixing the main alpha chain. RMSD (root mean square deviation) of 243 amino acid residues was 0.68 Å. From the data of the top templates of 2jc5A (Carpenter et al, 2007) and 2voaA (Schmiedel et al., 2009), the catalytic sites and DNA binding sites in red, or a loop involved in the targeting site in orange from a template of 2v0rA (Repanas et al., 2007) were identified. Twelve amino acid substitutions between the two domains are shown in yellow. These drawings were obtained using PyMOL. Analysis was carried out courtesy of Dr. M. Tanio, National Institutes of Natural Sciences, Okazaki

To explore further whether the nucleotide substitutions in p97Bcnt2 reflect natural selection in the two paralogs, we examined the *dN* (non-synonymous substitution per site)/*dS* (synonymous substitution per site) values for both the core RTE domain (243-483th amino acids) and the remaining regions (177-242th and 484-500th amino acids). The *dN*/*dS* values are 0.029/0.160 in the core region and 0.166/0.414 in the other regions. Although there are more non-synonymous and synonymous substitutions outside the core RTE domain of p97bcnt2, the *dN*/*dS* values are < 1, suggesting no definite attribution to positive selection. On the other hand, the *dN*/*dS* value in the core RTE domain is much lower than that of the other regions, suggesting that selective constraints have been substantially strong in the core RTE domain (Iwashita etal., 2009). These data suggest that the recruited RTE domains in both p97Bcnt/Cfdp2 and p97Bcnt2 have played a crucial role in the duplicated novel genes.

Bucentaur (Bcnt) Gene Family: Gene Duplication and Retrotransposon Insertion 397

duplication of the ancestral *Bcnt/Cfdp1* gene followed by the insertion of an order-specific retrotransposon, Bov-B LINE. This type of combined process provides great potential to generate a novel gene because a novel function can be acquired under the guarantee of the original gene function. The ancestral Bcnt/Cfdp1 protein contains a highly conserved Cterminus of 82-amino acids (Bcnt-C) that is not present in either p97Bcnt/Cfdp2 or p97Bcnt2. Bcnt-C is found in all eukaryotes where it is known as the BCNT superfamily. A chicken Bcnt/Cfdp1 is a centromere protein while the yeast ortholog is a component of the chromatin-remodeling complex, suggesting that the ancestral Bcnt/Cfdp1 protein plays a role in the regulation of gene expression. The two paralogs, p97Bcnt/Cfdp2 and p97Bcnt2, recruited an AP-endonuclease domain of the retrotransposon during their generation process as a ~325 amino acid region (RTE domain) in the middle of the molecule. The three Bcnt-related proteins distribute in both the cytosolic and nuclear fractions, and include intrinsically disordered regions other than the core of RTE domains of the two paralogs. The 3-D structures of the core RTE domains can be remodeled as canonical AP-endonucleases with identical catalytic amino acid residues. Although as yet there is no direct evidence for it, the two paralogs probably retain AP-endonuclease activity. Because AP-endonuclease/ Redox effector factor 1 is one of the major regulators of cellular responses to various stresses, we propose that the recruited AP-endonuclease domains, which may have emerged in response to cellular stresses, may be utilized by the paralogs in cellular regulation. Therefore, the three Bcnt-related family members provide a good opportunity to examine

We thank all our colleagues for their contributions to the study over the last 15 years, especially to Drs. K. Hashimoto and S. Hattori for their indispensable help in the early stages. We are grateful to Dr. T. Kohno for providing unpublished data, to Drs. H. Ohmori, M. Tanio, S-Y. Song, K. Nakashima, S. Imajo-Ohmi, E. B. Kuettner, M.B. Gerstein, G. Tell, Y. Miyata, and Y. Ohno-Iwashita, and to The I-TASSER Server Team for useful discussion, and to Dr. D. Izumi for providing unpublished data. We are also grateful to Dr. Y. Nagai, the former president of Mitsubishi Kagaku Institute Life Sciences, for continuous encouragement, and to Dr. M. Dooley-Ohto for patient editing. During the preparation of this article, one of the authors in northern Japan suffered the disasters of a major earthquake and resulting tsunami, which led to the Fukushima nuclear plant accident. In this, we recognize the power of nature, and realize the importance of observing it carefully and

Barzilay, G., Walker, L.J., Robson, C.N., & Hickson, I.D. 1995. Site-directed mutagenesis of

Busso, C.S., Lake, M.W., & Izumi, T. 2010. Posttranslational modification of mammalian AP

endonuclease (APE1). *Cell. Mol. Life Sci.,* 67, pp. 3609-3620

endonuclease and RNase H activity. *Nucleic Acids Res.*, 23, pp. 1544-1550 Brosius, J. 2005. Echoes from the past—are we still in an RNP world? *Cytogenet. Genome Res.*,

the human DNA repair enzyme HAP1: identification of residues important for AP

dynamic changes in signaling networks that accompany novel genes.

**7. Acknowledgements** 

describing it correctly.

110, pp. 8-24

**8. References** 
