**3. Tandem alignment of three** *Bcnt* **gene family members**

The draft bovine genome sequence was published in 2009 (The Bovine Genome Sequencing and Analysis Consortium, 2009). The initial analysis estimated that the bovine genome contains about 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species. It has been shown that 3.1% of the bovine genome consists of recently duplicated sequences (judged by sequences ≥ 1 kb in length and ≥ 90% identity), and more than three-quarters (75-90%) of segmental duplications are organized into local tandem duplication clusters (Liu et al., 2009). It is noteworthy that cattle-specific evolutionary breakpoint regions in the chromosomes have a higher density of tandem duplications and enrichment of repetitive elements. Furthermore, it has been pointed out that bovine tandem gene duplication is significantly related to species-specific biological functions such as immunity, digestion, lactation, and reproduction (Liu et al., 2009).

Bucentaur (Bcnt) Gene Family: Gene Duplication and Retrotransposon Insertion 389

II and IR-III) diverged from IR-II in *p97Bcnt/Cfdp2*. We propose a parsimonious scenario for the creation of the three *Bcnt*-related genes in a process comprising 5 steps as shown in Fig. 5 (Iwashita et al., 2009). Self-BLAST search of the 120-kb region from *Tmem 170A* to exon 5 of *Bcnt/Cfdp1* confirms the two-round duplication of this gene cluster. Furthermore, homologous fragments of *Tmem170A* 3' UTR, which is located 6.8-kb distal to *p97Bcnt-2,*  distribute at the 3'-region of both *Bcnt/Cfdp1* and *p97Bcnt/Cfdp2*. These data support the above scenario that resulted in the creation of the two paralogs, *p97Bcnt/Cfdp2* and *p97Bcnt-2*. Furthermore, both the processed pseudogene of *Bcnt/Cfdp1* and a 900-bp fragment encompassing the IR-II exon of *p9Bcnt-2* map on bovine chromosome 26 (Iwashita et al., 2009). It is interesting to examine the relationship between the retrotransposon-mediated

Fig. 5. A scenario for the creation of the two paragolous genes, *p97Bcnt/Cfdp2* and *p97Bcnt-2*  A parsimonious scenario for the creation of the three *Bcnt*-related family genes includes 5 steps as follows: (1) partial gene duplication of the ancestral *Bcnt/Cfdp1,* leaving the Bcnt-C region by segmental duplication; (2) insertion of a Bov-B LINE in intron 5 of one of the duplicated copies, recruitment of the AP-endonuclease domain of the retrotransposon, and generation of the ancestor of *p97Bcnt/Cfdp2* or *p97Bcnt-2*; (3) segmental duplication of the IR unit of ancestor *p97Bcnt*; (4) further gene duplication of the ancestor *p97Bcnt* to generate the nascent *p97Bcnt-2*; and, finally, (5) segmental duplication of the IR unit of the nascent *p97Bcnt-2* to create *p97Bcnt-2*. Nucleotide regions corresponding to the acidic N-terminal regions, IR units, Bcnt-C, and Tmem170A, are symbolically indicated by boxes colored grey, orange, dark blue, and purple, respectively. Bov-B LINE has an AP-endonuclease domain

(in yellow) and reverse transcriptase domain (in green)

creation of novel genes and the occurrence of processed pseudogenes.

Fig. 4. Bovine *Bcnt/Cfdp1* locus and its corresponding region in the human genome

The organization of bovine *Bcnt/Cfdp1*-*p97Bcnt/Cfdp2-p97Bcnt*-*2* is shown schematically. *Bcar1*, Breast cancer anti-estrogen resistance 1 gene and *Tmem170A*, Transmembrane protein 170A gene, are located proximal and distal to the *Bcnt-*gene cluster or *BCNT* in both bovine chromosome 18 (middle part) and human chromosome 16q23 (upper part), respectively. The *Bcnt/Cfdp1, p97Bcnt/Cfdp2,* and *p97Bcnt*-2 genes comprise 7, 8, and 10 exons, respectively (lower part); each exon is indicated by a vertical bar and is numbered

The three *Bcnt*-related genes are tandemly aligned on bovine chromosome 18 over a range of more than 177 kb, a syntenic region of human chromosome 16q23 (Fig. 4) and mouse chromosome 8. This gene cluster exists between the proximal breast cancer anti-estrogen resistance 1 gene (*Bcar1*) and the distal transmembrane protein *170A* gene (*Tmem*170A) in bovines, as is the case of *BCNT/CFDP1* in humans and *Bcnt/Cfdp1* in mice. Therefore, the cluster region was generated from an order-specific segmental duplication. It has been suggested that Bov-B LINEs emerged by horizontal transfer from squamata to ancient ruminants (Zupunski et al., 2001), and expanded just after the divergence of ruminantia and Camelidae (Jobse et al., 1995). Bov-B LINEs have further expanded in different lineages during the diversification of ruminant species after splitting from Tragulina, which was confirmed by hybridization with DNA fragments of the RTE domain of *Lesser Malay chevrotain* (Iwashita et al., 2006). This is consistent with the expansion of bovine SINEs (Jobse et al., 1995). *Tragulus javanicus*, the living fossil of the basal ruminant stock, shares a similar *Bcnt/Cfdp1 and p97Bcnt/Cfdp2* gene organization to bovines (Iwashita et al., 2006). Thus the partial gene duplication of the ancestral *Bcnt/Cfdp1* followed by the Bov-B LINE insertion occurred sometime after the Ruminantia-Suina-Tylopoda split and before the Pecora-Tragulina divergence, ~50 million years ago. A phylogenic tree has been constructed based on the N-terminal regions (~ 175 amino acids) encoded by the first four exons and shared among three *Bcnt*-related members. The tree topology suggests *p97Bcnt-2* was created from duplication of ancestor *p97Bcnt/Cfdp2* in an ancient ruminant prior to the Pecora-Tragulina divergence. Furthermore, using the 120-bp sequence corresponding to 40 amino acid residues in IR, duplication of the IR unit in *p97Bcnt*/*Cfdp2* is estimated to have occured prior to the creation of *p97Bcnt-2,* which has three IR units (Fig. 2). The two units in *p97Bcnt-2* (IR-

Fig. 4. Bovine *Bcnt/Cfdp1* locus and its corresponding region in the human genome

(lower part); each exon is indicated by a vertical bar and is numbered

The organization of bovine *Bcnt/Cfdp1*-*p97Bcnt/Cfdp2-p97Bcnt*-*2* is shown schematically. *Bcar1*, Breast cancer anti-estrogen resistance 1 gene and *Tmem170A*, Transmembrane protein 170A gene, are located proximal and distal to the *Bcnt-*gene cluster or *BCNT* in both bovine chromosome 18 (middle part) and human chromosome 16q23 (upper part), respectively. The *Bcnt/Cfdp1, p97Bcnt/Cfdp2,* and *p97Bcnt*-2 genes comprise 7, 8, and 10 exons, respectively

The three *Bcnt*-related genes are tandemly aligned on bovine chromosome 18 over a range of more than 177 kb, a syntenic region of human chromosome 16q23 (Fig. 4) and mouse chromosome 8. This gene cluster exists between the proximal breast cancer anti-estrogen resistance 1 gene (*Bcar1*) and the distal transmembrane protein *170A* gene (*Tmem*170A) in bovines, as is the case of *BCNT/CFDP1* in humans and *Bcnt/Cfdp1* in mice. Therefore, the cluster region was generated from an order-specific segmental duplication. It has been suggested that Bov-B LINEs emerged by horizontal transfer from squamata to ancient ruminants (Zupunski et al., 2001), and expanded just after the divergence of ruminantia and Camelidae (Jobse et al., 1995). Bov-B LINEs have further expanded in different lineages during the diversification of ruminant species after splitting from Tragulina, which was confirmed by hybridization with DNA fragments of the RTE domain of *Lesser Malay chevrotain* (Iwashita et al., 2006). This is consistent with the expansion of bovine SINEs (Jobse et al., 1995). *Tragulus javanicus*, the living fossil of the basal ruminant stock, shares a similar *Bcnt/Cfdp1 and p97Bcnt/Cfdp2* gene organization to bovines (Iwashita et al., 2006). Thus the partial gene duplication of the ancestral *Bcnt/Cfdp1* followed by the Bov-B LINE insertion occurred sometime after the Ruminantia-Suina-Tylopoda split and before the Pecora-Tragulina divergence, ~50 million years ago. A phylogenic tree has been constructed based on the N-terminal regions (~ 175 amino acids) encoded by the first four exons and shared among three *Bcnt*-related members. The tree topology suggests *p97Bcnt-2* was created from duplication of ancestor *p97Bcnt/Cfdp2* in an ancient ruminant prior to the Pecora-Tragulina divergence. Furthermore, using the 120-bp sequence corresponding to 40 amino acid residues in IR, duplication of the IR unit in *p97Bcnt*/*Cfdp2* is estimated to have occured prior to the creation of *p97Bcnt-2,* which has three IR units (Fig. 2). The two units in *p97Bcnt-2* (IR-

II and IR-III) diverged from IR-II in *p97Bcnt/Cfdp2*. We propose a parsimonious scenario for the creation of the three *Bcnt*-related genes in a process comprising 5 steps as shown in Fig. 5 (Iwashita et al., 2009). Self-BLAST search of the 120-kb region from *Tmem 170A* to exon 5 of *Bcnt/Cfdp1* confirms the two-round duplication of this gene cluster. Furthermore, homologous fragments of *Tmem170A* 3' UTR, which is located 6.8-kb distal to *p97Bcnt-2,*  distribute at the 3'-region of both *Bcnt/Cfdp1* and *p97Bcnt/Cfdp2*. These data support the above scenario that resulted in the creation of the two paralogs, *p97Bcnt/Cfdp2* and *p97Bcnt-2*. Furthermore, both the processed pseudogene of *Bcnt/Cfdp1* and a 900-bp fragment encompassing the IR-II exon of *p9Bcnt-2* map on bovine chromosome 26 (Iwashita et al., 2009). It is interesting to examine the relationship between the retrotransposon-mediated creation of novel genes and the occurrence of processed pseudogenes.

Fig. 5. A scenario for the creation of the two paragolous genes, *p97Bcnt/Cfdp2* and *p97Bcnt-2* 

A parsimonious scenario for the creation of the three *Bcnt*-related family genes includes 5 steps as follows: (1) partial gene duplication of the ancestral *Bcnt/Cfdp1,* leaving the Bcnt-C region by segmental duplication; (2) insertion of a Bov-B LINE in intron 5 of one of the duplicated copies, recruitment of the AP-endonuclease domain of the retrotransposon, and generation of the ancestor of *p97Bcnt/Cfdp2* or *p97Bcnt-2*; (3) segmental duplication of the IR unit of ancestor *p97Bcnt*; (4) further gene duplication of the ancestor *p97Bcnt* to generate the nascent *p97Bcnt-2*; and, finally, (5) segmental duplication of the IR unit of the nascent *p97Bcnt-2* to create *p97Bcnt-2*. Nucleotide regions corresponding to the acidic N-terminal regions, IR units, Bcnt-C, and Tmem170A, are symbolically indicated by boxes colored grey, orange, dark blue, and purple, respectively. Bov-B LINE has an AP-endonuclease domain (in yellow) and reverse transcriptase domain (in green)

Bucentaur (Bcnt) Gene Family: Gene Duplication and Retrotransposon Insertion 391

Fig. 6. Unique mobility of the Bcnt/Cfdp1 and p97Bcnt/Cfdp2 proteins in SDS-PAGE

Extracts of bovine brain (1), rat brain (2) and MDBK cells, a bovine kidney epithelial cell line (3) were separated in SDS polyacrylamide gels and subjected to immunoblotting with anti-Bcnt-C peptide antibody in the presence (**a**) or absence (**b**) of antigen peptide at a final concentration of 100 M, or with anti-p97Bcnt monoclonal antibodies (**c**). The two small black arrows indicate Bcnt/Cfdp1 with an apparent molecular mass of 45 kDa appearing as a doublet, probably due to phosphorylation (Iwashita et. al., 2003); the red large arrow indicates Bcnt/Cfdp1 with an apparent molecular mass of 53 kDa as described below

interactions that are frequently triggered by posttranslational modifications within the regions of intrinsic disorder (Dunker et al., 2008). IDPs may function as hub proteins via the formation of complexes with cellular proteins, which are then modulated by protein

By computational prediction, Bcnt/Cfdp1, p97Bcnt/Cfdp2, and p97Bcnt2 are all suggested to comprise intrinsically disordered regions, except for the core of RTE domains that correspond to AP-endonuclease in the two paralogs (Fig. 7). This computational prediction is partially supported by an NMR study of the 3-D structure of the N-terminal 40 amino acid residues of the Bcnt-C region prepared in *Escherichia coli* using 15N-labeled amino acids. The spectrum revealed a lack of fixed tertiary structure (courtesy of Dr. T. Kohno). Furthermore, the Bcnt/Cfdp1 protein forms a tight protein complex with cellular proteins in bovine placenta even in the presence of a detergent, CHAPSO, when evaluated by gel filtration chromatography on Sephacryl S-300 HR followed by western blotting. Both Bcnt/Cfdp1 and p97Bcnt/Cfdp2 are phosphoproteins that are potentially phosphorylated on serine residues by casein kinase II *in vitro* (Iwashita et. al., 1999). Recently, the two phosphorylated serine residues in human BCNT, 116S in the N-terminal region and 250S in the C-terminal region, were identified by mass spectrometric analysis (Dephoure et al., 2008). This phosphorylation is cell cycle independent. It should be noted that these two phosphorylated serine residues reside in amino acid sequences WASF and WESF, respectively, which implies a unique motif for specific phosphorylation. Phosphorylation on these motifs might be expected to play a role in switching, such as switching the cation mediated protein-ligand interaction (Zacharias & Dougherty, 2002). These characteristics

modifications such as phosphorylation, acetylation, ubiquitination, or degradation.

suggest that the three Bcnt-related family members are hub-like molecules.
