**5. Higher eukaryotes in the diploid state**

The higher eukaryote in the diploid state is characterized by the pairs of homologous chromosomes, and its large-scale evolution contains the process to establish the homozygote of new genes as well as their generation from gene duplication. Although the number of homologous chromosome pairs is different depending on the species of diploid organisms, a specific pair of homologous chromosomes *(xi, xk)* will be first focused for simplicity, where the suffixes *i* and *k* denote different mutations on the respective chromosomes. The number *n(xi,xk;t )* of variants carrying such a pair *(xi,xk)* obeys the following time-change equation in the population of organisms exchanging the homologous chromosomes upon reproduction.

$$\begin{split} \frac{d}{dt}n(\mathbf{x}\_{i},\mathbf{x}\_{k};t) &= \sum\_{j,l} Q(\mathbf{x}\_{i},\mathbf{x}\_{k};t)\_{ijkl} R(M;\mathbf{x}\_{i},\mathbf{x}\_{k})\_{ijkll} n(\mathbf{x}\_{i},\mathbf{x}\_{j};t) n(\mathbf{x}\_{k},\mathbf{x}\_{l};t) - D(\mathbf{x}\_{i},\mathbf{x}\_{k}) n(\mathbf{x}\_{i},\mathbf{x}\_{k};t) \\ &+ \sum\_{i'k'(\neq i,k)} \sum\_{j,l} q(\mathbf{x}\_{i'},\mathbf{x}\_{k} \leftarrow \mathbf{x}\_{i'},\mathbf{x}\_{k'};t)\_{i'jkl} R(M;\mathbf{x}\_{i},\mathbf{x}\_{k})\_{i'jkl} n(\mathbf{x}\_{i'},\mathbf{x}\_{j};t) n(\mathbf{x}\_{k'},\mathbf{x}\_{l};t) \end{split} \tag{23}$$

A Theoretical Scheme

Eq. (32) is reduced to be

1

*d Io io*

just like the case of monoploid organisms.

and the average increase rate *W t*( ) is by

variants *f(xi, xopt)* and that of dominant organisms *f(xopt, xopt).* 

of the Large-Scale Evolution by Generating New Genes from Gene Duplication 15

The probability of generating a new style of organisms carrying a new gene is derived from Eq. (29). In the population where the organisms *(xopt, xopt)* are dominant, *W (t)* is approximately equal to *W(xopt, xopt)*, and both *f(xopt, xopt; t)* and *B(t)* are hardly dependent on time. Eq. (29) is then integrated to give the following relation between the fraction of

(, , ) ( ; , ) (, ) (,) ( ) (, ) *i opt opt opt optoptxoptopt opt opt optopt optopt i opt opt opt opt opt i opt qx x x x RMx x <sup>f</sup> x x <sup>f</sup> xxB Wx x Wx x*

where the rate of generating the gene duplication *i* is defined for a sufficiently long time *t* by

0

Although this relation (32) seems to be different in including the population size *B* from Eq. (10) of monoploid organisms at first glance, the denominator on the right side of Eq. (32) also contains the population size *B* as seen in Eqs. (27) and (30). If the population size is large enough to neglect the difference in death rate between the variant *(xi, xopt)* and the dominant organism *(xopt, xopt)*, therefore, the difference in the increase rate *W(xopt, xopt) - W(xi, xopt)* is approximately equal to *{R(M; xopt, xopt)optoptxoptopt* - *R(M; xi, xopt)ioptxoptopt}Bf(xopt, xopt)*, and

> (, , ) ( ; , ) (, ) (,) ( ; , ) ( ;, ) *i opt opt opt optoptxoptopt opt opt optopt optopt i opt opt opt opt opt optopt optopt i opt iopt optopt*

This is essentially the same form as Eq. (10) of the monoploid organism in the case when the gene duplication hardly changes the death rate, i. e., *D(xi)* ≈ *D(xopt)*. Denoting the probability of generating a new gene *I* from the gene duplicated part *i* by *q(xI* ← *xi)*, the probability *Pd1(xI,xo* ← *xo,xo)* that a new style of the organism *(xI, xo)* carrying the new gene *I* heterogeneously is generated from the original style of an organism *(xo, xo)* is expressed as

( )( , , ) ( ; , ) (, ,) ( ;,) ( ;,)

where *xopt* in Eq. (34) is rewritten into *xo* with the meaning of the original type chromosome. Thus, a new style diploid organism also arises from the minor members in the population

However, the content of the above probability in diploid organisms is different from the case of monoploid organisms in the following points. First of all, the reproducing rate *R(M; xi, xo)ioxoo* is only the half of *R(M; xo, xo)ooxoo* even in the random partition of homologous chromosomes, and the former may be further decreased by the lowering of the biological activity of the variant *(xi, xo)*. Second, the further gene duplication to produce two or more new genes is hardly expected in the homologous chromosomes *(xi, xo)*, because the fraction

← ←

*qx x qx x x x RMx x P xx xx*

× × <sup>←</sup> <sup>=</sup> <sup>−</sup> (34)

*I i i o o o ooxoo o o oo oo*

← = <sup>−</sup> (35)

*RMx x RMx x*

*o o oo oo i o io oo*

× ×

*qx x x x RMx x fx x fx x RMx x RMx x*

,

<sup>1</sup> (, , ) ( , , ;) *<sup>t</sup> i opt opt opt optoptxoptopt i opt opt opt optoptxoptopt qx x x x qx x x x d t*

← ≡←

() ( , ;) ( , ;) ( , ;) ( , ;) *Wt Wx x t i opt i opt opt opt opt opt* = *f x x t Wx x t* + *f xxt* (31)

<sup>←</sup> <sup>×</sup> <sup>=</sup> <sup>−</sup> (32)

2

τ

τ

×

×

∫ (33)

where *R(M; xi, xk)ijxkl* is the rate of producing the children *(xi, xk)* from the mating of a variant *(xi, xj)* with another variant *(xk, xl)* under a common material and energy source *M*, and *D(xi, xk)* is the death rate of the organism *(xi, xk)*. The apparent decrease factor *Q(xi, xk; t)ijxkl* is related with the mutation term *q(xi',xk'* ← *xi,xk; t)ijxkl* in the following way.

$$Q(\mathbf{x}\_{i'}, \mathbf{x}\_k; t)\_{ij \times kl} = 1 - \sum\_{i',k' \nmid \forall i,k} q(\mathbf{x}\_{i'}, \mathbf{x}\_{k'} \leftarrow \mathbf{x}\_{i'}, \mathbf{x}\_k; t)\_{ij \times kl} \tag{24}$$

Although Eq. (23) makes no distinction between the male and the female for simplicity, this distinction does not essentially alter the following process of evolution.

In the same way as for monoploid organisms, the population behavior of diploid organisms becomes transparent by transforming Eq. (23) into the equation concerning the total number of organisms given by *B(t) =*Σ*i*Σ*<sup>k</sup>n(xi, xk; t)* and that concerning the fraction of variants *(xi, xk)* defined by *f(xi, xk; t) = n(xi, xk; t)/B(t).* These equations are expressed in the following forms, respectively.

$$\frac{d}{dt}B(t) = \overline{\mathcal{W}}(t)B(t) \tag{25}$$

$$\begin{split} \frac{d}{dt} f(\mathbf{x}\_{i}, \mathbf{x}\_{k}; t) &= \{ \mathcal{W}(\mathbf{x}\_{i}, \mathbf{x}\_{k}; t) - \overline{\mathcal{W}}(t) \} f(\mathbf{x}\_{i}, \mathbf{x}\_{k}; t) \\ &+ \sum\_{i',k'} \sum\_{j,l} q(\mathbf{x}\_{i}, \mathbf{x}\_{k} \leftarrow \mathbf{x}\_{i'} \mathbf{x}\_{k'}; t)\_{i'j \mathbf{x}^{j}l} R(M; \mathbf{x}\_{i}, \mathbf{x}\_{k'})\_{i'j \mathbf{x}^{j}l} f(\mathbf{x}\_{i'}, \mathbf{x}\_{j}; t) f(\mathbf{x}\_{k'}, \mathbf{x}\_{l}; t) B(t) \end{split} \tag{26}$$

where the increase rate *W(xi, xk; t)* of the variant *(xi, xk)* is defined by

$$\mathcal{W}(\mathbf{x}\_{i},\mathbf{x}\_{k};t) = \sum\_{j} \sum\_{l} \mathcal{R}(M;\mathbf{x}\_{i},\mathbf{x}\_{k})\_{ij \times kl} f(\mathbf{x}\_{i},\mathbf{x}\_{j};t) f(\mathbf{x}\_{k},\mathbf{x}\_{l};t) \mathcal{B}(\mathbf{t}) \;/\ f(\mathbf{x}\_{i},\mathbf{x}\_{k};t) - D(\mathbf{x}\_{i},\mathbf{x}\_{k}) \tag{27}$$

the average increase rate *W (t)* is by

$$\overline{\mathcal{W}}(t) \equiv \sum\_{i} \sum\_{k} \mathcal{W}(\mathbf{x}\_{i'}, \mathbf{x}\_{k'}; t) f(\mathbf{x}\_{i'}, \mathbf{x}\_{k}; t) \tag{28}$$

and *q(xi,xk* ← *xi,xk; t)ijxkl* is defined by *Q(xi, xk; t)ijxkl – 1*. If the suffixes *i, j, k* and *l* denote the point mutations in existing genes, Eq. (26) represents Darwinian evolution gradually leading to the organisms with an optimal increase rate, each characterized by *(xopt, xopt)*.

Because the gene duplication occurs only rarely, it is natural to consider that the large-scale evolution due to gene duplication starts after the organisms *(xopt, xopt)* have been dominant in the population. If the chromosome having experienced gene duplication is newly denoted by *xi* and the point mutation is neglected, the fraction *f(xi, xopt; t)* of variants *(xi, xopt)* obeys the following equation as a special case of Eq. (26).

$$\begin{split} \frac{d}{dt} f(\mathbf{x}\_i, \mathbf{x}\_{opt}; t) &= \{ \mathcal{W}(\mathbf{x}\_i, \mathbf{x}\_{opt}; t) - \overline{\mathcal{W}}(t) \} f(\mathbf{x}\_i, \mathbf{x}\_{opt}; t) \\ &+ q(\mathbf{x}\_i, \mathbf{x}\_{opt} \leftarrow \mathbf{x}\_{opt}, \mathbf{x}\_{opt}; t)\_{\text{optpttpptptpt}} R(M; \mathbf{x}\_{opt}, \mathbf{x}\_{opt})\_{\text{optpttpptptpt}} f^2(\mathbf{x}\_{opt}, \mathbf{x}\_{opt}; t) \mathcal{B}(t) \end{split} \tag{29}$$

where the increase rate *W(xi,xopt;t)* of the variant *(xi, xopt)* is given by

$$\mathcal{W}(\mathbf{x}\_{i}, \mathbf{x}\_{\text{opt}}; t) = \mathcal{R}(\mathbf{M}; \mathbf{x}\_{i}, \mathbf{x}\_{\text{opt}})\_{i \text{opt} \times \text{opt} \text{opt}} f(\mathbf{x}\_{\text{opt}}, \mathbf{x}\_{\text{opt}}; t) \mathcal{B}(t) - D(\mathbf{x}\_{i}, \mathbf{x}\_{\text{opt}}) \tag{30}$$

and the average increase rate *W t*( ) is by

14 Gene Duplication

where *R(M; xi, xk)ijxkl* is the rate of producing the children *(xi, xk)* from the mating of a variant *(xi, xj)* with another variant *(xk, xl)* under a common material and energy source *M*, and *D(xi, xk)* is the death rate of the organism *(xi, xk)*. The apparent decrease factor *Q(xi, xk; t)ijxkl* is

> ', '( , ) ( , ;) 1 ( , , ;) *ik ij kl i k ik ij kl i k ik Qx x t qx x x x t* <sup>×</sup> <sup>×</sup> ≠

Although Eq. (23) makes no distinction between the male and the female for simplicity, this

In the same way as for monoploid organisms, the population behavior of diploid organisms becomes transparent by transforming Eq. (23) into the equation concerning the total number

defined by *f(xi, xk; t) = n(xi, xk; t)/B(t).* These equations are expressed in the following forms,

() () () *<sup>d</sup> Bt W tBt*

', ' ' ' ' '' ' ' '

*W x x t RM x x f x x t f x x tBt f x x t Dx x* <sup>≡</sup> ∑∑ <sup>×</sup> <sup>−</sup> (27)

+ ← ∑∑ (26)

( , ;) ( ; , ) ( , ;) ( , ;) ()

*qx x x x t RM x x f x x t f x x tBt* <sup>×</sup>

*i k i k i jxk l i k i j k l i j k l*

( , ; ) ( ; , ) ( , ; ) ( , ; ) ( )/ ( , ; ) ( , ) *i k i kij kl i <sup>j</sup> k l ik ik*

() ( , ;) ( , ;) *ik ik*

and *q(xi,xk* ← *xi,xk; t)ijxkl* is defined by *Q(xi, xk; t)ijxkl – 1*. If the suffixes *i, j, k* and *l* denote the point mutations in existing genes, Eq. (26) represents Darwinian evolution gradually leading

Because the gene duplication occurs only rarely, it is natural to consider that the large-scale evolution due to gene duplication starts after the organisms *(xopt, xopt)* have been dominant in the population. If the chromosome having experienced gene duplication is newly denoted by *xi* and the point mutation is neglected, the fraction *f(xi, xopt; t)* of variants *(xi, xopt)* obeys

( , , ;) ( ; , ) ( , ;) ()

( , ;) ( ; , ) ( , ;) () ( , ) *W x x t RM x x f x x tBt Dx x i opt i* = *opt iopt o*<sup>×</sup> *ptopt opt opt i* − *opt* (30)

*q x x x x t RMx x f x x tBt*

*i opt opt opt optoptxoptopt opt opt optoptxoptopt opt opt*

*i k*

to the organisms with an optimal increase rate, each characterized by *(xopt, xopt)*.

( , ; ) { ( , ; ) ( )} ( , ; )

*i opt i opt i opt*

*<sup>d</sup> fx x t Wx x t Wt fx x t dt*

where the increase rate *W(xi,xopt;t)* of the variant *(xi, xopt)* is given by

= −

' '

= − ∑ <sup>←</sup> (24)

*<sup>k</sup>n(xi, xk; t)* and that concerning the fraction of variants *(xi, xk)*

*Wt Wx x tfx x t* <sup>≡</sup> ∑∑ (28)

2

(29)

*dt* <sup>=</sup> (25)

related with the mutation term *q(xi',xk'* ← *xi,xk; t)ijxkl* in the following way.

distinction does not essentially alter the following process of evolution.

( , ; ) { ( , ; ) ( )} ( , ; )

*ik ik i k*

where the increase rate *W(xi, xk; t)* of the variant *(xi, xk)* is defined by

*<sup>d</sup> fx x t Wx x t Wt fx x t dt*

= −

Σ*i*Σ

of organisms given by *B(t) =*

', ' ,

the average increase rate *W (t)* is by

+ ←

*j l*

the following equation as a special case of Eq. (26).

*i k jl*

respectively.

$$\overline{\mathcal{W}}(\mathbf{t}) = \mathcal{W}(\mathbf{x}\_{i}, \mathbf{x}\_{\text{opt}}; \mathbf{t}) f(\mathbf{x}\_{i}, \mathbf{x}\_{\text{opt}}; \mathbf{t}) + \mathcal{W}(\mathbf{x}\_{\text{opt}}, \mathbf{x}\_{\text{opt}}; \mathbf{t}) f(\mathbf{x}\_{\text{opt}}, \mathbf{x}\_{\text{opt}}; \mathbf{t}) \tag{31}$$

The probability of generating a new style of organisms carrying a new gene is derived from Eq. (29). In the population where the organisms *(xopt, xopt)* are dominant, *W (t)* is approximately equal to *W(xopt, xopt)*, and both *f(xopt, xopt; t)* and *B(t)* are hardly dependent on time. Eq. (29) is then integrated to give the following relation between the fraction of variants *f(xi, xopt)* and that of dominant organisms *f(xopt, xopt).* 

$$f(\mathbf{x}\_i, \mathbf{x}\_{opt}) = \frac{q(\mathbf{x}\_i, \mathbf{x}\_{opt} \gets \mathbf{x}\_{opt}, \mathbf{x}\_{opt})\_{opptptxpptpt} R(M; \mathbf{x}\_{opt}, \mathbf{x}\_{opt})\_{opptpt\*optpt}}{W(\mathbf{x}\_{opt}, \mathbf{x}\_{opt}) - W(\mathbf{x}\_i, \mathbf{x}\_{opt})} f^2(\mathbf{x}\_{opt}, \mathbf{x}\_{opt}) \mathbf{B} \tag{32}$$

where the rate of generating the gene duplication *i* is defined for a sufficiently long time *t* by

$$q(\mathbf{x}\_i, \mathbf{x}\_{opt} \gets \mathbf{x}\_{opt}, \mathbf{x}\_{opt})\_{\text{optpttopptpt}} \equiv \frac{1}{\mathfrak{t}} \int\_0^t q(\mathbf{x}\_i, \mathbf{x}\_{opt} \gets \mathbf{x}\_{opt}, \mathbf{x}\_{opt}; \tau)\_{\text{optpttopptpt}} d\tau \tag{33}$$

Although this relation (32) seems to be different in including the population size *B* from Eq. (10) of monoploid organisms at first glance, the denominator on the right side of Eq. (32) also contains the population size *B* as seen in Eqs. (27) and (30). If the population size is large enough to neglect the difference in death rate between the variant *(xi, xopt)* and the dominant organism *(xopt, xopt)*, therefore, the difference in the increase rate *W(xopt, xopt) - W(xi, xopt)* is approximately equal to *{R(M; xopt, xopt)optoptxoptopt* - *R(M; xi, xopt)ioptxoptopt}Bf(xopt, xopt)*, and Eq. (32) is reduced to be

$$f(\mathbf{x}\_i, \mathbf{x}\_{opt}) = \frac{q(\mathbf{x}\_i, \mathbf{x}\_{opt} \gets \mathbf{x}\_{opt}, \mathbf{x}\_{opt})\_{\text{optptptptptpt}} R(M; \mathbf{x}\_{opt}, \mathbf{x}\_{opt})\_{\text{optptpt}\times\text{optptpt}}}{R(M; \mathbf{x}\_{opt}, \mathbf{x}\_{opt})\_{\text{optptpt}\times\text{optpt}} - R(M; \mathbf{x}\_i, \mathbf{x}\_{opt})\_{\text{optpt}\times\text{optpt}}} f(\mathbf{x}\_{opt}, \mathbf{x}\_{opt}) \tag{34}$$

This is essentially the same form as Eq. (10) of the monoploid organism in the case when the gene duplication hardly changes the death rate, i. e., *D(xi)* ≈ *D(xopt)*. Denoting the probability of generating a new gene *I* from the gene duplicated part *i* by *q(xI* ← *xi)*, the probability *Pd1(xI,xo* ← *xo,xo)* that a new style of the organism *(xI, xo)* carrying the new gene *I* heterogeneously is generated from the original style of an organism *(xo, xo)* is expressed as

$$P\_{d1}(\mathbf{x}\_{1}, \mathbf{x}\_{o} \leftarrow \mathbf{x}\_{i}, \mathbf{x}\_{o}) = \frac{q(\mathbf{x}\_{1} \leftarrow \mathbf{x}\_{i})q(\mathbf{x}\_{i}, \mathbf{x}\_{o} \leftarrow \mathbf{x}\_{o}, \mathbf{x}\_{o})\_{\alpha\alpha\alpha\alpha}R(M; \mathbf{x}\_{o}, \mathbf{x}\_{o})\_{\alpha\alpha\alpha\alpha}}{R(M; \mathbf{x}\_{o}, \mathbf{x}\_{o})\_{\alpha\alpha\alpha\alpha} - R(M; \mathbf{x}\_{i}, \mathbf{x}\_{o})\_{\alpha\alpha\alpha}}\tag{35}$$

where *xopt* in Eq. (34) is rewritten into *xo* with the meaning of the original type chromosome. Thus, a new style diploid organism also arises from the minor members in the population just like the case of monoploid organisms.

However, the content of the above probability in diploid organisms is different from the case of monoploid organisms in the following points. First of all, the reproducing rate *R(M; xi, xo)ioxoo* is only the half of *R(M; xo, xo)ooxoo* even in the random partition of homologous chromosomes, and the former may be further decreased by the lowering of the biological activity of the variant *(xi, xo)*. Second, the further gene duplication to produce two or more new genes is hardly expected in the homologous chromosomes *(xi, xo)*, because the fraction

A Theoretical Scheme

the following form.

Pdn/Qnrn

probability *Pdn* is simply expressed by

of the Large-Scale Evolution by Generating New Genes from Gene Duplication 17

respectively. Here, *Q1* and *Q2* represent the terms *q(xI* ← *xi)q(xi,xo* ← *xo,xo)ooxoo* and *q(xI* ← *xi)q(xi,xo* ← *xo,xo)ooooxooooq(yJ* ← *yj)q(yj,yo* ← *yo,yo)ooooxoooo* , respectively. As the extension, the probability *Pdn*, with which a new style diploid organism carrying *n* kinds of new genes heterogeneously is generated from the successive hybridization of variants, is expressed in

> 1 2 *<sup>n</sup> dn <sup>n</sup>*

*<sup>Q</sup> P r*

where *Si (i=1, 2, ………,n)* is the reduction factor in the producing rate of the variant carrying duplicated genes on the chromosome *i*, *Qn* is the product of the probabilities of generating *n* kinds of new genes from gene duplication on the respective chromosomes and *rn* is the ratio of the children received these new genes. Although reduction factors *Si*'s in Eq. (39) independently take values in the range of *1/2 < Si <1*, they are tentatively represented by a common variable *S* for a simple illustration of *n* dependence of *Pdn* in a figure. Then, the

> *<sup>n</sup> dn n <sup>n</sup> <sup>Q</sup> P r*

These probabilities *Pdn*'s in Eq. (40) are plotted against the reduction factor *S* in Fig. 3 for

several values of *n*. As noted already, the reduction factor *S* is restricted to the

1 2 3 4 5 6 7 8 9 10 11 12 12S

however, the probability *Pdn* is still present in the range of *1/2 < S < 1*. Although

*Qn+1rn+1* is smaller than *Qnrn*, as discussed in the text.

Fig. 3. The probabilities of generating new genes from gene duplication and successive hybridization in diploid organisms. On the basis of Eq. (40), the values of *Pdn/Qnrn* are plotted against the twelve-fold reduction factor *12S* for *n = 1*, *2*, *3*, *4* and *5*. The value of *r1* is equal to one, and the curve of *Pd1 vs S* is consistent with the curve of *Pm1 vs s* in Fig. 1, but the range of reduction factor *S* is restricted to the range of *1/2 < S < 1*. For a larger value of *n*,

*Pdn+1/Qn+1rn+1* is larger than *Pdn/Qnrn* in the figure, *Pdn+1* is smaller than *Pdn*. This is because

*n*

*SS S* <sup>=</sup> ⋅⋅⋅⋅⋅⋅⋅⋅ (39)

*<sup>S</sup>* <sup>=</sup> (40)

Pd1/Q1 Pd2/Q2r2 Pd3/Q3r3 Pd4/Q4r4 Pd5/Q5r5

of such variants experienced successive gene duplication becomes much lower, not only due to the severer lowering of biological activity but also by the severer incompatibility of homologous chromosomes or by the separation of the chromosomes carrying different origins of duplicated genes in the descendants. That is, if the further gene duplication *j*  occurs on the chromosome *xi* to yield *xij*, for example, the incompatibility of chromosomes *xij*  and *xo* becomes severer upon the mitosis and/or the meiosis. If the gene duplication *j* occurs on the chromosome *xo* to yield *xj*, on the contrary, the chromosome *xj* is separated from the chromosome *xi* in the descendants.

In spite of such conservative property, the diploid organism with the plural number of homologous chromosome pairs can give rise to a new style of an organism getting together two or more new genes, through the successive hybridization among the satellite variants having experienced gene duplication on different kinds of chromosomes. As the first example, the appearance of a new style organism received two kinds of new genes *I* and *J* will be considered by this mechanism of hybridization. In this case, two pairs of homologous chromosomes *(x0, x0; y0, y0)* are focused, and the probability of generating the heterozygote *(xI,xo;yJ,yo)* from the original style of organisms *(xo,xo;yo,yo)* is considered through the hybridization of two types of variants *(xi,xo;yo,yo)* and *(xo,xo; yj, yo).* According to Eq. (35), this probability *Pd2(xI,xo;yJ,yo* ← *xo,xo;yo,yo)* is given by

$$\begin{split} &P\_{d2}(\mathbf{x}\_{i},\mathbf{x}\_{o};\mathbf{y}\_{f},\mathbf{y}\_{o}\leftarrow\mathbf{x}\_{o},\mathbf{x}\_{o};\mathbf{y}\_{o},\mathbf{y}\_{o})\\ &=\frac{q(\mathbf{x}\_{i}\leftarrow\mathbf{x}\_{i})q(\mathbf{x}\_{i},\mathbf{x}\_{o}\leftarrow\mathbf{x}\_{o},\mathbf{x}\_{o})\_{\alpha\alpha\alpha\alpha\alpha\alpha\alpha}R(M;\mathbf{x}\_{o},\mathbf{x}\_{o};\mathbf{y}\_{o},\mathbf{y}\_{o})\_{\alpha\alpha\alpha\alpha\alpha\alpha}}{R(M;\mathbf{x}\_{o},\mathbf{x}\_{o};\mathbf{y}\_{o},\mathbf{y}\_{o})\_{\alpha\alpha\alpha\alpha\alpha\alpha}-R(M;\mathbf{x}\_{i},\mathbf{x}\_{o};\mathbf{y}\_{o},\mathbf{y}\_{o})\_{\alpha\alpha\alpha\alpha\alpha\alpha}}\\ &\cdot\frac{q(\mathbf{y}\_{f}\leftarrow\mathbf{y}\_{j})q(\mathbf{y}\_{f},\mathbf{y}\_{o}\leftarrow\mathbf{y}\_{o},\mathbf{y}\_{o})\_{\alpha\alpha\alpha\alpha\alpha\alpha}R(M;\mathbf{x}\_{o},\mathbf{x}\_{o};\mathbf{y}\_{o},\mathbf{y}\_{o})\_{\alpha\alpha\alpha\alpha\alpha\alpha}}{R(M;\mathbf{x}\_{o},\mathbf{x}\_{o};\mathbf{y}\_{o},\mathbf{y}\_{o})\_{\alpha\alpha\alpha\alpha\alpha}-R(M;\mathbf{x}\_{o},\mathbf{x}\_{o};\mathbf{y}\_{f},\mathbf{y}\_{o})\_{\alpha\alpha\alpha\alpha\alpha}}r\_{2}\end{split} \tag{36}$$

where *r2* is the ratio of the children received two kinds of new genes *I* and *J*, taking the value of *(1/2)2* in the case of random partition of homologous chromosomes. In order to show the result of further hybridization process, Eqs. (35) and (36) will be simplified in their expression at this stage. The probabilities *q(xI* ← *xi)* and *q(xJ* ← *xj)* of generating new genes *I* and *J* from duplicated parts *i* and *j* in Eqs. (35) and (36) may be equal to the corresponding probabilities *qxI,xi* and *qxIJ,xIj* in Eqs. (11) and (16), respectively, because the nucleotide base substitution rate is almost common to both eukaryotes and prokaryotes (Kimura, 1980; Otsuka et al., 1997). Although it is still difficult to estimate the occurrence frequency of gene duplication, this frequency is also assumed to be common to both monoploid and diploid organisms, i. e., *qxi,xo ~ q(xi,xo* ← *xo,xo)ooooxooo* and *qxij,xi ~ q(yj,yo* ← *yo,yo)ooooxoooo*, for simplicity. The reproducing rates *R(M; xo,xo; yo,yo)ooooxoooo*, *R(M; xi,xo; yo,yo)ioooxoooo* and *R(M;xo,xo ;yj,yo)oojoxoooo* are simply denoted by *R*, *R(1 - S1)* and *R(1 - S2)*, respectively, with the reduction factors *S1* and *S2*, where both *S1* and *S2* satisfy the relation *1/2 < S1, S2 < 1* as noted already. Eqs. (35) and (36) are then rewritten into

$$P\_{d1}(\mathbf{x}\_1, \mathbf{x}\_o \leftarrow \mathbf{x}\_o, \mathbf{x}\_o) = \frac{Q\_1}{S\_1} \tag{37}$$

and

$$P\_{d2}(\mathbf{x}\_1, \mathbf{x}\_o; y\_{1'}, y\_o \gets \mathbf{x}\_{o'}, \mathbf{x}\_o; y\_{o'}, y\_o) = \frac{Q\_2}{S\_1 S\_2} r\_2 \tag{38}$$

of such variants experienced successive gene duplication becomes much lower, not only due to the severer lowering of biological activity but also by the severer incompatibility of homologous chromosomes or by the separation of the chromosomes carrying different origins of duplicated genes in the descendants. That is, if the further gene duplication *j*  occurs on the chromosome *xi* to yield *xij*, for example, the incompatibility of chromosomes *xij*  and *xo* becomes severer upon the mitosis and/or the meiosis. If the gene duplication *j* occurs on the chromosome *xo* to yield *xj*, on the contrary, the chromosome *xj* is separated from the

In spite of such conservative property, the diploid organism with the plural number of homologous chromosome pairs can give rise to a new style of an organism getting together two or more new genes, through the successive hybridization among the satellite variants having experienced gene duplication on different kinds of chromosomes. As the first example, the appearance of a new style organism received two kinds of new genes *I* and *J* will be considered by this mechanism of hybridization. In this case, two pairs of homologous chromosomes *(x0, x0; y0, y0)* are focused, and the probability of generating the heterozygote *(xI,xo;yJ,yo)* from the original style of organisms *(xo,xo;yo,yo)* is considered through the hybridization of two types of variants *(xi,xo;yo,yo)* and *(xo,xo; yj, yo).* According to

> ( )( , , ) ( ; , ; , ) ( ;,; , ) ( ;, ; ,) ( )( , , ) ( ; , ; , )

*qx x qx x x x RMx x y y RMx x y y RMx x y y qy y qy y y y RMx x y y*

1

(, ,) *d Io oo <sup>Q</sup> P xx xx*

(,;, ,;,) *d IoJo oooo <sup>Q</sup> P xx y y x x y y <sup>r</sup>*

2 2

*I i i o o o ooooxoooo o o o o oooo oooo o o o o oooo oooo i o o iooo oooo J j j o o o ooooxoooo o o o o oooo oooo*

*x y y RMx x y y* × × −

← ← <sup>⋅</sup> <sup>2</sup> ;,) (;,;,) *o o o oooo oooo o o j o oojo oooo*

where *r2* is the ratio of the children received two kinds of new genes *I* and *J*, taking the value of *(1/2)2* in the case of random partition of homologous chromosomes. In order to show the result of further hybridization process, Eqs. (35) and (36) will be simplified in their expression at this stage. The probabilities *q(xI* ← *xi)* and *q(xJ* ← *xj)* of generating new genes *I* and *J* from duplicated parts *i* and *j* in Eqs. (35) and (36) may be equal to the corresponding probabilities *qxI,xi* and *qxIJ,xIj* in Eqs. (11) and (16), respectively, because the nucleotide base substitution rate is almost common to both eukaryotes and prokaryotes (Kimura, 1980; Otsuka et al., 1997). Although it is still difficult to estimate the occurrence frequency of gene duplication, this frequency is also assumed to be common to both monoploid and diploid organisms, i. e., *qxi,xo ~ q(xi,xo* ← *xo,xo)ooooxooo* and *qxij,xi ~ q(yj,yo* ← *yo,yo)ooooxoooo*, for simplicity. The reproducing rates *R(M; xo,xo; yo,yo)ooooxoooo*, *R(M; xi,xo; yo,yo)ioooxoooo* and *R(M;xo,xo ;yj,yo)oojoxoooo* are simply denoted by *R*, *R(1 - S1)* and *R(1 - S2)*, respectively, with the reduction factors *S1* and *S2*, where both *S1* and *S2* satisfy the relation *1/2 < S1, S2 < 1* as noted already.

0

1

← = (37)

← = (38)

1

2

1 2

*S S*

*S*

× ×

×

×

*r*

(36)

chromosome *xi* in the descendants.

2 ,

Eqs. (35) and (36) are then rewritten into

and

(;,

*RMx*

*o*

Eq. (35), this probability *Pd2(xI,xo;yJ,yo* ← *xo,xo;yo,yo)* is given by

( ;, ,;,)

←

*d Io J o o o o o*

← ← <sup>=</sup> <sup>−</sup>

*P xx y y x x y y*

respectively. Here, *Q1* and *Q2* represent the terms *q(xI* ← *xi)q(xi,xo* ← *xo,xo)ooxoo* and *q(xI* ← *xi)q(xi,xo* ← *xo,xo)ooooxooooq(yJ* ← *yj)q(yj,yo* ← *yo,yo)ooooxoooo* , respectively. As the extension, the probability *Pdn*, with which a new style diploid organism carrying *n* kinds of new genes heterogeneously is generated from the successive hybridization of variants, is expressed in the following form.

$$P\_{dn} = \frac{Q\_n}{S\_1 S\_2 \cdots \cdots \cdots S\_n} r\_n \tag{39}$$

where *Si (i=1, 2, ………,n)* is the reduction factor in the producing rate of the variant carrying duplicated genes on the chromosome *i*, *Qn* is the product of the probabilities of generating *n* kinds of new genes from gene duplication on the respective chromosomes and *rn* is the ratio of the children received these new genes. Although reduction factors *Si*'s in Eq. (39) independently take values in the range of *1/2 < Si <1*, they are tentatively represented by a common variable *S* for a simple illustration of *n* dependence of *Pdn* in a figure. Then, the probability *Pdn* is simply expressed by

$$P\_{dn} = \frac{Q\_n}{S} r\_n \tag{40}$$

These probabilities *Pdn*'s in Eq. (40) are plotted against the reduction factor *S* in Fig. 3 for several values of *n*. As noted already, the reduction factor *S* is restricted to the

Fig. 3. The probabilities of generating new genes from gene duplication and successive hybridization in diploid organisms. On the basis of Eq. (40), the values of *Pdn/Qnrn* are plotted against the twelve-fold reduction factor *12S* for *n = 1*, *2*, *3*, *4* and *5*. The value of *r1* is equal to one, and the curve of *Pd1 vs S* is consistent with the curve of *Pm1 vs s* in Fig. 1, but the range of reduction factor *S* is restricted to the range of *1/2 < S < 1*. For a larger value of *n*, however, the probability *Pdn* is still present in the range of *1/2 < S < 1*. Although *Pdn+1/Qn+1rn+1* is larger than *Pdn/Qnrn* in the figure, *Pdn+1* is smaller than *Pdn*. This is because *Qn+1rn+1* is smaller than *Qnrn*, as discussed in the text.

A Theoretical Scheme

(Eldredge & Gould, 1972).

to the diploid eukaryotes.

**6. Conclusions and discussion**

of the Large-Scale Evolution by Generating New Genes from Gene Duplication 19

The variants, which experienced gene duplication, first decline to be minor members in a population by the load of carrying extra gene(s), but some of them revives as a new style of organisms by the generation of new gene(s) from the counterpart of duplicated genes. After the new gene(s) appear, the new style organisms increase their fraction being further elaborated by Darwinian evolution. This course of the large-scale evolution is essentially the same in any type of organisms, and this is a necessary condition for the new style of organisms and the original style of organisms to be able to coexist utilizing different material and energy sources or to live in separate areas, showing a striking contrast to the survival of the fittest in Darwinian evolution. This evolutionary pattern also gives an explanation to the punctuated mode of evolution, which has been proposed from paleontology against the gradual accumulation of variants in Darwinian evolution

However, the detailed processes of this large-scale evolution are different depending on the types of genome constitution and transmission. The monoploid organism is suitable to generate one new gene step by step testing its biological function, but hardly generates many kinds of new genes simultaneously. The lower eukaryote, whose genome consists of the plural number of chromosomes, resolves this difficulty to produce a new style of organisms receiving many kinds of new genes by the conjugation of variants carrying different origins of new genes. The diploid organism can also produce a new character responsible for multiple kinds of new genes by the successive hybridization of different variants but its conservative property requires the succeeding process to establish the homozygote of these genes. This process becomes longer for a larger number of new genes to be established. During this long process, the further hybridization with other variants also occurs, occasionally yielding the explosive divergence of new characters depending on the combinatorial sets of new genes. This conclusion of the present study explains the recently revealed evolutionary patterns of prokaryotes and eukaryotes to a great extent, getting an insight into the problems how and why the monoploid eukaryotes have evolved

According to the analyses of base-pair changes in ribosomal RNAs, the main lineages of present-day prokaryotes diverged *3.0x109* years ago, developing various chemical syntheses, *O2*-releasing photosynthesis and *O2* respiration, respectively (Otsuka et al., 1999), after the earlier divergence of archaebacteria, eubacteria and eukaryotes (Sugaya & Otsuka, 2002). Several stages from simple electron transport pathways to *O2* respiration and *O2*-releasing photosynthesis are still observed in the present-day eubacteria and the elongation of the pathways has taken place stepwise by gene duplication, as can be traced from the amino acid sequence similarities between their component proteins and the ubiquitous permeases (Otsuka, 2002; Otsuka & Kawai, 2006), although such similarity search of amino acid sequences is not systematically carried out yet for chemical syntheses. However, the excellent abilities of O2 respiration and O2-releasing photosynthesis cannot be fully exhibited in the simple cell structure of prokaryotes (Otsuka, 2005), and the genome size of the eubacteria having these abilities is also limited to the order of *106 bp* compactly encoding

On the other hand, the eukaryotes have experienced much more evolutionary events until some of them establish the diploid state. The ancestral eukaryote probably became the predator of eubacteria by developing the intracellular structure, endocytosis and exocytosis

*3,000* ~ *4,000* genes like the other prokaryotes (Wheeler et al., 2004).

range of *1/2 < S < 1,* and the probability *Pdn* is within the range of *2nQnrn > Pdn > Qnrn*. If the homologous chromosomes are randomly partitioned into the children regardless of carrying a new gene or not, *rn* takes the value of *(1/2)n* and the above relation of *Pdn* becomes *Qn > Pdn > Qn/2n*. Moreover, the value of *Qn* becomes smaller for the larger value of *n*, and the probability *Pdn* becomes lower as the number of new genes assembled by hybridization is increased. The lower probability means the longer time or more generations for a new style organism carrying more kinds of new genes to appear. Thus, the diploid organism has a chance to acquire many kinds of new genes by hybridization, but it takes a longer time to realize this chance.

Moreover, the process to establish the homozygote is further continued after the new style organism carrying *n* kinds of new genes heterogeneously is generated with the probability *Pdn*. Although it is laborious to follow this process completely, the essence of this process can be elucidated by investigating the ratio of children that receive these new genes homogeneously and heterogeneously from the mating between the organisms each carrying *n* kinds of new genes heterogeneously. If the chromosomes in each homologous pair are randomly partitioned into the children regardless of carrying a new gene or not, the ratio of children receiving *(n-k)* kinds of new genes is calculated to be *nCk3n-k/4n* with the normalization factor *4n*, where *k* takes a value ranging from zero to *n*. This indicates that more than half of the children receive all new genes *(k = 0)* for *n = 1*, *2*. If the one or two new genes exhibit an excellent character, therefore, the descendants increase their fraction monotonously as a new style of organisms. However, the ratio of children receiving a full set of new genes becomes smaller for a larger value of *n*. In the case of *n = 5*, for example, the ratio of the children that receive five kinds of new genes *(k = 0)* decreases to *(3/4)5*, while other five types of children each appear with the ratio of *(3/4)4/4* by receiving four kinds of new genes *(k = 1)* in different ways. When a biologically meaningful character is expressed by five kinds of new genes, therefore, only *(3/4)5* of the children succeed in expressing this character but other five types of children are reserved as those carrying 'hidden genes' for producing other characters by further hybridization with other types of variants. Such divergence of characters becomes more outstanding when a larger number of new genes are required for the expression of a character. This divergent property in the process to establish many kinds of new genes as the homozygote explains the explosive divergence of body plans that has occasionally occurred in diploid organisms, because the cell differentiation is a representative character expressed by many kinds of genes and its hierarchical evolution constructs body plans, as will be discussed in the next section. Until the new style organisms are established as the homozygote, the mating between the variants of heterozygote also regenerates the original style of organisms. The phenomenon called the "reversion" or "atavism" in classical biology may be the vestige of this evolutionary process to establish the homozygote.

If the influence of transposons is explicitly considered, it makes the above process more complicated in such a way that duplicated genes are separately transferred to different kinds of chromosomes. When various origins of duplicated genes or new genes are concentrated on one chromosome, however, the descendants received such a chromosome may be extinct due to the incompatibility of this chromosome with its partner chromosome not carrying any new gene. Thus, many kinds of new genes for expressing a new character may be scattered over different kinds of chromosomes in survivors just like the result of the present model scheme.
