**4. Persistence criterion for nuclear susceptibility genes for schizophrenia**

As is shown in the previous section, putative pathogenic genes, if located in the mtDNA, are sustained by mutation-selection balance with heterozygote advantage. On the other hand, if located in the ncDNA, they should be sustained by mutation-selection balance without heterozygote advantage. In this section, we introduce our previous work (Doi et al., 2009), in which we carefully re-examined the necessary conditions for putative nuclear susceptibility genes for schizophrenia and deduced a criterion (persistence criterion, or 'P-criterion') that every nuclear susceptibility gene should fulfill for persistence of the disease, and present its applications to association studies for schizophrenia.

#### **4.1 Three basic assumptions**

116 Epidemiology Insights

'Negative frequency-dependent selection' explains the persistence only when the fitness of the affected individuals increases as the prevalence in the general population decreases,

Thus, the remaining possibility for persistence mechanism is mutation-selection balance

**3.2 Heterozygote advantage works in the mitochondrial genome model but not in the** 

'Heterozygote advantage' assumes that the susceptibility alleles increase the fitness of the unaffected gene carriers, thereby sustaining the gene frequencies. This line of explanations may include: (i) physiological advantage (resistance to shock, infections, and poor nutrition etc.), (ii) a higher sexual activity and/or attractiveness, and (iii) creative intelligence or a

This hypothesis needs two lines of confirmation: (a) that unaffected gene carriers have such advantages, and (b) that such advantages really contribute to sufficiently increase their

It seems to gain the confirmation (a). For example, Erlenmeyer-Kimling (1968) reported an increased survival rate of *female* children of parents with schizophrenia, proposing a possible physiological advantage associated with schizophrenia. Kinney et al. (2001), in a well-designed and methodologically sophisticated study, showed that an advantage of everyday creativity was linked to a subtle clinical picture (schizotypal signs) in a non-

However, it lacks the confirmation (b) in the nuclear genome model. This hypothesis, although theoretically plausible and fascinating, has not been supported by most epidemiological studies, which show a decreased reproductive fitness of unaffected siblings of patients with schizophrenia. Although recent large-sampled epidemiological studies (Bassett et al., 1996; McGrath et al., 1999; Haukka et al., 2003; Svensson et al., 2007) have consistently shown that the reproductive fitness of unaffected *female* siblings of patients with schizophrenia is slightly but significantly increased (1.02-1.08), it is not large enough to compensate for the gene loss due to the decreased reproductive fitness of patients (0.2-0.3 in males and 0.4-0.5 in females) and their unaffected male siblings (0.9-1.0) in the nuclear genome model. On the other hand, the latest meta-analysis (Bundy et al., 2011) shows no significant difference between the fertility of parents of patients with schizophrenia and healthy controls, although there is a trend towards parents having more offspring.

Therefore, heterozygote advantage seems not to work in the nuclear genome model.

On the other hand, it works in the mitochondrial genome model because mitochondrial DNA (mtDNA) is transmitted to the next generation only through females. Indeed, we can see that this slightly elevated reproductive fitness of the unaffected female siblings, coupled with the less pronounced decreased reproductive fitness of female patients, is sufficient to compensate for the gene loss; when we calculate , the cross-generational reduction of the frequency of females with a putative pathogenic mtDNA in the general population, using the data in the largest-sampled cohort study to date (Haukka et al., 2003), we have <sup>3</sup> 5.06 10 (**Note**). This figure implies that the gene loss can be balanced by *de novo*

which seems not to be the case with schizophrenia.

higher trait creativity including 'everyday creativity'.

with or without heterozygote advantage.

**nuclear genome model for schizophrenia** 

psychotic sample of schizophrenia offspring.

reproductive fitness.

At first we describe our three basic assumptions.

#### **4.1.1 An ideal human population**

We assume here a random-mating human population with a sufficiently large effective population size at equilibrium, where negative selection pressures on the susceptibility alleles for schizophrenia are predominant and the effect of genetic drift is negligibly small. The prevalence *p* of schizophrenia in this ideal human population is assumed to be stable across generations by mutation-selection balance. Therefore, the gene frequency in the general population (*mG*) is given in terms of the gene frequencies in the affected population (*mA*) and in the unaffected population (*mU*):

$$m\_G = pm\_A + (1 - p)m\_{U\ \text{"{o}"{o}"}} \text{ or } m\_A - m\_G = (1 - p)d\ . \quad d \equiv m\_A - m\_{U\ \text{"{o}"}} \tag{1}$$

#### **4.1.2 Mutation-selection balance in each risk locus**

We assume here that the total of the population frequencies of the pathogenic alleles at *each risk locus* is preserved by mutation-selection balance. Therefore, *mG* ,the cross-generational reduction of the frequency of a pathogenic allele should not be more than the rate of mutations that produce pathogenic variants at the locus. On the other hand, since mutations at the locus include mutations of two directions that produce pathogenic or non-pathogenic variants, the mutation rate at the locus ( ) should be greater than the rate of mutations that produce pathogenic variants at the locus.

Thus we have:

$$
\mu > -\Delta m\_{\odot}.\tag{2}
$$

#### **4.1.3 Multifactorial threshold model**

We assume the multifactorial threshold model, in which quantitative traits such as liability to the disease are determined by multiple genetic and non-genetic factors including a

Impact of Epidemiology on Molecular Genetics of Schizophrenia 119

(1 ) <sup>1</sup>

*sp*

, where *<sup>ν</sup>* is defined as (1 )

Mutation rates on autosomes and the X chromosome almost always fall within the range between <sup>6</sup> 10 and <sup>4</sup> 10 per locus per generation (usually < <sup>5</sup> 10 ; one generation = 20 years) (Nachman & Crowell, 2000) and can be approximated by a linear function of the parental age at least under 30 years for maternal age and under 40 years for paternal age (Risch et al., 1987). Large-sampled cohort studies in Israel, Sweden and Denmark show that the mean age of parents in the general population is ~ 28 years for mothers and ~31 years for fathers; the mean age of both parents is < 29.6 years (Malaspina et al., 2001; El-Saadi et al., 2004).

> 6 44 29.6 <sup>10</sup> 10 1.48 10 20

According to the epidemiological data by Haukka et al. (2003), the estimated values for *p*

We present here some applications of the P-criterion to association studies of schizophrenia. The results suggest that common disease/common variant hypothesis is unlikely to fit schizophrenia and that an enormous sample size is required to detect a nuclear

.

and *s* are <sup>2</sup> *p* 1.29 10 and <sup>1</sup> *s* 6.54 10 . Therefore, we have <sup>3</sup>

, or (1 )

(1 ) <sup>1</sup> *<sup>s</sup> p pd*

On the other hand, the principle of association studies demands: 0 *d* .

associated variant M which is sustained by mutation-selection balance.

Thus we have the criterion for a susceptibility gene:

From the observation (5), we can see that *d*

**4.3 Parameter estimate for schizophrenia** 

Therefore we assume here:

4

mutation rate ( <sup>5</sup> 1.48 10 ), <sup>2</sup>

susceptibility gene for schizophrenia.

1.76 10 for a relatively low mutation rate ( <sup>6</sup> 1.48 10 ).

**4.4 Implications for association studies of schizophrenia** 

0*d*  *<sup>s</sup> p pd*

.

*M M*

*s p*

is monotonically increasing for *sM* ( 0 1 *sM* ) and *s s <sup>M</sup>* 1 holds for the

(1 )

*sp <sup>d</sup> p sp* 

> (1 ) *sp p sp*

1.76 10 for the highest mutation rate ( <sup>4</sup> 1.48 10 ), and

1.76 10 for the average

. (5)

. (6)

implies *<sup>M</sup> s s* for any schizophrenia-

(4)

From (2) and (3) we have:

susceptibility allele M, we have:

Since 1 *M M*

*s s p*

stochastic and/or an epigenetic effect. Under this assumption, the relative fitness as a quantitative trait in the affected population is determined by multiple factors and approximately follows a gamma distribution with a mean(1 ) *s* . (*s* is the selection coefficient of schizophrenia; the mean relative fitness in the normal population is 1.)

Fig. 2. Distribution curve of the reproductive fitness in the affected population

Distribution curve of the reproductive fitness in the affected subpopulation with a schizophrenia-associated allele M never shifts to the right unless M has a strong protective effect (i.e. an effect of elevating carrier's reproductive fitness by reducing severity of and liability to the disease). Therefore, we can assume that *sM* , the selection coefficient in the affected subpopulation with a schizophrenia-associated allele M, is not smaller than *s* (*s s <sup>M</sup>* 1) for a susceptibility allele (**Fig. 2**). The inequality *s s <sup>M</sup>* implies that M is a resistance gene that reduces severity and risk of the disease.

No special assumptions else are required on the allelic structure in each locus, penetrance of each susceptibility gene, and possible interactions among the loci.

#### **4.2 Deduction of the P-criterion**

Now we proceed to deduce the P-criterion. From the assumptions, *m <sup>G</sup>* ' , the population frequency of the schizophrenia-associated allele M in the next generation, is given by:

$$m'\_G = \frac{p \cdot m\_A \cdot (1 - s\_A) + (1 - p) \cdot m\_U \cdot 1}{p \cdot (1 - s\_M) + (1 - p) \cdot 1} = \frac{m\_G - s\_M p m\_A}{1 - s\_M p} \cdot 1$$

Therefore the reduction of the population frequency of the schizophrenia-associated allele M per generation is:

$$-\Delta m\_{\rm G} = m\_{\rm G} - m\_{\rm G}^{\prime} = \frac{s\_M p \{ m\_A - m\_{\rm G} \}}{1 - s\_M p} = p(1 - p)d \cdot \frac{s\_M}{1 - s\_M p} \,. \tag{3}$$

From (2) and (3) we have:

118 Epidemiology Insights

stochastic and/or an epigenetic effect. Under this assumption, the relative fitness as a quantitative trait in the affected population is determined by multiple factors and approximately follows a gamma distribution with a mean(1 ) *s* . (*s* is the selection coefficient

**1**

'

, the population

**affected population**

**affected subpopulation with a non-protective**

**relative fitness**

Distribution curve of the reproductive fitness in the affected subpopulation with a schizophrenia-associated allele M never shifts to the right unless M has a strong protective effect (i.e. an effect of elevating carrier's reproductive fitness by reducing severity of and liability to the disease). Therefore, we can assume that *sM* , the selection coefficient in the affected subpopulation with a schizophrenia-associated allele M, is not smaller than *s* (*s s <sup>M</sup>* 1) for a susceptibility allele (**Fig. 2**). The inequality *s s <sup>M</sup>* implies that M is a

No special assumptions else are required on the allelic structure in each locus, penetrance of

(1 ) (1 ) 1 1 *A A U GMA*

Therefore the reduction of the population frequency of the schizophrenia-associated allele M

( ) '

*mmm p pd*

*p s p s p* .

> (1 ) 1 1 *MAG M*

*M M*

*s p s p* . (3)

*p m s p m ms pm*

*M M*

*s pm m s*

**allele** *M*

of schizophrenia; the mean relative fitness in the normal population is 1.)

**0 0**

resistance gene that reduces severity and risk of the disease.

*m*

*G*

*GG G*

**4.2 Deduction of the P-criterion** 

per generation is:

each susceptibility gene, and possible interactions among the loci.

Now we proceed to deduce the P-criterion. From the assumptions, *m <sup>G</sup>*

frequency of the schizophrenia-associated allele M in the next generation, is given by:

(1 ) (1 ) 1 '

**1-sM 1-s**

Fig. 2. Distribution curve of the reproductive fitness in the affected population

**probability density**

$$\mu > p(1-p)d \cdot \frac{s\_M}{1 - s\_M p} \tag{4}$$

Since 1 *M M s s p* is monotonically increasing for *sM* ( 0 1 *sM* ) and *s s <sup>M</sup>* 1 holds for the

susceptibility allele M, we have:

$$\mu > p(1-p)d \cdot \frac{s}{1-sp} \text{, or } \frac{(1-sp)\mu}{(1-p)sp} > d\text{.} \tag{5}$$

On the other hand, the principle of association studies demands: 0 *d* .

Thus we have the criterion for a susceptibility gene:

$$0 < d < \nu\_{\prime} \quad \text{where } \nu \text{ is defined as} \quad \nu = \frac{(1 - sp)\mu}{(1 - p)sp}. \tag{6}$$

From the observation (5), we can see that *d* implies *<sup>M</sup> s s* for any schizophreniaassociated variant M which is sustained by mutation-selection balance.

#### **4.3 Parameter estimate for schizophrenia**

Mutation rates on autosomes and the X chromosome almost always fall within the range between <sup>6</sup> 10 and <sup>4</sup> 10 per locus per generation (usually < <sup>5</sup> 10 ; one generation = 20 years) (Nachman & Crowell, 2000) and can be approximated by a linear function of the parental age at least under 30 years for maternal age and under 40 years for paternal age (Risch et al., 1987). Large-sampled cohort studies in Israel, Sweden and Denmark show that the mean age of parents in the general population is ~ 28 years for mothers and ~31 years for fathers; the mean age of both parents is < 29.6 years (Malaspina et al., 2001; El-Saadi et al., 2004). Therefore we assume here:

$$10^{-6} < \mu < \frac{29.6}{20} \times 10^{-4} = 1.48 \times 10^{-4} \text{ .}$$

According to the epidemiological data by Haukka et al. (2003), the estimated values for *p* and *s* are <sup>2</sup> *p* 1.29 10 and <sup>1</sup> *s* 6.54 10 . Therefore, we have <sup>3</sup> 1.76 10 for the average mutation rate ( <sup>5</sup> 1.48 10 ), <sup>2</sup> 1.76 10 for the highest mutation rate ( <sup>4</sup> 1.48 10 ), and 4 1.76 10 for a relatively low mutation rate ( <sup>6</sup> 1.48 10 ).

#### **4.4 Implications for association studies of schizophrenia**

We present here some applications of the P-criterion to association studies of schizophrenia. The results suggest that common disease/common variant hypothesis is unlikely to fit schizophrenia and that an enormous sample size is required to detect a nuclear susceptibility gene for schizophrenia.

Impact of Epidemiology on Molecular Genetics of Schizophrenia 121

0.01 < 1.02 < 1.18 < 2.81 0.02 < 1.009 < 1.09 < 1.92 0.05 < 1.004 < 1.04 < 1.38 0.1 < 1.002 < 1.02 < 1.20 0.3 < 1.0009 < 1.009 < 1.09 0.5 < 1.0008 < 1.008 < 1.08 0.7 < 1.0009 < 1.009 < 1.09 0.9 < 1.002 < 1.02 < 1.24 0.95 < 1.004 < 1.04 < 1.58 0.98 < 1.009 < 1.10 < 8.49 Table 1. Upper bounds of odds ratio for given allele frequencies in the unaffected

**4.4.3 Calculation of the required sample size and the power of an association study**  Using the P-criterion we can calculate a lower bound of sample size required in an association study of a given power as well as an upper bound of the power of an association

Concerning the required sample size 2*N* (*N* case-control pairs) and the power 1

2

normal curve, and *x* (population frequency of the allele) and 2

*d*

0.0005 0.9995 *x* , then we have <sup>3</sup> 2 (1 ) 0.9995 10 *x x* . From the P-criterion, we have:

Therefore, we have the following approximation with an error smaller than 0.2 %:

*N*

association study, we have the well-established formulae (Ohashi & Tokunaga, 2002):

1 \* 2 (1 )

2 \* 2 (1 ) <sup>1</sup>

curve, the two sided *α* point (*α:* a significant level) and the upper *β* point of the standard

*x mm <sup>A</sup> <sup>U</sup>* 2 2 <sup>1</sup> (1 ) (1 ) 2 (1 ) <sup>2</sup>

1 1 22 6 1.6 10 2 (1 ) 0.002 2 2

*x x* .

*z x xz*

 

*d*

*Nd z x x* 

2

,

.

are defined as follows.

1.76 10 . Suppose

 

denote the cumulative distribution function of the standard normal

*m mm m xx d A AU U*

1.48 10 , we have <sup>3</sup>

<sup>4</sup> 1.48 10

> of an

<sup>5</sup> 1.48 10

*mU*

population

and

Here, , *z* \*

, and *z*

2

<sup>1</sup> ( )

For the average mutation rate <sup>5</sup>

study of a given sample size.

<sup>6</sup> 1.48 10

#### **4.4.1 Calculation of an upper bound of the effect size of a putative susceptibility gene of a given frequency**

Using the P-criterion, we can calculate an upper bound of the effect size of a putative susceptibility gene of a given frequency.

Effects size of a susceptibility gene M is expressed by odds ratio defined as

$$OR = \frac{m\_A(1 - m\_{\rm ul})}{(1 - m\_A)m\_{\rm ul}}\_{\prime\prime}$$

which is monotonically increasing for *mA* and monotonically decreasing for *mU* . Since the criterion demands *mmm U AU* , we have

$$\frac{m\_{\rm U}(1-m\_{\rm U})}{(1-m\_{\rm U})m\_{\rm U}} < OR < \frac{(m\_{\rm U}+\nu)(1-m\_{\rm U})}{(1-m\_{\rm U}-\nu)m\_{\rm U}}, \text{ or } 1 < OR < 1 + \frac{\nu}{m\_{\rm U}(1-\nu-m\_{\rm U})}\tag{7}$$

for 0 1 *mU* .

And since the criterion demands *m m <sup>A</sup> <sup>U</sup>* , we have

$$1 < OR < \frac{m\_A (1 - m\_A + \nu)}{(1 - m\_A)(m\_A - \nu)} = 1 + \frac{\nu}{(1 - m\_A)(m\_A - \nu)}\tag{8}$$

for 1 1 *mU* .

Thus, we have an upper bound of the effect size for a given frequency.

From the above we can easily see that the common disease/ common variant hypothesis, which proposes that common alleles at a handful of loci interact to cause a common disease, is unlikely to fit schizophrenia. No common alleles with population frequency between 0.05 and 0.95 can have large effects for schizophrenia: the odds ratio of every common risk allele is less than 1.04 for the average mutation rate, less than 1.58 for the highest mutation rate, and less than 1.004 for a relatively low mutation rate (**Table 1**).

#### **4.4.2 Calculation of range of the frequency of a putative susceptibility gene of a given effect size**

By solving the inequality (7) or (8), we can estimate the range of gene frequency for a given effect size. Thus, we can see that susceptibility genes of the average mutation rate and a moderate effect that meet the criterion are limited to 'very rare variants' or 'very common variants'. For example, suppose <sup>5</sup> 1.48 10 and *OR*=5.0, then we have: <sup>2</sup> 1.76 10 and

$$4 < \frac{\nu}{m\_{\mathcal{U}}(1 - \nu - m\_{\mathcal{U}})} \Big|\_{\dots}$$

Solving this inequality, we get either 0 0.00044 *mU* (that is, 0 0.00176 0.0022 *m m A U* ) or *mU* 0.9977 .


Table 1. Upper bounds of odds ratio for given allele frequencies in the unaffected population

#### **4.4.3 Calculation of the required sample size and the power of an association study**

Using the P-criterion we can calculate a lower bound of sample size required in an association study of a given power as well as an upper bound of the power of an association study of a given sample size.

Concerning the required sample size 2*N* (*N* case-control pairs) and the power 1 of an association study, we have the well-established formulae (Ohashi & Tokunaga, 2002):

$$N \equiv \frac{1}{2} \left( \frac{z \, \prescript{\*}{}{\alpha}\_{\alpha} \sqrt{2\varkappa(1-\varkappa)} + z\_{\beta} \gamma}{d} \right)^{2} \,\nu$$

and

120 Epidemiology Insights

**4.4.1 Calculation of an upper bound of the effect size of a putative susceptibility gene** 

Using the P-criterion, we can calculate an upper bound of the effect size of a putative

*m m OR*

which is monotonically increasing for *mA* and monotonically decreasing for *mU* . Since the

*<sup>U</sup>* , we have

From the above we can easily see that the common disease/ common variant hypothesis, which proposes that common alleles at a handful of loci interact to cause a common disease, is unlikely to fit schizophrenia. No common alleles with population frequency between 0.05 and 0.95 can have large effects for schizophrenia: the odds ratio of every common risk allele is less than 1.04 for the average mutation rate, less than 1.58 for the highest mutation rate,

**4.4.2 Calculation of range of the frequency of a putative susceptibility gene of a** 

By solving the inequality (7) or (8), we can estimate the range of gene frequency for a given effect size. Thus, we can see that susceptibility genes of the average mutation rate and a moderate effect that meet the criterion are limited to 'very rare variants' or 'very common

> *m m U U* (1 )

Solving this inequality, we get either 0 0.00044 *mU* (that is,

.

(1 )( ) (1 )( )

*m m m m*

*A A A A*

(1 ) (1 ) *A U A U*

*m m* ,

*OR*

 

(8)

1.48 10 and *OR*=5.0, then we have: <sup>2</sup>

1.76 10 and

*U U* (1 )

(7)

*m m* 

Effects size of a susceptibility gene M is expressed by odds ratio defined as

(1 ) ( )(1 ) (1 ) (1 ) *UU U U U U U U mm m m OR mm m m*

, we have

(1 ) 1 1

*m m OR*

Thus, we have an upper bound of the effect size for a given frequency.

and less than 1.004 for a relatively low mutation rate (**Table 1**).

4

variants'. For example, suppose <sup>5</sup>

0 0.00176 0.0022 *m m A U* ) or *mU* 0.9977 .

*A A*

, or 1 1

**of a given frequency** 

for 0 1 *mU*

for 1 1 

**given effect size** 

*mU* .

susceptibility gene of a given frequency.

criterion demands *mmm U AU*

.

And since the criterion demands *m m <sup>A</sup>*

$$1 - \beta \equiv \Phi\left(\frac{\sqrt{2N}d - z^{\star}\_{\;\;\alpha}\sqrt{2\varkappa(1-\chi)}}{\gamma}\right).$$

Here, , *z* \* , and *z* denote the cumulative distribution function of the standard normal curve, the two sided *α* point (*α:* a significant level) and the upper *β* point of the standard normal curve, and *x* (population frequency of the allele) and 2 are defined as follows.

$$\infty \equiv \frac{1}{2}(m\_A + m\_{\rm ul}) \qquad \gamma^2 \equiv m\_A(1 - m\_A) + m\_{\rm ll}(1 - m\_{\rm ll}) = 2\chi(1 - \chi) - \frac{1}{2}d^2$$

For the average mutation rate <sup>5</sup> 1.48 10 , we have <sup>3</sup> 1.76 10 . Suppose 0.0005 0.9995 *x* , then we have <sup>3</sup> 2 (1 ) 0.9995 10 *x x* . From the P-criterion, we have:

$$\frac{1}{2}d^2 < \frac{1}{2}\nu^2 < 1.6 \times 10^{-6} < 2\chi(1-\chi) \times 0.002\dots$$

Therefore, we have the following approximation with an error smaller than 0.2 %:

Impact of Epidemiology on Molecular Genetics of Schizophrenia 123

coefficient in the affected subpopulation with an allele M and in the affected population respectively. Therefore, such genes, if sustained by mutation-selection balance, cannot be susceptibility genes but resistance genes that reduce severity and risk of the disease (see 4.2). If they were not resistance genes, their frequencies in the affected population must have

**5.2 The results of association studies to date accord with the mitochondrial genome** 

Since a resistance gene in the nuclear genome model cannot be associated with the disease unless it is linked with a susceptibility gene, resistance genes in the nuclear model should be located in the vicinity of susceptibility genes, which disagrees with the results of association

For example, on the chromosome 1, all of the schizophrenia-associated genes that could meet the criterion (*RGS4*, *PLXNA2*, *DISC1*) are located on 1q, while four resistance genes (*MHTFR*, *GRIK3*, *PDE4B*, *GSTM1*) are on 1p (**Table 2**). Fifteen resistance genes are located on 2q, 5q, 7q, 10q, 11p, 12p, 12q, 13p, 13q, 16p, 17p, and 19q, where no schizophreniaassociated variants that could meet the criterion are located (data: not shown). Therefore, the

A possible interpretation which accords with the nuclear genome model might be that many nuclear susceptibility genes of less than the highest mutation rates have not been detected by association studies to date due to lack of power. In this case, however, an enormous sample size (more than 3.7~370 million case-control pairs) would be required to identify them as was mentioned above. In other words, such an enormous sample size is required to

On the other hand, every resistance gene on *any* chromosome can be associated with schizophrenia in the mitochondrial genome model; since mtDNA is transmitted only via females and there is no link between the nuclear genome and the mitochondrial genome, every nuclear genome which interacts with a pathogenic mitochondrial genome to alter severity and risk of the disease is subject to natural selections in the predisposed maternal lineage that succeeds to a same pathogenic mitochondrial genome. Therefore, every resistance gene for schizophrenia in the mitochondrial genome model is to be subject to a positive selection in the predisposed maternal lineage, thereby associating with

Thus, the mitochondrial genome model is compatible with the results of the association

It should be noted that in the mitochondrial genome model every facilitating gene (a gene that increases the severity and morbid risk in the predisposed population) on any chromosome may diminish in the predisposed matrilineal pedigrees by negative selection,

Schizophrenia-associated variants listed in the top 45 in the SZGene Database (the version of 10th December, 2010) were selected. Based on the genotype distributions in meta-analyses, allele frequencies and the case-control differences were calculated. 4 variants at the 3 loci (*RGS4*, *PLXNA2*, *DISC1*) could meet the criterion under the assumption that the mutation

results of association studies to date argue against the nuclear genome model.

been reduced to the same level in the unaffected population.

**model but not with the nuclear genome model** 

studies to date.

schizophrenia.

studies to date.

prove the nuclear genome model.

thereby negatively associating with the disease.

$$
\gamma^2 = 2\varkappa(1-\varkappa) - \frac{1}{2}d^2 \equiv 2\varkappa(1-\varkappa) \text{, or } \gamma \equiv \sqrt{2\varkappa(1-\varkappa)}\dots
$$

Thus, we have:

$$N \equiv \frac{1}{2} \left( \frac{z^\ast \,\_\alpha \sqrt{2\mathbf{x}(1-\mathbf{x})} + z \,\_\beta \gamma}{d} \right)^2 \equiv \left( \frac{z^\ast \,\_\alpha + z \,\_\beta}{d} \right)^2 \mathbf{x}(1-\mathbf{x}) > \left( \frac{z^\ast \,\_\alpha + z \,\_\beta}{\nu} \right)^2 \mathbf{x}(1-\mathbf{x})$$

$$1 - \beta \equiv \Phi \left( \frac{\sqrt{2N}d - z \,^\star \sqrt{2\mathbf{x}(1-\mathbf{x})}}}{\gamma} \right) \equiv \Phi \left( \frac{N}{\mathbf{x}(1-\mathbf{x})}d - z \,^\star \,\_\alpha \right) < \Phi \left( \sqrt{\frac{N}{\mathbf{x}(1-\mathbf{x})}}\nu - z \,^\star \,\_\alpha \right)$$

Let us calculate the required sample size in a genome-wide association study ( <sup>7</sup> 2.5 10 , 1 0.95 ). Since we have 0.00000025 0.05 *z z* \* 6.79 ,

$$N > \left(\frac{z^{\star}\,\_{\alpha} + z\,\_{\beta}}{\nu}\right)^2 \ge (1 - \chi) = \left(\frac{6.79}{1.76 \times 10^{-3}}\right)^2 \ge (1 - \chi) = 3.72 \times 10^6$$

for *x* 0.5 . Therefore, more than 3.7 million case-control pairs are required in a genomewide association study with a power 0.95 to detect a susceptibility variant of the average mutation rate and a population frequency between 0.0005 and 0.9995.

Similarly we can see that more than 37,000 case-control pairs are required in a genome-wide association study with a power 0.95 to detect a susceptibility variant of the highest mutation rate ( <sup>4</sup> 1.48 10 ) and a population frequency between 0.005 and 0.995.

Finally, let us consider the case with a relatively low mutation rate <sup>6</sup> 1.48 10 , which corresponds to <sup>4</sup> 1.76 10 . In this case, more than 370 million case-control pairs are required in a genome-wide association study with a power 0.95 to detect a susceptibility variant of a population frequency between 0.000005 and 0.999995. Therefore it would take more than several hundred years to gather the required number of samples even if all of the affected individuals in the world were to be recruited to the study.

#### **5. Mitochondrial DNA (mtDNA) hypothesis of schizophrenia**

In this final section, we discuss on the nature of those schizophrenia-associated genes that do not meet the P-criterion, suggesting that these genes should be resistance genes that reduce the morbid risk and severity of the disease. We show that the results of association studies to date is compatible with the mitochondrial genome model but not with the nuclear genome model and propose a new hypothesis which assumes that the risk loci are in the mtDNA. We present eight major predictions of this hypothesis, and discuss that these predictions seem to accord with the other epidemiological findings and the results of the genetic and the pathophysiological studies to date.

#### **5.1 Nature of schizophrenia-associated genes that do not meet the P-criterion**

Now, let us consider the nature of those schizophrenia-associated genes that do not meet the persistence criterion. The inequality *d* implies *s s <sup>M</sup>* , where *sM* and *s* denote the selection

<sup>1</sup> \* 2 (1 ) \* \* (1 ) (1 ) <sup>2</sup> *z x xz z z z z N xx xx*

*Nd z x x N N d z <sup>z</sup> x x x x*

3 \* 6.79 (1 ) (1 ) 3.72 10 1.76 10

1.76 10 . In this case, more than 370 million case-control pairs are

 .

 

2 \* 2 (1 ) <sup>1</sup> \* \* (1 ) (1 )

Let us calculate the required sample size in a genome-wide association study

2 2

for *x* 0.5 . Therefore, more than 3.7 million case-control pairs are required in a genomewide association study with a power 0.95 to detect a susceptibility variant of the average

Similarly we can see that more than 37,000 case-control pairs are required in a genome-wide association study with a power 0.95 to detect a susceptibility variant of the highest mutation

required in a genome-wide association study with a power 0.95 to detect a susceptibility variant of a population frequency between 0.000005 and 0.999995. Therefore it would take more than several hundred years to gather the required number of samples even if all of the

In this final section, we discuss on the nature of those schizophrenia-associated genes that do not meet the P-criterion, suggesting that these genes should be resistance genes that reduce the morbid risk and severity of the disease. We show that the results of association studies to date is compatible with the mitochondrial genome model but not with the nuclear genome model and propose a new hypothesis which assumes that the risk loci are in the mtDNA. We present eight major predictions of this hypothesis, and discuss that these predictions seem to accord with the other epidemiological findings and the results of the

**5.1 Nature of schizophrenia-associated genes that do not meet the P-criterion** 

Now, let us consider the nature of those schizophrenia-associated genes that do not meet the

). Since we have 0.00000025 0.05 *z z* \* 6.79 ,

*z z N xx x x*

1.48 10 ) and a population frequency between 0.005 and 0.995.

Finally, let us consider the case with a relatively low mutation rate <sup>6</sup>

mutation rate and a population frequency between 0.0005 and 0.9995.

affected individuals in the world were to be recruited to the study.

genetic and the pathophysiological studies to date.

persistence criterion. The inequality *d*

**5. Mitochondrial DNA (mtDNA) hypothesis of schizophrenia** 

2 2 2

2 (1 ) *x x* .

 

 

6

implies *s s <sup>M</sup>* , where *sM* and *s* denote the selection

1.48 10 , which

2 2 <sup>1</sup> 2 (1 ) 2 (1 ) <sup>2</sup>

*d d*

 

*xx d xx* , or

2.5 10 , 1 0.95

Thus, we have:

( <sup>7</sup>

rate ( <sup>4</sup> 

corresponds to <sup>4</sup> 

coefficient in the affected subpopulation with an allele M and in the affected population

respectively. Therefore, such genes, if sustained by mutation-selection balance, cannot be susceptibility genes but resistance genes that reduce severity and risk of the disease (see 4.2). If they were not resistance genes, their frequencies in the affected population must have been reduced to the same level in the unaffected population.
