**2. Methods**

2 Will-be-set-by-IN-TECH

For clinic-based designs, on the other hand, families are ascertained into the study based on having multiple affected family members in addition to the affected probands. Pedigrees with many cases are highly informative because they are more likely to carry the disease gene mutation, but typically have not been ascertained in any population-based manner. Such families are often identified from high-risk disease clinics and provide substantial information to estimate the disease risk (for example, Kopciuk et al., 2009). Multistage designs (Whittemore & Halpern, 1997; Siegmund et al., 1999) provide an alternative way to efficiently recruit high risk families, often using disease family registries, where families are sampled from more informative groups via several stages. Studies based on these high-risk families can be effective for characterizing the prevalence and penetrance of mutated genes, but it is well known that without proper ascertainment corrections statistical inference would lead to biased estimations of population attributes such as allele frequency, disease risks, and

To allow population-based inference for estimating disease risks associated with mutated genes, family data can be analyzed using various likelihood-based methods (Thomas, 2004). In particular, ascertainment-corrected likelihood approaches have been developed by several authors (for example, Choi et al., 2008; Carayol & Bonaïti-Pellié, 2004; Kraft & Thomas, 2000; Le Bihan et al., 1995). Based on the survival approach, Le Bihan et al. (1995) formulated a prospective likelihood for modeling phenotypes as the age of onset and disease status given genotypes, and corrected the likelihood by the probability of families being ascertained for study. This approach is natural as it models phenotypes as a function of genotype and covariates, but the ascertainment scheme has to be clearly known and simple enough to make proper correction. On the other hand, the retrospective likelihood models genotypes conditioning on the phenotypes of all family members (Carayol & Bonaïti-Pellié, 2004; Kraft & Thomas, 2000; Schaid et al., 2010). Although this approach provides the most robust way to obtain consistent estimates of relative risk even with the ascertainment schemes that are imprecisely defined or complex, it encounters the computational burden of summing over possible genotypes of all family members and a decreased efficiency resulting from conditioning. Choi et al. (2008) adapted the retrospective likelihood conditioning only on phenotypes of individuals who were involved in the ascertainment criteria; for families sampled from the population-based designs, only probands were used to correct for the ascertainment, whereas for families from the clinic-based designs, the probands and their parents and sibs were used for ascertainment correction. Moreover, Schaid et al. (2010) accommodated the composite likelihood approach to obtaining the retrospective likelihood based on all possible pairs of individuals in families to reduce the computational burden.

The main objectives of this article are first, to examine the effects of misspecification of study designs when more appropriate study designs have been ignored or incorrectly specified in the analysis; second, to provide simple and easy to apply adjustment schemes for estimating disease risks by combining family data from different study designs; and third, to develop an Expectation-Maximization algorithm to infer missing genotypes in the estimation of disease risks. We start with describing ascertainment-corrected likelihood methods to take the study design into account and propose a likelihood-based approach to estimating the disease risks for combined family data collected under different study designs. The performance of these ascertainment-corrected likelihood methods is evaluated in terms of bias and efficiency. The effect of design misspecification is examined for estimating the disease risks associated with mutated genes. The bias and efficiency involved in estimating two disease risks are

penetrance of the mutated genes.
