**2. Malaria parasite course and outcome of infection**

### **2.1 Quantitative malaria-related phenotypes**

268 Malaria Parasites

optimises its exploitation of and subsequent transmission from the human host. The parasite must persist and transmit to mosquitoes in the face of very differing immune environments. The differential impact of human genetics according to the clinical outcome of infection will throw light on how the parasite manages its strategy for survival and reproduction

Focussing on infection enables implementation of a family-based study design that controls for population sub-structure and admixture. Such an approach would be impractical for the study of severe disease because of its relative infrequency. Longitudinal family-based studies enable a more detailed real-time analysis of the human response to infection. Thus, as well as controlling for population sub-structure, they can (i) reveal how the same individual responds at different times in his life and thus generate insight into the acquisition of clinical and anti-parasite immunity; (ii) enable incorporation of parasite genetics both with respect to the long-term co-evolutionary trajectory of the host-parasite

Evidence for a contribution of host genetic factors to mild clinical malaria and biological phenotypes, such as number of clinical episodes, parasite density, immune responses to *P. falciparum* antigens has progressed with the development of increasingly sophisticated techniques. Population level differences in susceptibility to malaria have been observed between sympatric ethnic groups (Modiano et al., 1996) and, at a finer scale, differential phenotypic expression was observed in monozygotic and dizygotic twins; there was greater phenotypic similarity in monozygotic twins, strongly suggesting genetic control as such twins are genetically more similar than dizygotic twins (Jepson et al., 1995). Segregation studies that assess the extent of phenotypic similarity in families demonstrated cosegregation of parasite density and of prevalence of mild malaria in families (Rihet et al., 1998a). Microsatellite typing of family-based cohorts enabled this segregation to be narrowed down to chromosomal regions (Flori et al., 2003; Garcia et al., 1998; Rihet et al., 1998b; Sakuntabhai et al., 2008; Timmann et al., 2007). Candidate gene approaches have also shown association of specific genetic polymorphisms with mild clinical malaria (Kun et al., 2001; Walley et al., 2004; Williams et al., 1996). Emphasis has understandably been placed on clinical malaria and very few studies have considered asymptomatic malaria and only to a limited extent (Flori et al., 2003; Garcia et al., 1998; Mombo et al., 2003; Rihet et al., 1998a,

This chapter presents a summary of the achievements in the field of genetic analysis to date, the benefits of examining biological parasite phenotypes and the practical aspects of sampling and analysis. Firstly, we discuss in some detail issues concerning phenotype choice, the pros and cons of case/control vs. family based methods, the importance of context-dependency and of taking into account covariates. We then expand upon the utility of heritability analyses and describe the novel methods that should be a requisite for performing a genetic analysis of quantitative malaria parasite phenotypes. We then discuss our own findings using heritability and genome wide analyses that have led us to propose a novel hypothesis concerning the role of allergy in malaria. Finally we outline the future direction that genetic studies should take, most notably concerning the need to develop

tools to examine gene-gene and gene-environment interactions.

duo and the short-term impact of intervention (Loucoubar et al., 2011a).

(transmission).

1998b; Timmann et al., 2007).

The malaria parasite spp. lifecycles will undoubtedly be known to readers or covered in associated chapters. Here, we place the life-cycle within a perspective useful for human genetic studies. Although differing in the details, different *Plasmodium* spp. broadly share three distinct life cycle parts within the human host: invasion and asexual proliferation within the liver, invasion and asexual proliferation within red blood cells and the production of sexual stages, gametocytes, from a proportion of these asexually proliferating haploid parasites within the red blood cells. These gametocytes are essential for successful transmission to mosquitoes and subsequent infection of new human hosts. The parasite, as with any other sexually reproducing eukaryote, will, to the best of its capacity, have evolved to optimally exploit its host and maximise its reproductive rate through infection of new hosts. In turn, the human is expected to have evolved to minimise the damage caused by the parasite. The course of infection and the outcome of the human-parasite interaction are thus quantifiable by measurement of the density of asexual and sexual circulating parasites and the frequency of clinical episodes.

Placing the in-host biology of the parasite within the context of the clinical outcome of the infection is central to unravelling how human genetics impacts upon the pathophysiology of malaria. The clinical outcome of infection ranges from severe through mild disease to asymptomatic infection. Less well documented is the progression of clinical expression during the course of a single infection. Early treatment, thanks to considerable public health efforts, has now reduced the burden of disease and in study cohorts we do not know whether an individual with mild malaria would have progressed to severe disease if left untreated and/or eventually control but not eliminate the infection, leading to a chronic long-term asymptomatic infection. Thus, our focus is on parasite biological phenotypes in the context of symptomatic or asymptomatic infection outcomes, with no division into mild *vs*. severe disease. Thus, we ask why is there variation in the density of asexual parasite stages that individuals can withstand before becoming symptomatic and once symptomatic, why do only some infections attain very high densities. Transmission is a crucial part of the lifecycle for the parasite and there is good evidence that the parasite has evolved to optimise its transmission to mosquitoes with respect to the host response to infection (Paul et al., 2003). We thus examine the human factors that influence gametocyte production and whether they differ in symptomatic and asymptomatic infections. Some biological phenotypes, most notably those pertaining to the exo-erythrocytic stages, are beyond our current ability to measure in sufficient detail but do warrant increased research effort. Preventing liver stage infection and eliminating latent hypnozoites in relapsing species are clear targets for reducing the prevalence of infection.

Major differences in certain life-cycle aspects do exist amongst the *Plasmodium* spp. infecting man and surprisingly little is known about the biology or the acquired immune response to species other than *P. falciparum*. The major apparent differences include the capacity to form relapsing latent hypnozoite stages that reside in the liver (*P*. *vivax* and *Plasmodium ovale*), the rate of development of the asexually replicating erythrocytic stages (48hours for *P. falciparum*, *P. vivax* and *P. ovale vs*. 72 hours for *P. malariae*), the capacity for asexual stage parasites to cytoadhere (*P. falciparum*) and the predilection for invading red blood cells of

Human Genetic Contribution to the Outcome of Infection with Malaria Parasites 271

Allergy is the paradigm of such immune system maturation. The maturation of the immune system, both innate and acquired, will in turn impact upon the influence of genetic factors that potentially confer protection to malaria parasites. In short, we advocate taking repeated measures from the same individual, as a single snap-shot will not only miss any variability due to the parasite, but also provide no context within which to characterise the individual,

Despite all this noise and natural variation, however, the genetic signal is seemingly strong enough to be detected for many of malaria-related phenotypes. Fine-tuning of the phenotypes may generate more power in more detailed genetic analyses, and one of the simplest methods to assess the strength of the fine-tuning is to perform heritability analyses

Acquired immunity is a major factor determining the outcome of an infection. The epidemiology of *P. falciparum* is characterised by premunition and the slow development of acquired immunity. In areas of very intense transmission, there is a relatively rapid development of premunition, whereby the individual can tolerate the presence of the parasite without expressing symptoms. Such clinical immunity thus generates a subpopulation who are infected but asymptomatic. As the intensity of transmission decreases, the degree of exposure and age at which premunition develops is progressively later until in areas with low transmission intensity every infection leads to symptoms. The acquisition of immunity that clears the parasite is rarely achieved and only in regions of very intense transmission in old age groups. Acquired clinical and anti-parasite immunity is short-lived and leave of absence of an individual from an endemic area will decrease what little immunity had developed. Thus, human mobility at an individual level is an important confounding factor. In addition, in many areas malaria transmission is seasonal. This seasonal absence/reduction of infectious bites is akin to a period of absence from exposure and can alter the state of the individual and how they respond to an infection in the transmission and "non"-transmission seasons. The epidemiology of *P. falciparum* malaria varies according to the number of infectious bites an individual receives per unit time; importantly, although obvious, the same number of infectious bites spread over two months *vs*. over a year will not yield the same epidemiological profile. The temporal heterogeneity in exposure is a key confounding factor for phenotypic analysis. This will of course be the case for the other species to some extent, but the long duration of *P. malariae* infection and the capacity to relapse for *P. vivax*/*P. ovale* will uncouple the tight link between infectious mosquito bites and the prevalence of infection observed for *P.* 

A second highly important and often neglected context-dependency is the impact of other co-circulating infectious pathogens. It is widely recognised that multiple co-infecting *Plasmodium* spp. affect each other (Bruce et al., 2000) and the debate over the importance of helminth infections for malaria remains unresolved (Nacher et al., 2000; Spiegel et al., 2003). Whilst concomitant infections can be accounted for, the long term impact of *Plasmodium* spp. on each other is an entirely different problem. There is increasing evidence that there is cross-immunity among *Plasmodium* spp. and accounting for this requires longitudinal sampling. Whilst birth cohorts would be optimal, the investment is considerable. As a

beyond obvious factors such as age and gender.

**2.3 Environmental influence and context-dependency** 

(see section 4).

*falciparum*.

differing ages (broadly *P. vivax* and *P. ovale* preferentially invade reticulocytes, *P. malariae* mature red blood cells and *P. falciparum* has a more catholic taste). The duration of a single infection varies significantly: *P. malariae* seemingly lasts up to 30 years despite no evidence of the existence of exo-erythrocytic latent stages; *P. vivax* and most probably *P. ovale* have latent hypnozoite stages and thus although a single blood stage infection may be short-lived (less than a year), relapse extends the duration of a single infection; *P. falciparum* infections can last up to 2 years. Thus, whilst the current emphasis on *P. falciparum* has led to the identification of genetic factors controlling certain clinical and biological phenotypes, their relevance to other species may not be certain and there is much to be done with respect to the three non-falciparum species infecting man (Louicharoen et al., 2009). *Plasmodium knowlesi*, although shown to infect man, has yet to be sufficiently studied to be amenable for human genetic study.

#### **2.2 The phenotype problem**

Just as the grouped nature of severe disease yields a poorly resolved phenotype, precise definition of mild clinical malaria and biological phenotypes is equally important and yet arguably more difficult. Defining what is a clinical episode in an endemic setting where malaria co-exists with sundry other infectious diseases is, for the most part, rather *ad hoc*. There are various statistical methods that attempt to define the proportion of fevers attributable to malaria (Smith et al., 1994), but at an individual level body temperature, symptoms associated with malaria and the presence of parasites define a clinical episode. In more studied populations a threshold of parasite density is used in an attempt to account for the high prevalence of asymptomatic infections and the similarity of malaria symptoms with those of other diseases. In practice, local clinicians tend to know when a clinical presentation is a clinical malaria episode but attempts to quantify this and determine quantifiable measurable criteria lead to highly variable definitions. Indeed, within site variation in symptoms and threshold densities will exist not only according to age, but also as a result of human genetics. Biological phenotypes are as complex to define. Although we can measure, for example, asexual and sexual stage parasite densities, the extent to which such data represent any meaningful measure of the host-parasite interaction needs to be considered. Asexual parasite density can alter rapidly and this is especially the case for *P. falciparum*, which has the capacity to sequester. Sexual stage parasite density will to some extent depend on the asexual parasite density, but the added significance of variation in gametocyte density rather than simply gametocyte positivity for transmission to mosquitoes is debatable (Paul et al., 2007). Moreover, at each step (exo-erythrocytic, asexual erythrocytic and sexual erythrocytic), there will likely be variation among parasite clones in the timing and extent of progression through the life-cycle. Specifically, the pre-patent and the asexual growth periods prior to the production of sexual stages will vary among clones. If timing varies so will the densities of parasite stages. On top of this parasite-specific variation, there will be variation resulting from the host-parasite interaction. This will reflect the extent of parasite-specific immunity developed by the host, the general "condition" of the host and "fixed" host genetic factors. Implicit within such host influence is the notion that the phenotypic expression of human genetic variation impacting upon the parasite will vary for an individual depending on that individual's age and history. Independently of any exposure to the specific parasite species in question, an individual's immune response matures over time and can be shaped by exposure (or lack thereof) to non-infectious agents.

differing ages (broadly *P. vivax* and *P. ovale* preferentially invade reticulocytes, *P. malariae* mature red blood cells and *P. falciparum* has a more catholic taste). The duration of a single infection varies significantly: *P. malariae* seemingly lasts up to 30 years despite no evidence of the existence of exo-erythrocytic latent stages; *P. vivax* and most probably *P. ovale* have latent hypnozoite stages and thus although a single blood stage infection may be short-lived (less than a year), relapse extends the duration of a single infection; *P. falciparum* infections can last up to 2 years. Thus, whilst the current emphasis on *P. falciparum* has led to the identification of genetic factors controlling certain clinical and biological phenotypes, their relevance to other species may not be certain and there is much to be done with respect to the three non-falciparum species infecting man (Louicharoen et al., 2009). *Plasmodium knowlesi*, although shown to infect man, has yet to be sufficiently studied to be amenable for

Just as the grouped nature of severe disease yields a poorly resolved phenotype, precise definition of mild clinical malaria and biological phenotypes is equally important and yet arguably more difficult. Defining what is a clinical episode in an endemic setting where malaria co-exists with sundry other infectious diseases is, for the most part, rather *ad hoc*. There are various statistical methods that attempt to define the proportion of fevers attributable to malaria (Smith et al., 1994), but at an individual level body temperature, symptoms associated with malaria and the presence of parasites define a clinical episode. In more studied populations a threshold of parasite density is used in an attempt to account for the high prevalence of asymptomatic infections and the similarity of malaria symptoms with those of other diseases. In practice, local clinicians tend to know when a clinical presentation is a clinical malaria episode but attempts to quantify this and determine quantifiable measurable criteria lead to highly variable definitions. Indeed, within site variation in symptoms and threshold densities will exist not only according to age, but also as a result of human genetics. Biological phenotypes are as complex to define. Although we can measure, for example, asexual and sexual stage parasite densities, the extent to which such data represent any meaningful measure of the host-parasite interaction needs to be considered. Asexual parasite density can alter rapidly and this is especially the case for *P. falciparum*, which has the capacity to sequester. Sexual stage parasite density will to some extent depend on the asexual parasite density, but the added significance of variation in gametocyte density rather than simply gametocyte positivity for transmission to mosquitoes is debatable (Paul et al., 2007). Moreover, at each step (exo-erythrocytic, asexual erythrocytic and sexual erythrocytic), there will likely be variation among parasite clones in the timing and extent of progression through the life-cycle. Specifically, the pre-patent and the asexual growth periods prior to the production of sexual stages will vary among clones. If timing varies so will the densities of parasite stages. On top of this parasite-specific variation, there will be variation resulting from the host-parasite interaction. This will reflect the extent of parasite-specific immunity developed by the host, the general "condition" of the host and "fixed" host genetic factors. Implicit within such host influence is the notion that the phenotypic expression of human genetic variation impacting upon the parasite will vary for an individual depending on that individual's age and history. Independently of any exposure to the specific parasite species in question, an individual's immune response matures over time and can be shaped by exposure (or lack thereof) to non-infectious agents.

human genetic study.

**2.2 The phenotype problem** 

Allergy is the paradigm of such immune system maturation. The maturation of the immune system, both innate and acquired, will in turn impact upon the influence of genetic factors that potentially confer protection to malaria parasites. In short, we advocate taking repeated measures from the same individual, as a single snap-shot will not only miss any variability due to the parasite, but also provide no context within which to characterise the individual, beyond obvious factors such as age and gender.

Despite all this noise and natural variation, however, the genetic signal is seemingly strong enough to be detected for many of malaria-related phenotypes. Fine-tuning of the phenotypes may generate more power in more detailed genetic analyses, and one of the simplest methods to assess the strength of the fine-tuning is to perform heritability analyses (see section 4).

#### **2.3 Environmental influence and context-dependency**

Acquired immunity is a major factor determining the outcome of an infection. The epidemiology of *P. falciparum* is characterised by premunition and the slow development of acquired immunity. In areas of very intense transmission, there is a relatively rapid development of premunition, whereby the individual can tolerate the presence of the parasite without expressing symptoms. Such clinical immunity thus generates a subpopulation who are infected but asymptomatic. As the intensity of transmission decreases, the degree of exposure and age at which premunition develops is progressively later until in areas with low transmission intensity every infection leads to symptoms. The acquisition of immunity that clears the parasite is rarely achieved and only in regions of very intense transmission in old age groups. Acquired clinical and anti-parasite immunity is short-lived and leave of absence of an individual from an endemic area will decrease what little immunity had developed. Thus, human mobility at an individual level is an important confounding factor. In addition, in many areas malaria transmission is seasonal. This seasonal absence/reduction of infectious bites is akin to a period of absence from exposure and can alter the state of the individual and how they respond to an infection in the transmission and "non"-transmission seasons. The epidemiology of *P. falciparum* malaria varies according to the number of infectious bites an individual receives per unit time; importantly, although obvious, the same number of infectious bites spread over two months *vs*. over a year will not yield the same epidemiological profile. The temporal heterogeneity in exposure is a key confounding factor for phenotypic analysis. This will of course be the case for the other species to some extent, but the long duration of *P. malariae* infection and the capacity to relapse for *P. vivax*/*P. ovale* will uncouple the tight link between infectious mosquito bites and the prevalence of infection observed for *P. falciparum*.

A second highly important and often neglected context-dependency is the impact of other co-circulating infectious pathogens. It is widely recognised that multiple co-infecting *Plasmodium* spp. affect each other (Bruce et al., 2000) and the debate over the importance of helminth infections for malaria remains unresolved (Nacher et al., 2000; Spiegel et al., 2003). Whilst concomitant infections can be accounted for, the long term impact of *Plasmodium* spp. on each other is an entirely different problem. There is increasing evidence that there is cross-immunity among *Plasmodium* spp. and accounting for this requires longitudinal sampling. Whilst birth cohorts would be optimal, the investment is considerable. As a

Human Genetic Contribution to the Outcome of Infection with Malaria Parasites 273

Association studies allow identification of genes and their allelic variants involved in susceptibility to disease. They are indispensable for identifying susceptibility genes after candidate chromosomal regions have been revealed by genetic linkage study. The basic method of study compares the allele frequency of a genetic marker from affected (i.e. expressing phenotype) individuals and unaffected control individuals (case-control studies), chosen randomly from a population. The marker used may be a polymorphism without causal relationship to the phenotype or a mutation in a gene candidate. A positive result suggests that the marker studied is involved either directly or by virtue of being linked to the causal gene (i.e. the marker is in linkage disequilibrium with the causal gene whereby marker and causal alleles co-occur more frequently than they would by chance). The major problem with case-control studies is the possibility of false positive results due to differences in environmental factors that influence the development or the evolution of the phenotype being studied. The choice of the control population is one of the most important problems of case-control study: if the control group are not from the same population as the affected individuals, uncontrolled environmental factors or population stratification might induce false positive association. Family-based studies not only account for population stratification, but also increase environmental homogeneity. Classically the major advantage of case-control studies over family-based designs has concerned power. All individuals in case/control studies are unrelated and are thus independent data points, whereas families include individuals who are related and not independent. The non-independent nature of phenotypic data from related individuals can, in fact, be accounted for, as will be discussed in the next section. Moreover, not only is this received wisdom of contrasting power not likely to be as general as believed (Knight & Camp, 2011), but also improved sequencing technology will likely increase the power of family-based designs (Ott et al., 2011). The major limitation of family-based designs remains the identification of sufficient numbers of affected individuals *per se*. Although this may be a problem for extreme phenotypes (e.g. cerebral malaria), it is not for mild malaria and biological phenotypes. Finally, family-based designs offer the possibility of repeated measures, thereby providing a more detailed and complete picture of how an individual responds to an infection. The downside of such longitudinal studies, other than the cost, is the impact that increased access to treatment will

have and the potential bias of studying a well-treated population.

Heritability is an important parameter that indicates the genetic contribution underlying an observed phenotype and provides an indication of the power to detect the effect of individual genes when performing GWAS. A large heritability implies a strong correlation between phenotype and genotype, so that loci with an effect on the phenotype can be more easily detected (Visscher et al., 2008). Estimation of heritability in its broad sense in natural populations is not possible and hence narrow sense heritability, which estimates the additive genetic contribution, is calculated. Actual values of heritability are specific for a study population at a particular time and thus not strictly comparable among studies,

**4. Preliminary genetic analyses – Heritability** 

**4.1 Application to natural populations** 

although broad trends can be inferred.

**3.2 Family-based versus case control** 

proxy, the development of serological methods that could stratify populations according to level of exposure to all co-circulating would provide a useful tool to examine the long term effects of infection by the community of pathogens on the pathogen of interest.

#### **3. Study populations – Who and how many**

A major requisite in any epidemiological study design is defining the sample size that can give the power to detect the effect of interest. For genetic studies, the response traditionally given is "as many as possible" and generally Genome Wide Association studies (GWAS) aim for sample sizes in the thousands. Sample size requirements impose a huge burden and constraint on research and for genetic studies, it is customary that the identified candidate genes are confirmed in a replicate study. The cost of such an endeavour is prohibitive and available to very few laboratories world-wide. Moreover, such large numbers will necessarily include populations from different environments and thus be immediately confounded. For complex diseases, such as malaria, single large effect genes are few and far between. Detecting small effect genes will require a large sample size, but reducing the stringency of the acceptance threshold for candidate gene nomination should be considered. This is especially true if the emphasis shifts from finding the gene, as in monogenic diseases, to identifying important biological pathways. There are multiple solutions to increasing resistance to parasites and repetitive identification of genes involved in specific biological pathways offers convergence towards understanding what governs the outcome of infection.

#### **3.1 Replication**

Genetic studies require that candidate genes are confirmed in separate populations. Replication is, however, frequently difficult and causal polymorphisms often have a low effect, increasing, for example, the risk of developing the disease by less than 5-10% (Wu et al., 2010). Moreover, whilst the assumption that there are a few key genes resulting in pathology for non-infectious diseases may be justifiable, this may not be the case for infectious diseases. Malaria is a good example where selective pressure on different populations has occurred relatively recently and thus different ethnicities have evolved different protective mechanisms. Replicating single genetic candidate polymorphisms may not therefore be an entirely appropriate approach and when performed should consider the ethnicities of the study populations. The focus should therefore be placed on the functional consequences of a mutation during an infection and this with reference to the biology of the pathogen and the normal host response. Malaria, on the face of it, is a prime example of this. Sickle cell trait confers protection and yet there are numerous other haemoglobin mutations selected in different ethnicities that potentially offer the same protective solution but via different mechanisms. Whilst some mutations may well exert their protective effect through the same mechanism (e.g. HbC), others may not. The co-occurrence of multiple putatively protective mutations introduces considerable analytical complexity that demands more rigorous consideration than has hitherto been enacted. Furthermore, the emphasis is necessarily shifting away from a pure candidate gene approach to one that considers all the single nucleotide polymorphisms (SNPs) at a locus of interest and focuses on the biological consequences of the mutation.

#### **3.2 Family-based versus case control**

272 Malaria Parasites

proxy, the development of serological methods that could stratify populations according to level of exposure to all co-circulating would provide a useful tool to examine the long term

A major requisite in any epidemiological study design is defining the sample size that can give the power to detect the effect of interest. For genetic studies, the response traditionally given is "as many as possible" and generally Genome Wide Association studies (GWAS) aim for sample sizes in the thousands. Sample size requirements impose a huge burden and constraint on research and for genetic studies, it is customary that the identified candidate genes are confirmed in a replicate study. The cost of such an endeavour is prohibitive and available to very few laboratories world-wide. Moreover, such large numbers will necessarily include populations from different environments and thus be immediately confounded. For complex diseases, such as malaria, single large effect genes are few and far between. Detecting small effect genes will require a large sample size, but reducing the stringency of the acceptance threshold for candidate gene nomination should be considered. This is especially true if the emphasis shifts from finding the gene, as in monogenic diseases, to identifying important biological pathways. There are multiple solutions to increasing resistance to parasites and repetitive identification of genes involved in specific biological pathways offers convergence towards understanding what governs the outcome of

Genetic studies require that candidate genes are confirmed in separate populations. Replication is, however, frequently difficult and causal polymorphisms often have a low effect, increasing, for example, the risk of developing the disease by less than 5-10% (Wu et al., 2010). Moreover, whilst the assumption that there are a few key genes resulting in pathology for non-infectious diseases may be justifiable, this may not be the case for infectious diseases. Malaria is a good example where selective pressure on different populations has occurred relatively recently and thus different ethnicities have evolved different protective mechanisms. Replicating single genetic candidate polymorphisms may not therefore be an entirely appropriate approach and when performed should consider the ethnicities of the study populations. The focus should therefore be placed on the functional consequences of a mutation during an infection and this with reference to the biology of the pathogen and the normal host response. Malaria, on the face of it, is a prime example of this. Sickle cell trait confers protection and yet there are numerous other haemoglobin mutations selected in different ethnicities that potentially offer the same protective solution but via different mechanisms. Whilst some mutations may well exert their protective effect through the same mechanism (e.g. HbC), others may not. The co-occurrence of multiple putatively protective mutations introduces considerable analytical complexity that demands more rigorous consideration than has hitherto been enacted. Furthermore, the emphasis is necessarily shifting away from a pure candidate gene approach to one that considers all the single nucleotide polymorphisms (SNPs) at a locus of interest and focuses on the biological

effects of infection by the community of pathogens on the pathogen of interest.

**3. Study populations – Who and how many** 

infection.

**3.1 Replication** 

consequences of the mutation.

Association studies allow identification of genes and their allelic variants involved in susceptibility to disease. They are indispensable for identifying susceptibility genes after candidate chromosomal regions have been revealed by genetic linkage study. The basic method of study compares the allele frequency of a genetic marker from affected (i.e. expressing phenotype) individuals and unaffected control individuals (case-control studies), chosen randomly from a population. The marker used may be a polymorphism without causal relationship to the phenotype or a mutation in a gene candidate. A positive result suggests that the marker studied is involved either directly or by virtue of being linked to the causal gene (i.e. the marker is in linkage disequilibrium with the causal gene whereby marker and causal alleles co-occur more frequently than they would by chance). The major problem with case-control studies is the possibility of false positive results due to differences in environmental factors that influence the development or the evolution of the phenotype being studied. The choice of the control population is one of the most important problems of case-control study: if the control group are not from the same population as the affected individuals, uncontrolled environmental factors or population stratification might induce false positive association. Family-based studies not only account for population stratification, but also increase environmental homogeneity. Classically the major advantage of case-control studies over family-based designs has concerned power. All individuals in case/control studies are unrelated and are thus independent data points, whereas families include individuals who are related and not independent. The non-independent nature of phenotypic data from related individuals can, in fact, be accounted for, as will be discussed in the next section. Moreover, not only is this received wisdom of contrasting power not likely to be as general as believed (Knight & Camp, 2011), but also improved sequencing technology will likely increase the power of family-based designs (Ott et al., 2011). The major limitation of family-based designs remains the identification of sufficient numbers of affected individuals *per se*. Although this may be a problem for extreme phenotypes (e.g. cerebral malaria), it is not for mild malaria and biological phenotypes. Finally, family-based designs offer the possibility of repeated measures, thereby providing a more detailed and complete picture of how an individual responds to an infection. The downside of such longitudinal studies, other than the cost, is the impact that increased access to treatment will have and the potential bias of studying a well-treated population.

## **4. Preliminary genetic analyses – Heritability**

#### **4.1 Application to natural populations**

Heritability is an important parameter that indicates the genetic contribution underlying an observed phenotype and provides an indication of the power to detect the effect of individual genes when performing GWAS. A large heritability implies a strong correlation between phenotype and genotype, so that loci with an effect on the phenotype can be more easily detected (Visscher et al., 2008). Estimation of heritability in its broad sense in natural populations is not possible and hence narrow sense heritability, which estimates the additive genetic contribution, is calculated. Actual values of heritability are specific for a study population at a particular time and thus not strictly comparable among studies, although broad trends can be inferred.

Human Genetic Contribution to the Outcome of Infection with Malaria Parasites 275

bias introduced by multiple measures from the same individual and the fact that individuals are related. The individual observations are not independent. Although taking a single measure for an individual can overcome the first issue, multiple measures from the same individual are informative as they provide a notion of the repeatability of the phenotype and enables calculation of the intra-individual or permanent environment effect. This intraindividual variation contains features that are particular to each individual. This will include house effects, maternal effects, individual behaviour and non-additive genetic effects. The house and maternal effect can be taken into account by using appropriate matrices; each pair of individuals either do (1) or do not (0) share the same house or mother. Creating a genetic relatedness matrix of the study population is not only central for heritability calculations, but extremely useful to take into account the non-independence of individuals when performing classical regression analyses. The genetic covariance (the familial relationship) among all pairs of individuals in the study cohort can be simply

For A and B, a given pair in a pedigree, the genetic covariance is computed as r(A,B) = 2x coancestry(A,B) where the coancestry between A and B is calculated using the method presented in Falconer and Mackay (1996): *coancestry*(A,B) = *p*(1/2)*n(p)*(1 + I *Common Ancestor*) where p is the number of paths in the pedigree linking A and B, n(p) the number of individuals (including A and B) for each path p and IX is the inbreeding coefficient of an individual X, which is equal to the coancestry between the two parents of X. IX is set to 0 if X is a founder. The consequent Pedigree-based genetic relatedness matrix has dimensions KxK, where K is the total number of individuals in the pedigree including those with missing phenotypes. This matrix can be built using INBREED procedure of SAS. A house matrix can also be constructed whereby a value of 1 is ascribed if the relative pair shares the same house or 0 otherwise. Likewise, to examine the extent that there are maternal effects

Repeated measures analyses are best handled using Generalized Linear Mixed Models (GLMM). Mixed models enable fitting of random effects. Random effects are assumed to be normally distributed, and conditional on these random effects, data can have any distribution in the exponential family (e.g. Gaussian, Binomial, or Poisson). For repeated measures of unrelated individuals the random variable would simply be the individual identity. For related individuals, the genetic relatedness matrix will take into account the individual repeated measures plus the bias introduced by the non-independence of

Heritability analyses seek to decompose the total variance of the phenotype in question into components explained by additive genetic, intra-individual, house effects. Heritability is the proportion of the phenotypic variance that is due to additive genetics. Other covariates, such as age, gender etc, can also be taken into account. Although there are several programs able to perform such analyses, we have found that SAS offers a complete and yet flexible library of procedures (version 9.1.3, SAS Institute Inc., Cary, NC, USA), notably GLIMMIX, MIXED and INBREED. For count outcome variables (e.g. parasite density, number of clinical episodes per unit time), a Poisson regression model is fitted, which explicitly takes into account the non-negative integer-valued aspect of the dependent count variable. Therefore a GLMM with a Poisson distribution can be fitted using GLIMMIX and *log* as the

calculated using the pedigree information as follows:

observations from related individuals.

that are passed onto offspring, a maternal matrix can be established.

Heritability analyses have until recently remained the quantitative tool of animal and plant breeders. They have been relatively ignored by human geneticists and the study of natural populations for several reasons. Firstly, to generate sufficient data, well-conducted longitudinal family-based epidemiological studies that take into account confounding environment factors are required (Ntoumi et al., 2007). This requires a considerable investment. Secondly, because the genetic component is not measured directly but is inferred from the resemblance between relatives and because relatives often live in the same house, differentiating genetic from the shared environment is problematical. Inadvertent exclusion of a key environmental factor would erroneously lead to substantial overestimates of heritability. Thirdly, the statistical methods that can manage repeated measures inherent in longitudinal surveys for robust heritability analyses have only recently been developed. Finally, given the relative inaccuracy of heritability estimates and the increasing ease with which genome wide analyses can be performed, the added value of calculating heritability has been considered questionable.

This view of the utility of heritability analyses has been largely colored by its extensive historical use in breeding programs where projections of selection experiments are invaluable. For the study of complex infectious diseases, the value of heritability lies elsewhere and goes beyond the simple question of evaluating the potential genetic contribution to a phenotype. This "novel" value of heritability is well exemplified by the recent observation that there is considerable missing heritability in GWAS of more complex diseases (Manolio et al., 2009); only a fraction of estimated heritability can be accounted for by the genes identified in GWAS. Without initial estimation of heritability, this anomaly would not have been identified. The potential causes for this, include potentially important roles of epistasis, gene-environmental interaction and the confounding effect of population specific genetic architecture (Eichler et al., 2010). In addition to genetic explanations, one potential source contributing to the missing heritability concerns the phenotype; poorly resolved phenotypes lower the power to detect genetic variants (van der Sluis et al., 2010). One important point often misunderstood is that the absence of heritability of a phenotype implies no genetic contribution – this is not true. Narrow sense heritability measures the proportion of the variance in the phenotype explained only by additive genetics and there can be non-additive genetic effects. Furthermore, a causal variant that has an effect on a phenotype, but which is present at 100% allelic frequency will have zero heritability. Conversely, a large heritability does not imply that only a few genes are involved.

#### **4.2 Repeated measures, complex pedigrees and statistical analyses**

Historically, heritability analyses have used single measures of the phenotype or a summary variable when repeated measures were performed. Such summary measures tend to lead to over-inflated estimates of heritability and in the advent of available statistical methods, should be avoided. Likewise, heritability analyses used to analyse the residual of the phenotype after having taken into account other covariates. Such an approach assumes that there is no interaction between the genetic factors and these other covariates, an assumption that is likely to be invalid. Statistical methods now exist whereby simultaneous analysis of the genetic and environmental contribution to a phenotype is possible.

Heritability analyses of phenotypes gathered through repeated measurements of individuals from a community with a complex pedigree structure must take into account the

Heritability analyses have until recently remained the quantitative tool of animal and plant breeders. They have been relatively ignored by human geneticists and the study of natural populations for several reasons. Firstly, to generate sufficient data, well-conducted longitudinal family-based epidemiological studies that take into account confounding environment factors are required (Ntoumi et al., 2007). This requires a considerable investment. Secondly, because the genetic component is not measured directly but is inferred from the resemblance between relatives and because relatives often live in the same house, differentiating genetic from the shared environment is problematical. Inadvertent exclusion of a key environmental factor would erroneously lead to substantial overestimates of heritability. Thirdly, the statistical methods that can manage repeated measures inherent in longitudinal surveys for robust heritability analyses have only recently been developed. Finally, given the relative inaccuracy of heritability estimates and the increasing ease with which genome wide analyses can be performed, the added value of calculating

This view of the utility of heritability analyses has been largely colored by its extensive historical use in breeding programs where projections of selection experiments are invaluable. For the study of complex infectious diseases, the value of heritability lies elsewhere and goes beyond the simple question of evaluating the potential genetic contribution to a phenotype. This "novel" value of heritability is well exemplified by the recent observation that there is considerable missing heritability in GWAS of more complex diseases (Manolio et al., 2009); only a fraction of estimated heritability can be accounted for by the genes identified in GWAS. Without initial estimation of heritability, this anomaly would not have been identified. The potential causes for this, include potentially important roles of epistasis, gene-environmental interaction and the confounding effect of population specific genetic architecture (Eichler et al., 2010). In addition to genetic explanations, one potential source contributing to the missing heritability concerns the phenotype; poorly resolved phenotypes lower the power to detect genetic variants (van der Sluis et al., 2010). One important point often misunderstood is that the absence of heritability of a phenotype implies no genetic contribution – this is not true. Narrow sense heritability measures the proportion of the variance in the phenotype explained only by additive genetics and there can be non-additive genetic effects. Furthermore, a causal variant that has an effect on a phenotype, but which is present at 100% allelic frequency will have zero heritability.

Conversely, a large heritability does not imply that only a few genes are involved.

Historically, heritability analyses have used single measures of the phenotype or a summary variable when repeated measures were performed. Such summary measures tend to lead to over-inflated estimates of heritability and in the advent of available statistical methods, should be avoided. Likewise, heritability analyses used to analyse the residual of the phenotype after having taken into account other covariates. Such an approach assumes that there is no interaction between the genetic factors and these other covariates, an assumption that is likely to be invalid. Statistical methods now exist whereby simultaneous analysis of

Heritability analyses of phenotypes gathered through repeated measurements of individuals from a community with a complex pedigree structure must take into account the

**4.2 Repeated measures, complex pedigrees and statistical analyses** 

the genetic and environmental contribution to a phenotype is possible.

heritability has been considered questionable.

bias introduced by multiple measures from the same individual and the fact that individuals are related. The individual observations are not independent. Although taking a single measure for an individual can overcome the first issue, multiple measures from the same individual are informative as they provide a notion of the repeatability of the phenotype and enables calculation of the intra-individual or permanent environment effect. This intraindividual variation contains features that are particular to each individual. This will include house effects, maternal effects, individual behaviour and non-additive genetic effects. The house and maternal effect can be taken into account by using appropriate matrices; each pair of individuals either do (1) or do not (0) share the same house or mother.

Creating a genetic relatedness matrix of the study population is not only central for heritability calculations, but extremely useful to take into account the non-independence of individuals when performing classical regression analyses. The genetic covariance (the familial relationship) among all pairs of individuals in the study cohort can be simply calculated using the pedigree information as follows:

For A and B, a given pair in a pedigree, the genetic covariance is computed as r(A,B) = 2x coancestry(A,B) where the coancestry between A and B is calculated using the method presented in Falconer and Mackay (1996): *coancestry*(A,B) = *p*(1/2)*n(p)*(1 + I *Common Ancestor*) where p is the number of paths in the pedigree linking A and B, n(p) the number of individuals (including A and B) for each path p and IX is the inbreeding coefficient of an individual X, which is equal to the coancestry between the two parents of X. IX is set to 0 if X is a founder. The consequent Pedigree-based genetic relatedness matrix has dimensions KxK, where K is the total number of individuals in the pedigree including those with missing phenotypes. This matrix can be built using INBREED procedure of SAS. A house matrix can also be constructed whereby a value of 1 is ascribed if the relative pair shares the same house or 0 otherwise. Likewise, to examine the extent that there are maternal effects that are passed onto offspring, a maternal matrix can be established.

Repeated measures analyses are best handled using Generalized Linear Mixed Models (GLMM). Mixed models enable fitting of random effects. Random effects are assumed to be normally distributed, and conditional on these random effects, data can have any distribution in the exponential family (e.g. Gaussian, Binomial, or Poisson). For repeated measures of unrelated individuals the random variable would simply be the individual identity. For related individuals, the genetic relatedness matrix will take into account the individual repeated measures plus the bias introduced by the non-independence of observations from related individuals.

Heritability analyses seek to decompose the total variance of the phenotype in question into components explained by additive genetic, intra-individual, house effects. Heritability is the proportion of the phenotypic variance that is due to additive genetics. Other covariates, such as age, gender etc, can also be taken into account. Although there are several programs able to perform such analyses, we have found that SAS offers a complete and yet flexible library of procedures (version 9.1.3, SAS Institute Inc., Cary, NC, USA), notably GLIMMIX, MIXED and INBREED. For count outcome variables (e.g. parasite density, number of clinical episodes per unit time), a Poisson regression model is fitted, which explicitly takes into account the non-negative integer-valued aspect of the dependent count variable. Therefore a GLMM with a Poisson distribution can be fitted using GLIMMIX and *log* as the

Human Genetic Contribution to the Outcome of Infection with Malaria Parasites 277

composition of this community is majoritarily Karen (85%), with Thai (14%) and the rest are Mon and Burmese (1%). The total pedigrees are comprised of 2,427 individuals, including absent or deceased relatives. There are 238 independent families containing 603 nuclear families; the majority are 2 generation-families with family size range from 3 to 958. The epidemiology of malaria has been described elsewhere (Phimpraphi et al., 2008a). Briefly, the incidence of malaria is highly seasonal with annual peaks in May-June. Incidence was low, peaking at 141 episodes of *P. falciparum* per 1000 person-years and 70 for *P. vivax* over the 6-year intense study period. In this site, virtually all infections lead to febrile episodes and thus there is no information on asymptomatic infections. Peak incidence occurs in an earlier age group (5-9 years old) for *P. vivax* than for *P. falciparum* (10-15 years old). Parasite densities of either species peak in the <10 years old age group. Microsatellite genotyping again enabled the construction of a pedigree based on IBD (Phimpraphi et al., 2008b).

**5.2 Heritability of malaria-related phenotypes with differing transmission intensity** 

Senegal

50%

25%

Prevalence

Mean density

Max density

Prevalence

Mean density

*P. falciparum*

Max density

Prevalence

Mean density

Max density

Prevalence

Mean density

Fig. 1. Percentage of variance in malaria-related phenotypes explained by additive genetics (red), house (orange), hamlet (light blue), age (dark blue), date (turquoise), days in village (mauve), asexual parasite density (dark grey) and unknown (light grey). Malaria-related phenotypes include the prevalence of asymptomatic and clinical episodes and the mean and maximum asexual parasite density (trophozoite) during asymptomatic and clinical episodes.

Max density

Prevalence

Trophozoite Gametocyte Trophozoite Trophozoite Trophozoite

**Dielmo Ndiop** Thai **Asymtomatic Clinical Asymtomatic Clinical Clinical malaria**

> Mean density

Max density

Prevalence

Mean density

Max density

*P. falciparum P. falciparum P. vivax* Gametocyte Trophozoite Trophozoite

Prevalence

Mean density

Max density

Prevalence

Mean density

Max density

Non-malaria fever

25%

Heritability analyses were conducted on several clinical and biological malaria-related phenotypes. The major non-genetic factors included age and variables concerned with differential extent of exposure on both a temporal (season, year) and spatial scale (hamlet, house). Figure 1 summarises the differential impact of the non-genetic and genetic factors on the number of clinical episodes (*P. falciparum* and *P. vivax*), the asexual parasite density

link function between E(*variable | covariates*). For binary outcome variables (e.g. presence or absence of gametocytes), GLMM are fitted with a Binomial distribution with a *logit* link function. A maximal model with all covariates is fitted and a minimal adequate model including only significant covariates obtained. The effect of each covariate on the outcome variable is estimated taking into account inbreeding, via the genetic relatedness matrix integrated in GLIMMIX using the LDATA option, repeated measures and house effects.
