**2. Genome wide association and common variants**

day functioning, not be accounted for by general developmental delays, and be present from

For large-scale genome analyses, DSM criteria have been considered insufficiently precise, and cases are often selected using scores from the Autism Diagnostic Interview (ADI-R) [2], Autism Diagnostic Observation Schedule (ADOS) [3], and/or Social Responsiveness Scale (SRS) [4]. These instruments offer a more robust psychometric platform, and cases defined as "autism" are required to meet strict threshold criteria (e.g. all sub-dimensions of the ADI-R and ADOS). Individuals not quite meeting these criteria may be subsumed under the "broader" autism phenotype, which also typically includes Asperger syndrome, childhood disintegrative disorder and pervasive developmental disorder not otherwise specified. A di‐ agnosis of Rett syndrome—which has a reportedly distinct pathophysiology, clinical course, and diagnostic strategy (Levy, Mandell & Schultz, 2009) [5] and will likely be removed in the impending publication of DSM-V—is typically exclusionary. Intellectual impairment, which is often co-morbid with ASD (Dawson *et al*., 2007; Bölte *et al*., 2009) [6,7] is not an ex‐ clusionary criterion, but is co-varied in statistical analyses. Given the broad range of IQ tests

and their associated psychometric properties, this requires considerable finesse.

increasing statistical power of relevant analyses.

(2005) [12] suggested that this risk may be even greater.

**1.3. Early genetic studies – insights from Rett syndrome and Fragile X**

**1.2. Heritability of ASD**

Standardization of diagnostic criteria has facilitated the accumulation of large ASD samplesets, where institutions can share (de-identified) data. In this vein, initiatives such as the Au‐ tism Genome Project include data from several thousand ASD individuals, greatly

Although Skuse (2007) [8] cautions that heritability estimates of ASD may have been skewed by the co-inheritance of (low) intelligence or other variables, there is little doubt that genetic factors play a key role in autism. In the most widely-cited twin study, Bailey *et al.* (1995) [9] report that monozygotic twins are 92% concordant on a broad spectrum of cognitive or so‐ cial abnormalities, compared with only 10% for dizygotic twins. Parents and siblings of indi‐ viduals with ASD often exhibit subsyndromal levels of impairment (Piven *et al.*, 1997) [10], and having an affected sibling is the single biggest risk factor for developing an ASD. In an analysis of 943,664 Danish children (Lauritsen *et al.*, 2005) [11], the strongest predictors of autism were siblings with ASD, who conferred a 22-fold increased risk, while Fombonne

Early efforts to identify the genetic causes of ASD utilized linkage and association ap‐ proaches. Linkage studies, more prominent in the 1980s and 1990s, typically focus on fami‐ lies or larger pedigrees and are well powered to identify rarer genetic variants. The most common linkage approach is the affected sib-pair design (see O'Roak & State, 2008) [13], which examines the transmission of genomic segments through generations. Linkage stud‐ ies helped define the locus containing *FMR1*, which is mutated in fragile X syndrome (e.g. Richards *et al.*, 1991) [14], Approximately 30% of children with fragile X syndrome meet cri‐

early childhood.

276 Recent Advances in Autism Spectrum Disorders - Volume I

Aside from notable successes with fragile X and Rett syndrome, early linkage and associa‐ tion studies have been inconsistent in resolving more complex genetic correlates of ASD, and candidate genes have often not being replicated between studies. These challenges may in part be accounted-for by their relatively low resolution/coverage, making it difficult to detect candidate loci other than those of major effect. A shift in technology was required to get beyond such challenges, which was engendered by the introduction of high-resolution single nucleotide polymorphism (SNP) arrays. SNP arrays provided coverage of many thou‐ sand (now several million) common SNPs, which could be examined at a relatively low cost across large sample sets.

Genome-wide association studies (GWAS) examine the frequency of SNPs in case *versus* control populations, and can adopt either a case-control or family-based design. The former allows researchers to avoid the often complex process of acquiring diagnostic/phenotype da‐ ta from a patient's family, and can incorporate very large numbers of control datasets that may be more readily available. The latter controls for the often confounding phenomenon of population stratification, where variants more common to specific racial groups may either be erroneously identified as causal, or obscure actual causal variants. A major caveat with family-based designs is the often unfounded assumption that unaffected family members do not share causal variants.

GWAS test for common variants (>1% population frequency), with the assumption that ASD are at least in part caused by the coinheritance of multiple risk variants, each of small indi‐ vidual effect (odds ratios typically between ~1:1 and ~1:5). This assumption is known as the common disease-common variant (CDCV) model (Risch & Merikangas, 1996) [20].

#### **2.1. The 5p14.1 locus**

A 2009 paper from our laboratory (Wang et al., 2009 [21]) was the first to identify common variants for ASD on a genome-wide scale. Our group examined 780 families (3,101 individu‐ als) with affected children, a second, independent group of 1,204 affected individuals, and 6,491 controls. All were of European ancestry. We identified six genetic markers on chromo‐ some 5 in the 5p14.1 region that confirmed susceptibility to ASD. This locus has been repli‐ cated in two additional independent cohort studies (Ma *et al.*, 2009; St Pourcain *et al.*, 2010) [22, 23], lending further support for 5p14 as associated with ASD risk.

(*r2*

>0.98). Using resequencing to compare relevant genotypes, they identified highly signifi‐ cant differences in *MSNP1AS* expression. Thus, the T/T genotype at rs7704909 corresponded to a 23.3 fold increase in *MSNP1AS* RNA compared to the C/C genotype. For the rs4307059, the T/T *versus* C/C genotype corresponded to a 22.0-fold increase in *MSNP1A* expression. For rs12518194, the A/A *versus* G/G genotype corresponded to a 10.8 fold increase in *MSNP1A* expression. Again, there was no evidence of increased/decreased expression dif‐

Autism Spectrum Disorders: Insights from Genomics

http://dx.doi.org/10.5772/54357

279

**Figure 1.** Genome-wide association results at the 5p14.1 region. **a**, A Manhattan plot shows the log10(*p* values) of SNPs from their combined association analysis. **b**, The 5p14.1 region in the UCSC Genome Browser, with conserved genomic elements in the PhastCons track. **c**, Genotyped (diamonds) and imputed (grey circles) SNPs are plotted with their combined *p* values. Genotyped SNPs are colored on the basis of their correlation with rs4307059 (red: r2 ≥ 0.5; yellow: 0.2 ≤ r2 < 0.5; white: r2 < 0.2). Estimated recombination rates from HapMap are plotted to reflect local linkage disequilibrium. Adapted from Wang et al., 2009 [21]. Reprinted with permission from Nature Publishing Group.

Although Western blot analyses did not identify significant differences in moesin protein levels between cases and controls, overexpressing *MSNP1AS* in human cell lines was shown to significantly reduce levels of the moesin protein. The authors speculated that relevant al‐ terations in moesin may occur only during specific development landmarks, which may im‐

ferences for *CDH9* or *CDH10* in relation to genotype or case/control status.

As shown in Figure 1, the region straddles two genes, *CDH9* and *CDH10*. Both genes encode type II classical cadherins, transmembrane proteins that promote cell adhesion. Cadherins represent a large family of transmembrane proteins that mediate calcium-dependent cell– cell adhesion, and have been shown to generate synaptic complexity in the developing brain (Redies, Hertel & Hübner, 2012) [24]. The association of cadherins is consistent with the cort‐ ical-disconnectivity model of autism (e.g. Gepner & Féron, 2009) [25], which postulates that ASD may result from an increase or decrease in functional connectivity and neuronal syn‐ chronization in relevant neural pathways. While this hypothesis may yet be confirmed, a re‐ cent study by Kerin *et al*. (2012) [26] suggests a more complex mechanism to explain association between ASD and the 5p14.1 locus.

Basing their analyses on the genomic region surrounding the rs4307059 locus, the authors used a bioinformatics approach (i.e. Genome Browser) to examine relevant expressed se‐ quence tags (ESTs) and RNA (by Tiling Array within the 100-kb linkage disequilibrium at the GWAS peak). Only one functional element—a single noncoding RNA—was located. The 3.9-kb RNA corresponded to moesin pseudogene 1 (*MSNP1*), and has 94% sequence identity to the mature mRNA of the protein-coding gene *MSN*. Located on the X chromosome (Xq11.2), *MSN* spans 74 kb and contains 13 exons. It produces a 4-kb mRNA, and encodes the 577–amino acid moesin protein. The noncoding RNA at 5p14.1 was encoded by the op‐ posite (antisense) strand of *MSNP1*, and was therefore named moesin pseudogene 1, anti‐ sense – *MSNP1AS*.

Follow-up analyses by the group largely confirm that *MSNP1AS* is expressed in the brain, providing important functional validation. Using custom TaqMan Gene Expression assays to target the region, they showed that while MSN was widely expressed in all tissues tested, *MSNP1AS* was expressed variably. Sites of greatest expression were the adult temporal cere‐ bral cortex, adult peripheral blood, and fetal heart, as well as three immortalized cell lines. Moreover, postmortem analyses (qPCR on total RNA) of fresh-frozen, superior temporal gyri of ASD-control pairs (n=10) found a 12.7-fold increase of *MSNP1AS* expression in the temporal cortex of individuals with ASD. Individuals with ASD also showed a 2.4-fold in‐ crease in MSN expression in the same region. Interestingly, there was no evidence of in‐ creased expression for either *CDH9* or *CDH10*.

The group next used genotype—determined from three associated SNPs from the original Wang *et al*. paper—as the independent variable in expression analyses. All three SNPs, rs7704909, rs12518194, and rs4307059, have a high degree of linkage disequilibrium (LD) (*r2* >0.98). Using resequencing to compare relevant genotypes, they identified highly signifi‐ cant differences in *MSNP1AS* expression. Thus, the T/T genotype at rs7704909 corresponded to a 23.3 fold increase in *MSNP1AS* RNA compared to the C/C genotype. For the rs4307059, the T/T *versus* C/C genotype corresponded to a 22.0-fold increase in *MSNP1A* expression. For rs12518194, the A/A *versus* G/G genotype corresponded to a 10.8 fold increase in *MSNP1A* expression. Again, there was no evidence of increased/decreased expression dif‐ ferences for *CDH9* or *CDH10* in relation to genotype or case/control status.

**2.1. The 5p14.1 locus**

278 Recent Advances in Autism Spectrum Disorders - Volume I

sense – *MSNP1AS*.

A 2009 paper from our laboratory (Wang et al., 2009 [21]) was the first to identify common variants for ASD on a genome-wide scale. Our group examined 780 families (3,101 individu‐ als) with affected children, a second, independent group of 1,204 affected individuals, and 6,491 controls. All were of European ancestry. We identified six genetic markers on chromo‐ some 5 in the 5p14.1 region that confirmed susceptibility to ASD. This locus has been repli‐ cated in two additional independent cohort studies (Ma *et al.*, 2009; St Pourcain *et al.*, 2010)

As shown in Figure 1, the region straddles two genes, *CDH9* and *CDH10*. Both genes encode type II classical cadherins, transmembrane proteins that promote cell adhesion. Cadherins represent a large family of transmembrane proteins that mediate calcium-dependent cell– cell adhesion, and have been shown to generate synaptic complexity in the developing brain (Redies, Hertel & Hübner, 2012) [24]. The association of cadherins is consistent with the cort‐ ical-disconnectivity model of autism (e.g. Gepner & Féron, 2009) [25], which postulates that ASD may result from an increase or decrease in functional connectivity and neuronal syn‐ chronization in relevant neural pathways. While this hypothesis may yet be confirmed, a re‐ cent study by Kerin *et al*. (2012) [26] suggests a more complex mechanism to explain

Basing their analyses on the genomic region surrounding the rs4307059 locus, the authors used a bioinformatics approach (i.e. Genome Browser) to examine relevant expressed se‐ quence tags (ESTs) and RNA (by Tiling Array within the 100-kb linkage disequilibrium at the GWAS peak). Only one functional element—a single noncoding RNA—was located. The 3.9-kb RNA corresponded to moesin pseudogene 1 (*MSNP1*), and has 94% sequence identity to the mature mRNA of the protein-coding gene *MSN*. Located on the X chromosome (Xq11.2), *MSN* spans 74 kb and contains 13 exons. It produces a 4-kb mRNA, and encodes the 577–amino acid moesin protein. The noncoding RNA at 5p14.1 was encoded by the op‐ posite (antisense) strand of *MSNP1*, and was therefore named moesin pseudogene 1, anti‐

Follow-up analyses by the group largely confirm that *MSNP1AS* is expressed in the brain, providing important functional validation. Using custom TaqMan Gene Expression assays to target the region, they showed that while MSN was widely expressed in all tissues tested, *MSNP1AS* was expressed variably. Sites of greatest expression were the adult temporal cere‐ bral cortex, adult peripheral blood, and fetal heart, as well as three immortalized cell lines. Moreover, postmortem analyses (qPCR on total RNA) of fresh-frozen, superior temporal gyri of ASD-control pairs (n=10) found a 12.7-fold increase of *MSNP1AS* expression in the temporal cortex of individuals with ASD. Individuals with ASD also showed a 2.4-fold in‐ crease in MSN expression in the same region. Interestingly, there was no evidence of in‐

The group next used genotype—determined from three associated SNPs from the original Wang *et al*. paper—as the independent variable in expression analyses. All three SNPs, rs7704909, rs12518194, and rs4307059, have a high degree of linkage disequilibrium (LD)

[22, 23], lending further support for 5p14 as associated with ASD risk.

association between ASD and the 5p14.1 locus.

creased expression for either *CDH9* or *CDH10*.

**Figure 1.** Genome-wide association results at the 5p14.1 region. **a**, A Manhattan plot shows the log10(*p* values) of SNPs from their combined association analysis. **b**, The 5p14.1 region in the UCSC Genome Browser, with conserved genomic elements in the PhastCons track. **c**, Genotyped (diamonds) and imputed (grey circles) SNPs are plotted with their combined *p* values. Genotyped SNPs are colored on the basis of their correlation with rs4307059 (red: r2 ≥ 0.5; yellow: 0.2 ≤ r2 < 0.5; white: r2 < 0.2). Estimated recombination rates from HapMap are plotted to reflect local linkage disequilibrium. Adapted from Wang et al., 2009 [21]. Reprinted with permission from Nature Publishing Group.

Although Western blot analyses did not identify significant differences in moesin protein levels between cases and controls, overexpressing *MSNP1AS* in human cell lines was shown to significantly reduce levels of the moesin protein. The authors speculated that relevant al‐ terations in moesin may occur only during specific development landmarks, which may im‐ pact neurodevelopment. This would explain why moesin levels are not elevated in the ASD samples *per se*, in spite of the marked differences in *MSNP1AS* expression. Further work is needed to confirm this hypothesis, and quantification of moesin protein levels at key devel‐ opmental stages would certainly contribute in this respect.

*CNTNAP2* binds to *FOXP2*, which is a well-established correlate of language and speech

Autism Spectrum Disorders: Insights from Genomics

http://dx.doi.org/10.5772/54357

281

Another locus indentified by the candidate gene approach is Engrailed 2 (*EN2*), a homeo‐ box gene that is critical to the development of the midbrain and cerebellum. Like other homeobox genes, it regulates morphogenesis. *EN2* is a human homolog of the engrailed gene, which is found in Drosophila. *En2* mouse mutants have anatomic phenotypes in the cerebellum that resemble cerebellar abnormalities reported in autistic individuals (Cheng *et al.*, 2010) [35]. In three separate datasets, Benayed *et al.* (2005, 2009) [36, 37] have reported and replicated a significant association between *EN2* and both broad and narrow ASD phenotypes. Wang *et al.* (2008) [38] also found an association between *EN2* and ASD in a Chinese Han sample, although Zhong *et al.* (2003) [39] failed to find evi‐

The oncogene *MET* is also strongly linked to ASD etiology, having been supported by a number of studies in the past decade (e.g. IMGSAC, 2001; Campbell *et al.*, 2006, 2008; Sousa *et al.*, 2009) [40-43]. Recently, Eagleson *et al.* (2011) [44] reported a role for *Met* signaling in

For the most significant discovery SNP identified in the Wang *et al.* study above (rs4307059), the risk allele frequency was 0.65 in cases with an odds ratio of 1.19. This is comparable with common variant discoveries in other psychiatric disorders including schizophrenia (Gless‐ ner *et al.*, 2009) [45], bipolar disorder (Ferreira, 2008) [46], and attention-deficit/hyperactivity disorder (Arcos-Burgos *et al.* 2010) [47]. While it is important not to undermine the signifi‐ cance of these findings, it should be noted that the predictive value of such ratios is relative‐ ly low (Dickson *et al.* 2010) [48], often explaining less than 5% of the total risk (review at http://www.genome.gov/gwastudies). However, it is also possible that these common SNPs may be tagging a rarer causative variant (i.e. synthetic association), where the effect sizes may be markedly underestimated by the GWAS variant as we recently reported (Dickson *et al.* 2010) [48]. In one example, Wang *et al.* (2010) [49] examined the *NOD2* locus as a cause of Crohn's disease. Using resequencing data, they found that three causal variants explain > 5% of the genetic risk, where GWAS had estimated the risk at ~1%. This finding has two po‐ tentially important implications. First of all, it highlights the need for careful phenotyping of cohorts, which is important to ensure that the phenotypes produced by rare-variants are not being "filtered-out" and missed as a consequence. A long range haplotype analysis of the GWAS data at respective loci is recommended in an attempt to enrich for individuals with rare-causative variants, who can be selected from the cohort and sequenced for confirmation

Second, the results of our Crohn's disease study suggest that in certain circumstances, there may be an explicit relationship between tagged variants and underlying rare variants. Thus, the distinction between loci harboring common *versus* rare variants is not necessarily con‐ crete. Indeed, the same locus may harbor both common and rare variants (Anderson *et al*.,

disorders (Lai *et al.*, 2001) [34], and are commonly observed in ASD.

cortical interneuron development in vitro in a mouse model.

dence of an underlying association.

**2.3. Unexplained variance**

(Wang *et al.*, 2010) [49].

Taken as a whole, these results provide compelling support for 5p14.1 as a risk locus for ASD. Although sample sizes for some analyses were small (10 ASD-control pairs for post‐ mortem studies), this quite rigorous series of experiments draws a clear path from GWAS result through functional validation. As such, these results help allay criticism of the GWAS approach as a means of candidate discovery. Thus, a 2010 review by McClellan and King (2010) [27] singled out the 5p14.1 locus as an example of the "perils of cryptic population stratification". These comments seemed somewhat misguided in the light of rigorous meth‐ odologies developed by the GWAS community for controlling stratification (e.g. EigenStrat) [28], replication [22, 23], and now functional validation by the Kerin *et al*. group [26].

Similarly, replication/validation of the 5p14.1 locus provides an important demonstration of the legitimacy of associations in intergenic regions. Again, McClellan and King had disput‐ ed the utility of such results, questioning how "genome-wide association studies come to be populated by risk variants with no known function?" It is important to emphasize that the GWAS approach typically does not tag the disease variant, but rather its approximate loca‐ tion—through linkage disequilibrium, this is typically 100kb or less. Moreover, as in the Kerin *et al.* paper, the significant SNP may be tagging an intergenic regulatory element, which has functional consequences far beyond the associated region, in this case the MSN locus on the X-chromosome.

Finally, these expression analyses provide a reminder about the capabilities of different ge‐ nomic technologies. In the past twelve months, a number of high-profile next-generation se‐ quencing (NGS) studies have been able to examine genomic correlates of ASD with unprecedented resolution. These types of studies—reviewed in greater depth below—have been interpreted as the future of ASD genetics and, to a large extent, this may be true. How‐ ever, we note that DNA sequencing in the 5p14.1 region would not have identified the noncoding RNA at this locus. Thus, although NGS platforms used for RNA-sequencing are becoming increasingly sophisticated (Ozsolak & Milos, 2011) [29], microarray studies retain a place in guiding genomic discovery.

#### **2.2. Other replicated common variants from candidate gene studies**

A number of other common variants from candidate gene studies have been proposed as ASD risk factors. These include Contactin Associated Protein 2 (*CNTNAP2*), located on chro‐ mosome 7q35, which has been identified as a candidate for the age at first word endopheno‐ type (Alarcón *et al.,* 2002) [30]. A follow-up by the same group (Alarcón *et al.*, 2008) [31] using linkage, association, and gene-expression analyses, found *CNTNAP2* to be the only autism-susceptibility gene to reach significance across all approaches. An independent link‐ age analysis by Arking *et al.* (2008) [32] also highlighted *CNTNAP2* as a significant ASD can‐ didate gene. *CNTNAP2* is part of the neurexin family, which have repeatedly been associated with autism (see below). Interestingly, Vernes *et al.* (2008) [33] showed that *CNTNAP2* binds to *FOXP2*, which is a well-established correlate of language and speech disorders (Lai *et al.*, 2001) [34], and are commonly observed in ASD.

Another locus indentified by the candidate gene approach is Engrailed 2 (*EN2*), a homeo‐ box gene that is critical to the development of the midbrain and cerebellum. Like other homeobox genes, it regulates morphogenesis. *EN2* is a human homolog of the engrailed gene, which is found in Drosophila. *En2* mouse mutants have anatomic phenotypes in the cerebellum that resemble cerebellar abnormalities reported in autistic individuals (Cheng *et al.*, 2010) [35]. In three separate datasets, Benayed *et al.* (2005, 2009) [36, 37] have reported and replicated a significant association between *EN2* and both broad and narrow ASD phenotypes. Wang *et al.* (2008) [38] also found an association between *EN2* and ASD in a Chinese Han sample, although Zhong *et al.* (2003) [39] failed to find evi‐ dence of an underlying association.

The oncogene *MET* is also strongly linked to ASD etiology, having been supported by a number of studies in the past decade (e.g. IMGSAC, 2001; Campbell *et al.*, 2006, 2008; Sousa *et al.*, 2009) [40-43]. Recently, Eagleson *et al.* (2011) [44] reported a role for *Met* signaling in cortical interneuron development in vitro in a mouse model.

#### **2.3. Unexplained variance**

pact neurodevelopment. This would explain why moesin levels are not elevated in the ASD samples *per se*, in spite of the marked differences in *MSNP1AS* expression. Further work is needed to confirm this hypothesis, and quantification of moesin protein levels at key devel‐

Taken as a whole, these results provide compelling support for 5p14.1 as a risk locus for ASD. Although sample sizes for some analyses were small (10 ASD-control pairs for post‐ mortem studies), this quite rigorous series of experiments draws a clear path from GWAS result through functional validation. As such, these results help allay criticism of the GWAS approach as a means of candidate discovery. Thus, a 2010 review by McClellan and King (2010) [27] singled out the 5p14.1 locus as an example of the "perils of cryptic population stratification". These comments seemed somewhat misguided in the light of rigorous meth‐ odologies developed by the GWAS community for controlling stratification (e.g. EigenStrat)

[28], replication [22, 23], and now functional validation by the Kerin *et al*. group [26].

Similarly, replication/validation of the 5p14.1 locus provides an important demonstration of the legitimacy of associations in intergenic regions. Again, McClellan and King had disput‐ ed the utility of such results, questioning how "genome-wide association studies come to be populated by risk variants with no known function?" It is important to emphasize that the GWAS approach typically does not tag the disease variant, but rather its approximate loca‐ tion—through linkage disequilibrium, this is typically 100kb or less. Moreover, as in the Kerin *et al.* paper, the significant SNP may be tagging an intergenic regulatory element, which has functional consequences far beyond the associated region, in this case the MSN

Finally, these expression analyses provide a reminder about the capabilities of different ge‐ nomic technologies. In the past twelve months, a number of high-profile next-generation se‐ quencing (NGS) studies have been able to examine genomic correlates of ASD with unprecedented resolution. These types of studies—reviewed in greater depth below—have been interpreted as the future of ASD genetics and, to a large extent, this may be true. How‐ ever, we note that DNA sequencing in the 5p14.1 region would not have identified the noncoding RNA at this locus. Thus, although NGS platforms used for RNA-sequencing are becoming increasingly sophisticated (Ozsolak & Milos, 2011) [29], microarray studies retain

A number of other common variants from candidate gene studies have been proposed as ASD risk factors. These include Contactin Associated Protein 2 (*CNTNAP2*), located on chro‐ mosome 7q35, which has been identified as a candidate for the age at first word endopheno‐ type (Alarcón *et al.,* 2002) [30]. A follow-up by the same group (Alarcón *et al.*, 2008) [31] using linkage, association, and gene-expression analyses, found *CNTNAP2* to be the only autism-susceptibility gene to reach significance across all approaches. An independent link‐ age analysis by Arking *et al.* (2008) [32] also highlighted *CNTNAP2* as a significant ASD can‐ didate gene. *CNTNAP2* is part of the neurexin family, which have repeatedly been associated with autism (see below). Interestingly, Vernes *et al.* (2008) [33] showed that

opmental stages would certainly contribute in this respect.

280 Recent Advances in Autism Spectrum Disorders - Volume I

locus on the X-chromosome.

a place in guiding genomic discovery.

**2.2. Other replicated common variants from candidate gene studies**

For the most significant discovery SNP identified in the Wang *et al.* study above (rs4307059), the risk allele frequency was 0.65 in cases with an odds ratio of 1.19. This is comparable with common variant discoveries in other psychiatric disorders including schizophrenia (Gless‐ ner *et al.*, 2009) [45], bipolar disorder (Ferreira, 2008) [46], and attention-deficit/hyperactivity disorder (Arcos-Burgos *et al.* 2010) [47]. While it is important not to undermine the signifi‐ cance of these findings, it should be noted that the predictive value of such ratios is relative‐ ly low (Dickson *et al.* 2010) [48], often explaining less than 5% of the total risk (review at http://www.genome.gov/gwastudies). However, it is also possible that these common SNPs may be tagging a rarer causative variant (i.e. synthetic association), where the effect sizes may be markedly underestimated by the GWAS variant as we recently reported (Dickson *et al.* 2010) [48]. In one example, Wang *et al.* (2010) [49] examined the *NOD2* locus as a cause of Crohn's disease. Using resequencing data, they found that three causal variants explain > 5% of the genetic risk, where GWAS had estimated the risk at ~1%. This finding has two po‐ tentially important implications. First of all, it highlights the need for careful phenotyping of cohorts, which is important to ensure that the phenotypes produced by rare-variants are not being "filtered-out" and missed as a consequence. A long range haplotype analysis of the GWAS data at respective loci is recommended in an attempt to enrich for individuals with rare-causative variants, who can be selected from the cohort and sequenced for confirmation (Wang *et al.*, 2010) [49].

Second, the results of our Crohn's disease study suggest that in certain circumstances, there may be an explicit relationship between tagged variants and underlying rare variants. Thus, the distinction between loci harboring common *versus* rare variants is not necessarily con‐ crete. Indeed, the same locus may harbor both common and rare variants (Anderson *et al*., 2011). In recent years, we have seen an increased emphasis on the former, which is reflected in an upsurge in the number of copy number variation (CNV) and NGS studies.

system operates at pre- and post-synapses, whose functions includes regulating neurotrans‐ mitter release, recycling synaptic vesicles in pre-synaptic terminals, and modulating changes in dendritic spines and post-synaptic density (Yi & Ehlers, 2005) [60]. As well as implicating an ASD-ubiquitination network, we also identified a second pathway involving *NRXN1*, *CNTN4*, *NLGN1*, and *ASTN2*. Genes in this group mediate neuronal cell-adhesion, and con‐ tribute to neurodevelopment by facilitating axon guidance, synapse formation and plastici‐ ty, and neuron–glial interactions. We also note that ubiquitins are involved in recycling celladhesion molecules, which is a possible mechanism by which these two networks are cross

Autism Spectrum Disorders: Insights from Genomics

http://dx.doi.org/10.5772/54357

283

In a similar approach, Pinto *et al.* (2010) [50] further confirmed the importance of rare CNVs as causal factors for ASD. The group did not observe a significant difference between cases and controls in terms of raw number of CNVs or estimated CNV size. However, the number of CNVs in genic regions was significantly greater in ASD cases. Again, loci enriched for CNVs include a number of genes known to be important for neurodevelopment and synap‐ tic plasticity, such as *SHANK2*, *SYNGAP1*, and *DLGAP2*. Between 5.5% and 5.7% of ASD cases have at least one *de novo* CNV, further confirming the significance of *de novo* genetic events as risk factors for autism. Similar to the Glessner study, the Pinto group mapped CNVs to a series of networks involved in the development and regulation of the central nervous system functions. Implicated networks include neuronal cell adhesion, GTPase reg‐ ulation (important for signal transduction and biosynthesis), and GTPase/Ras signaling, also

Finally, Gai *et al.* (2011) [61] took a slightly different approach, focusing exclusively on inher‐ ited CNVs. While underlying loci were not necessarily common to those identified by the Glessner and Pinto groups, enrichment in pathways involving central nervous system de‐ velopment, synaptic functions and neuronal signaling processes was again confirmed. The Gai *et al.* study also emphasized the role of glutamate-mediated neuronal signals in ASD.

Collectively, these CNV studies suggest that certain hotspots on the genome are particularly vulnerable to ASD, which include loci on chromosomes 1q21, 3p26, 15q11-q13, 16p11, and 22q11. These hotspots are part of large gene networks that are important to neural signaling and neurodevelopment and have additionally been associated with other neuropsychiatric disorders. In particular, a number of CNV studies in schizophrenia have highlighted struc‐ tural mutations incorporating chromosomes 1q21, 15q13, and 22q11 (e.g. Glessner *et al.*, 2010) [62], which are significantly enriched in cases versus controls, with *NRXN1* being a standout in this regard. From a phenotype perspective, autism and schizophrenia seem very different, both in behavioral manifestation and age of onset, and it may seem counter-intui‐ tive that associated loci should overlap. Some authors have addressed this peculiarity by proposing that schizophrenia and autism may in fact be different poles of the same spec‐ trum. Thus, Crespi and Braddock (2008) [63] suggest that social cognition is underdeveloped in ASD and over-developed in the psychotic spectrum, with a similar polarization of lan‐ guage and behavioral phenotypes. Although speculative, this hypothesis has gained some traction. In the next several years, genomic, imaging, and model-systems approaches will

linked.

involved in ubiquitination.
