**3. Rare variants – CNVs and next-generation sequencing**

#### **3.1. Copy number variation in ASD**

CNVs are insertions, deletions, or translocations in the human genome that are universal in the general population (e.g. Pinto *et al.*, 2010) [50]. CNVs can be detected by the same SNP arrays used in GWAS, and vary in length from many megabases to 1 kilobase or smaller. They are often not associated with any observable phenotype.

One of the most widely-known CNVs is Down syndrome, which is characterized by an ex‐ tra chromosome 21. Rett syndrome is also caused by a CNV, which includes a deletion in *MECP2*. CNVs can be inherited or occur *de novo*, the cause of which is thus far unknown. Common disease-causing CNVs are infrequent but rare CNVs, with a frequency of less than 1%, have been identified for a range of disorders including ADHD (e.g. Williams *et al.*, 2010) [51], schizophrenia (e.g. Glessner *et al*., 2010; Levinson *et al.*, 2011) [52, 53], bipolar disorder (e.g. Chen *et al.*, 2010) [54] and many others.

Sebat *et al.* (2007) [55] provided some early insights into the genomic features of CNVs in ASD. Firstly, they noted that *de novo* CNVs were individually rare – from 118 ASD cases, none of the identified variants were observed more than twice, with the majority seen just once. This confirmed the widely-held assumption that many different loci can contribute to the same ASD phenotype. The sheer volume of loci identified by this approach (multiple lo‐ ci on 20 chromosomes) affirms the extraordinarily complexity of ASD.

A number of subsequent studies have greatly expanded the number of candidate loci using the CNV approach. Our laboratory (Bucan *et al.*, 2009) [56] reported 150+ CNVs in 912 ASD families that were not found in 1,488 controls. Critically, 27 of these loci were replicated in an independent cohort of 859 ASD cases and 1,051 controls. Some of the rare variants we identified had previously been associated with autism, including *NRXN1* and *UBE3A*, (Guil‐ matre *et al.*, 2009) [57]. Samaco *et al.* (2005) [58] previously identified significant deficits in *ube3a* expression in *mecp2*-deficient mice, suggesting a shared pathological pathway with Rett syndrome (as well as Angelman syndrome, and autism). Similarly, Kim *et al.* (2008) [59] associated *NRXN1* with a balanced chromosomal abnormality at chromosome 2p16.3 in two unrelated ASD individuals. Rare variants in the coding region included two missense changes.

Glessner *et al.* (2009) [52] identified and reported CNVs in two major gene networks, includ‐ ing neuronal cell adhesion molecules (such as *NRXN1*) and the ubiquitin gene family (such as *UBE3A*). Interestingly, four of the most prominent genes enriched by CNVs in ASD cases (*UBE3A*, *PARK2*, *RFWD2* and *FBXO40*) – all of which were uncovered independently - are part of the ubiquitin gene family. Ubiquitination can alter protein function after translation, and degrade target proteins in conjunction with proteasomes. The ubiquitin–proteasome system operates at pre- and post-synapses, whose functions includes regulating neurotrans‐ mitter release, recycling synaptic vesicles in pre-synaptic terminals, and modulating changes in dendritic spines and post-synaptic density (Yi & Ehlers, 2005) [60]. As well as implicating an ASD-ubiquitination network, we also identified a second pathway involving *NRXN1*, *CNTN4*, *NLGN1*, and *ASTN2*. Genes in this group mediate neuronal cell-adhesion, and con‐ tribute to neurodevelopment by facilitating axon guidance, synapse formation and plastici‐ ty, and neuron–glial interactions. We also note that ubiquitins are involved in recycling celladhesion molecules, which is a possible mechanism by which these two networks are cross linked.

2011). In recent years, we have seen an increased emphasis on the former, which is reflected

CNVs are insertions, deletions, or translocations in the human genome that are universal in the general population (e.g. Pinto *et al.*, 2010) [50]. CNVs can be detected by the same SNP arrays used in GWAS, and vary in length from many megabases to 1 kilobase or smaller.

One of the most widely-known CNVs is Down syndrome, which is characterized by an ex‐ tra chromosome 21. Rett syndrome is also caused by a CNV, which includes a deletion in *MECP2*. CNVs can be inherited or occur *de novo*, the cause of which is thus far unknown. Common disease-causing CNVs are infrequent but rare CNVs, with a frequency of less than 1%, have been identified for a range of disorders including ADHD (e.g. Williams *et al.*, 2010) [51], schizophrenia (e.g. Glessner *et al*., 2010; Levinson *et al.*, 2011) [52, 53], bipolar disorder

Sebat *et al.* (2007) [55] provided some early insights into the genomic features of CNVs in ASD. Firstly, they noted that *de novo* CNVs were individually rare – from 118 ASD cases, none of the identified variants were observed more than twice, with the majority seen just once. This confirmed the widely-held assumption that many different loci can contribute to the same ASD phenotype. The sheer volume of loci identified by this approach (multiple lo‐

A number of subsequent studies have greatly expanded the number of candidate loci using the CNV approach. Our laboratory (Bucan *et al.*, 2009) [56] reported 150+ CNVs in 912 ASD families that were not found in 1,488 controls. Critically, 27 of these loci were replicated in an independent cohort of 859 ASD cases and 1,051 controls. Some of the rare variants we identified had previously been associated with autism, including *NRXN1* and *UBE3A*, (Guil‐ matre *et al.*, 2009) [57]. Samaco *et al.* (2005) [58] previously identified significant deficits in *ube3a* expression in *mecp2*-deficient mice, suggesting a shared pathological pathway with Rett syndrome (as well as Angelman syndrome, and autism). Similarly, Kim *et al.* (2008) [59] associated *NRXN1* with a balanced chromosomal abnormality at chromosome 2p16.3 in two unrelated ASD individuals. Rare variants in the coding region included two missense

Glessner *et al.* (2009) [52] identified and reported CNVs in two major gene networks, includ‐ ing neuronal cell adhesion molecules (such as *NRXN1*) and the ubiquitin gene family (such as *UBE3A*). Interestingly, four of the most prominent genes enriched by CNVs in ASD cases (*UBE3A*, *PARK2*, *RFWD2* and *FBXO40*) – all of which were uncovered independently - are part of the ubiquitin gene family. Ubiquitination can alter protein function after translation, and degrade target proteins in conjunction with proteasomes. The ubiquitin–proteasome

in an upsurge in the number of copy number variation (CNV) and NGS studies.

**3. Rare variants – CNVs and next-generation sequencing**

They are often not associated with any observable phenotype.

ci on 20 chromosomes) affirms the extraordinarily complexity of ASD.

**3.1. Copy number variation in ASD**

282 Recent Advances in Autism Spectrum Disorders - Volume I

(e.g. Chen *et al.*, 2010) [54] and many others.

changes.

In a similar approach, Pinto *et al.* (2010) [50] further confirmed the importance of rare CNVs as causal factors for ASD. The group did not observe a significant difference between cases and controls in terms of raw number of CNVs or estimated CNV size. However, the number of CNVs in genic regions was significantly greater in ASD cases. Again, loci enriched for CNVs include a number of genes known to be important for neurodevelopment and synap‐ tic plasticity, such as *SHANK2*, *SYNGAP1*, and *DLGAP2*. Between 5.5% and 5.7% of ASD cases have at least one *de novo* CNV, further confirming the significance of *de novo* genetic events as risk factors for autism. Similar to the Glessner study, the Pinto group mapped CNVs to a series of networks involved in the development and regulation of the central nervous system functions. Implicated networks include neuronal cell adhesion, GTPase reg‐ ulation (important for signal transduction and biosynthesis), and GTPase/Ras signaling, also involved in ubiquitination.

Finally, Gai *et al.* (2011) [61] took a slightly different approach, focusing exclusively on inher‐ ited CNVs. While underlying loci were not necessarily common to those identified by the Glessner and Pinto groups, enrichment in pathways involving central nervous system de‐ velopment, synaptic functions and neuronal signaling processes was again confirmed. The Gai *et al.* study also emphasized the role of glutamate-mediated neuronal signals in ASD.

Collectively, these CNV studies suggest that certain hotspots on the genome are particularly vulnerable to ASD, which include loci on chromosomes 1q21, 3p26, 15q11-q13, 16p11, and 22q11. These hotspots are part of large gene networks that are important to neural signaling and neurodevelopment and have additionally been associated with other neuropsychiatric disorders. In particular, a number of CNV studies in schizophrenia have highlighted struc‐ tural mutations incorporating chromosomes 1q21, 15q13, and 22q11 (e.g. Glessner *et al.*, 2010) [62], which are significantly enriched in cases versus controls, with *NRXN1* being a standout in this regard. From a phenotype perspective, autism and schizophrenia seem very different, both in behavioral manifestation and age of onset, and it may seem counter-intui‐ tive that associated loci should overlap. Some authors have addressed this peculiarity by proposing that schizophrenia and autism may in fact be different poles of the same spec‐ trum. Thus, Crespi and Braddock (2008) [63] suggest that social cognition is underdeveloped in ASD and over-developed in the psychotic spectrum, with a similar polarization of lan‐ guage and behavioral phenotypes. Although speculative, this hypothesis has gained some traction. In the next several years, genomic, imaging, and model-systems approaches will likely shed further light on the relationship between autism, schizophrenia and other neuro‐ psychiatric disorders.

Man (OMIM) database listed over 7,000 known or suspected Mendelian diseases (MD), with ~3,500 (~50%) of these having an identified molecular basis (http://omim.org/statistics/ entry). Since OMIM derives its data from published data, these figures likely under-repre‐ sent rare disorders, which may go unreported. As such, there may be several times more Mendelian disorders that have no defined genetic etiology to date. Given the large-represen‐ tation of autism phenotypes in known syndromes, we can assume a similar trend in unre‐

Autism Spectrum Disorders: Insights from Genomics

http://dx.doi.org/10.5772/54357

285

The proportion of ASD accounted for by rare variants remains to be determined. Irrespec‐ tive, as with many other aspects of scientific inquiry, the study of these events will continue to play an important role in explicating the pathogenesis of ASD. El-Fishawy and State (2010) [72] point to hypercholesterolemia and hypertension (Brown, 1974; Lifton *et al.*, 2001) [73,74] as examples where rare mutations have been successful in driving a molecular un‐ derstanding of the disease as opposed to identifying risk factors in the general population. Rare mutations, particularly when they are Mendelian, carry large effects and are typically located in genic regions. These characteristics make the resolution of underlying networks

Recent groundbreaking studies by Marchetto *et al.* (2010) [75] and Muotri *et al.* (2010) [76], who created a cell culture model of Rett syndrome, are potentially exciting developments in this regard. Here, the researchers used skin biopsies from four Rett syndrome patients, each carrying a different *MECP2* mutation, to culture induced pluripotent stem cells (iPS). Once the iPS cells developed into neurons, they showed a decreased number of neurons and den‐ dritic spines, consistent with neurodevelopmental disruptions. Intervention with insulinlike growth factor 1 (*IGF1*), which is known to regulate neurodevelopment, was subsequently shown to reverse Rett-like symptoms in a mouse model of the disease. This innovative approach is an exciting model of how rare gene approaches can stimulate our

> **Syndrome Number of Studies Median Rate Range %** Tuberous sclerosis 11 1.1 0–3.8 Fragile X 9 0.0 0–8.1 Down syndrome 12 0.7 0–16.7 Neurofibromatosis 1 6 0 0–1.4

distinctly less complex and, moreover, are amenable to modeling in other systems.

understanding of the pathophysiology and potential reversibility of ASD.

**3.3. Large-scale next-generation sequencing**

**Table 1.** Associated disorders and their rate in autism (from Volkmar *et al.*, 2005 in Zafeiriou *et al.* 2007) [70,71]

In April 2012, *Nature* simultaneously published three papers that used exome sequencing to probe genomic correlates of ASD. This represented something of a landmark for both ASD and NGS research, as it demonstrated the viability of NGS on a large scale – the three stud‐ ies combined examined 600 trios (parents and offspring), plus a 935 further ASD cases. Col‐

ported ASD syndromes.

#### **3.2. Sequencing familial forms of ASD**

To this point, we have focused primarily on the complex interactions of polygenic networks as the major cause of ASD. However, this is not exclusively the case. Paralleling the recent spate of CNV studies is a renewed focus on rare disorders. These include familial forms of complex diseases that are potentially monogenic or with less complex inheritance pattern. At the outset of this chapter, we emphasized the overlap with fragile X syndrome, where one third of cases are co-morbid for ASD. As mentioned, fragile X is caused by a failure to express the protein coded by *FMR1*. However, mutations in *FMR1* do not always result in fragile X and can result in a phenotype more representative of ASD. Thus, Muhle *et al.* (2004) [64] found that 7-8% of idiopathic ASD cases may have mutations at the *FMR1* locus. Like‐ wise, although mutations in *MECP2* are the common cause of Rett syndrome, certain muta‐ tions at the same locus have been associated with idiopathic autism (Carney *et al.* (2003).

X-linked genes encoding neurologins *NLGN3*, *NLGN4* and *SHANK3* (a neuroligin binding partner) are other prominent examples of distinct rare genetic causes. A parallel can be drawn between these studies and studies of mental retardation and epilepsy, which include many rare syndromes that collectively account for a substantial proportion of the two disor‐ ders (Morrow *et al.*, 2008). Indeed it is perhaps more than coincidence that autism is heavily co-morbid with these two conditions, with ~40% of ASD cases meeting diagnostic criteria for mental retardation and epilepsy respectively (Bölte *et al.*, 2009; Danielsson *et al.*, 2005) [7,65]. It is also noteworthy that many of these monogenic-related genes are also major players in neurodevelopment and synapse activity. Other prominent examples include *TSC1*, *TSC2* (Osborne *et al.*, 1991; Franz, 1998) [66, 67], *NF1*, and *UBE3A* (see Morrow *et al.*, 2008) [68].

The identification of monogenic or possibly oligogenic autisms is likely to accelerate in the next several years as NGS becomes more widely available. In our group, we recently encountered a family of two parents, six healthy siblings, and two siblings with severe autism suggestive of autosomal recessive inheritance. Unsuccessful attempts using link‐ age and CNV approaches failed to identify a causal locus, but whole-exome sequencing at 20x coverage identified four genes, including one with a non-synonymous SNP in the protocadherin alpha 4 isoform1 precursor (*PCDHA4*) gene, which presents a strong can‐ didate gene, currently under validation. Protocadherins are part of the cadherin family that facilitates neuronal cell adhesion and this discovery is consistent with the functional properties of the PCDH family.

Known syndromes with ASD features include fragile X, neurofibromatosis type 1, down syndrome, tuberous sclerosis, neurofibromatosis (which confers a 100-fold increased risk for ASD Li *et al.* (2005) [69], Angelman, Prader-Willi and related 15q syndromes, and at least several dozen others (see Zafeiriou *et al.*, 2007, for a comprehensive review) [70]. Table 1 from Volkmar *et al.* (2005) [71] lists the most commonly associated syndromes with median rate and range. It is likely that many more unidentified rare syndromes with Mendelian causes have ASD phenotypes. As of September 2012, the Online Mendelian Inheritance in Man (OMIM) database listed over 7,000 known or suspected Mendelian diseases (MD), with ~3,500 (~50%) of these having an identified molecular basis (http://omim.org/statistics/ entry). Since OMIM derives its data from published data, these figures likely under-repre‐ sent rare disorders, which may go unreported. As such, there may be several times more Mendelian disorders that have no defined genetic etiology to date. Given the large-represen‐ tation of autism phenotypes in known syndromes, we can assume a similar trend in unre‐ ported ASD syndromes.

The proportion of ASD accounted for by rare variants remains to be determined. Irrespec‐ tive, as with many other aspects of scientific inquiry, the study of these events will continue to play an important role in explicating the pathogenesis of ASD. El-Fishawy and State (2010) [72] point to hypercholesterolemia and hypertension (Brown, 1974; Lifton *et al.*, 2001) [73,74] as examples where rare mutations have been successful in driving a molecular un‐ derstanding of the disease as opposed to identifying risk factors in the general population. Rare mutations, particularly when they are Mendelian, carry large effects and are typically located in genic regions. These characteristics make the resolution of underlying networks distinctly less complex and, moreover, are amenable to modeling in other systems.

Recent groundbreaking studies by Marchetto *et al.* (2010) [75] and Muotri *et al.* (2010) [76], who created a cell culture model of Rett syndrome, are potentially exciting developments in this regard. Here, the researchers used skin biopsies from four Rett syndrome patients, each carrying a different *MECP2* mutation, to culture induced pluripotent stem cells (iPS). Once the iPS cells developed into neurons, they showed a decreased number of neurons and den‐ dritic spines, consistent with neurodevelopmental disruptions. Intervention with insulinlike growth factor 1 (*IGF1*), which is known to regulate neurodevelopment, was subsequently shown to reverse Rett-like symptoms in a mouse model of the disease. This innovative approach is an exciting model of how rare gene approaches can stimulate our understanding of the pathophysiology and potential reversibility of ASD.


**Table 1.** Associated disorders and their rate in autism (from Volkmar *et al.*, 2005 in Zafeiriou *et al.* 2007) [70,71]

#### **3.3. Large-scale next-generation sequencing**

likely shed further light on the relationship between autism, schizophrenia and other neuro‐

To this point, we have focused primarily on the complex interactions of polygenic networks as the major cause of ASD. However, this is not exclusively the case. Paralleling the recent spate of CNV studies is a renewed focus on rare disorders. These include familial forms of complex diseases that are potentially monogenic or with less complex inheritance pattern. At the outset of this chapter, we emphasized the overlap with fragile X syndrome, where one third of cases are co-morbid for ASD. As mentioned, fragile X is caused by a failure to express the protein coded by *FMR1*. However, mutations in *FMR1* do not always result in fragile X and can result in a phenotype more representative of ASD. Thus, Muhle *et al.* (2004) [64] found that 7-8% of idiopathic ASD cases may have mutations at the *FMR1* locus. Like‐ wise, although mutations in *MECP2* are the common cause of Rett syndrome, certain muta‐ tions at the same locus have been associated with idiopathic autism (Carney *et al.* (2003).

X-linked genes encoding neurologins *NLGN3*, *NLGN4* and *SHANK3* (a neuroligin binding partner) are other prominent examples of distinct rare genetic causes. A parallel can be drawn between these studies and studies of mental retardation and epilepsy, which include many rare syndromes that collectively account for a substantial proportion of the two disor‐ ders (Morrow *et al.*, 2008). Indeed it is perhaps more than coincidence that autism is heavily co-morbid with these two conditions, with ~40% of ASD cases meeting diagnostic criteria for mental retardation and epilepsy respectively (Bölte *et al.*, 2009; Danielsson *et al.*, 2005) [7,65]. It is also noteworthy that many of these monogenic-related genes are also major players in neurodevelopment and synapse activity. Other prominent examples include *TSC1*, *TSC2* (Osborne *et al.*, 1991; Franz, 1998) [66, 67], *NF1*, and *UBE3A* (see Morrow *et al.*, 2008) [68].

The identification of monogenic or possibly oligogenic autisms is likely to accelerate in the next several years as NGS becomes more widely available. In our group, we recently encountered a family of two parents, six healthy siblings, and two siblings with severe autism suggestive of autosomal recessive inheritance. Unsuccessful attempts using link‐ age and CNV approaches failed to identify a causal locus, but whole-exome sequencing at 20x coverage identified four genes, including one with a non-synonymous SNP in the protocadherin alpha 4 isoform1 precursor (*PCDHA4*) gene, which presents a strong can‐ didate gene, currently under validation. Protocadherins are part of the cadherin family that facilitates neuronal cell adhesion and this discovery is consistent with the functional

Known syndromes with ASD features include fragile X, neurofibromatosis type 1, down syndrome, tuberous sclerosis, neurofibromatosis (which confers a 100-fold increased risk for ASD Li *et al.* (2005) [69], Angelman, Prader-Willi and related 15q syndromes, and at least several dozen others (see Zafeiriou *et al.*, 2007, for a comprehensive review) [70]. Table 1 from Volkmar *et al.* (2005) [71] lists the most commonly associated syndromes with median rate and range. It is likely that many more unidentified rare syndromes with Mendelian causes have ASD phenotypes. As of September 2012, the Online Mendelian Inheritance in

psychiatric disorders.

**3.2. Sequencing familial forms of ASD**

284 Recent Advances in Autism Spectrum Disorders - Volume I

properties of the PCDH family.

In April 2012, *Nature* simultaneously published three papers that used exome sequencing to probe genomic correlates of ASD. This represented something of a landmark for both ASD and NGS research, as it demonstrated the viability of NGS on a large scale – the three stud‐ ies combined examined 600 trios (parents and offspring), plus a 935 further ASD cases. Col‐ lectively, these papers suggest that several hundred or more genes may be considered autism candidates, and again highlight the staggering complexity of the phenotype.

(P=.01) higher proportion among the probands (125 total) *versus* their unaffected sibling (87 total). From simulations, the authors concluded that two or more *de novo* nonsense/splicesite mutations should be considered significant. The gene sodium channel, voltage-gated, type II, α subunit gene (*SCN2A*) was the only such gene – with two ASD individuals found to harbor relevant nonsense mutations. Mutations in *SCN2A* have been associated with epi‐ lepsy (Kamiya *et al*., 2004; Ogiwara *et al*., 2009) [82, 83] and idiopathic ASD in multiplex fam‐

Autism Spectrum Disorders: Insights from Genomics

http://dx.doi.org/10.5772/54357

287

Combining the exomes from their study with those from O'Roak *et al*. (n for probands = 414), the groups identified two additional genes that each contained two loss-of-function mutations: the katanin p60 subunit A-like 2 (*KATNAL2*) and chromodomain helicase DNA binding protein 8 (*CHD8*). O'Roak *et al*. also evaluated these three novel candidates using exome sequencing on 935 cases and 870 controls. Three additional loss-of-function muta‐ tions each were observed in *KATNAL2* and *CHD8* in individuals with ASD, while none were

It is important to note, however, that for *de novo* events in general, there was no evidence to support the hypothesis that multiple events in any individual conferred an increased risk of

In a fourth independent exome sequencing study involving 343 families from the Simons Simplex Collection Iossifov *et al*. (2012) [85] also reported a relatively equal distribution of *de novo* mutations in cases and controls. Again however, loss-of-function mutations—nonsense, splice site, and frame shifts—were more common in individuals with ASD (59 *versus* 28). Of the 59 "likely gene disruptions (LGD)" in ASD cases, none occurred more than once, al‐ though two—*NRXN1* and *PHF2*—had been identified in a previous CNV study by the same group (Gilman *et al*, 2011) [86]. Intriguingly, the 59-strong LGD shared considerable overlap with a set of 842 proteins that interact with the fragile X protein, FMRP. In total, 14 of the 59 appeared on the FMRP list (*P*=.006). Furthermore, 13 of 72 CNV candidates from the group's previous CNV paper were also on the list (*P*=.0004), meaning 26 of the combined 129 total

The authors subsequently screened for *de novo* mutations in upstream targets of *FMR1*. One was identified – a deletion in *GRM5* that removes a single amino acid and causes an addi‐ tional substitution at the same site. *GRM5* encodes the glutamate receptor mGluR5 (Bear *et al*, 2004) [87] and, as noted below, mGluR5 antagonists are currently in clinical trial (Jacque‐ mont *et al*., 2011) [88] having indicated success in mouse models (Dölen *et al*., 2007) [89]. Fur‐ ther elucidating the relationship between *FMR1*/FRMRP and these ASD candidates is clearly an important next step in maximizing the impact of these findings. These are discussed fur‐

Collectively, all four of these exome sequencing studies converge upon the conclusion that ASD is highly heterogeneous, with several hundred or more loci potential risk variants. Sim‐ ulations by the Neale *et al*. group confirm the statistical implausibility that hundreds of var‐ iants with high penetrance are possible, and a model where *de novo* variants in up to 20% of cases, confer ~10- to 20-fold increased risk is supported. The studies also converge on the

ilies (Weiss *et al*., 2003) [84]. Neither of the probands has a history of seizures.

ASD. As such, the 'two *de novo* hit' hypothesis is not supported.

identified in controls.

were FMRP-related (*P*<1x10-13).

ther in the section below.

O'Roak *et al*. (2012) [77] sequenced 677 individual exomes from 209 families – primarily from the Simons Simplex Collection [78]. In 189 new probands, they validated 120 severely disruptive *de novo* mutations, 39% of which occur in a highly interconnected b-catenin/chro‐ matin remodeling protein network. The group observed a strong paternal bias (41:10) in the rate of *de novo* mutations, which supports the hypothesis that the germline mutation rate in coding regions is markedly more prominent among males. These *de novo* events were more common in older fathers, marking paternal age as a significant risk factor for ASD.

Among the identified *de novo* loci, 62 were identified as top candidate mutations based on severity and/or supporting evidence from the literature. Interestingly, probands with these mutations were broadly distributed in terms of IQ score, with only a modest (non-signifi‐ cant) association with intellectual impairment. Recurrent protein-disruptive mutations were identified in two genes: netrin G1 (*NTNG1*) and chromodomain helicase DNA binding pro‐ tein 8 (*CHD8*). *NTNG1* is known to play a role in axon guidance and dendritic organization (Nishimura-Akiyoshi *et al*., 2007) [79]. *CHD8* regulates β-catenin and p53 signaling, and has not previously been associated with ASD. This gene was emphasized as particularly note‐ worthy, after follow-up protein-protein interaction (PPI) analyses, showed that β-catenin and p53 signaling may be features of an ASD-relevant network. In total 49 of proteins in the PPI network were highly interconnected, with a number of underlying genes also previous‐ ly associated with neurodevelopment.

Neale *et al*. (2012) [80] exome-sequenced 175 trios and also focused on *de novo* mutations. As per the O'Roak study, there was a correlation between paternal age and *de novo* events for offspring (*P*<0.0001), and also for maternal age (*P*=0.000365). Across the sample set, the group observed 161 point mutations, of which 101 were missense, 50 silent, and 10 non‐ sense. Two conserved splice site rare single nucleotide variants and six frameshift inser‐ tions/deletions (indels) were also observed. Three genes were found to harbor two *de novo* mutations: *BRCA2* (two missense), *FAT1* (two missense) and *KCNMA1* (one missense, one silent).

The group next performed PPI analyses to determine whether interactions between genes associated with *de novo* mutations, as well as existing ASD candidates, was of etiological im‐ portance. This pathway approach, which additionally incorporated data from Sanders *et al*. study (below) [81], found that the distribution of functional *de novo* mutations is not ran‐ dom. The average distance for non-synonymous variants was significantly larger for con‐ trols *versus* cases (3.78 *vs*. 3.66; *P*=.033). This suggests that a proportion of these *de novo* events contribute to autism. A model whereby *de novo* variants in up to 20% of cases, confer a 10- to 20-fold increased risk was supported.

In the third of these Nature papers, Sanders *et al*. (2012) [81] performed exome sequencing on 238 families, including 200 quartets (parents, 1 affected and 1 unaffected sibling) from the Simons Simplex Collection [78]. Comparing *de novo* non-synonymous single nucleotide var‐ iants (SNVs) between affected and unaffected siblings, the group observed a significantly (P=.01) higher proportion among the probands (125 total) *versus* their unaffected sibling (87 total). From simulations, the authors concluded that two or more *de novo* nonsense/splicesite mutations should be considered significant. The gene sodium channel, voltage-gated, type II, α subunit gene (*SCN2A*) was the only such gene – with two ASD individuals found to harbor relevant nonsense mutations. Mutations in *SCN2A* have been associated with epi‐ lepsy (Kamiya *et al*., 2004; Ogiwara *et al*., 2009) [82, 83] and idiopathic ASD in multiplex fam‐ ilies (Weiss *et al*., 2003) [84]. Neither of the probands has a history of seizures.

lectively, these papers suggest that several hundred or more genes may be considered

O'Roak *et al*. (2012) [77] sequenced 677 individual exomes from 209 families – primarily from the Simons Simplex Collection [78]. In 189 new probands, they validated 120 severely disruptive *de novo* mutations, 39% of which occur in a highly interconnected b-catenin/chro‐ matin remodeling protein network. The group observed a strong paternal bias (41:10) in the rate of *de novo* mutations, which supports the hypothesis that the germline mutation rate in coding regions is markedly more prominent among males. These *de novo* events were more

Among the identified *de novo* loci, 62 were identified as top candidate mutations based on severity and/or supporting evidence from the literature. Interestingly, probands with these mutations were broadly distributed in terms of IQ score, with only a modest (non-signifi‐ cant) association with intellectual impairment. Recurrent protein-disruptive mutations were identified in two genes: netrin G1 (*NTNG1*) and chromodomain helicase DNA binding pro‐ tein 8 (*CHD8*). *NTNG1* is known to play a role in axon guidance and dendritic organization (Nishimura-Akiyoshi *et al*., 2007) [79]. *CHD8* regulates β-catenin and p53 signaling, and has not previously been associated with ASD. This gene was emphasized as particularly note‐ worthy, after follow-up protein-protein interaction (PPI) analyses, showed that β-catenin and p53 signaling may be features of an ASD-relevant network. In total 49 of proteins in the PPI network were highly interconnected, with a number of underlying genes also previous‐

Neale *et al*. (2012) [80] exome-sequenced 175 trios and also focused on *de novo* mutations. As per the O'Roak study, there was a correlation between paternal age and *de novo* events for offspring (*P*<0.0001), and also for maternal age (*P*=0.000365). Across the sample set, the group observed 161 point mutations, of which 101 were missense, 50 silent, and 10 non‐ sense. Two conserved splice site rare single nucleotide variants and six frameshift inser‐ tions/deletions (indels) were also observed. Three genes were found to harbor two *de novo* mutations: *BRCA2* (two missense), *FAT1* (two missense) and *KCNMA1* (one missense, one

The group next performed PPI analyses to determine whether interactions between genes associated with *de novo* mutations, as well as existing ASD candidates, was of etiological im‐ portance. This pathway approach, which additionally incorporated data from Sanders *et al*. study (below) [81], found that the distribution of functional *de novo* mutations is not ran‐ dom. The average distance for non-synonymous variants was significantly larger for con‐ trols *versus* cases (3.78 *vs*. 3.66; *P*=.033). This suggests that a proportion of these *de novo* events contribute to autism. A model whereby *de novo* variants in up to 20% of cases, confer

In the third of these Nature papers, Sanders *et al*. (2012) [81] performed exome sequencing on 238 families, including 200 quartets (parents, 1 affected and 1 unaffected sibling) from the Simons Simplex Collection [78]. Comparing *de novo* non-synonymous single nucleotide var‐ iants (SNVs) between affected and unaffected siblings, the group observed a significantly

autism candidates, and again highlight the staggering complexity of the phenotype.

common in older fathers, marking paternal age as a significant risk factor for ASD.

ly associated with neurodevelopment.

286 Recent Advances in Autism Spectrum Disorders - Volume I

a 10- to 20-fold increased risk was supported.

silent).

Combining the exomes from their study with those from O'Roak *et al*. (n for probands = 414), the groups identified two additional genes that each contained two loss-of-function mutations: the katanin p60 subunit A-like 2 (*KATNAL2*) and chromodomain helicase DNA binding protein 8 (*CHD8*). O'Roak *et al*. also evaluated these three novel candidates using exome sequencing on 935 cases and 870 controls. Three additional loss-of-function muta‐ tions each were observed in *KATNAL2* and *CHD8* in individuals with ASD, while none were identified in controls.

It is important to note, however, that for *de novo* events in general, there was no evidence to support the hypothesis that multiple events in any individual conferred an increased risk of ASD. As such, the 'two *de novo* hit' hypothesis is not supported.

In a fourth independent exome sequencing study involving 343 families from the Simons Simplex Collection Iossifov *et al*. (2012) [85] also reported a relatively equal distribution of *de novo* mutations in cases and controls. Again however, loss-of-function mutations—nonsense, splice site, and frame shifts—were more common in individuals with ASD (59 *versus* 28). Of the 59 "likely gene disruptions (LGD)" in ASD cases, none occurred more than once, al‐ though two—*NRXN1* and *PHF2*—had been identified in a previous CNV study by the same group (Gilman *et al*, 2011) [86]. Intriguingly, the 59-strong LGD shared considerable overlap with a set of 842 proteins that interact with the fragile X protein, FMRP. In total, 14 of the 59 appeared on the FMRP list (*P*=.006). Furthermore, 13 of 72 CNV candidates from the group's previous CNV paper were also on the list (*P*=.0004), meaning 26 of the combined 129 total were FMRP-related (*P*<1x10-13).

The authors subsequently screened for *de novo* mutations in upstream targets of *FMR1*. One was identified – a deletion in *GRM5* that removes a single amino acid and causes an addi‐ tional substitution at the same site. *GRM5* encodes the glutamate receptor mGluR5 (Bear *et al*, 2004) [87] and, as noted below, mGluR5 antagonists are currently in clinical trial (Jacque‐ mont *et al*., 2011) [88] having indicated success in mouse models (Dölen *et al*., 2007) [89]. Fur‐ ther elucidating the relationship between *FMR1*/FRMRP and these ASD candidates is clearly an important next step in maximizing the impact of these findings. These are discussed fur‐ ther in the section below.

Collectively, all four of these exome sequencing studies converge upon the conclusion that ASD is highly heterogeneous, with several hundred or more loci potential risk variants. Sim‐ ulations by the Neale *et al*. group confirm the statistical implausibility that hundreds of var‐ iants with high penetrance are possible, and a model where *de novo* variants in up to 20% of cases, confer ~10- to 20-fold increased risk is supported. The studies also converge on the conclusion that paternal age (and possibly maternal age) is a significant ASD risk factor, but the frequency and size of *de novo* mutations *per se* is not. Evidence for three candidate genes —*CHD8*, *KATNAL2*, and *SCN2A*—would seem quite strong, though further functional stud‐ ies are needed to help define pathogenesis. Perhaps most exciting is the association between *GRM5* and existing/novel candidates. As we have learned from GWAS, larger sample sets are clearly needed to fully harness the power of NGS in relation to such a complex pheno‐ type. While these studies have been important in proposing novel candidates and confirm‐ ing existing hypotheses of ASD, we await with anticipation results from the sequencing of all 2,648 families from the Simon Simplex Collection.

(n=300). It is likely that drugs targeting the mGluR5 pathway, if/when approved for fragile X syndrome, will lead to human clinical trials for ASD. This translational approach – which delineates a direct route from gene discovery, through functional validation to treatment, is

Autism Spectrum Disorders: Insights from Genomics

http://dx.doi.org/10.5772/54357

289

ASD are clearly highly heritable disorders and advances in gene-finding technology in the past decade have rapidly accelerated gene discovery. As is typically the case, successive de‐ velopments have made the problem more complex such that there are huge numbers of can‐ didate genes, most of which remain to be replicated. In spite of this complexity, we can observe a number of patterns beginning to unfold 1) the relative scarcity of causal common variants, 2) the growing list of causal rare variants, and 3) the emergence of monogenic dis‐

The monogenic autisms are particularly interesting from a treatment perspective, as they provide a mechanism for studying ASD phenotypes in model systems and are an obvious target for drug intervention. They are also amenable to clinical testing and the decreasing cost of research technologies means that this capacity is more widely available to clinicians. In fact, as the resolution of clinical instruments becomes more sophisticated, it is likely that

A key requirement in driving gene discovery is the necessity of high-quality phenotype da‐ ta. ASDs are notoriously heterogeneous, and are fractionated in terms of symptoms and tra‐ jectory. Mandy & Skuse (2008) [97] reviewed seven factor analysis studies of ASD symptoms, and found that all but one dissociated social and non-social factors. In a nonclinical sample of 3,000 twin pairs, Happé *et al.* (2006) [98] examined autistic-like traits and found consistently low correlations (r = 0.1-0.4) between each of the core deficits on the au‐ tism spectrum. Endophenotypes, sub-components or sub-processes of the broader pheno‐ type, may provide a productive avenue to disentangling some of this complexity. By filtering out all but a few discrete measures, we can theoretically increase the signal-to-noise ratio in genotype-phenotype associations. A number of endophenotypes for ASD have been associated with disease genes, including head circumference (associated with the *HOXA1* A218G polymorphism, Conciatori *et al.*, 2004) [99]; age at first word (associated with a quan‐ titative trait locus on 7q35, Alarcón *et al.* 2005) [100]; delayed magnetoencephalography evoked responses to auditory stimuli (Roberts *et al.*, 2010) [101]; and enhanced perception (Mottron *et al.*, 2006) [102]. The endophenotype approach is arguably more consistent with rare-/mono-genic discovery, where a mutated network may not yield a diagnosis of autism *per se*, but nevertheless cause associated abnormalities. Note, this approach does not dimin‐ ish the pleiotropic effects of genes involved in neurodevelopment, and only serves to make the point that the relevant genotype may associate with some but not all ASD features.

The converse, of course, is also true, as a large number of candidate genes contribute to the majority of known ASD. With ~80% of genes expressed in the brain it is likely that this num‐

clearly the blueprint by which genome research can have tangible clinical impact.

orders with primary and secondary ASD phenotypes.

the clinic will become a primary workplace for syndromic discovery.

**5. Conclusions**
