**2. Genetic data**

teaching, speech and language therapy and social skills therapy. When behavioral treat‐

Figure 1 demonstrates the interaction of the autism spectrum disorders researches and studies.

**Figure 1.** A puzzle-like representation of the interaction process of the researches and studies for autism spectrum

The advancements of the technologies in the field of genetics provide the opportunities for researchers and scientists to explore in depth the biological information and to convert it in‐

In this chapter, we will investigate the genetics origins of autism and demonstrate the latest techniques and technologies available for diagnosing the complex disorder. We will also propose a robust approach for detecting and identifying the targeted disorder based upon the advantages and strengths of the publically available and commercial approaches while avoid‐ ing their weaknesses. The proposed approach is divided into two steps. The preprocessing step is a feature-extraction method used to clearly map and detect the genetic variations and structural rearrangements followed by a statistical-based model as feature-selection to evalu‐ ate and measure the statistical and biological significance of the predicted variations. The classification step is to discover the relationship among the tested samples into groups and/or

The results suggest that autism is associated with an increased amount of alterations in un‐ stable segments of the genome. The experimental results also show that using high-resolu‐

to meaningful biological knowledge through computational-based models.

subgroups, and to provide insight into the complex pattern of the genome.

disorders.

ment fails, many medications are used to treat ASD symptoms.

342 Recent Advances in Autism Spectrum Disorders - Volume I

#### **2.1. Genomic structural variations and ASD susceptibility**

Genetic alterations in the form of chromosomal rearrangements are genomic structural var‐ iations that lead to changes in the DNA copy number such as duplications and deletions of the DNA copies. However, copy number changes do not include other genomic structural variations such as inversions, insertions and reciprocal translocations. Figure 2 demonstrates different types of chromosomal rearrangements.

**Figure 2.** a schematic representation of types of chromosomal rearrangements [67].


autism and Lhermitte Duclos disease [39& 40].\*\* A genetic syndrome caused by disruption of the SHANK3 gene which codes for the shank3 protein. The protein most important role is in the brain. It is involved in processes crucial for learning and memory. It also has an important role in brain development. It is also known as 22q13.3 deletion syndrome and is highly associated with autism.Human (Homo sapiens) Genome Browser Gateway, http://

Discovering the Genetics of Autism http://dx.doi.org/10.5772/ 53797 345

A set of chromosomal regions and genes that are implicated with ASD are listed in Table 1. Some of the regions are associated with known Mendelian syndromes. In some individ‐ uals affected with these syndromes, ASD occurs as a secondary diagnosis. In other re‐ gions and genes, genetic variations causing ASD include a wide range of possibilities each with very low frequency among the cases (rare variants). In some cases the rare variants are found only once in the population. In contrast to rare variants we see that in other chromosomal regions and genes only few common genetic variations (common alleles)

Figure 3 illustrates the process of generating DNA copy number data using Microarray-

**Figure 3.** Principles of the aCGH technology. (a) DNA from the sample to be tested and reference DNA are labeled with a green fluorescence dye (*Cy*3) and red (*Cy*5), respectively, and competitively co-hybridized to an array containing genomic DNA targets that have been spotted on a glass slide. The resulting ratio of the fluorescence intensities is pro‐ portional to the ratio of the copy numbers of DNA sequences in the test and reference genomes measured in a loga‐ rithmic scale. (b) The slides are scanned using a specific microarray scanner shown in (c). (d) The output of the scanning process is the ratio of the fluorescence intensities for each spot represented as a point in the relative copy

based comparative genomic hybridization (array CGH) technology.

genome.ucsc.edu/cgi-bin/hgGateway.

account for ASD susceptibility.

**2.2. Data Generating**

number profile [66].

**Table 1.** Chromosomal regions and genes that are implicated in risk for ASD, and associated genetic disorders and syndroms [68& 69].Abbreviations: LTD, long-term depression; LTP, long-term potentiation; PPI, prepulse inhibition; E/I, excitatory/inhibitory; PSD, postsynaptic density; ASD, autism spectrum disorders; SCZ, schizophrenia; ADHD, attention deficit hyperactivity disorder; ID, intellectual disability; XLID, X-linked intellectual disability; LIS, lissencephaly; EPI, epilepsy; OCD, obsessive compulsive disorder; TS, Tourette syndrome; SLI, speech and language impairment; USV, ultrasonic vocalization; TF, transcription factor; ECM, extracellular matrix; GPCR, G-protein-coupled receptor;BPAD, Bipolar affective disorder. \*A rare autosomal dominant inherited disorder characterized by multiple tumor-like growths, increased risk of certain forms of cancer, and diverse clinical features including neurologic features such as

autism and Lhermitte Duclos disease [39& 40].\*\* A genetic syndrome caused by disruption of the SHANK3 gene which codes for the shank3 protein. The protein most important role is in the brain. It is involved in processes crucial for learning and memory. It also has an important role in brain development. It is also known as 22q13.3 deletion syndrome and is highly associated with autism.Human (Homo sapiens) Genome Browser Gateway, http:// genome.ucsc.edu/cgi-bin/hgGateway.

A set of chromosomal regions and genes that are implicated with ASD are listed in Table 1. Some of the regions are associated with known Mendelian syndromes. In some individ‐ uals affected with these syndromes, ASD occurs as a secondary diagnosis. In other re‐ gions and genes, genetic variations causing ASD include a wide range of possibilities each with very low frequency among the cases (rare variants). In some cases the rare variants are found only once in the population. In contrast to rare variants we see that in other chromosomal regions and genes only few common genetic variations (common alleles) account for ASD susceptibility.

#### **2.2. Data Generating**

**Chromosome region Gene Phenotype**

344 Recent Advances in Autism Spectrum Disorders - Volume I

6q23.3 AHI1 Joubert syndrome

12p13.33 CACNA1C Timothy syndrome 15q11.2 UBE3A Angelman syndrome 16p13.3 TSC2 Tuberous Sclerosis type II 17q11.2 NF1 Neurofibromatosis

Xp21.3 ARX LIS, XLID, EPI, ASD

Xq27.3 FMR1 Fragile X syndrome Xq28 MECP2 Rett syndrome

3p13 FOXP1 ID, ASD, SLI 6q16.3 GRIK2 Recessive ID

17q11.2 SLC6A4 ASD, OCD 17q12 ACCN1/PNMT ASD, SCZ, EPI

Xq13.1 NLGN3 ASD Xp22.11 PTCHD1 ASD, ID

7q22.1 RELN ASD 7q36.3 EN2 ASD 12q14.2 AVPR1A ASD 17q21.32 ITGB3 ASD

Xp22.32-p22.31 NLGN4X ASD, ID, TS, ADHD

7q31.2 MET ASD, Diabetes II

1q42.2 DISC1 SCZ,BPAD 2q31.1 SLC25A12 ASD 3p25.3 OXTR ASD

7q31.1 FOXP2 SLI 11q13.3-q13.4 SHANK2 ASD, ID 15q11-q13 MAGEL2/ NDN ASD, EPI, ID

1q21.1 NBPF9 ASD, ID, SCZ, ADHD, EPI 2p16.3 NRXN1 ASD, ID, language delay, SCZ.

7q11.23 FKBP6/CLIP2 ASD, ID, language delay

16p11.2 VPS35/ORC6 ASD, ADHD, ID, EPI, SCZ 16p13.3 A2BP1 ID, ASD, EPI, SCZ, ADHD

22q11.21 DiGeorge syndrome, SCZ, ASD, ID.BPAD 22q13.33 SHANK3 ASD, Phelan McDermid syndrome\*\*

**Table 1.** Chromosomal regions and genes that are implicated in risk for ASD, and associated genetic disorders and syndroms [68& 69].Abbreviations: LTD, long-term depression; LTP, long-term potentiation; PPI, prepulse inhibition; E/I, excitatory/inhibitory; PSD, postsynaptic density; ASD, autism spectrum disorders; SCZ, schizophrenia; ADHD, attention deficit hyperactivity disorder; ID, intellectual disability; XLID, X-linked intellectual disability; LIS, lissencephaly; EPI, epilepsy; OCD, obsessive compulsive disorder; TS, Tourette syndrome; SLI, speech and language impairment; USV, ultrasonic vocalization; TF, transcription factor; ECM, extracellular matrix; GPCR, G-protein-coupled receptor;BPAD, Bipolar affective disorder. \*A rare autosomal dominant inherited disorder characterized by multiple tumor-like growths, increased risk of certain forms of cancer, and diverse clinical features including neurologic features such as

9q34.13 TSC1 Tuberous Sclerosis type I 10q23.31 PTEN Cowden disease\*

11q13.4 DHCR7 Smith-Lemli-Opitz syndrome

Xp21.2 DMD Duchenne muscular dystrophy

Xp22.13 CDKL5 X-linked infantile spasm syndrome

7q35-q36.1 CNTNAP2 Recessive EPI syndrome, ASD, ADHD, TS, OCD

**Mendelian Syndromes**

**Rare Variants**

**Common Alleles**

Figure 3 illustrates the process of generating DNA copy number data using Microarraybased comparative genomic hybridization (array CGH) technology.

**Figure 3.** Principles of the aCGH technology. (a) DNA from the sample to be tested and reference DNA are labeled with a green fluorescence dye (*Cy*3) and red (*Cy*5), respectively, and competitively co-hybridized to an array containing genomic DNA targets that have been spotted on a glass slide. The resulting ratio of the fluorescence intensities is pro‐ portional to the ratio of the copy numbers of DNA sequences in the test and reference genomes measured in a loga‐ rithmic scale. (b) The slides are scanned using a specific microarray scanner shown in (c). (d) The output of the scanning process is the ratio of the fluorescence intensities for each spot represented as a point in the relative copy number profile [66].

#### **2.3. Data Modeling**

As illustrated in Figure 3, aCGH technology is an experimental approach for genome-wide scanning of differences in DCN samples. It provides a high-resolution method to map and measure relative changes in DCN simultaneously at thousands of genomic loci. In a biologi‐ cal experiment, unknown (test) and reference (normal) DNA samples are labeled with fluo‐ rescent dyes Cy3 and Cy5, respectively. Then, they are combined and competitively cohybridized to an array containing genomic DNA targets that have been spotted on a glass slide. The resulting ratio of the fluorescence intensities is proportional to the ratio of the copy numbers of DNA sequences in the test and reference genomes measured in a logarith‐ mic scale for a certain genomic location. These intensity ratios are informative about DNA copy number changes. We expect to see duplication (gain) for positive ratio, deletion (loss) for negative ratio and normal state for neutral ratio. Due to the logarithmic scale and the probes performance, the data can be approximated as a piecewise function of short and long intervals with different intensity levels that are not equally-spaced along the genome. More‐ over, microarray experiments suffer from many sources of error due to human factors, array printer performance, labeling, and hybridization efficiency.

where *y*[*n*] is the contaminated genetic signal and *x*[*n*] is the true value of the genetic varia‐ tion to be estimated at genomic location *n* of the length *N*. *ε <sup>n</sup>* is assumed to be modeled as

As described in (1), Figure 4 illustrates the genetic data in the form of DNA copy number generated by aCGH technology where 4 variant segements are presented with different in‐

Although the recent advantecment in microarray technologies and sequencing now make it easy to measure the genetic variations with high-resolution through scanning large number of samples, small changes, particularly at the low copy repeat (LCRs) regions, remain diffi‐ cult to detect due to different noise conditions. Thus, the challenging problem is to differen‐

Various methods have been proposed as preprocessing techniques to tackle this problem. These methods have been motivated by either well-known signal processing techniques or

**METHOD COMPUTATIONAL**

.

Discovering the Genetics of Autism http://dx.doi.org/10.5772/ 53797 347

additive wihte Gaussian noise with zero mean and some variance σ<sup>2</sup>

tiate between the true biological signaling and the noise measurements.

**SMOOTHING TECHNIQUES COMPLEXITY** SIGMA FILTERING (Alqallaf et al., 2007) *O*(*N*) SMOOTHING AND EDGE DETECTION (Huang et al., 2004) *O*(*N*) WAVELETS (Hsu et al., 2005) *O*(*N log N*)

CIRCULAR BINARY SEGMENTATION (Olshen et al., 2004) *O*(*N*2) HIDDEN MARKOV MODELS (Fridlyand et al., 2004) *O*(*C*<sup>2</sup>*N*) SPARSE BAYESIAN LEARNING (Pique-Regi et al., 2008) *O*(*N log N*)

**Table 2.** Comparison based on the computational complexity of the proposed denoising techniques.

iant regions boundaries in the smoothing process.

In Table 2, we present a comparison study based on the computational cost of the most re‐ cent and successful approaches. As can be noticed that the smoothing techniques are well suited to process very large amount of data such as the genetic signals compared to the stat‐ istical-based models. However, these techniques include important features such as the var‐

Here we present our previously proposed method (Alqallaf et al., 2007), Sigma filter (SF). It is a nonlinear method used as a feature extraction to detect the variant segments edges and

tensity levels.

**3. Methods**

**3.1. Data Filtering**

statistical-based models.

**STATISTICAL-BASED MODELS**

**Figure 4.** Graphical representation of the generated data using aCGH technology. The red stars represent the raw da‐ ta as described in (1). The grey solid line represents the true value of 4 variant segments that need to be estimated with intensity levels *Ai* measured in log2(ratio) and bounded by the breakpoints *n <sup>i</sup>*-1 and *n <sup>i</sup>* , respectively.

According to the data description and properties generated by microarray technology, the DCN cell line can be approximated as a one-dimensional piecewise constant (PWC) discretetime signal contaminated with some error. A good model of the genetic data generated by the aCGH technology can be model as follows.

$$\mathbf{y}[n] = \mathbf{x}[n] + \mathbf{z}\_{n'} \qquad \qquad n = 1, \ 2, \ \dots, N. \tag{1}$$

where *y*[*n*] is the contaminated genetic signal and *x*[*n*] is the true value of the genetic varia‐ tion to be estimated at genomic location *n* of the length *N*. *ε <sup>n</sup>* is assumed to be modeled as additive wihte Gaussian noise with zero mean and some variance σ<sup>2</sup> .

As described in (1), Figure 4 illustrates the genetic data in the form of DNA copy number generated by aCGH technology where 4 variant segements are presented with different in‐ tensity levels.
