**5. An application of SeqAnt 2.0: Discovering new mutations from forward genetic screens in the mouse**

Forward genetic screens in *Mus musculus* have been very informative, revealing unsuspected mechanisms governing basic biological processes [27-32]. In this approach, a potent chemical mutagens, such as *N*-ethyl-*N*-nitrosourea (ENU), is used to randomly induce mutations in mice. The mice are then bred and phenotypically screened to identify lines that disrupt a specific biological process of interest. Although identifying a mutation using the rich resources of mouse genetics is straightforward, it is unfortunately neither fast nor cheap.

To solve this problem, we developed a methodology that combines multiplex chromosomespecific exome capture, next-generation sequencing, rapid mapping, sequence annotation, and variation filtering to detect newly induced causal variants in a dramatically accelerated way [33]. Rapid sequence annotation and variation filtering are critical to this approach. We used SeqAnt as a part of this methodology for rapid annotation of variations obtained from mutant, parental, and background strains in a single experiment. By using SeqAnt, we first annotated all the variants into different functional classes. Next, by comparing variants identified in mutant offspring to those found in dbSNP, the unmutagenized background strains, and parental lines, we could immediately distinguish the induced putative causative mutations from preexisting variations or experimental artifacts (Table 6).

94 Bioinformatics

sequence variants, one of which was at an evolutionarily conserved site. Subsequent functional testing suggested that the variant at the conserved site acts to influence the level of *AFF2* expression. Thus, for this experiment, SeqAnt allowed us to rapidly focus on those

**Figure 8. Summary of SNV and indel variation discovered at the** *AFF2* **locus in males with ASD.** The frequency of SNVs and indels (minor alleles) in cases is plotted against their level of evolutionary conservation. Most common variation has already been discovered and exists in public databases (blue;

Forward genetic screens in *Mus musculus* have been very informative, revealing unsuspected mechanisms governing basic biological processes [27-32]. In this approach, a potent chemical mutagens, such as *N*-ethyl-*N*-nitrosourea (ENU), is used to randomly induce mutations in mice. The mice are then bred and phenotypically screened to identify lines that disrupt a specific biological process of interest. Although identifying a mutation using the rich resources of mouse genetics is straightforward, it is unfortunately neither fast

To solve this problem, we developed a methodology that combines multiplex chromosomespecific exome capture, next-generation sequencing, rapid mapping, sequence annotation, and variation filtering to detect newly induced causal variants in a dramatically accelerated way [33]. Rapid sequence annotation and variation filtering are critical to this approach. We

circles and diamonds). Most of the rare variation at *AFF2* was discovered in our study and not

**5. An application of SeqAnt 2.0: Discovering new mutations from** 

contained in public databases (red; circles and diamonds).

**forward genetic screens in the mouse** 

nor cheap.

sites of greatest interest for both statistical analyses and direct functional testing.


**Table 6.** Results of filtering homozygous variants sites for each mouse mutant line sequenced.

We demonstrated the use of this approach to find the causative mutations induced in four novel ENU lines identified from a recent ENU screen. In all four cases, after applying our method and combining with standard mapping data used to initially localize the variant to a chromosome, we found two or fewer putative mutations (and sometimes only a single one). Confirming that the variant was in fact causative was then easily achieved via standard segregation approaches. SeqAnt gave us the ability to rapidly annotate and screen variants of lesser interest (silent, UTR, intronic, intergenic), so we could instead focus our attention on those variants (replacement) that were most likely to account for the mutant phenotype.

SeqAnt 2012: Recent Developments in Next-Generation Sequencing Annotation 97

**CD Patients** 

0.02

0.2 0.2 0.57

**Frequency in Control Population** 

> 0.0003 0.0002

> > - - -

**Gene Location Variants Type Position Function Frequency in VEO** 

45448069

DUOX1 chr15 (q21.1) 2 SNP

NOS2 chr17 (q11.2-

q12) <sup>3</sup>

Indel Indel Indel

SNP

CSF2RA chrX (p22.33) 0 - - GM-CSF signaling - -

CSF2RB chr22 (q12.3) 1 SNP 37331455 GM-CSF signaling 0.02 0.0024

CYBB chrX (p11.4) 1 SNP 37663322 oxidative burst 0.02 0.0032

DUOX2 chr15 (q21.1) 1 Indel 45393428-30 enterocyte, H2O2 0.02 -

FCGR1A chr1 (q21.2) 0 - - phagocytosis - -

FCGR2A chr1(q23.3) 0 - - phagocytosis - -

FCGR2B chr1 (q23.3) 0 - - phagocytosis - -

FCGR3A chr1 (q23.3) 0 - - phagocytosis - -

FCGR3B chr1 (q23.3) 0 - - phagocytosis - -

IL27RA chr19 (p13.12) 1 Indel 14159807 IL-27 signaling 0.02 -

JAK2 chr9 (p24.1) 0 - - GM-CSF signaling - -

MPO chr17 (q22) 0 - - bacterial killing - -

NCF1 chr7 (q11.23) 0 - - oxidative burst - -

NCF2 chr1 (q25.3) 0 - - oxidative burst - -

NCF4 chr22 (q12.3) 1 SNP 37273825 oxidative burst 0.02 0.0001

reactive nitrogen intermediates

NLRP12 chr19 (q13.42) 0 - - chemotaxis - -

NOX1 chrX (q21.1) 0 - - oxidative burst - -

NOX3 chr6 (q25.3) 0 - - oxidative burst - -

NOX4 chr11 (q14.3) 2 SNP 89088208 oxidative burst 0.02 -

26087106 26096042 26085975-76

45431655 enterocyte, H202 0.02
