**7. Future directions**

98 Bioinformatics

SLC11A1 chr2 (q35) 2 Indel

STAT3 chr17 (q21.2) 2 SNP

SNP

SNP

**Gene Location Variants Type Position Function Frequency in VEO** 

NOX5 chr15 (q23) 0 - - oxidative burst - -

PRAM1 chr19 (p13.2) 1 Indel 8564497-500 adhesion 0.02 -

RAC1 chr7 (p22.1) 0 - - oxidative burst - -

RAC2 chr22 (q12.3) 0 - - oxidative burst - -

SELPLG chr12 (q24.11) 1 SNP 109017468 adhesion 0.11 -

STAT5A chr17 (q21.2) 1 SNP 40461109 GM-CSF signaling 0.02 -

STAT5B chr17 (q21.2) 0 - - GM-CSF signaling - -

VAV1 chr19 (p13.3) 0 - - oxidative burst - -

VAV2 chr9 (q34.2) 0 - - oxidative burst - -

VAV3 chr1 (p13.3) 0 - - oxidative burst - -

We used SeqAnt to annotate all the sequence variations from the 45 exomes and identified a total of 60,682 variant sites of interest in coding regions (54,313 replacement SNPs, 2953 indels covering 6369 bases). For our exploratory genome-wide analysis of SNPs, we restricted our analysis to those variants with phyloP scores greater than 2.0, which corresponds to the top 1% of conserved sites in the human genome. Remaining were 12,575, of which 51% (6490) were not cataloged in dbSNP 132 and might constitute novel mutations contributing to early-onset IBD. We then restricted our analysis to 33 neutrophil genes. Table 6 contains a list of these 33 neutrophil genes with the number of rare putative functional variants (replacement SNPs or exonic indels). These variants are to be followed up using direct functional assays to assess function. Again, SeqAnt enabled us to rapidly annotate all variants, ignore those variants of lesser interest, and focus our attention on

**Table 7.** Genetic variants found in genes that regulate neutrophil function.

those most likely to contribute to the VEO CD in our sequenced patients.

219254723 bacterial killing 0.02

40477064 IL-27 signaling 0.02

219247739

40481429

**CD Patients** 

0.02

0.02

SNP 89182666 0.02 0.0022

**Frequency in Control Population** 

> - -


Genomic sequence annotation requires an up-to-date and comprehensive database of DNA sequence information for a given organism. Our first aim is to continue adding to our database organisms whose genomic information could be annotated. We plan on including several other mammals, vertebrates, invertebrates, and ultimately bacteria strains in the near future. This will give researchers a web application they can use to speed their genetic studies of such organisms. We are also in the process of updating the dbSNP information contained in the SeqAnt database.

Another area of future focus is to broaden the types of input and output files that SeqAnt could work with, while embracing standards in broad use in the bioinformatics community. We intend to include the capability to directly annotate .vcf files as a standard input file format. Presently, all our output files are either text files or BED files. We also plan to provide the option of having the annotation output in .vcf format. Furthermore, we intend to modify SeqAnt to make the .map and .ped files (PLINK formats) from the snp variant file, which will be beneficial for substructure analysis and several other analyses that can be done using PLINK.

The inclusion of additional custom tracks from the UCSC browser to annotate for conserved and putatively functional sites will also be a future area of SeqAnt development. Our hope is that this will improve the effectiveness of downstream functional analysis. We also plan to have the application hosted in a cloud computing environment, side by side with other bioinformatics tools. This is relevant not only because of the wider accessibility it guarantees, but there is often the added ease of using other tools in the same environment to generate and modify input and output files from SeqAnt for further analysis.

SeqAnt was set up to be a dynamic application, and our improvements to this software make it possible to apply SeqAnt to different genomic variant analysis situations. Inevitable advances in sequencing technologies will spur continued demand for tools that can make sense out of the enormous raw sequence data generated, and we will work continually to make SeqAnt adaptable to these improvements and even more accessible to the wider public.
