**6. Sequence analysis of ancestral haplotypes**

The challenge in terms of sequence analysis is to compile a sufficient matrix to be able to recognize each haplotype and its extent. Assume access to multigenerational families with accurate, truly phased but unmolested raw sequences of at least 100, 000 bases:


Given NGS, this approach is now feasible, even if daunting.

the first row. Nucleotides of AH 62.1, 7.1, 44.1\*, 8.1 and 18.2\* are given only where they differ from AH44.2 and other‐ wise marked with a dot. Missing nucleotides are marked with a dash and shaded grey. The sequences are described by Horton et al. [24], whereas AH haplotypes have been assigned from the HLA allele types given by Horton, according

The degree of conservation of each ancestral haplotype is truly remarkable. For example, Smith et al. [32] found variation at only 11 of 3, 600, 000 positions between HLA-A and DR. Similar findings have been reported by others, including Aly et al. [31], see Figure 7. Mutation and

Figure 7 illustrates the importance of interpreting nucleotide diversity according to the block structure of the genome. Thus, conservation in the intervening, essentially monomorphic regions, is of minor interest, whereas differences within PFB allow the discovery of evolution,

HLAA HLAB DRB1

←Telomere Centromere→

**Figure 7.** Remarkable conservation within 8.1 haplotypes. A total of 656 SNPs spanning 4.8 Mb in the MHC region are depicted. The lower frequency allele (row) for each SNP along each haplotype column is highlighted in yellow. The top group depicts SNP results from 8.1 AH haplotypes (*n* = 31), the lower group are HLA-DR3, non-B8 haplotypes (*n* = 13). The 29.9 Mb range between HLA and DRB1 was >99.9% conserved, with only 9 variant alleles of the 10, 768 alleles

The inescapable conclusion is that some parts of the genome have *not* two or three but hundreds

The challenge in terms of sequence analysis is to compile a sufficient matrix to be able to recognize each haplotype and its extent. Assume access to multigenerational families with

accurate, truly phased but unmolested raw sequences of at least 100, 000 bases:

Individuals homozygous for AH 8.1

Individuals homozygous for HLA DR3 but not HLA B8

to Cattley [35].

recombination must be suppressed.

358 Next Generation Sequencing - Advances, Applications and Challenges

function and disease susceptibility.

Adapted from ref. [31].

identified for the 384 SNPs in the 31 8.1 AHs.

of alternative ancestral sequences.

**6. Sequence analysis of ancestral haplotypes**

Importantly, those regions which are complex because of duplications and indels should be included rather than "corrected" based on the assumption that there is a single reference or "wild" sequence. Some examples are shown in Figure 6.

In designing better algorithms [36], the strategy for comparative analysis will be crucial. In many polymorphic regions, the density of differences can be as high as 1 per 10 bases when different haplotypes are compared but as low as 0 if the haplotypes are the same. It follows that analysis without haplotype assignment will be misleading.
