**5. Sequencing of critical genomic regions**

regulates expressed genes by *cis, trans* or *epistatic* interaction. The whole sequence is conserved. Linkage disequilibrium, when it occurs, is simply a reflection of this conservation which includes haplotypes with alleles which are relatively common in one haplotype when com‐ pared with others. Each is ancestral, in the sense that they are shared by apparently unrelated families separated by hundreds or even thousands of generations. It follows that the poly‐ morphisms are actively conserved and could not be a consequence of recent mutation.

Population Genetics Quantal Genomics

**Figure 4.** Importance of clustering functional genes. Colours represent loci and numbers represent alleles at those loci. On the left is the basis of the infinitesimal model used in population genetics. Loci are biallelic and can be homozygous or heterozygous. Free recombination occurs between loci and alleles segregate independently. On the right, loci are within polymorphic frozen blocks (PFB), shown by alignment of loci. Alleles within PFB segregate *en bloc,* forming haplotypes, which are inherited intact through many generations. Important genes are carried within PFB, conserving their *cis* interactions. Loci within PFB have multiple alleles, allowing for a greater degree of polymorphism clustered within the block. There can be hundreds of ancestral haplotypes for each PFB. *Trans* interactions between haplotypes increase the diversity expressed in the population. The loci shown in green and yellow are outside the PFB and follow a pattern of inheritance similar to population genetics. *De novo* mutations are indicated by asterisk—on the right the mutations occur at loci outside of conserved PFB and will have little if any consequence because truly important differ‐ ences are encoded within PFB. Monogenic diseases or traits are the partial exceptions. On the left, mutations can occur at any loci but are generally assumed to occur at loci that were monoallelic. They may or may not be important, de‐

By 1987, it was clearly established that each ancestral haplotype has a specific content of genomic features such as duplications and indels. These too are actively conserved and can themselves be used as signatures for haplotypes of hundreds of kilobases and even megabases. These observations were very difficult to explain in terms of any form of neo-Darwinism, natural selection, random errors or population genetics as taught then and today. Rather, we realised, the genome is not actually homogeneous but partitioned into protected quanta or PFB

pending upon frequency, context, repair and heritability. Adapted with permission from ref. [22].

[17, 22, 26, 29].

Some of the implications are illustrated in Figures 4 and 5.

352 Next Generation Sequencing - Advances, Applications and Challenges

By 1992, there was sufficient sequencing to confirm the earlier prediction that each ancestral haplotype is actually a frozen sequence.


**Table 3.** Haplospecific geometric elements. Ancestral haplotypes have specific sequence signatures at each of the duplicons. Note in 18.2, the duplication did not occur or has been deleted.

We now know that examples of the 8.1 ancestral haplotype are almost identical over megabases [31, 32].

We illustrate the differences between different haplotype sequences in Figure 6. It can be seen that there are certain sites where haplotypes differ. Importantly, haplospecificity is conferred by the whole sequence rather than single nucleotide polymorphisms. For example, reading from left to right, 8.1 and 18.2 differ in T/G but not A/G, etc. Note also that some of the differences are due to indels. Of critical importance is accurate, unmolested sequencing over kilobases, as is now possible through NGS. It is clear, however, that assembly is hazardous especially in areas of duplication and polymorphism. Note also, that there is no justification for regarding *one* particular sequence as the reference. Rather, it is necessary to compare each output with a library of known sequences within each PFB.

The number of differences depends on which haplotypes are compared (see Table 4). Two of the most common Caucasian haplotypes, 8.1 and 7.1, differ by a hundred positions, repre‐ senting approximately 1% nucleotide diversity. The most different haplotypes are 18.2 and 7.1, having 2.5% nucleotide diversity. Interestingly, these haplotypes are different functionally; 18.2 permits insulin-dependent diabetes mellitus whereas 7.1 is protective.


**Table 4.** Pairwise differences between haplotypes. Total differences between each pair of haplotypes in the 9277 bp region at HLA-B.


**Haplotype Geometric element at CL1 Length Geometric element at CL2 Length**

(TC)12 94 TA (TC)18 TT (TC)

(TC)12 94 (TC)14 TG (TC)

**Table 3.** Haplospecific geometric elements. Ancestral haplotypes have specific sequence signatures at each of the

We now know that examples of the 8.1 ancestral haplotype are almost identical over megabases

We illustrate the differences between different haplotype sequences in Figure 6. It can be seen that there are certain sites where haplotypes differ. Importantly, haplospecificity is conferred by the whole sequence rather than single nucleotide polymorphisms. For example, reading from left to right, 8.1 and 18.2 differ in T/G but not A/G, etc. Note also that some of the differences are due to indels. Of critical importance is accurate, unmolested sequencing over kilobases, as is now possible through NGS. It is clear, however, that assembly is hazardous especially in areas of duplication and polymorphism. Note also, that there is no justification for regarding *one* particular sequence as the reference. Rather, it is necessary to compare each

The number of differences depends on which haplotypes are compared (see Table 4). Two of the most common Caucasian haplotypes, 8.1 and 7.1, differ by a hundred positions, repre‐ senting approximately 1% nucleotide diversity. The most different haplotypes are 18.2 and 7.1, having 2.5% nucleotide diversity. Interestingly, these haplotypes are different functionally;

**AH Haplotype 44.2 62.1 7.1 44.1\* 8.1**

**18.2\*** 184 130 250 137 245

**Table 4.** Pairwise differences between haplotypes. Total differences between each pair of haplotypes in the 9277 bp

0

0

0

44.2

C A G A T G A G G A C C A G G G T G T G G T G T A G A G G C A G A G G G A T C T T G G T G G T T C T G T G G C C C A - A T A T A C A A C T T T A T G A A C T T G

39

16

14

11

29

60

17

29

13

41

0

**8.1** 224 219 101 204

<sup>9</sup> 58

<sup>5</sup> 96

<sup>5</sup> 94

6 TG (TC) 8 TG (TC)

6 TG (TC) 8 TG (TC)

**57.1** (TC)12(TG)

**7.1** (TC)12(TG)

Adapted from ref. [30].

**44.2**

region at HLA-B.

**62.1** 187

[31, 32].

6 (TC)14(TG) 3

354 Next Generation Sequencing - Advances, Applications and Challenges

6 (TC)14(TG) 3

**18.2** (TC)14 28 Deleted

duplicons. Note in 18.2, the duplication did not occur or has been deleted.

output with a library of known sequences within each PFB.

0

**44.1\*** 73 154 227

**7.1** 249 221

18.2 permits insulin-dependent diabetes mellitus whereas 7.1 is protective.

**8.1** (TC)28 56 (TC)15 TG (TC)



the first row. Nucleotides of AH 62.1, 7.1, 44.1\*, 8.1 and 18.2\* are given only where they differ from AH44.2 and other‐ wise marked with a dot. Missing nucleotides are marked with a dash and shaded grey. The sequences are described by Horton et al. [24], whereas AH haplotypes have been assigned from the HLA allele types given by Horton, according to Cattley [35].

The degree of conservation of each ancestral haplotype is truly remarkable. For example, Smith et al. [32] found variation at only 11 of 3, 600, 000 positions between HLA-A and DR. Similar findings have been reported by others, including Aly et al. [31], see Figure 7. Mutation and recombination must be suppressed.

Figure 7 illustrates the importance of interpreting nucleotide diversity according to the block structure of the genome. Thus, conservation in the intervening, essentially monomorphic regions, is of minor interest, whereas differences within PFB allow the discovery of evolution, function and disease susceptibility.

Adapted from ref. [31].

**Figure 7.** Remarkable conservation within 8.1 haplotypes. A total of 656 SNPs spanning 4.8 Mb in the MHC region are depicted. The lower frequency allele (row) for each SNP along each haplotype column is highlighted in yellow. The top group depicts SNP results from 8.1 AH haplotypes (*n* = 31), the lower group are HLA-DR3, non-B8 haplotypes (*n* = 13). The 29.9 Mb range between HLA and DRB1 was >99.9% conserved, with only 9 variant alleles of the 10, 768 alleles identified for the 384 SNPs in the 31 8.1 AHs.

The inescapable conclusion is that some parts of the genome have *not* two or three but hundreds of alternative ancestral sequences.
