**9. Conclusion**

In analysing NGS databases, we recommend:

**1.** Screening for PFB.

**8.2. Comparing polymorphic sequences of well-characterised PFB**

sequences as possible. An example is shown in Figure 6.

**•** Indels are important: alignments can be misleading.

In analysing NGS databases, we recommend:

It can be seen that

Reproduced with permission from ref. [22].

synthesis. For further details, see Williamson et al. [38].

362 Next Generation Sequencing - Advances, Applications and Challenges

reference sequences.

haplospecific sequences.

**1.** Screening for PFB.

**9. Conclusion**

Since there are numerous ancestral haplotypes within a PFB, it is essential to compare as many

**Figure 10.** Tracing segregation through three generation families. The alleles at MRIP, now known as myosin phospha‐ tase Rho-interacting protein, are used to designate haplotypes within the 5.5 Mb region of bovine chromosome 19 from SREBF1 to TCAP. Within this region, there are many genes involved in muscle development, growth and fatty acid

**•** Only a minority of sites are informative and these must be selected from the remainder.

**•** Kilobases need to be examined and reduced 10- to 100-fold, retaining the informative sites. **•** Different haplotypes are defined by specific combinations of bases at those informative sites. **•** Very few single nucleotide polymorphisms are specific for a particular ancestral haplotype. On the contrary, specific combinations may be best defined by comparison with a library of

Thus, although the identification of each of the many haplotype remains challenging, the overall patterns of informative sites are helpful in screening for PFB and for localising

**Figure 11.** Regions of high sequence diversity within 1000 genomes are similar to previously identified PFB. Imputed haplotypes in the 600 kb region surrounding HLA-B from 553 individuals were downloaded from the 1000 Genomes browser [41]. The population groups chosen were of African, European and Asian origin (ACB, ASW, BEB, CEU, CHB and YRI). The majority of variations recorded in the 1000 Genomes vcf files are SNPs, but some indels up to 174 bp are recorded. For each imputed haplotype, we counted the number of differences from the reference sequence in 10 kb sections. Indels were counted as one difference, irrespective of length. The black curve represents the maximum differ‐ ence at each 10 kb. The red lines, taken from ref. [42], show the amount of nucleotide diversity between two individual haplotypes, counted in 100 bp sections. Haplotypes compared for this section were 44.1 to 62.1, 44.1 to 8.1 and 8.1 to 14.1. Squares show the number of LD\_link [41] "haplotypes", calculated from sets of adjacent variants in 500 bp inter‐ vals. LD link requires that variants be biallelic and only takes single nucleotide changes, not indels. Only variants with at least two examples in the CEU and YRI populations were included.


**Figure 12.** Complex iterative element. Dotplot of a 10 kb region in the MHC between MICA and MICB showing a com‐ plex iterative element. Gaudieri [42] shows high nucleotide diversity for this region which was not recorded within 1000 Genomes data. Example sequences for AH 7.1 and AH 44.1 downloaded from UCSC genome browser. Dotplot generated with Gepard [43] using word length 10.
