**1.1. Short introduction into HLA nomenclature**

be associated with many traits and diseases [2]. Therefore, HLA genotyping from large datasets

The HLA genes are codominant, both alleles in the two chromosomes are expressed, and are exceptionally polymorphic in their exons involved in antigen recognition (exon 2 and 3 for Class-I and exon 2 for Class-II loci). These peptide-binding highly variable regions are in the focus of HLA typing; there are 13,412 allele sequences in the IMGT/HLA reference database at the time of writing this article [3], compared to the 1250+ alleles known in 2002 [4]. This polymorphy, together with the high homology of these loci, makes the classical variant-call NGS pipelines impractical: it is not the individual SNPs or indels, but whole exon or whole

gene sequences identifying alleles that have to be found by NGS-based HLA typing.

there is only one signal, the base is treated unequivocally either A, C, G, or T.

Sequence-based HLA typing (SBT) is relatively new, there are established methods to identify unique sequence patterns of HLA loci by sequence-specific oligonucleotides [5]. These methods are less precise though, it is not possible to obtain the whole sequence of an allele by using probes either. Furthermore, as SBT focuses primarily on the previously mentioned important exons, the phasing problem known from whole-genome assembly can be the main source of ambiguity. During phasing the individual base differences are assigned unambigu‐ ously to one of the chromosomes. Fortunately, phasing short reads is easier when the two alleles differ at many positions, making NGS-based HLA typing attractive. Unlike Sanger traces, the signal from the two chromosomes can be separated reassuringly as for each base

**Figure 1.** The figure illustrates how overlapping short reads can be used to phase exon 2 and exon 3 of HLA-A using the variants present in intron 2. Forward reads are colored pink/orange, reverse orientation is yellow. Colored bars in reads are depicting nucleotide differences from the reference, the reference track is gray at homozygous positions, only heterozygous bases are colored (A: red, C: blue, G: brown, T: green). Reads highlighted with black and yellow dashes show how step-by-step phasing can happen using the reads overlapping the consecutive heterozygous positions. Since all four marked reads overlap at the heterozygous position near the middle of intron 2, it is unambiguous which read belongs to which chromosome. Therefore, the phase between the heterozygous positions in exon 2 and exon 3 can be resolved too. Note that in practice phase resolution happens by considering large number of short reads for reliability.

Alignment was created by the Omixon HLA Twin 1.1 software.

and finding further associations is an ever ongoing effort.

370 Next Generation Sequencing - Advances, Applications and Challenges

The name of an HLA allele reflects the precision of the DNA sequence determining the actual allele. There are four fields separated by colons after the locus name and a star sign:


The ultimate source of HLA nomenclature is at [7] maintained by the Anthony Nolan Research Institute. The most up-to-date HLA reference database can be downloaded from [8].
