*5.2.2. MHC genotyping using DNA samples of wild animals*

3.0. To discover novel Mafa-class I sequences, we perform the de novo assembly set to detect >85% matches using the trimmed and MID-binned sequences after converting the outputs to ace files for the Sequencher Ver. 5.01 DNA sequence assembly software (Gene Code Co., Ann Arbor, MI). We then use the defined consensus sequence obtained from the de novo assembly as a reference sequence to identify and map the correct allele sequences. Using this process, we genotyped a set of 400 unrelated animals by the Sanger sequencing method and high resolution pyrosequencing and identified 190 different alleles, 28 at Mafa-A, 54 at Mafa-B, 12 at Mafa-I, 11 at Mafa-E, 7 at Mafa-F, 34 at Mafa-DRB, 13 at Mafa-DQA1, 13 at Mafa-DQB1, 9

**Figure 11.** A schematic workflow of the allele assignment process using the SeaBass software.

On the basis of our large-scale project to genotype the MHC of 5000 Filipino cynomolgus macaques by NGS, we so far have detected 15 different types of Mafa haplotypes (HT1~HT15) in 45 homozygous animals. These Mafa homozygous animals provided the basis to efficiently estimate other Mafa haplotypes. For example, we estimated a variety of Mafa-A, Mafa-B/I, Mafa-E, and Mafa-class II (Mafa-DRB, Mafa-DQA1, Mafa-DQB1, Mafa-DPA1, and Mafa-DPB1) haplotypes by comparing the homozygous animals with heterozygous animals that

at Mafa-DPA1, and 9 at Mafa-DPB1 alleles [35, 59].

98 Next Generation Sequencing - Advances, Applications and Challenges

At this time in the development of MHC genotyping by NGS, it is difficult to apply the RNAsequencing mapping method to accurately genotype the MHC of wild animals using known allele sequences as references. This is because the present allele information is relatively poor for most of them (Table 5). Therefore, MHC genotyping of wild animals or poorly studied species by NGS is based on *de novo* assembly of DNA sequences. In this case, the definition of "real alleles" and "artifact alleles" is important because NGS errors such as monostretch sequences are frequently observed in the assembled consensus sequences. Some of the allele assignment approaches based on *de novo* assembly that have been published include the allele validation threshold (AVT) method [61], clustering method [62–64], and the relative sequenc‐ ing depth modeling methods [65]. These methods suppose that the contigs that have a sequence depth greater than the threshold level are the "real alleles," and they are determined by

**Figure 13.** Application of Mafa homozygous and heterozygous animals for nonclinical trials of induced pluripotent stem (iPS) cells.

statistical calculation of the threshold using the sequence depth values of all contigs obtained in *de novo* assembly. Therefore, the detection of exact or "real" alleles depends largely on the setting of the threshold level and the quality of the sequence reads [65]. To enable the correct setting of the threshold level, it is important to use primers that can amplify all alleles of the target locus or loci without allelic imbalance. Furthermore, additional considerations such as repeating independent NGS experiments at least three times and detecting identical allele sequences in at least two animals are necessary to distinguish between real and artifactual alleles.
