**5. NGS-based MHC genotyping methods in nonhuman species**

**Figure 5.** Detailed information concerning selection of allele candidates using the SeaBass computer program. (A) "Ex‐ traction of allele candidates" by Blat search. We select allele candidates that are extracted in all of the exons. (B-1) New allele detection. In this example, one allele was called B\*15:18:01, but the other allele was called B\*44:03:01 excluding the exon 3. (B-2) Confirmation of the new allele by NGS. Mapping of the sequence reads with B\*44:03:01 as a reference suggested six nucleotide differences with B\*44:03:01 were detected in exon 3. We confirmed the polymorphisms by Sanger sequencing and deposited the sequence to DDBJ and IMGT-HLA database. Now the formal allele name is

90 Next Generation Sequencing - Advances, Applications and Challenges

B\*44:184 [94].

NGS technology provides the opportunity to genotype MHC sequences either by PCR targeted DNA sequencing or by PCR targeted RNA sequencing, that is, by DNA sequencing after converting the RNA samples to cDNA by reverse transcriptase. Usually, one or other of the


**Table 3.** Evaluation of the SeaBass program

sequencing methods is chosen rather than using both methods on the same samples. In the following sections, we compare the use and limitations of targeted NGS sequencing using DNA or RNA samples for MHC genotyping of MHC class I and class II genes in nonhuman species such as the Filipino cynomolgus macaques.

#### **5.1. Advantage and disadvantage of using DNA and RNA samples for NGS**

Table 4 shows a summary of the advantages and disadvantages of using DNA and RNA samples for NGS-based MHC genotyping.


**Table 4.** Advantages and disadvantages of DNA and RNA samples for NGS-based MHC genotyping

The advantages of using DNA samples instead of RNA samples are that (1) the sampling and extraction of the DNA nucleic acids are easier and cheaper than RNA samples, (2) PCR amplification can be perform directly without an additional reaction such as the reverse transcriptase (RT) reaction, (3) design of primers in the exon and intron regions, and (4) fewer read sequences are required for DNA than RNA samples if all alleles are amplified without allelic imbalance. Although many more read sequences are necessary for RNA samples than DNA samples to genotype all the MHC alleles that have different transcription levels, the advantages of using RNA samples for genotyping are that (1) they provide an opportunity to examine MHC gene expression, (2) transcription levels are possible to be estimated for each of MHC alleles from the read sequence depth [56], and (3) only transcribed MHC genes are detected without contamination of PCR products originating from pseudogenes if the primer locations cross over to at least two homologous exons. Thus, the use of RNA samples is thought to be more effective for precise MHC genotyping on duplicated MHC genes that have high similarities among the genes. However, DNA and RNA samples have their own unique advantages and disadvantages for informative NGS-based MHC genotyping and widen the choices for experimentation and data collection.

#### **5.2. Methodology**

sequencing methods is chosen rather than using both methods on the same samples. In the following sections, we compare the use and limitations of targeted NGS sequencing using DNA or RNA samples for MHC genotyping of MHC class I and class II genes in nonhuman

**Total A C B DRB345 DRB1 DQA1 DQB1 DPA1 DPB1**

1916 250 250 242 186 239 140 234 140 235

3832 500 500 484 372 478 280 468 280 470

99.8 **100 100 100** 99.2 99.6 **100 100 100** 99.6

**Total A C B DRB345 DRB1 DQA1 DQB1 DPA1 DPB1**

498 86 80 77 50 68 4 65 4 64

996 172 160 154 100 136 8 130 8 128

**100 100 100 100 100 100 100 100 100 100**

Table 4 shows a summary of the advantages and disadvantages of using DNA and RNA

**DNA RNA**

**5.1. Advantage and disadvantage of using DNA and RNA samples for NGS**

Difficulty of sampling Easy Difficult Extraction cost of nucleic acid Cheap Expensive Preparation before PCR No RT reaction Primer location Both of exons and introns Exons only Required sequence read number Few Many Exclusion of pseudogene Difficult Easy Estimation of expression level Impossible Possible

**Table 4.** Advantages and disadvantages of DNA and RNA samples for NGS-based MHC genotyping

species such as the Filipino cynomolgus macaques.

samples for NGS-based MHC genotyping.

**Table 3.** Evaluation of the SeaBass program

**Worldwide subject (1916 loci)**

92 Next Generation Sequencing - Advances, Applications and Challenges

Locus number

Allele number

(%)

Locus number

Allele number

(%)

Accuracy rate

Accuracy rate

**Japanese subject (498 loci)**

Table 5 shows a publication list of the MHC genotyping by PCR-based NGS methods in different animal species, and it includes the MHC species name, target gene, PCR method, degree of allele data accumulation, and the allele assignment method.



**Table 5.** Publication list of MHC genotyping by PCR-based NGS methods in nonhuman species

As discussed previously, for humans, the HLA alleles obtained by next-generation sequencers are mainly assigned by mapping to known allele sequences that are used as the read references because a large number of HLA allele sequences already have been collected in the IMGT-HLA database [7] (Table 2). On the other hand, *de novo* assembly of read sequences and subcloning of PCR products identifies novel allele sequences. Of the nonhuman species, RNA samples tend to be used for MHC genotyping in experimental animals (model animals) such as macaque species and swine, whereas DNA samples are mainly used for MHC genotyping wild (nonmodel) animals because collecting RNA samples from them in their natural environment is more difficult than sampling captured or domesticated experimental animals (Table 5).

#### *5.2.1. MHC genotyping RNA samples collected from Filipino cynomolgus macaques*

MHC alleles in humans and experimental animals such as the macaque species and swine are mainly assigned by mapping methods because of the large amount of MHC allele information already available for them than for most other species. This allele information is collected and released by the IPD-MHC database [57]. When novel alleles are detected, *de novo* assembly of the read sequences and subcloning of PCR products identifies the sequences.

**Species MHC**

Alpine marmots

Avian Collared

New Zealand sea lion

flycatcher

House Sparrows

Berthelot's pipittawny pipit

New Zealand passerine

Reptile Ornate dragon lizard

**name**

94 Next Generation Sequencing - Advances, Applications and Challenges

*AnbeAnc a*

**Animal model or nonmodel type**

**Templat e**

*Mama* Nonmodel DNA Class I and

*Phho* Nonmodel DNA DRB and

**Target gene**

DRB

DQB

Great tit *Pama* Nonmodel DNA Class I 454 Poor *De novo*

Eurasian Coot *Fuat* Nonmodel DNA Class II 454 Poor *De novo*

As discussed previously, for humans, the HLA alleles obtained by next-generation sequencers are mainly assigned by mapping to known allele sequences that are used as the read references because a large number of HLA allele sequences already have been collected in the IMGT-HLA database [7] (Table 2). On the other hand, *de novo* assembly of read sequences and subcloning of PCR products identifies novel allele sequences. Of the nonhuman species, RNA samples tend to be used for MHC genotyping in experimental animals (model animals) such as macaque species and swine, whereas DNA samples are mainly used for MHC genotyping wild (nonmodel) animals because collecting RNA samples from them in their natural environment is more difficult than sampling captured or domesticated experimental animals (Table 5).

MHC alleles in humans and experimental animals such as the macaque species and swine are mainly assigned by mapping methods because of the large amount of MHC allele information

Fish Stickleback fish *Gaac* Nonmodel DNA Class II 454 Poor *De novo*

**Table 5.** Publication list of MHC genotyping by PCR-based NGS methods in nonhuman species

*5.2.1. MHC genotyping RNA samples collected from Filipino cynomolgus macaques*

*Fial* Nonmodel DNA Class II 454 Poor *De novo*

*Pado* Nonmodel DNA Class I 454 Poor *De novo*

*Peph* Nonmodel DNA Class II PGM Poor *De novo*

*Ctor* Nonmodel DNA Class I 454 Poor *De novo*

Nonmodel DNA Class II 454 Poor *De novo*

**NGS platform** **Degree of allele data accumulation**

454 Poor *De novo*

454 Poor *De novo*

**Allele assignment method**

assembly

assembly

assembly

assembly

assembly

assembly

assembly

assembly

assembly

assembly

**Ref.**

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

We identified homozygous and heterozygous cynomolgus macaques (Mafa) that have specific Mafa MHC haplotypes by genotyping the MHC of more than 5000 Filipino animals, and we found that they have a smaller number of different Mafa-class I and Mafa-class II alleles than the Indonesian and Vietnamese populations. In this section, we outline the MHC genotyping method using RNA samples and provide some results as an example of the method. Figure 7 shows a comparative genomic map of MHC regions between human and Filipino cynomol‐ gus macaque.

**Figure 7.** Comparative genomic map of the human (HLA) and the Filipino cynomolgus macaque (Mafa) Class I and Class II transcribed genes.

The MHC class I genomic region has many more Mafa-class I genes than HLA-class I genes generated by gene duplication events, whereas the organization of Mafa-class II genes are well conserved between the two species. Also, there are many Mafa-class I pseudogenes located in the Mafa-class I region. Therefore, we performed MHC genotyping by amplicon sequencing with the Roche GS Junior system using RNA samples from the Filipino cynomolgus macaques to prevent contamination of PCR products originating from the pseudogenes (Figure 8).

The workflow that we used is composed mainly of five steps: (1) RNA extraction and cDNA synthesis, (2) multiplex PCR amplification, (3) pooling of the PCR products, (4) amplicon NGS sequencing, and (5) allele assignment. In step 1, we usually extracted total RNA from the peripheral white blood cell samples using the TRIzol reagent (Invitrogen/Life Technologies/ Thermo Fisher Scientific, Carlsbad, CA) and synthesized cDNA by oligo d(T) primer using the ReverTraAce for the reverse transcriptase reaction (TOYOBO, Osaka, Japan) after treatment of the isolated RNA with DNase I (Invitrogen/Life Technologies/Thermo Fisher Scientific, Carlsbad, CA). In step 2, we designed a single Mafa-class I-specific primer set in exon 2 and exon 4 (PCR product size: 514 bp or 517 bp) that could amplify all known Mafa-class I alleles, whereas the Mafa-class II locus-specific primer sets included the polymorphic exon 2 in Mafa-DRB (420 bp), Mafa-DQA1 (435 bp), Mafa-DQB1 (396 bp), Mafa-DPA1 (407 bp), and Mafa-DPB1 (333, 336 or 339 bp) for massively parallel pyrosequencing (Figure 9).

**Figure 8.** A schematic workflow of the successive steps of the MHC genotyping method by NGS amplicon sequencing for the Filipino cynomolgus macaques.

**Figure 9.** Location of primer sites to amplify Filipino cynomolgus macaque MHC genes. Yellow boxes and blue arrows indicate polymorphic exons and PCR regions, respectively. Numbers indicate exon numbers.

In addition to these primer sets, we also designed 50 different types of fusion primers that contained the 454 titanium adaptor (A in forward and B in reverse primer), 10 bp MID (multiple identifier), and MHC-specific primers (Figure 8). Moreover, we constructed a multiplex PCR method using the primer sets by carefully optimizing primer composition and PCR conditions and by comparing the sequence read data obtained by NGS (Figure 10).

As a result of these primer designs, 51.5%, 13.6%, and 8.6–8.9% of all read sequence numbers were detected in Mafa-class I, Mafa-DRB, and the other Mafa-class II genes, respectively, and we confirmed that the genotypes obtained by the multiplex PCR method were consistent with MHC Genotyping in Human and Nonhuman Species by PCR-based Next-Generation Sequencing http://dx.doi.org/10.5772/61842 97

**Figure 10.** Ratio of read sequence numbers obtained by amplicon sequencing of multiplex PCR products.

our previous uniplex PCR methods. Therefore, the multiplex PCR method greatly simplified the procedures required in preparing the DNA samples for NGS by reducing the time of preparation and the amount and cost of reagents. In the pooling step of the PCR products, we quantified the purified PCR products by the Picogreen assay (Invitrogen) with a Fluoroskan Ascent micro-plate fluorometer (Thermo Fisher Scientific, Waltham, MA), mixed each of the PCR products at equimolar concentrations and then diluted them according to the manufac‐ ture's recommendation. In the NGS amplicon sequencing step, we perform emulsion PCR (emPCR) and emulsion-breaking according to the manufacturer's protocol (Roche, Basel, Switzerland). After the emulsion-breaking step, we enriched and counted the beads carrying the single-stranded DNA templates, and deposited them into a PicoTiterPlate to obtain the sequence reads.

A schematic workflow of the allele assignment process as a follow on from Figure 8 is shown in Figure 11.

After the sequencing run, image processing, signal correction, and base calling are performed by the GS Run Processor Ver. 3.0 (Roche) with full processing for shotgun or paired-end filter analysis. Quality-filter sequence reads that are passed by the assembler software (single sff file) are binned according to the MID labels into each separate sequence sff file using the sff file software (Roche). These files are further quality trimmed to remove poor sequence at the end of the reads with quality values (QVs) of less than 20. After separation of the trimmed and MID-labeled sequence reads in each of forward and reverse side read sequences, we inde‐ pendently detect the Mafa-class I and Mafa-class II allele candidates from both sides of the forward and reverse reads by using the BLAT program to match the trimmed and MID labeled sequence reads at 99% and 100% identity while setting the minimum overlap length at 200 and the alignment identity score parameter at 10 against all the known Mafa-class I and Mafa-class II allele sequences released in the IMGT/MHC-NHP database [58]. After the extraction of common allele candidates from both sequencing sides, we finally assign the "real alleles" by confirming nucleotide sequences of the allele candidates using the GS Reference Mapper Ver.

**Figure 9.** Location of primer sites to amplify Filipino cynomolgus macaque MHC genes. Yellow boxes and blue arrows

**Figure 8.** A schematic workflow of the successive steps of the MHC genotyping method by NGS amplicon sequencing

In addition to these primer sets, we also designed 50 different types of fusion primers that contained the 454 titanium adaptor (A in forward and B in reverse primer), 10 bp MID (multiple identifier), and MHC-specific primers (Figure 8). Moreover, we constructed a multiplex PCR method using the primer sets by carefully optimizing primer composition and PCR conditions

As a result of these primer designs, 51.5%, 13.6%, and 8.6–8.9% of all read sequence numbers were detected in Mafa-class I, Mafa-DRB, and the other Mafa-class II genes, respectively, and we confirmed that the genotypes obtained by the multiplex PCR method were consistent with

indicate polymorphic exons and PCR regions, respectively. Numbers indicate exon numbers.

for the Filipino cynomolgus macaques.

96 Next Generation Sequencing - Advances, Applications and Challenges

and by comparing the sequence read data obtained by NGS (Figure 10).

**Figure 11.** A schematic workflow of the allele assignment process using the SeaBass software.

3.0. To discover novel Mafa-class I sequences, we perform the de novo assembly set to detect >85% matches using the trimmed and MID-binned sequences after converting the outputs to ace files for the Sequencher Ver. 5.01 DNA sequence assembly software (Gene Code Co., Ann Arbor, MI). We then use the defined consensus sequence obtained from the de novo assembly as a reference sequence to identify and map the correct allele sequences. Using this process, we genotyped a set of 400 unrelated animals by the Sanger sequencing method and high resolution pyrosequencing and identified 190 different alleles, 28 at Mafa-A, 54 at Mafa-B, 12 at Mafa-I, 11 at Mafa-E, 7 at Mafa-F, 34 at Mafa-DRB, 13 at Mafa-DQA1, 13 at Mafa-DQB1, 9 at Mafa-DPA1, and 9 at Mafa-DPB1 alleles [35, 59].

On the basis of our large-scale project to genotype the MHC of 5000 Filipino cynomolgus macaques by NGS, we so far have detected 15 different types of Mafa haplotypes (HT1~HT15) in 45 homozygous animals. These Mafa homozygous animals provided the basis to efficiently estimate other Mafa haplotypes. For example, we estimated a variety of Mafa-A, Mafa-B/I, Mafa-E, and Mafa-class II (Mafa-DRB, Mafa-DQA1, Mafa-DQB1, Mafa-DPA1, and Mafa-DPB1) haplotypes by comparing the homozygous animals with heterozygous animals that carry the identical Mafa-class I and Mafa-class II alleles in the homozygous animals. In addition, we estimated the Mafa haplotypes and haplotype frequencies by the PHASE 2.1.1 program [60] using the allele data obtained by amplicon sequencing. From these procedures, we estimated a total of 84 Mafa-class I and 18 Mafa-class II haplotypes. Of the 15 different Mafa HT haplotypes, the haplotype frequencies of HT1, HT2, HT4, and HT8 were the highest. Of them, HT1 and HT8 have entirely different Mafa alleles, whereas HT2 and HT4 are thought to be recombinants of HT1 and HT8 (Figure 12).

**Figure 12.** Gene composition of representative Mafa MHC haplotypes HT1 and HT8 and their recombinants HT2 and HT4.

Namely, the Mafa-A allele in HT2 is identical to that in HT8, whereas HT2 also has alleles at other loci that are identical with those in HT1. Similarly, HT4 has alleles in Mafa-class I loci that are identical with those in HT8, and alleles in the Mafa-class II loci that are identical with those in HT1. Therefore, Mafa homozygous animals with known haplotypes such as H1 and H2 are important for biomedical research, such as the transplantation outcomes of induced pluripotent stem (iPS) cells (Figure 13) because such studies are undertaken on animals with a defined genetic background and relatively well-characterized MHC haplotypes that might regulate the adaptive immune system in different ways and efficiencies.
