Preface

Chapter 7 **Functional Implications of MHC Associations in Autoimmune**

Chapter 8 **HLA in Gastrointestinal Inflammatory Disorders 223** M.I. Torres, T. Palomeque and P. Lorite

Chapter 9 **Association Between HLA Gene Polymorphism and**

**Section 3 HLA-Associated Important Infectious Diseases 257**

Chapter 11 **Association Between HLA Gene Polymorphism And The Genetic Susceptibility Of HIV Infection 301**

**Susceptibility of SARS Infection 311**

Chapter 12 **Association Between HLA Gene Polymorphism and the Genetic**

Chapter 13 **Influence of Human Leukocyte Antigen on Susceptibility of Tropical Pulmonary Infectious Diseases and Clinical**

**Antiepileptic Drugs-Induced Cutaneous Adverse**

**Hypoparathyroidism 201** Rajni Rani and Archana Singh

Yuying Sun and Yongzhi Xi

Chapter 10 **HLA and Infectious Diseases 259**

Fang Yuan and Yongzhi Xi

Yuying Sun and Yongzhi Xi

Attapon Cheepsattayakorn

**Implications 323**

**Reactions 247**

**VI** Contents

**Diseases with Special Reference to Type1 Diabetes, Vitiligo and**

Daniela Maira Cardozo, Amanda Vansan Marangon, Ana Maria Sell, Jeane Eliete Laguila Visentainer and Carmino Antonio de Souza

This year marks the 60th anniversary (1954) of the discovery of the human major histocom‐ patibility complex (MHC), or the human leukocyte antigen (HLA) system, by the French Nobel laureate physician Jean Dausset, as well as the 55th anniversary (1958) of the identifi‐ cation and naming of the first human leukocyte antigen, MAC (equivalent to the HLA-A2 antigen). Sixty years ago, Dausset first discovered that sera from patients with leukopenia or from patients who had received multiple blood transfusions were capable of agglutinating 56–100% of leukocytes from normal donors. The discovery preluded a massive international collaboration for investigation of the HLA system. Furthermore, also in 1958, independent studies by Rose Payne and Jon van Rood found that the sera from some parous women con‐ tained leukocyte antibodies. A subsequent study by van Rood used a computer-assisted statistical method to establish an HLA antisera cluster analysis method and identified the HLA-Bw4/6 antigen. In addition, in the early- and mid-1960s, a series of major break‐ throughs by scientists from many international collaborative laboratories, combined with the discoveries outlined above, confirmed that humans and mice have the same major histo‐ compatibility complex system, called HLA and H-2, respectively, thus providing a perfect ending to the journey of HLA discovery. The entire history of HLA research involves great international and inter-disciplinary collaborations. World experts in HLA research collabo‐ rated closely; shared experience, discoveries, and valuable antisera; and unified experimen‐ tal techniques and nomenclatures. Through frequent conferences, laboratories in different countries formed massive international collaborations and thereby greatly accelerated the discovery of the HLA system and progress in the field.

It should be emphasized that although human HLA and mouse H-2 belong to the MHC sys‐ tem, the discoveries of these two systems did not have a consequential or causal relation‐ ship. This is because the study of the mouse H-2 system relied mainly on serological analysis of mouse red blood cells, whereas the discovery of the human HLA system was based on the finding that sera from leukopenia patients contained leukoagglutinin. Just as Dausset described in *"The HLA Adventure"* in 1990, "It would not, however, be altogether accurate to claim that the HLA pioneers were stimulated by the H-2 model to seek a coun‐ terpart in man. In actually, the effort to study men and mice progressed quite independently of each other for many years."

HLA is by far the most complex and polymorphic human gene system ever discovered. Since the first official naming by HLA Nomenclature Committee of the World Health Or‐ ganization in 1968 and with the rapid advances in molecular biology techniques, many new HLA and HLA-related alleles have been discovered each year. As of January 2014, more than 10,000 HLA and HLA-related alleles had been identified. The currently recognized and confirmed numerous biomedical functions of HLA contrast sharply with the original view that HLA serve only as major transplantation antigens involved in the transplant rejection response and are relevant only to immunogenetics. It has been proved that the characteris‐ tics of MHC systems in various species are similar to those of the human HLA system, in‐ cluding the following: (1) the antigens are widely distributed on the surfaces of lymphocytes and other nucleated cells, are closely related to allogeneic and xenogeneic transplantation rejections, and are the major transplantation antigens stimulating the mixed lymphocyte re‐ action and graft-versus-host reactions; (2) the system is directly involved in the processing of endogenous and exogenous antigens by antigen-presenting cells (APCs), a key factor de‐ termining whether specific immune responses can be induced; (3) in the specific recognition by T-cell receptors (TCRs) of peptide antigens presented by APCs, TCRs need to recognize the MHC/HLA molecules associated with the peptide antigen simultaneously in order to generate a signal that triggers T-cell activation and consequently controls the immune re‐ sponse of an organism to an antigen, as well as the interactions between immunocompetent cells; (4) the system encodes some components of the complement system; and (5) the fre‐ quency of certain antigens/alleles is related to many human diseases involving all systems in the body, manifested as either susceptibility or antagonism.

medicine. It is no exaggeration to say that HLA has become one of the hottest and most ac‐ tive fields in modern basic medicine, clinical medicine, and other scientific disciplines.

Preface IX

Under such circumstances, both basic HLA research and its clinical application need a new monograph that comprehensively reflects the latest achievements in the field (although monographs covering achievements at different stages of HLA research have been publish‐ ed earlier). Therefore, in early 2013, InTech Publisher, with great vision and a sense of re‐ sponsibility, started work on publishing a book about new achievements in HLA research. Upon kind invitation from the former Publishing Process Manager Ana Pantar, Professor Xi gladly agreed to serve as Editor of the book. Thus, Professor Xi was fortunate to contribute both as an author and as Editor, to organize international experts in the areas of HLA-relat‐ ed basic research and clinical application, to unite their knowledge in chapters covering var‐ ious related topics, and finally to finish the book "HLA and Associated Important Diseases"

The book consists of three sections: The first section comprises Chapters 1–5, which mainly cover basic theoretical and technological developments and independently describe HLA data and statistical analysis, HLA class I polymorphism and Tapasin dependency, HLA-E, - F and -G the non-classical side of the MHC cluster and HLA matching strategy. The second section, Chapters 6–9, introduces research progress pertaining to several important HLA-as‐ sociated autoimmune diseases. The third section, Chapters 10–13, outlines progress on sev‐ eral important HLA-associated infectious diseases. The contents of this book are relatively comprehensive and novel, covering HLA-related basic theory, advanced matching techni‐ ques, and important HLA-associated autoimmune and infectious diseases. Thus, this book is ideal for experts, clinicians, technicians, and graduate students working in HLA-related re‐ search and clinical areas. I hope this book will become a valuable reference for basic and

Of note, the progress of HLA research could be described as "dazzling," that is, information and knowledge in this field have accumulated rapidly. The moment you put pen to paper, new studies and data emerge. Thus, it is always difficult to present an up-to-date and accu‐ rate account of the latest developments in this field. Therefore, this book can only describe the current state of the field and introduce developments in several main areas of HLA re‐ search. Some omissions are inevitable. Fortunately, since the beginning, this book has re‐ ceived strong support and positive responses from many experts in the field, which has helped establish this book as an authoritative resource. In addition, the authors of this book were scrupulous in their writing, which guarantees that the content is of good scientific quality and is novel, up-to-date, and practical. Necessarily, the chapters have some overlap in content, different writing styles, and different scientific points of view. However, in order to ensure the integrity and continuity of each chapter, as Editor of the book, with full respect

This book was written with tireless dedication by many experts in the HLA field, despite their busy work schedules. Their efforts contributed to the book's smooth publication. Thus, we need to take this opportunity to first express a heartfelt thanks and deep respect to them. We also want to thank former researchers for their significant contributions to the HLA field, as well as colleagues who are still on the front lines of HLA research. Without their

for the authors, decided not to delete or modify content.

which is dedicated to its readers.

clinical research.

Ever since the discovery of HLA, scientists have been studying the role of HLA in the rela‐ tionship between organ matching and organ transplant survival. Today, HLA is considered the most important factor affecting organ transplant survival rates. At the same time, scien‐ tists are also very interested in the relationship between HLA and disease. In 1973, Paul Ter‐ asaki and colleagues first reported that the HLA-B27 antigen is strongly associated with ankylosing spondylitis, a finding subsequently confirmed by many scientists in several countries. This discovery sparked extensive research on HLA antigen-associated diseases worldwide, which peaked in the 1970s and 1980s. Thus far, research papers on HLA-related diseases number over 10,000 and pertain to over 500 diseases, of which autoimmune diseas‐ es, infectious diseases, and cancer are the most prominent.

Compared with the discovery of the human ABO red-blood cell group system by Austrian genetic pathologist Karl Landsteiner, the history of MHC research in various species is short, and the history of human HLA exploration is even shorter. However, the achieve‐ ments within such a short period are remarkable. We can say that the discovery of HLA is not only an adventure in the history of scientific discovery but also a tale of the unremitting persistence of scientists and an epic of great scientific discoveries. Other scientific discover‐ ies generally result from a scientific hypothesis derived from a scientific theory, which, with scientific innovation and investigation, often change a specific discipline, but rarely revolu‐ tionize multiple disciplines, much less science as a whole; however, the discovery and con‐ firmation of HLA were entirely different. In addition to elucidating the gene structure, protein structure, and various biological functions of the HLA system, HLA research has shaped research in modern basic medicine, clinical medicine, and human sociology. It not only brought revolutionary changes to basic sciences, such as biology, immunology, genet‐ ics, genomics, and human sociology, but also brought unprecedented breakthroughs to many disciplines in clinical medicine, such as organ transplantation, oncology, transfusion science, forensic science, laboratory science, reproductive science, vaccinology, and internal medicine. It is no exaggeration to say that HLA has become one of the hottest and most ac‐ tive fields in modern basic medicine, clinical medicine, and other scientific disciplines.

than 10,000 HLA and HLA-related alleles had been identified. The currently recognized and confirmed numerous biomedical functions of HLA contrast sharply with the original view that HLA serve only as major transplantation antigens involved in the transplant rejection response and are relevant only to immunogenetics. It has been proved that the characteris‐ tics of MHC systems in various species are similar to those of the human HLA system, in‐ cluding the following: (1) the antigens are widely distributed on the surfaces of lymphocytes and other nucleated cells, are closely related to allogeneic and xenogeneic transplantation rejections, and are the major transplantation antigens stimulating the mixed lymphocyte re‐ action and graft-versus-host reactions; (2) the system is directly involved in the processing of endogenous and exogenous antigens by antigen-presenting cells (APCs), a key factor de‐ termining whether specific immune responses can be induced; (3) in the specific recognition by T-cell receptors (TCRs) of peptide antigens presented by APCs, TCRs need to recognize the MHC/HLA molecules associated with the peptide antigen simultaneously in order to generate a signal that triggers T-cell activation and consequently controls the immune re‐ sponse of an organism to an antigen, as well as the interactions between immunocompetent cells; (4) the system encodes some components of the complement system; and (5) the fre‐ quency of certain antigens/alleles is related to many human diseases involving all systems in

Ever since the discovery of HLA, scientists have been studying the role of HLA in the rela‐ tionship between organ matching and organ transplant survival. Today, HLA is considered the most important factor affecting organ transplant survival rates. At the same time, scien‐ tists are also very interested in the relationship between HLA and disease. In 1973, Paul Ter‐ asaki and colleagues first reported that the HLA-B27 antigen is strongly associated with ankylosing spondylitis, a finding subsequently confirmed by many scientists in several countries. This discovery sparked extensive research on HLA antigen-associated diseases worldwide, which peaked in the 1970s and 1980s. Thus far, research papers on HLA-related diseases number over 10,000 and pertain to over 500 diseases, of which autoimmune diseas‐

Compared with the discovery of the human ABO red-blood cell group system by Austrian genetic pathologist Karl Landsteiner, the history of MHC research in various species is short, and the history of human HLA exploration is even shorter. However, the achieve‐ ments within such a short period are remarkable. We can say that the discovery of HLA is not only an adventure in the history of scientific discovery but also a tale of the unremitting persistence of scientists and an epic of great scientific discoveries. Other scientific discover‐ ies generally result from a scientific hypothesis derived from a scientific theory, which, with scientific innovation and investigation, often change a specific discipline, but rarely revolu‐ tionize multiple disciplines, much less science as a whole; however, the discovery and con‐ firmation of HLA were entirely different. In addition to elucidating the gene structure, protein structure, and various biological functions of the HLA system, HLA research has shaped research in modern basic medicine, clinical medicine, and human sociology. It not only brought revolutionary changes to basic sciences, such as biology, immunology, genet‐ ics, genomics, and human sociology, but also brought unprecedented breakthroughs to many disciplines in clinical medicine, such as organ transplantation, oncology, transfusion science, forensic science, laboratory science, reproductive science, vaccinology, and internal

the body, manifested as either susceptibility or antagonism.

VIII Preface

es, infectious diseases, and cancer are the most prominent.

Under such circumstances, both basic HLA research and its clinical application need a new monograph that comprehensively reflects the latest achievements in the field (although monographs covering achievements at different stages of HLA research have been publish‐ ed earlier). Therefore, in early 2013, InTech Publisher, with great vision and a sense of re‐ sponsibility, started work on publishing a book about new achievements in HLA research. Upon kind invitation from the former Publishing Process Manager Ana Pantar, Professor Xi gladly agreed to serve as Editor of the book. Thus, Professor Xi was fortunate to contribute both as an author and as Editor, to organize international experts in the areas of HLA-relat‐ ed basic research and clinical application, to unite their knowledge in chapters covering var‐ ious related topics, and finally to finish the book "HLA and Associated Important Diseases" which is dedicated to its readers.

The book consists of three sections: The first section comprises Chapters 1–5, which mainly cover basic theoretical and technological developments and independently describe HLA data and statistical analysis, HLA class I polymorphism and Tapasin dependency, HLA-E, - F and -G the non-classical side of the MHC cluster and HLA matching strategy. The second section, Chapters 6–9, introduces research progress pertaining to several important HLA-as‐ sociated autoimmune diseases. The third section, Chapters 10–13, outlines progress on sev‐ eral important HLA-associated infectious diseases. The contents of this book are relatively comprehensive and novel, covering HLA-related basic theory, advanced matching techni‐ ques, and important HLA-associated autoimmune and infectious diseases. Thus, this book is ideal for experts, clinicians, technicians, and graduate students working in HLA-related re‐ search and clinical areas. I hope this book will become a valuable reference for basic and clinical research.

Of note, the progress of HLA research could be described as "dazzling," that is, information and knowledge in this field have accumulated rapidly. The moment you put pen to paper, new studies and data emerge. Thus, it is always difficult to present an up-to-date and accu‐ rate account of the latest developments in this field. Therefore, this book can only describe the current state of the field and introduce developments in several main areas of HLA re‐ search. Some omissions are inevitable. Fortunately, since the beginning, this book has re‐ ceived strong support and positive responses from many experts in the field, which has helped establish this book as an authoritative resource. In addition, the authors of this book were scrupulous in their writing, which guarantees that the content is of good scientific quality and is novel, up-to-date, and practical. Necessarily, the chapters have some overlap in content, different writing styles, and different scientific points of view. However, in order to ensure the integrity and continuity of each chapter, as Editor of the book, with full respect for the authors, decided not to delete or modify content.

This book was written with tireless dedication by many experts in the HLA field, despite their busy work schedules. Their efforts contributed to the book's smooth publication. Thus, we need to take this opportunity to first express a heartfelt thanks and deep respect to them. We also want to thank former researchers for their significant contributions to the HLA field, as well as colleagues who are still on the front lines of HLA research. Without their contributions, HLA research would not be so brilliantly significant today. Lastly, we also want to thank the former Publishing Process Manager, Ana Pantar, the current Publishing Process Manager, Iva Lipovic, and the staff at InTech Publisher for their valuable contribu‐ tions to the editing and publication of this book.

#### **Xi Yongzhi, M.D**

**Section 1**

**Basic Aspects of Human Leukocyte Antigens**

Department of Immunology and National Center for Biomedicine Analysis, Beijing 307 Hospital Affiliated to Academy of Medical Sciences, Beijing, P.R.China

#### **He Fuchu, Ph.D**

Academician & Professor of Genetics and Cell Biology Academy of Military Medical Sciences, P.R.China **Basic Aspects of Human Leukocyte Antigens**

contributions, HLA research would not be so brilliantly significant today. Lastly, we also want to thank the former Publishing Process Manager, Ana Pantar, the current Publishing Process Manager, Iva Lipovic, and the staff at InTech Publisher for their valuable contribu‐

Department of Immunology and National Center for Biomedicine Analysis,

Beijing 307 Hospital Affiliated to Academy of Medical Sciences,

Academician & Professor of Genetics and Cell Biology Academy of Military Medical Sciences, P.R.China

**Xi Yongzhi, M.D**

Beijing, P.R.China

**He Fuchu, Ph.D**

tions to the editing and publication of this book.

X Preface

**Chapter 1**

**Statistic and Analytical Strategies for HLA Data**

To date, the HLA system is the most complex and polymorphic human gene system identified. Although the research history of HLA is not very long, we have made rapid advancements in our understanding of the HLA system during this short time. Research in the HLA field involves elucidating the structure and various biological functions of genes and proteins associated with the HLA system; in addition, it can be directly applied in the study of basic medicine, clinical medicine, anthroposociology, and other fields. HLA research has led to not only revolutionary reforms in basic medical disciplines, such as biology, immunology, heredity, genetics, and anthroposociology, but also unprecedented breakthroughs in many clinical medicine specialties, including organ transplants, oncology, transfusion science, forensic medicine, ecsomatics, genesiology, and vaccination, as well as in disease-related fields of internal medicine. Therefore, it is critical to organize and process HLA study data using

Undoubtedly, the proper use of statistics can directly affect the scientific nature, truth, and objectivity of HLA-related studies. Moreover, in addition to the principles and methods of biomedical statistics commonly used in other life sciences, the statistical analysis of HLA study data has its own specific requirements, which integrate the theories and methods of modern bioinformatics. Bioinformatics is a significant research frontier in biomedical statistics and an important field of biomedical research, expanding from macrocosm to microcosm. It integrates numerous methods of biotechnology, computer technology, mathematics, and statistics and is gradually becoming a major discipline yielding discoveries of the secrets of biology, thereby playing an irreplaceable role in organizing and processing relative HLA study data. However, these methods are not within the scope of basic statistical and analytical strategies used for evaluating HLA study data. Thus, due to the limited space and contents of this book, this Chapter will not discuss them. If appropriate, we will describe these methods in a specific

chapter of a new monograph about the progress of HLA basic research in the future.

© 2014 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Fang Yuan and Yongzhi Xi

http://dx.doi.org/10.5772/57493

appropriate statistical analysis.

**1. Introduction**

Additional information is available at the end of the chapter

## **Statistic and Analytical Strategies for HLA Data**

Fang Yuan and Yongzhi Xi

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/57493

### **1. Introduction**

To date, the HLA system is the most complex and polymorphic human gene system identified. Although the research history of HLA is not very long, we have made rapid advancements in our understanding of the HLA system during this short time. Research in the HLA field involves elucidating the structure and various biological functions of genes and proteins associated with the HLA system; in addition, it can be directly applied in the study of basic medicine, clinical medicine, anthroposociology, and other fields. HLA research has led to not only revolutionary reforms in basic medical disciplines, such as biology, immunology, heredity, genetics, and anthroposociology, but also unprecedented breakthroughs in many clinical medicine specialties, including organ transplants, oncology, transfusion science, forensic medicine, ecsomatics, genesiology, and vaccination, as well as in disease-related fields of internal medicine. Therefore, it is critical to organize and process HLA study data using appropriate statistical analysis.

Undoubtedly, the proper use of statistics can directly affect the scientific nature, truth, and objectivity of HLA-related studies. Moreover, in addition to the principles and methods of biomedical statistics commonly used in other life sciences, the statistical analysis of HLA study data has its own specific requirements, which integrate the theories and methods of modern bioinformatics. Bioinformatics is a significant research frontier in biomedical statistics and an important field of biomedical research, expanding from macrocosm to microcosm. It integrates numerous methods of biotechnology, computer technology, mathematics, and statistics and is gradually becoming a major discipline yielding discoveries of the secrets of biology, thereby playing an irreplaceable role in organizing and processing relative HLA study data. However, these methods are not within the scope of basic statistical and analytical strategies used for evaluating HLA study data. Thus, due to the limited space and contents of this book, this Chapter will not discuss them. If appropriate, we will describe these methods in a specific chapter of a new monograph about the progress of HLA basic research in the future.

© 2014 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **2. Basic concepts of HLA genetic statistics**

### **2.1. Genetics basis for statistical analysis of HLA data**

Hardy-Weinberg law: The Hardy-Weinberg law is also referred to as the hereditary equilibri‐ um law or genetic equilibrium law. The basis of the Hardy-Weinberg law is as follows: in an infinite, randomly mating group, when there is no migration, mutation, selection, or genetic drift, the genotype frequency and gene frequency at a locus in the group will remain un‐ changed generation by generation, achieving a genetic equilibrium state, known as the Hardy-Weinberg equilibrium. This law was proposed by G.H. Hardy, a British mathematician, and W. Weinberg, a German medical scientist, in 1908.

Assume that there is an autosomal locus, in brief, alleles A and A'. If the frequencies of genes

The investigated genes are in autosomes and are unrelated to genotypes; therefore, the frequencies of the three genotypes are identical in male and female progeny. Assume that the frequencies of the three genotypes AA, AA', and A'A' are *P*, *Q*, and *R*, respectively. From the

If we assume that the frequencies of genes A and A' in progeny are *p* and *q*, respectively, then

That is to say, when gene frequencies are different between males and females, they will be averaged in the next generation and thus become equal in both sexes. Therefore, when mating is completely random, and selection, mutation, and migration are absent, the gene frequencies and the frequencies of the three genotypes will maintain unchanged generation by generation.

By generalizing the results above, if we assume that the frequencies of n alleles "A1, A2…An"

This is the presentation formula of the Hardy-Weinberg equilibrium. From this formula, we can see that the frequency of homozygotes AA or A'A' is equal to the square of the gene

*)=1/2pm+1/2pf*

and A' are *pm* and *qm*, respectively, and ovum frequencies with genes A and A' are *pf*

*+qf*

) AA(*pm\*pf*

**Table 1.** Genotype frequencies of progeny generated by random combinations of sperm and ovum

A'(*qm*) AA'(*pm\*qf*

and *qf* in females, then sperm frequencies with genes A

**Sperm** A(*pm*) A'(*qm*)

; similarly, *q=1/2qm+1/2qf*

*<sup>n</sup> pi* =1), and it may be proved that the genotype frequency

.

=1. If mating is completely random, the genotype

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

*)* AA'(*qm\*pf*

*)* A'A'(*qm\*qf*

 and *qf* , 5

)

)

A and A' are *pm* and *qm* in males and *pf*

respectively. Obviously, *pm+qm*=1 and *pf*

A(*pf*

*+1/2(qm\* pf*

then the genotype frequency series in progeny is:

*A*'*A*' )

in a group are *p1*, *p2* …*pn*, then (∑*i*=1

series in progeny can be expressed as

*+ pm\* qf*

If the frequency series of genes A and A' in gamete is expressed as:

Ovum

*P* = *pm* × *pf*

*R* = *qm* × *qf*

(*pA* + *qA*')

(*p* <sup>2</sup> *AA*

table above, we can obtain:

*Q* = *qm* × *pf* + *pm* × *qf*

*p+q*=1, *p=P+1/2Q=pm\* pf*

<sup>+</sup> <sup>2</sup>*pqAA*' <sup>+</sup> *<sup>q</sup>* <sup>2</sup>

(*p*1*A*<sup>1</sup> + *p*1*A*<sup>1</sup> + … … + *pnAn*)<sup>2</sup>

frequency of the next generation will be as shown in Table 1.

The factors that influence the Hardy-Weinberg equilibrium are as follows:


Generally, the circumstances meeting the criteria of ideal populations do not exist in practical applications. However, the Hardy-Weinberg equilibrium is still the basis for studies of gene distribution because it is impossible to model all of the factors influencing the investigated group, and various factors can counteract each other (e.g., mutation and selection).

Now we will explain this concept with an example.

Assume that there is an autosomal locus, in brief, alleles A and A'. If the frequencies of genes A and A' are *pm* and *qm* in males and *pf* and *qf* in females, then sperm frequencies with genes A and A' are *pm* and *qm*, respectively, and ovum frequencies with genes A and A' are *pf* and *qf* , respectively. Obviously, *pm+qm*=1 and *pf +qf* =1. If mating is completely random, the genotype frequency of the next generation will be as shown in Table 1.


**Table 1.** Genotype frequencies of progeny generated by random combinations of sperm and ovum

The investigated genes are in autosomes and are unrelated to genotypes; therefore, the frequencies of the three genotypes are identical in male and female progeny. Assume that the frequencies of the three genotypes AA, AA', and A'A' are *P*, *Q*, and *R*, respectively. From the table above, we can obtain:

$$\begin{aligned} P &= p\_m \times p\_f \\ Q &= q\_m \times p\_f + p\_m \times q\_f \\ R &= q\_m \times q\_f \end{aligned}$$

**2. Basic concepts of HLA genetic statistics**

4 HLA and Associated Important Diseases

**2.1. Genetics basis for statistical analysis of HLA data**

W. Weinberg, a German medical scientist, in 1908.

group is referred to as genetic drift.

Now we will explain this concept with an example.

group.

The factors that influence the Hardy-Weinberg equilibrium are as follows:

demonstrating that the frequency of natural mutation is very low.

reproduction in comparison to homozygous normal individuals.

Hardy-Weinberg law: The Hardy-Weinberg law is also referred to as the hereditary equilibri‐ um law or genetic equilibrium law. The basis of the Hardy-Weinberg law is as follows: in an infinite, randomly mating group, when there is no migration, mutation, selection, or genetic drift, the genotype frequency and gene frequency at a locus in the group will remain un‐ changed generation by generation, achieving a genetic equilibrium state, known as the Hardy-Weinberg equilibrium. This law was proposed by G.H. Hardy, a British mathematician, and

**1.** Mutation: Under natural conditions, the rate of gene mutation caused by the reparation effects of DNA replicase is 1×10-6–10-8/gamete/locus/generation in higher animals,

**2.** Selection: a) Reproductive fitness: This is a measure of the ability of providing genes for progeny, i.e., the relative capability of a certain genotype to survive and produce progeny in comparison with other genotypes; in HLA studies, the normal fitness 1 is often used as a reference. b) Heterozygote dominance: In some recessive hereditary diseases and under certain conditions, the heterozygote may be more favorable to survival and progeny

**3.** Random genetic drift: The random fluctuation of gene frequency in a small or separated

**4.** Migration: Gene frequencies may vary among individuals of different races and nation‐ alities. Migration makes different populations intermate, and foreign genes are mutually introduced, which leads to gene flow and thus alters the gene frequency of the original

**5.** Genetic heterogeneity: Individuals with consistent phenotypes or identical clinical symptoms of a specific type of disease may have different genotypes. If they are not strictly

**6.** Founder effect: This is a form of genetic drift and refers to a new group established by minor individuals with some alleles of the parent group. The population size of this new group may increase later; however, its gene variance is very small because there is no mating or proliferation between this group and other biological groups. This situation generally occurs in an isolated island or a self-enclosed, newly established village. Generally, the circumstances meeting the criteria of ideal populations do not exist in practical applications. However, the Hardy-Weinberg equilibrium is still the basis for studies of gene distribution because it is impossible to model all of the factors influencing the investigated

distinguished, the Hardy-Weinberg equilibrium will likely become complex.

group, and various factors can counteract each other (e.g., mutation and selection).

If we assume that the frequencies of genes A and A' in progeny are *p* and *q*, respectively, then *p+q*=1, *p=P+1/2Q=pm\* pf +1/2(qm\* pf + pm\* qf )=1/2pm+1/2pf* ; similarly, *q=1/2qm+1/2qf* .

That is to say, when gene frequencies are different between males and females, they will be averaged in the next generation and thus become equal in both sexes. Therefore, when mating is completely random, and selection, mutation, and migration are absent, the gene frequencies and the frequencies of the three genotypes will maintain unchanged generation by generation. If the frequency series of genes A and A' in gamete is expressed as:

$$(p\_A + q\_A \text{'})$$

then the genotype frequency series in progeny is:

$$\left(p\,^2\_{\,AA} + 2pq\_{AA'} + \left.q\right|^2\_{\,A'A'}\right)$$

By generalizing the results above, if we assume that the frequencies of n alleles "A1, A2…An" in a group are *p1*, *p2* …*pn*, then (∑*i*=1 *<sup>n</sup> pi* =1), and it may be proved that the genotype frequency series in progeny can be expressed as

$$(p\_{1A1} + p\_{1A1} + \dots + p\_{nAn})^2$$

This is the presentation formula of the Hardy-Weinberg equilibrium. From this formula, we can see that the frequency of homozygotes AA or A'A' is equal to the square of the gene frequency, while the heterozygote frequency is twice the product of the corresponding two gene frequencies. We will explain this concept using ABO blood groups as an example. ABO blood groups are known to be controlled by three alleles A, B, and O, found at the same locus. We can assume that the gene frequencies are p, q, and r, respectively. According to the presentation formula of the Hardy-Weinberg equilibrium, various genotype frequencies of ABO blood groups are expressed with the expansion equation of *(pA+pB+pO)2* . See the following table.

priate sample size fully reflects the repeatability rule in statistical analysis. Now, we will discuss how to determine the sample size in several common cases of HLA statistical analysis.

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

7

For example, if we want to understand the distribution of HLA-B\*27 in healthy residents of a certain region and the frequency of the HLA-B\*27 gene in patients with ankylosing arthritis, how many individuals should be included in the sample? According to the principle of the hypothesis test, if the sample size is too small, then the pre-existing differences cannot be shown; thus, it is hard to obtain correct study results, and the conclusion lacks sufficient basis. Conversely, an oversized sample can increase the practical difficulties of such analyses and unnecessarily waste labor, materials, financial resources, and time; in addition, sample excess may cause inadequate investment and decrease quality control during the scientific research

When determining the sample size, the first thing to do is to define the test level or significance level "α", i.e., specifying in advance the allowable probability (α) of false-positive errors in this test (generally, α=0.05); additionally, you should decide whether a one-sided test or two-

The test power should also be defined. The higher the test power is, the larger the sample size must be. The test power is determined by the probability of type-II errors (*β*). In the design of scientific studies, the test power should be not lower than 0.75; otherwise, it is possible that the test results will not reflect true differences in the population, thereby yielding false-

In this example, the population represents healthy residents in a certain region, and the individuals investigated in the study constitute the sample. The frequency of gene HLA-B\*27 in the population is presumed from the distribution proportion of HLA-B\*27 in the sample. If we assume that the distribution frequency of gene HLA-B\*27 is *P*, then the minimal sample (*n*)

**1.** Determination of sample size when estimating population parameters

process, thereby introducing potential interference with the study results.

sided test will be used. The smaller α is, the larger the sample size must be.

meeting the statistical conditions is calculated with the following formula:

In the formulas above, *u* indicates "*u* distribution", and *δ* indicates permissible error.

In this example, we want to investigate the frequency of HLA-B\*27 in healthy residents of a certain region. We assume that *P* in the previous investigation is 10%, the permissible error of

negative results.

When *P* is close to 0.5:

When *P* is close to 0 or 1:

sin−1(*δ* / *P*(1−*P*))

*<sup>δ</sup>* ) 2

*<sup>n</sup>* <sup>=</sup> 57.3*u*1−*α*/2

*<sup>n</sup>* =0.25( *<sup>u</sup>*1−*α*/2

When *P* is unknown:

*P*(1−*P*)

2

*<sup>n</sup>* =( *<sup>u</sup>*1−*α*/2 *<sup>δ</sup>* ) 2


**Table 2.** Genotype frequency of ABO blood groups

#### **2.2. Statistical basis of HLA data analysis**

#### *2.2.1. Population and sample*

The study subjects of HLA statistical data analysis are mostly specific groups, such as indi‐ viduals with a disease, of the same race, or from the same region, etc. However, due to the limitations of the study method, it is usually impossible to investigate every individual in the group, and the features of the whole group can only be presumed by analyzing some indi‐ viduals of the group. Thus, two concepts should be defined, i.e., population and sample. The core issue of statistical data analysis is how to deduce the population from a sample.

Population refers to all subjects in a study. The population can also be divided into the infinite population and the limited population. For example, we want to investigate the distribution of a certain HLA phenotype in Asian individuals; because it is difficult to estimate the total number of Asian individuals, we can assume that this population is infinite. Alternatively, if we want to study the recombination characteristics of the HLA system in a specific family, this population is limited. In HLA data analysis, most populations are infinite. Every member constituting the population is referred to as an individual.

A sample is a part of the population, and the number of individuals contained in a sample is the sample size. The core issue of statistical data analysis is that we presume the characteristics of a population from a sample. In order to accurately estimate the population parameters, an appropriate sample size is the foundation of data analysis.

Many factors need to be considered when determining the sample size, such as study objec‐ tives, precision, degree of confidence, reliability of statistical testing, sampling method, basic information of the population, study protocol, and study funds. Determination of the appro‐ priate sample size fully reflects the repeatability rule in statistical analysis. Now, we will discuss how to determine the sample size in several common cases of HLA statistical analysis.

**1.** Determination of sample size when estimating population parameters

For example, if we want to understand the distribution of HLA-B\*27 in healthy residents of a certain region and the frequency of the HLA-B\*27 gene in patients with ankylosing arthritis, how many individuals should be included in the sample? According to the principle of the hypothesis test, if the sample size is too small, then the pre-existing differences cannot be shown; thus, it is hard to obtain correct study results, and the conclusion lacks sufficient basis. Conversely, an oversized sample can increase the practical difficulties of such analyses and unnecessarily waste labor, materials, financial resources, and time; in addition, sample excess may cause inadequate investment and decrease quality control during the scientific research process, thereby introducing potential interference with the study results.

When determining the sample size, the first thing to do is to define the test level or significance level "α", i.e., specifying in advance the allowable probability (α) of false-positive errors in this test (generally, α=0.05); additionally, you should decide whether a one-sided test or twosided test will be used. The smaller α is, the larger the sample size must be.

The test power should also be defined. The higher the test power is, the larger the sample size must be. The test power is determined by the probability of type-II errors (*β*). In the design of scientific studies, the test power should be not lower than 0.75; otherwise, it is possible that the test results will not reflect true differences in the population, thereby yielding falsenegative results.

In this example, the population represents healthy residents in a certain region, and the individuals investigated in the study constitute the sample. The frequency of gene HLA-B\*27 in the population is presumed from the distribution proportion of HLA-B\*27 in the sample. If we assume that the distribution frequency of gene HLA-B\*27 is *P*, then the minimal sample (*n*) meeting the statistical conditions is calculated with the following formula:

When *P* is close to 0.5:

$$m = \left(\frac{\mu\_{1-\alpha/2}}{\delta}\right)^2 P(1-P)$$

frequency, while the heterozygote frequency is twice the product of the corresponding two gene frequencies. We will explain this concept using ABO blood groups as an example. ABO blood groups are known to be controlled by three alleles A, B, and O, found at the same locus. We can assume that the gene frequencies are p, q, and r, respectively. According to the presentation formula of the Hardy-Weinberg equilibrium, various genotype frequencies of

> **Phenotype Genotype Genotype frequency** A AA *p2*

B BB *q2*

O OO *r2* AB AB *2pq*

The study subjects of HLA statistical data analysis are mostly specific groups, such as indi‐ viduals with a disease, of the same race, or from the same region, etc. However, due to the limitations of the study method, it is usually impossible to investigate every individual in the group, and the features of the whole group can only be presumed by analyzing some indi‐ viduals of the group. Thus, two concepts should be defined, i.e., population and sample. The

Population refers to all subjects in a study. The population can also be divided into the infinite population and the limited population. For example, we want to investigate the distribution of a certain HLA phenotype in Asian individuals; because it is difficult to estimate the total number of Asian individuals, we can assume that this population is infinite. Alternatively, if we want to study the recombination characteristics of the HLA system in a specific family, this population is limited. In HLA data analysis, most populations are infinite. Every member

A sample is a part of the population, and the number of individuals contained in a sample is the sample size. The core issue of statistical data analysis is that we presume the characteristics of a population from a sample. In order to accurately estimate the population parameters, an

Many factors need to be considered when determining the sample size, such as study objec‐ tives, precision, degree of confidence, reliability of statistical testing, sampling method, basic information of the population, study protocol, and study funds. Determination of the appro‐

core issue of statistical data analysis is how to deduce the population from a sample.

constituting the population is referred to as an individual.

appropriate sample size is the foundation of data analysis.

AO *2pr*

BO *2qr*

. See the

ABO blood groups are expressed with the expansion equation of *(pA+pB+pO)2*

following table.

6 HLA and Associated Important Diseases

**Table 2.** Genotype frequency of ABO blood groups

**2.2. Statistical basis of HLA data analysis**

*2.2.1. Population and sample*

When *P* is close to 0 or 1:

$$n = \left[\frac{57.3\mu\_{1-\alpha/2}}{\sin^{-1}\left(\delta \int \sqrt{P\left(1-P\right)}\right)}\right]^2$$

When *P* is unknown:

$$\mu = 0.25 \left( \frac{\mu\_{1-\alpha/2}}{\delta} \right)^2$$

In the formulas above, *u* indicates "*u* distribution", and *δ* indicates permissible error.

In this example, we want to investigate the frequency of HLA-B\*27 in healthy residents of a certain region. We assume that *P* in the previous investigation is 10%, the permissible error of this investigation is 1%, and *α*=0.05 (two-sided), and we can attempt to estimate the number of individuals required for the study. From the critical value form of the *u* distribution or the *u* distribution function, we know that *u*(1-0.05/2)=1.96, and the calculated *n* is about 3457 cases.

out a tool table according to the equal probability principle of mathematical statistics, and the sampling results are better than those obtained by drawing lots or casting coins. Study subjects should be randomly and uniformly assigned into each treatment group (all control and trial groups), thereby preventing various objective factors from intervening with the study results. The greater the number of study subjects, the higher the randomization level. However, it is unnecessary to maximize the amount of study subjects; we should select an appropriate randomization method depending on the trial features. Some common randomization

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

9

**1.** Drawing lots: This method is easy to perform. For example, if we want to divide 12 animals into two groups, we should number the animals with 1, 2, 3, …, 12 and prepare the 12 lots, each having a number from 1 to 12. The lots are then mixed, and 6 lots are drawn as per prior specifications; the animals with these 6 lots are assigned into Group 1, and the

**2.** Random number form: The random number form is carried out according to the principle of random sampling. It can be used for both random assignment and random sampling. All of the numbers in the form are mutually independent. Regardless of horizontal, longitudinal, or slant order, the numbers can randomly occur; therefore, random numbers can be obtained in order by starting from any direction and any location. Some examples

**a.** Dividing into two groups: We planned to observe 20 patients with gastric ulcers (patient No. 1–20); one group uses an effective drug ranitidine as a control, and the other group uses a lily decoction. Twenty2-digit random numbers are generated by looking up the random number form, and the random numbers are arranged from small to large, allowing us to obtain the grouping order number "R". If R is between 1 and 10, then the patient is assigned into Group A; if R is between 11 and 20, then the patient is assigned into Group B. The grouping results are presented in Table 3 (reference: TianheXu, Jiu Wang. Design of Medical Experiments: Lecture 2 – Rules of randomization and blinding

**Patient No.** 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 **Random number** 93 22 53 64 39 7 10 63 76 35 87 3 4 79 88 8 13 85 51 34 **Grouping order number (R)** 20 7 12 14 10 3 5 13 15 9 18 1 2 16 19 4 6 17 11 8 **Group** B A B B A A A B B A B A A B B A A B B A

**b.** Randomized division of three or more groups: If we want to randomly divide 15 animals into 3 groups, we should number the animals from 1 to 15.Then, fifteen 2-digit random numbers are generated by looking up the random number form, and the random numbers should be arranged from small to large. The order number "R" can then be obtained. If R is between 1 and 5, then the animal is assigned into Group A. If R is between 6 and 10,

methods are detailed below.

are given below.

remaining animals are assigned into Group 2.

method. *Chinese Medical Journal*, 2005, 40(8): p.54).

**Table 3.** Randomized grouping results of 20 patients

#### **2.** Estimation of the sample size when comparing the ratio of two populations

When comparing a certain ratio between two populations, for example assessing the differ‐ ences in the morbidities of cardiovascular diseases between blue collar and white collar workers in a city, HLA analysis often involves determining the distribution differences of a certain gene in diseased and control groups. We can assume that at least two samples with samples sizes of *n1* and *n2*, respectively, will be sampled from each population, and the estimated values of the population ratio obtained from the two samples are *p1* and *p2*, respec‐ tively.

If *n1* is equal to *n2*, then:

$$\mu\_{\mathcal{U}\_1 = \mathcal{U}\_2} = \frac{\mathsf{f}\mu\_{1-\alpha/2}\sqrt{2\,p(1-p)} + \mu\_{1-\beta}\sqrt{p\_1(1-p\_1) + p\_2(1-p\_2)}\mathbf{I}^2}{(p\_1 - p\_2)^2}$$

If *n1* is not equal to *n2*, then *n2=k×n1*:

$$\mu\_1 = \frac{\left\| \mu\_{1-\alpha/2} \sqrt{2p(1-p)(1+k)/k} + \mu\_{1-\beta} \sqrt{p\_1(1-p\_1) + p\_2(1-p\_2)/k} \right\|^2}{(p\_1 - p\_2)^2} \mu\_1$$

where *u* = *u* distribution, *β* = test power, and *p* = integrated rate of both groups.

#### *2.2.2. Sampling: The process of obtaining a sample from the population*

The purpose of sampling is to determine the characteristics of a population by studying a sample (subset) of the population. For example, we want to determine the distribution of genotype HLA-B\*07 in a marrow bank from the gene frequencies of 1000 individuals in the bank. This requires that the sample can maximally represent the population features. There‐ fore, every individual in the population should have the same chance to be sampled, and the sample should be free from bias. For example, in a study investigating a certain HLA pheno‐ type and disease, we generally hope to determine the relationship between the disease and the specific HLA phenotype. In order to do this, researchers must be careful not to deliberately exclude cases without the specific HLA phenotype during sampling. The resulting sample would then not be representative of the total population; this is a bias sample and would not represent the total population profile. The sample we use should be a miniature, accurate representation of the population. In order to achieve this goal, we should use the method of random sampling to obtain samples.

Many randomization methods are commonly used. Initially, drawing lots, casting coins, and casting lots were used; later, researchers adopted random number form, random arrangement form, and the computer-based methods to generate random numbers. For sampling studies in medical science and the grouping of trial subjects, random number form and random arrangement form are relatively convenient. They both perform random sampling and work out a tool table according to the equal probability principle of mathematical statistics, and the sampling results are better than those obtained by drawing lots or casting coins. Study subjects should be randomly and uniformly assigned into each treatment group (all control and trial groups), thereby preventing various objective factors from intervening with the study results. The greater the number of study subjects, the higher the randomization level. However, it is unnecessary to maximize the amount of study subjects; we should select an appropriate randomization method depending on the trial features. Some common randomization methods are detailed below.



this investigation is 1%, and *α*=0.05 (two-sided), and we can attempt to estimate the number of individuals required for the study. From the critical value form of the *u* distribution or the *u* distribution function, we know that *u*(1-0.05/2)=1.96, and the calculated *n* is about 3457 cases.

When comparing a certain ratio between two populations, for example assessing the differ‐ ences in the morbidities of cardiovascular diseases between blue collar and white collar workers in a city, HLA analysis often involves determining the distribution differences of a certain gene in diseased and control groups. We can assume that at least two samples with samples sizes of *n1* and *n2*, respectively, will be sampled from each population, and the estimated values of the population ratio obtained from the two samples are *p1* and *p2*, respec‐

(*p*<sup>1</sup> <sup>−</sup> *<sup>p</sup>*2)2 ,

The purpose of sampling is to determine the characteristics of a population by studying a sample (subset) of the population. For example, we want to determine the distribution of genotype HLA-B\*07 in a marrow bank from the gene frequencies of 1000 individuals in the bank. This requires that the sample can maximally represent the population features. There‐ fore, every individual in the population should have the same chance to be sampled, and the sample should be free from bias. For example, in a study investigating a certain HLA pheno‐ type and disease, we generally hope to determine the relationship between the disease and the specific HLA phenotype. In order to do this, researchers must be careful not to deliberately exclude cases without the specific HLA phenotype during sampling. The resulting sample would then not be representative of the total population; this is a bias sample and would not represent the total population profile. The sample we use should be a miniature, accurate representation of the population. In order to achieve this goal, we should use the method of

Many randomization methods are commonly used. Initially, drawing lots, casting coins, and casting lots were used; later, researchers adopted random number form, random arrangement form, and the computer-based methods to generate random numbers. For sampling studies in medical science and the grouping of trial subjects, random number form and random arrangement form are relatively convenient. They both perform random sampling and work

where *u* = *u* distribution, *β* = test power, and *p* = integrated rate of both groups.

**2.** Estimation of the sample size when comparing the ratio of two populations

tively.

If *n1* is equal to *n2*, then:

8 HLA and Associated Important Diseases

If *n1* is not equal to *n2*, then *n2=k×n1*:

random sampling to obtain samples.

*<sup>n</sup>*<sup>1</sup> <sup>=</sup>*n*<sup>2</sup> <sup>=</sup> *<sup>u</sup>*1−*α*/2 <sup>2</sup>*p*(1<sup>−</sup> *<sup>p</sup>*) <sup>+</sup> *<sup>u</sup>*1−*<sup>β</sup> <sup>p</sup>*1(1<sup>−</sup> *<sup>p</sup>*1) <sup>+</sup> *<sup>p</sup>*2(1<sup>−</sup> *<sup>p</sup>*2) <sup>2</sup> (*p*<sup>1</sup> − *p*2)2

*<sup>n</sup>*<sup>1</sup> <sup>=</sup> *<sup>u</sup>*1−*α*/2 <sup>2</sup>*p*(1<sup>−</sup> *<sup>p</sup>*)(1 <sup>+</sup> *<sup>k</sup>*) / *<sup>k</sup>* <sup>+</sup> *<sup>u</sup>*1−*<sup>β</sup> <sup>p</sup>*1(1<sup>−</sup> *<sup>p</sup>*1) <sup>+</sup> *<sup>p</sup>*2(1<sup>−</sup> *<sup>p</sup>*2) / *<sup>k</sup>* <sup>2</sup>

*2.2.2. Sampling: The process of obtaining a sample from the population*

**b.** Randomized division of three or more groups: If we want to randomly divide 15 animals into 3 groups, we should number the animals from 1 to 15.Then, fifteen 2-digit random numbers are generated by looking up the random number form, and the random numbers should be arranged from small to large. The order number "R" can then be obtained. If R is between 1 and 5, then the animal is assigned into Group A. If R is between 6 and 10, then the animal is assigned to Group B. If R is between 11 and 15, then the animal is assigned to Group C. The grouping results are presented in Table 4 (TianheXu, Jiu Wang. Design of Medical Experiments: Lecture 2 – Rules of randomization and blinding method. *Chinese Medical Journal*, 2005, 40(8): p.54).

*2.3.2. Genetics features of HLA*

section).

**3.1. Genetic structure**

model and use the following symbols.

**1.** Haplotype genetic mode: An HLA complex is a group of closely linked genes. Crossingover between homologous chromosomes rarely occurs in these alleles, which are linked in the same chromosome, i.e., they form a haplotype. During reproduction, the HLA haplotype is inherited from parent to progeny as a complete genetic unit. The progeny can randomly obtain an HLA haplotype from both parents, thereby forming the progeny's new genotype. In the siblings of the same family, the probability of having two identical haplotypes is 25%, the probability of having one identical haplotype is 50%, and the probability of having two different haplotypes is 25%. Therefore, when seeking an appropriate donor for allogeneic organ transplant or transplant of hematopoietic stem cells in clinical practice, it is much easier to find the matched HLA antigen (the matched HLA haplotype in particular) in the patient's family than in nonsibling donors. However, it should be noted that when the haplotype is inherited from parent to progeny, homol‐ ogous crossing-over between both haplotypes may occur (see details in the recombination

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

11

**2.** Codominant inheritance: This means that antigens encoded by each pair of alleles are expressed on the cell membrane, and there is no recessive gene. Allele rejection does not exist. If the haplotypes of an individual's two chromosomes are HLA-A\*11:01, B\*27:04 and HLA-A\*24:01, B\*07:02, then four different HLA molecules, A11, A24, B27, and B7,

**3.** Linkage disequilibrium: Various HLA alleles at different loci occur in the group at a specific frequency. In a group, if the frequencies of two alleles at different loci occurring in the same chromosome are higher than the expected random frequencies, i.e.,the haplotype frequency (observed data) is significantly higher or lower than theoretical value (the product of allele frequencies at different loci), then this non-free combination phenomenon is referred to as linkage disequilibrium. For example, A1 and B8 in Cauca‐ sians and A2 and B46 in southern Chinese individuals always occur together, and the resulting haplotypes A1-B8 and A2-B46, respectively, exhibit linkage disequilibrium.

Studies of the genetic parameters of the HLA system actually start from the loci "HLA-A" and "HLA-B"; these two loci are often used as examples in HLA data analysis. To provide a simple description, we will expand this model to an autosomal double-loci multiple-alleles genetic

Assume thatthereare twolinkage loci(IandJ)onahumanchromosome; eachlocushasmultiple codominant alleles. The alleles at locus I are labeled as *i1*, *i2*, *i3*, *in*, and *i0*. "*i0*" represents the undetected blank gene at locus I; therefore, the allele number at locus I is calculated as *l* = *n* + 1.

will be expressed on the cytomembrane surface of the individual.

**3. Estimation of HLA population genetic parameters**


**Table 4.** Randomized grouping of 15 animals

#### **2.3. Definitions of relative terms in HLA statistical data analysis**

#### *2.3.1. Definitions of HLA phenotype, haplotype, and genotype*

HLA antigens have their own allele code on the chromosome; generally, the HLA antibodyantigen specificity of an individual can be detected using available typing reagents and committed cells. The antigen-specific type obtained by this method is referred to as the phenotype. However, the antigen phenotype does not reflect the individual's allele combina‐ tion pattern on the chromosome. The combination of HLA alleles on the chromosome is referred to as the haplotype. If this combination expands from type-I and type-II alleles to type-III genes or adjacent loci, it is often referred to as an extended haplotype. Two haplotypes form the HLA genotype of an individual, i.e., the pattern of the HLA allele combination on two chromosomes in the individual (Table 5). Generally, the haplotype and genotype can only be determined by performing phenotype analysis of all the members of a family or by using special experimental methods, such as monospermal analysis. The phenotype of every individual has many potential combinations that depend on different genotypes. It is therefore important to understand an individual's haplotype and genotype in allogeneic organ trans‐ plant, transplantation of hematopoietic stem cells, and forensic identification.


**Table 5.** Differences in HLA phenotypes, haplotypes, and genotypes

### *2.3.2. Genetics features of HLA*

then the animal is assigned to Group B. If R is between 11 and 15, then the animal is assigned to Group C. The grouping results are presented in Table 4 (TianheXu, Jiu Wang. Design of Medical Experiments: Lecture 2 – Rules of randomization and blinding method.

**Animal No.** 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 **Random number** 33 35 72 67 47 77 34 55 45 70 8 18 27 38 90 **Grouping order number (R)** 4 6 13 11 9 14 5 10 8 12 1 2 3 7 15 **Group** A B C C B C A B B C A A A B C

HLA antigens have their own allele code on the chromosome; generally, the HLA antibodyantigen specificity of an individual can be detected using available typing reagents and committed cells. The antigen-specific type obtained by this method is referred to as the phenotype. However, the antigen phenotype does not reflect the individual's allele combina‐ tion pattern on the chromosome. The combination of HLA alleles on the chromosome is referred to as the haplotype. If this combination expands from type-I and type-II alleles to type-III genes or adjacent loci, it is often referred to as an extended haplotype. Two haplotypes form the HLA genotype of an individual, i.e., the pattern of the HLA allele combination on two chromosomes in the individual (Table 5). Generally, the haplotype and genotype can only be determined by performing phenotype analysis of all the members of a family or by using special experimental methods, such as monospermal analysis. The phenotype of every individual has many potential combinations that depend on different genotypes. It is therefore important to understand an individual's haplotype and genotype in allogeneic organ trans‐

plant, transplantation of hematopoietic stem cells, and forensic identification.

**Individual Individual 1 Individual 2 Individual 3**

**Phenotype** HLA-A11, A24, B7, B27 HLA-A11, B7, 27 HLA-A11, B27

A\*11:01 B\*07:02 B\*27:04

HLA-A\*11:01, A\*11:01 HLA-B\*07:02, B\*27:04

HLA- A\*11:01, B\*07:02 & HLA- A\*11:01, B\*27:04 A\*11:01 B\*27:04

HLA-A\*11:01, A\*11:01 HLA-B\*27:04, B\*27:04

HLA- A\*11:01, B\*27:04 & HLA- A\*11:01, B\*27:04

A\*11:01 A\*24:01 B\*07:02 B\*27:04

HLA-A\*11:01,A\*24:01 HLA-B\*07:02, B\*27:04

HLA-A\*11:01, B\*07:02 & HLA-A\*24:01, B\*27:04

HLA- A\*11:01, B\*27:04 & HLA-A\*24:01, B\*07:02

**Table 5.** Differences in HLA phenotypes, haplotypes, and genotypes

or

*Chinese Medical Journal*, 2005, 40(8): p.54).

**2.3. Definitions of relative terms in HLA statistical data analysis**

*2.3.1. Definitions of HLA phenotype, haplotype, and genotype*

**Table 4.** Randomized grouping of 15 animals

10 HLA and Associated Important Diseases

**Typing results**

**Genotype**

**Haplotype**


### **3. Estimation of HLA population genetic parameters**

#### **3.1. Genetic structure**

Studies of the genetic parameters of the HLA system actually start from the loci "HLA-A" and "HLA-B"; these two loci are often used as examples in HLA data analysis. To provide a simple description, we will expand this model to an autosomal double-loci multiple-alleles genetic model and use the following symbols.

Assume thatthereare twolinkage loci(IandJ)onahumanchromosome; eachlocushasmultiple codominant alleles. The alleles at locus I are labeled as *i1*, *i2*, *i3*, *in*, and *i0*. "*i0*" represents the undetected blank gene at locus I; therefore, the allele number at locus I is calculated as *l* = *n* + 1. Similarly, the alleles at locus J are expressed as *j1*, *j2*, *j3*,…*jm* and *j0*. "*j0*" represents the undetect‐ ed blank gene at locus J; therefore, the allele number at locus J is calculated as *k* = *m* + 1.

**Gene j**

**+ -**

**+** *s u p<sup>i</sup>*

**-** *t v q<sup>i</sup>*

*=s+t q<sup>j</sup>*

According to the gene or haplotype frequency, the expected values of all genotype frequencies and phenotype frequencies can be obtained by combination as per the Hardy-Weinberg equilibrium law; the coincidence degree of the expected value and the corresponding actual observed value is referred to as the Hardy-Weinberg coincidence test. This test is mainly used in two cases: 1) As a prompt for supporting or excluding a certain genetic mode. For example, in an assumed Mendelian genetic system, the gene or haplotype frequency is calculated on the basis of the assumed genetic mode, and recombination is then performed as per the Hardy-Weinberg equilibrium law to obtain the expected value of the phenotype. If the expected value coincides with the observed value of this phenotype, the genetic mode may be true; otherwise, the genetic mode may be excluded. The conclusion obtained by application of the Hardy-Weinberg equilibrium to test a genetic mode cannot be confirmatory because sometimes increasing the assumed loci may give better coincidence results. 2) For reliability estimation of the population survey data. For some genetic systems with well-established genetic modes, such as the HLA system discussed in this book, if the population can perform fully random mating, and there are no effects caused by selection, mutation, or migration, the population distribution should be in good Hardy-Weinberg equilibrium. Poor coincidence of both values shows that the population survey data are not reliable, which can help us identify the causes

The coincidence degree of a phenotype's expected value and observed value is generally

In the Hardy-Weinberg equilibrium test, the expected value of the phenotype is often less than

the phenotype value is more than 5; however, this method has obvious subjective factors, and

incorporation methods, it is difficult to compare data between studies. Therefore, we think that it is unnecessary to incorporate items with the phenotype expected values of less than 5,

. The P value is calculated by looking up the form. The *χ<sup>2</sup>*

5. In this case, some authors will incorporate several phenotypes and calculate χ<sup>2</sup>

is calculated for every phenotype, and the values are added to obtain

value after incorporation will be reduced. In addition, due to variations of

calculation formula is:

again when

**Gene i**

measured with *χ<sup>2</sup>*

the calculated χ<sup>2</sup>

the total *χ<sup>2</sup>*

**Total** *p<sup>j</sup>*

of errors in aspects sampling, typing technology, etc.

. The *χ<sup>2</sup>*

*<sup>χ</sup>* <sup>2</sup> <sup>=</sup><sup>∑</sup> (*Expected value* - *Observed value*)2 *Expected value*

*3.2.1. Measuring method for determination of the coincidence degree*

**3.2. Hardy-Weinberg equilibrium test**

**Table 7.** Relation between the gene frequencies at loci I and J and the haplotype frequencies

**Total**

13

*=s+u*

*=t+v*

*=u+v* 1

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

Because the alleles at loci I and J can randomly combine, the number of all the possible haplotypes is *l\*k*. These haplotypes can form various genotypes, calculated as *lk*(*lk*+1)/2. Considering that the phenotype of genes in a homozygosis state is the same as the phenotype of a gene hybridized with a blank gene, i.e., the phenotype of genotypes *i2i2* and *i2i0* is *i2* (+), the number of all possible phenotypes from any allele combination at locus I is [*l*+(*l*-1) × (*l*-2)/2]; similarly, the number of all possible phenotypes from any allele combination at locus J is [*k*+ (*k*-1) × (*k*-2)/2]. Thus, the number of all possible phenotypes at loci I and J is [*l*+(*l*-1) × (*l*-2)/2] × [*k*+(*k*-1) × (*k*-2)/2].

Loci I and J are linked, so the gene frequency of each locus correlates to the frequency of haplotypes formed with genes at both loci; the antigen distribution in the population also correlates at both loci. This relation can be fully shown in a 2 × 2 four-space form. Unless specified otherwise, all the populations mentioned in this chapter are Mendelian populations achieving Hardy-Weinberg equilibrium, i.e., this population undergoes completely random mating, and there are no effects of selection, mutation, or migration.

The population distribution of antigens at both loci is presented in the table below. Symbols in the table indicate that in a population with a total number of individuals "*N*", "a" individuals have antigens i and j, "b" individuals have antigen i but without antigen j, "c" individuals have antigen j but without antigen i, and "d" individuals do not have antigens i and j. The marginal values A, B, C, and D in this table are respectively equal to the sum of the corresponding two spaces, and *N* is equal to the sum of four spaces.


**Table 6.** Population distribution of antigens i and j

Table 6 shows the relation between the frequencies of genes i and j and the haplotype fre‐ quency. The genes at loci I and J can form four haplotypes, ij, j0, i0, and 00; "0" represents the blank gene. The frequencies of the four genes are expressed as *s*, *t*, *u*, and *v* respectively. The frequency of gene i is expressed as "*pi* ", and the frequency sum of the other alleles at locus I is expressed as "*qi* "; obviously, *pi +qi* =1. Similarly, the frequency of gene j and the frequency sum of the other alleles at locus J are expressed as "*pj* " and "*qj* ", respectively. From the table below, we can see that the frequency of each gene can be expressed as the frequency sum of the corresponding haplotypes.


**Table 7.** Relation between the gene frequencies at loci I and J and the haplotype frequencies

#### **3.2. Hardy-Weinberg equilibrium test**

Similarly, the alleles at locus J are expressed as *j1*, *j2*, *j3*,…*jm* and *j0*. "*j0*" represents the undetect‐

Because the alleles at loci I and J can randomly combine, the number of all the possible haplotypes is *l\*k*. These haplotypes can form various genotypes, calculated as *lk*(*lk*+1)/2. Considering that the phenotype of genes in a homozygosis state is the same as the phenotype of a gene hybridized with a blank gene, i.e., the phenotype of genotypes *i2i2* and *i2i0* is *i2* (+), the number of all possible phenotypes from any allele combination at locus I is [*l*+(*l*-1) × (*l*-2)/2]; similarly, the number of all possible phenotypes from any allele combination at locus J is [*k*+ (*k*-1) × (*k*-2)/2]. Thus, the number of all possible phenotypes at loci I and J is [*l*+(*l*-1) × (*l*-2)/2] ×

Loci I and J are linked, so the gene frequency of each locus correlates to the frequency of haplotypes formed with genes at both loci; the antigen distribution in the population also correlates at both loci. This relation can be fully shown in a 2 × 2 four-space form. Unless specified otherwise, all the populations mentioned in this chapter are Mendelian populations achieving Hardy-Weinberg equilibrium, i.e., this population undergoes completely random

The population distribution of antigens at both loci is presented in the table below. Symbols in the table indicate that in a population with a total number of individuals "*N*", "a" individuals have antigens i and j, "b" individuals have antigen i but without antigen j, "c" individuals have antigen j but without antigen i, and "d" individuals do not have antigens i and j. The marginal values A, B, C, and D in this table are respectively equal to the sum of the corresponding two

**Total** A=a+c B=b+d *N*=a+b+c+d

Table 6 shows the relation between the frequencies of genes i and j and the haplotype fre‐ quency. The genes at loci I and J can form four haplotypes, ij, j0, i0, and 00; "0" represents the blank gene. The frequencies of the four genes are expressed as *s*, *t*, *u*, and *v* respectively. The

we can see that the frequency of each gene can be expressed as the frequency sum of the

" and "*qj*

**Antigen j**

**+ -**

+ a b C=a+b - c d D=c+d

", and the frequency sum of the other alleles at locus I is

", respectively. From the table below,

=1. Similarly, the frequency of gene j and the frequency sum

**Total**

mating, and there are no effects of selection, mutation, or migration.

spaces, and *N* is equal to the sum of four spaces.

**Table 6.** Population distribution of antigens i and j

frequency of gene i is expressed as "*pi*

"; obviously, *pi*

of the other alleles at locus J are expressed as "*pj*

*+qi*

ed blank gene at locus J; therefore, the allele number at locus J is calculated as *k* = *m* + 1.

[*k*+(*k*-1) × (*k*-2)/2].

12 HLA and Associated Important Diseases

**Antigen i**

expressed as "*qi*

corresponding haplotypes.

According to the gene or haplotype frequency, the expected values of all genotype frequencies and phenotype frequencies can be obtained by combination as per the Hardy-Weinberg equilibrium law; the coincidence degree of the expected value and the corresponding actual observed value is referred to as the Hardy-Weinberg coincidence test. This test is mainly used in two cases: 1) As a prompt for supporting or excluding a certain genetic mode. For example, in an assumed Mendelian genetic system, the gene or haplotype frequency is calculated on the basis of the assumed genetic mode, and recombination is then performed as per the Hardy-Weinberg equilibrium law to obtain the expected value of the phenotype. If the expected value coincides with the observed value of this phenotype, the genetic mode may be true; otherwise, the genetic mode may be excluded. The conclusion obtained by application of the Hardy-Weinberg equilibrium to test a genetic mode cannot be confirmatory because sometimes increasing the assumed loci may give better coincidence results. 2) For reliability estimation of the population survey data. For some genetic systems with well-established genetic modes, such as the HLA system discussed in this book, if the population can perform fully random mating, and there are no effects caused by selection, mutation, or migration, the population distribution should be in good Hardy-Weinberg equilibrium. Poor coincidence of both values shows that the population survey data are not reliable, which can help us identify the causes of errors in aspects sampling, typing technology, etc.

#### *3.2.1. Measuring method for determination of the coincidence degree*

The coincidence degree of a phenotype's expected value and observed value is generally measured with *χ<sup>2</sup>* . The *χ<sup>2</sup>* is calculated for every phenotype, and the values are added to obtain the total *χ<sup>2</sup>* . The P value is calculated by looking up the form. The *χ<sup>2</sup>* calculation formula is:

*<sup>χ</sup>* <sup>2</sup> <sup>=</sup><sup>∑</sup> (*Expected value* - *Observed value*)2 *Expected value*

In the Hardy-Weinberg equilibrium test, the expected value of the phenotype is often less than 5. In this case, some authors will incorporate several phenotypes and calculate χ<sup>2</sup> again when the phenotype value is more than 5; however, this method has obvious subjective factors, and the calculated χ<sup>2</sup> value after incorporation will be reduced. In addition, due to variations of incorporation methods, it is difficult to compare data between studies. Therefore, we think that it is unnecessary to incorporate items with the phenotype expected values of less than 5, and the χ<sup>2</sup> should be calculated as in other cases; although this may increase theχ<sup>2</sup> value, the resulting coincidence conclusion is more reliable.

Determination of the degrees of freedom in the χ<sup>2</sup> test: Assume that a genetic system consists of n alleles and *Φ* phenotypes, and the sample size is *N*. Because the gene frequencies *p1+p2+ ……pn*=1, the number of parameters estimated from the sample is (*n*-1); in addition, a degree of freedom is lost because the sample is too small. Therefore, the degrees of freedom remaining for other tests are:

$$d\_f = \Phi - (n - 1) - 1 = \Phi - n$$

In the Hardy-Weinberg equilibrium, *p* ≥ 0.5 is generally used as the criterion to judge whether there are significant differences between the expected and observed values.

### *3.2.2. Hardy-Weinberg equilibrium test for separated loci*

In a genetic system containing one or more loci, the Hardy-Weinberg equilibrium test can be performed for every locus. According to the Hardy-Weinberg equilibrium law, the expected frequency of the homozygous genotype is the product of the corresponding two gene fre‐ quencies, and the expected frequency of the heterozygous genotype is twice the product of the corresponding gene frequencies. The expected value of each phenotype is equal to the sum of the expected values of the corresponding genotypes. After multiplying the phenotype frequency by sample size "*N*" to calculate the expected value of the phenotype, the coincidence degree between this expected value and the observed value of the phenotype can be tested.

The table (Table 8) below shows a Hardy-Weinberg coincidence test of the antigen phenotype at locus HLA-C in Chinese individuals with the Han nationality. The expected values and observed values coincide well, demonstrating that the distribution of these alleles at locus C is in the Hardy-Weinberg equilibrium state. In this table, the HLA-Cw1 phenotype includes two genotypes, "HLA-C\*01/HLA-C\*01" and "HLA-C\*01/blank"; therefore, the expected value of the phenotype is calculated as

into the corresponding phenotypes, and a coincidence test is then performed with the observed value of the phenotype. Table 9 shows the calculation method for the expected value of phenotype "HLA-A2, B15", where A\* is blank, representing the set of all alleles at locus HLA-A except HLA-A\*02, and B\* is blank, representing the set of all alleles at locus HLA-B except HLA-B\*15. Haplotype frequencies would be calculated as HLA-A\*02 B\*15=0.0113; HLA-A\*02

**Table 8.** Hardy-Weinberg equilibrium test of the antigen phenotype at locus HLA-C in Chinese individuals with the

43 46.788

9 6.9655

0 0

**Phenotype Genotype Observed value Expected value Gene frequency**

HLA Cw1,2 HLA-C\*01/HLA-C\*02 0 0.4311 HLA-C\*02=0.0143 HLA Cw1,3 HLA-C\*01/HLA-C\*03 17 11.6275 HLA-C\*03=0.3857 HLA Cw1,4 HLA-C\*01/HLA-C\*04 1 2.3665 HLA-C\*04=0.0785 HLA Cw1,5 HLA-C\*01/HLA-C\*05 0 0 HLA-C\*05=0

10 13.5779 HLA-C\*01=0.1422

1 1.1716 HLA-C\*blank=0.3793

=16-6=10

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

15

HLA-C\*01/HLA-C\*01 or HLA-C\*01/blank

HLA-C\*02/HLA-C\*02 or HLA-C\*02/blank

HLA-C\*03/HLA-C\*03 or HLA-C\*03/blank

HLA-C\*04/HLA-C\*04 or HLA-C\*04/blank

HLA-C\*05/HLA-C\*05 or HLA-C\*05/blank

HLA Cw4,5 HLA-C\*04/HLA-C\*05 0 0

Blank Blank/blank 18 15.2501 Total 106 106.004

HLA Cw2,3 HLA-C\*02/HLA-C\*03 1 1.1693

HLA Cw3,4 HLA-C\*03/HLA-C\*04 5 6.4188 HLA Cw3,5 HLA-C\*03/HLA-C\*05 0 0

HLA Cw2,4 HLA-C\*02/HLA-C\*04 1 0.2380 *df*

HLA Cw2,5 HLA-C\*02/HLA-C\*05 0 0 *P*>0.5

HLA Cw1

HLA Cw2

HLA Cw3

HLA Cw4

HLA Cw5

Han nationality

**Haplotype combination mode Phenotype frequency** A\*02 B\*blank /A\*blank B\*15 2 × 0.0559 × 0.0098=0.001096 A\*02 B\*15 /A\*blank B\*blank 2 × 0.0113 × 0.0053=0.000120 A\*02 B\*15 /A\*blank B\*15 2 × 0.0113 × 0.0098=0.000221 A\*02 B\*15 /A\*02 B\*blank 2 × 0.0113 × 0.0059=0.000133 A\*02 B\*15 /A\*02 B\*15 0.0113 × 0.0113 × 0.000128

B\*blank=0.0559; HLA-A\*blank B\*15=0.0098; HLA-A\*blank B\*blank=0.0053.

**Table 9.** The haplotype composition and expected frequency of phenotype "HLA-A2, B15"

106×(0.1442× 0.1442 + 0.1442 0.3793×2)=13.5779.

The phenotype expected value of HLA-Cw1, 2 is calculated as

106×0.1442×0.0143×2 = 0.4311.

#### *3.2.3. Hardy-Weinberg equilibrium test of haplotypes*

The haplotype Hardy-Weinberg equilibrium test can be performed in a multiple-loci, multiplealleles genetic system, and the allelic and linkage relationships of all the genes in the system can also be tested. If not considering recombination, haplotypes and alleles also comply with the same genetic rules; therefore, according to Hardy-Weinberg equilibrium, the expected value of the phenotype containing two identical haplotypes should be equal to the square of the haplotype frequency, and the expected value of the phenotype containing two different haplotypes should be equal to twice the product of the frequencies of the two haplotypes. the expected value of the phenotype can be calculated by sorting various haplotype frequencies


and the χ<sup>2</sup> should be calculated as in other cases; although this may increase theχ<sup>2</sup>

of n alleles and *Φ* phenotypes, and the sample size is *N*. Because the gene frequencies *p1+p2+ ……pn*=1, the number of parameters estimated from the sample is (*n*-1); in addition, a degree of freedom is lost because the sample is too small. Therefore, the degrees of freedom remaining

In the Hardy-Weinberg equilibrium, *p* ≥ 0.5 is generally used as the criterion to judge whether

In a genetic system containing one or more loci, the Hardy-Weinberg equilibrium test can be performed for every locus. According to the Hardy-Weinberg equilibrium law, the expected frequency of the homozygous genotype is the product of the corresponding two gene fre‐ quencies, and the expected frequency of the heterozygous genotype is twice the product of the corresponding gene frequencies. The expected value of each phenotype is equal to the sum of the expected values of the corresponding genotypes. After multiplying the phenotype frequency by sample size "*N*" to calculate the expected value of the phenotype, the coincidence degree between this expected value and the observed value of the phenotype can be tested. The table (Table 8) below shows a Hardy-Weinberg coincidence test of the antigen phenotype at locus HLA-C in Chinese individuals with the Han nationality. The expected values and observed values coincide well, demonstrating that the distribution of these alleles at locus C is in the Hardy-Weinberg equilibrium state. In this table, the HLA-Cw1 phenotype includes two genotypes, "HLA-C\*01/HLA-C\*01" and "HLA-C\*01/blank"; therefore, the expected value

The haplotype Hardy-Weinberg equilibrium test can be performed in a multiple-loci, multiplealleles genetic system, and the allelic and linkage relationships of all the genes in the system can also be tested. If not considering recombination, haplotypes and alleles also comply with the same genetic rules; therefore, according to Hardy-Weinberg equilibrium, the expected value of the phenotype containing two identical haplotypes should be equal to the square of the haplotype frequency, and the expected value of the phenotype containing two different haplotypes should be equal to twice the product of the frequencies of the two haplotypes. the expected value of the phenotype can be calculated by sorting various haplotype frequencies

there are significant differences between the expected and observed values.

resulting coincidence conclusion is more reliable. Determination of the degrees of freedom in the χ<sup>2</sup>

*3.2.2. Hardy-Weinberg equilibrium test for separated loci*

for other tests are:

*df* =Φ−(*n* −1)−1=Φ−*n*

14 HLA and Associated Important Diseases

of the phenotype is calculated as

106×0.1442×0.0143×2 = 0.4311.

106×(0.1442× 0.1442 + 0.1442 0.3793×2)=13.5779.

*3.2.3. Hardy-Weinberg equilibrium test of haplotypes*

The phenotype expected value of HLA-Cw1, 2 is calculated as

value, the

test: Assume that a genetic system consists

**Table 8.** Hardy-Weinberg equilibrium test of the antigen phenotype at locus HLA-C in Chinese individuals with the Han nationality

into the corresponding phenotypes, and a coincidence test is then performed with the observed value of the phenotype. Table 9 shows the calculation method for the expected value of phenotype "HLA-A2, B15", where A\* is blank, representing the set of all alleles at locus HLA-A except HLA-A\*02, and B\* is blank, representing the set of all alleles at locus HLA-B except HLA-B\*15. Haplotype frequencies would be calculated as HLA-A\*02 B\*15=0.0113; HLA-A\*02 B\*blank=0.0559; HLA-A\*blank B\*15=0.0098; HLA-A\*blank B\*blank=0.0053.


**Table 9.** The haplotype composition and expected frequency of phenotype "HLA-A2, B15"

### **3.3. Estimation of genetic parameters**

Genetic parameters are estimated by assessing a quantity-limited sample, and thus, sampling error must exist. The size of sampling error is expressed with the standard error *σ*, where *σ* is equal to the root extraction of variance "*V*".

genes. For example, if we want to calculate the gene frequency of HLA-B\*27 at locus HLA-B, which is expressed as i1, then the frequency sum of all the alleles except HLA-B\*27 is expressed

**Genotype** i1i1 i1i2 i2i2 **Observed individuals** 36 48 16 **Amount of gene i1** 72 48 0 Amount of gene i2 0 48 32

Currently, HLA typing technology is developing rapidly. With the popularization of highthroughput sequencing technology, the calculation of HLA gene frequency can be mostly completed using the counting method. However, due to limited technical conditions in some population surveys, we can only obtain the corresponding phenotypes. Therefore, how should be best analyze gene frequency? There are two main methods used in practical work: one is the root method, which involved simple arithmetic and easy to perform; the other is the maximal likelihood algorithm, which is highly efficient in estimating gene frequencies, but required specialized computer software (see details in the next section). A description of how to use the root method for estimation of gene frequency according to phenotype results is given

The relationship between the phenotype frequency and the corresponding genotype frequency

**frequency Corresponding genotype Frequency of corresponding genotype**

*2+2pi qi*

*2*

can be obtained according to the Hardy-Weinberg law (see the table below).

I (+) *fi* i homozygote "ii", i heterozygote "i-" *pi*

I (-) *1-fi* Non-i combination "-/-" *qi*

**Table 11.** Relationship between phenotype and genotype frequencies

, then the frequency sum of all the other alleles at

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

17

**Table 10.** Distribution of genotype I in 100 random individuals

If the frequency of the dominant gene i is *pi*

*3.5.2. Estimation of gene frequency according to phenotype frequency*

as i2.

below.

this locus is

**Phenotype Phenotype**

We can deduce from the table

*pi* =1 - 1 - *fi*

*qi* =1− *pi* ;

### **3.4. Antigen frequency**

Antigen frequency is defined as the ratio or percentage of individuals with the antigen phenotype in the population. If *N* = total individuals and *C* = individuals with the antigen phenotype i, then the frequency of antigen i is calculated as:

$$f\_i = \text{C}\_i/N$$

The frequencies of antigens i and j can be easily obtained from the four-space form above:

$$f\_{\bar{i}} = (a+b)/N = C/N$$

$$f\_{\bar{j}} = (a+c)/N = A/N$$

and the standard error is calculated as: *σfi* <sup>=</sup> <sup>1</sup> *N CD <sup>N</sup>* <sup>=</sup> *fi*\*(1 - *fi*) *N*

When the antigen frequency *fi* is fixed, the greater the sample size *N*, the lower the standard error.

### **3.5. Gene frequency**

Assuming that i1 is an allele at locus I, the ratio or percentage of gene i1 in all the genes of this locus is referred to as the gene or genotype frequency of i1. The frequency sum of all alleles at a single locus is 1. Gene frequency can be obtained from family or population surveys.

#### *3.5.1. Calculation of gene frequency by direct genotype count*

If the genotyping results of *N* individuals are known, then the frequency of gene i in the population can be obtained by a simple counting method. Assuming that this value is *X*, the frequency of gene i is:

### *pi* = *X* / 2*N*

Assume that there are two alleles at autosomal locus I, i1 and i2. For diploids, it is possible to form three genotypes: i1i1, i1i2, and i2i2. After surveying 100 individuals, possible count results are presented per genotype in the following table (Table 10). In total, there are 200 genes at locus I: 36 individuals have i1i1, and the count of gene i1 is 72; 48 individuals have i1i2, and the count of genes i1 and i2 is 48; 16 individuals have i2i2, and the count of gene i2 is 32. Therefore, the gene frequency of i1 is calculated as (72+48)/200=0.6; similarly, the gene frequency of i2 is calculated as (32+48)/200=0.4; the sum of both frequencies is 1.

In the HLA system, each locus usually has several alleles. If we want to calculate the frequency of a certain allele at the locus, which can be expressed as i1, then the meaning of i2 is non-i1 genes. For example, if we want to calculate the gene frequency of HLA-B\*27 at locus HLA-B, which is expressed as i1, then the frequency sum of all the alleles except HLA-B\*27 is expressed as i2.


**Table 10.** Distribution of genotype I in 100 random individuals

#### *3.5.2. Estimation of gene frequency according to phenotype frequency*

Currently, HLA typing technology is developing rapidly. With the popularization of highthroughput sequencing technology, the calculation of HLA gene frequency can be mostly completed using the counting method. However, due to limited technical conditions in some population surveys, we can only obtain the corresponding phenotypes. Therefore, how should be best analyze gene frequency? There are two main methods used in practical work: one is the root method, which involved simple arithmetic and easy to perform; the other is the maximal likelihood algorithm, which is highly efficient in estimating gene frequencies, but required specialized computer software (see details in the next section). A description of how to use the root method for estimation of gene frequency according to phenotype results is given below.

If the frequency of the dominant gene i is *pi* , then the frequency sum of all the other alleles at this locus is

#### *qi* =1− *pi* ;

**3.3. Estimation of genetic parameters**

**3.4. Antigen frequency**

16 HLA and Associated Important Diseases

*f <sup>i</sup>* =(*a* + *b*)/ *N* =*C* / *N*

*f <sup>j</sup>* =(*a* + *c*)/ *N* = *A* / *N*

**3.5. Gene frequency**

frequency of gene i is:

*pi* = *X* / 2*N*

*f <sup>i</sup>* =*C* / *N*

error.

equal to the root extraction of variance "*V*".

and the standard error is calculated as: *σfi* <sup>=</sup> <sup>1</sup>

*3.5.1. Calculation of gene frequency by direct genotype count*

calculated as (32+48)/200=0.4; the sum of both frequencies is 1.

phenotype i, then the frequency of antigen i is calculated as:

Genetic parameters are estimated by assessing a quantity-limited sample, and thus, sampling error must exist. The size of sampling error is expressed with the standard error *σ*, where *σ* is

Antigen frequency is defined as the ratio or percentage of individuals with the antigen phenotype in the population. If *N* = total individuals and *C* = individuals with the antigen

The frequencies of antigens i and j can be easily obtained from the four-space form above:

*N*

When the antigen frequency *fi* is fixed, the greater the sample size *N*, the lower the standard

Assuming that i1 is an allele at locus I, the ratio or percentage of gene i1 in all the genes of this locus is referred to as the gene or genotype frequency of i1. The frequency sum of all alleles at a single locus is 1. Gene frequency can be obtained from family or population surveys.

If the genotyping results of *N* individuals are known, then the frequency of gene i in the population can be obtained by a simple counting method. Assuming that this value is *X*, the

Assume that there are two alleles at autosomal locus I, i1 and i2. For diploids, it is possible to form three genotypes: i1i1, i1i2, and i2i2. After surveying 100 individuals, possible count results are presented per genotype in the following table (Table 10). In total, there are 200 genes at locus I: 36 individuals have i1i1, and the count of gene i1 is 72; 48 individuals have i1i2, and the count of genes i1 and i2 is 48; 16 individuals have i2i2, and the count of gene i2 is 32. Therefore, the gene frequency of i1 is calculated as (72+48)/200=0.6; similarly, the gene frequency of i2 is

In the HLA system, each locus usually has several alleles. If we want to calculate the frequency of a certain allele at the locus, which can be expressed as i1, then the meaning of i2 is non-i1

*CD*

*<sup>N</sup>* <sup>=</sup> *fi*\*(1 - *fi*) *N*

> The relationship between the phenotype frequency and the corresponding genotype frequency can be obtained according to the Hardy-Weinberg law (see the table below).


**Table 11.** Relationship between phenotype and genotype frequencies

We can deduce from the table

*pi* =1 - 1 - *fi*

where *pi* is the gene frequency of gene I, and *fi* is the frequency of the phenotype or antigen containing gene i. This formula is often used for estimation of HLA gene frequency, and its form can be changed.

**Phenotype Haplotype combination mode**

A9-B13 % A2-B13 or A9-B13 % A2-B-

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

19

and then incorporate items with identical

Antigen j Total

<sup>2</sup> *N*

2

Father A10,11; B5,15 A11-B5 & A10-B15

Child 1 A9,11; B5,13 A11-B5 & A9-B13 Child 2 A9,10; B13,15 A10-B15 & A9-B13 Child 3 A9,11; B5,13 A11-B5 & A9-B13

Assume that codominant genes i and j are located at two different loci, and other blank genes at these two loci are expressed as 0. Therefore, it is possible to have four haplotypes (ij, j0, i0, and 00) in the population, and the frequencies of these four haplotypes are expressed as *s*, *t*, *u*, and *v*, respectively. The relationship between haplotype frequency and gene frequency is illustrated in Table 7. The actual observed value of the distribution of antigen ij in the popu‐

The expected distribution of antigens i and j in the population can be expressed with the pattern in Table 13, and the expected values in the table are obtained according to Hardy-Weinberg

2

+ -

+ (*2s-s2+2tu*)*N* (*u2+2uv*)*N N*[1-(*t+v*)]2 - (*t2+2tv*)*N v2N N*(*t+v*)

2] *N*(*u+v*)

Mother A2,9; B13

*3.6.2. Estimation of haplotype frequency according to phenotype frequency*

**Table 12.** A family's HLA typing results

lation is presented in Table 6.

phenotypes. *N* is the sample size.

Antigen i

equilibrium. Thus, we can expand (*s+t+u+v*)

Total *N*[1-(*u+v*)

After changing the formula, the gene frequency is expressed as:

**Table 13.** Expected values related to the distribution of antigen ij

Frequency of haplotype ij, *s* = *p <sup>j</sup>* - *qi* + *d* / *N*

Frequency of haplotype j0, *t* =*qi* - *d* / *N*

Frequency of haplotype i0, *u* =*q <sup>j</sup>* - *d* / *N*

Frequency of haplotype 00, *v* = *d* / *N*

$$p\_i = 1 - \sqrt{1 - C/N}$$

Or *pi* =1 - *D* / *N*

The definitions of *C*, *D*, and *N* in the formula are the same as above, and the standard error of *pi* is:

$$\sigma \circ p\_i = \frac{1}{2} \bullet \frac{\sqrt{\frac{f\_i}{N}}}{N} = \frac{\sqrt{N \cdot D}}{2N}$$

It should be noted that when the pi value is small, it can be calculated by the following formula:

$$p\_i \approx \frac{f\_i}{2}$$

### **3.6. Haplotype frequency**

For double-loci multiple-allele genetic systems, each chromosome has two alleles belonging to two different loci, and the combination of these different alleles forms variant haplotypes. The ratio or percentage of each haplotype in the population is referred to as the frequency of this haplotype. The sum of all haplotype frequencies is 1.

#### *3.6.1. Calculation of haplotype frequency by direct haplotype count*

When an individual's haplotype is known, the haplotype frequency can be calculated by a simple counting method, and the calculation method and technology are the same as those for calculation of gene frequency. However, haplotypes can often only be obtained by family surveys, and HLA haplotypes cannot be fully determined in some families. During data analysis, rejection of these individuals may cause error. In this case, the relative haplotype frequency can be estimated by referring to the population survey results. For example, in Table 12 below, whether the mother's haplotype is A9-B13/A2-B13 or A9-B13/A2-B- cannot be fully determined by family analysis. The haplotype can only be determined by estimation of relative frequency. Assume we know that the frequency of haplotype A2-B13 is 0.0356 and that of haplotype A2-B- is 0.0559 from population survey data. Because the mother can only have these two haplotypes, the relative frequency of A2-B13 is 0.0356/(0.0356+ 0.0559)=0.39 and that of A2-B- is 0.0559/(0.0356+ 0.0559)=0.61. During counting, these haplotypes should be counted as 0.39 A2-B13 and 0.61 A2-B-, respectively.

In practical applications, due to advances in HLA genotyping methods, especially the wide‐ spread use of sequencing-based typing methods, high-resolution HLA results are compre‐ hensively adopted; when there is only one allele that is detected at the locus of a certain gene, it is often considered a homozygous allele.


**Table 12.** A family's HLA typing results

where *pi*

*pi* is:

<sup>σ</sup>*pi* <sup>=</sup> <sup>1</sup> 2 *f i <sup>N</sup>* <sup>=</sup> *<sup>N</sup>* - *<sup>D</sup>* 2*N*

*pi* <sup>≈</sup> *<sup>f</sup> <sup>i</sup>* 2

form can be changed.

18 HLA and Associated Important Diseases

**3.6. Haplotype frequency**

*pi* =1 - 1 - *C* / *N*

Or *pi* =1 - *D* / *N*

is the gene frequency of gene I, and *fi*

this haplotype. The sum of all haplotype frequencies is 1.

as 0.39 A2-B13 and 0.61 A2-B-, respectively.

it is often considered a homozygous allele.

*3.6.1. Calculation of haplotype frequency by direct haplotype count*

containing gene i. This formula is often used for estimation of HLA gene frequency, and its

The definitions of *C*, *D*, and *N* in the formula are the same as above, and the standard error of

It should be noted that when the pi value is small, it can be calculated by the following formula:

For double-loci multiple-allele genetic systems, each chromosome has two alleles belonging to two different loci, and the combination of these different alleles forms variant haplotypes. The ratio or percentage of each haplotype in the population is referred to as the frequency of

When an individual's haplotype is known, the haplotype frequency can be calculated by a simple counting method, and the calculation method and technology are the same as those for calculation of gene frequency. However, haplotypes can often only be obtained by family surveys, and HLA haplotypes cannot be fully determined in some families. During data analysis, rejection of these individuals may cause error. In this case, the relative haplotype frequency can be estimated by referring to the population survey results. For example, in Table 12 below, whether the mother's haplotype is A9-B13/A2-B13 or A9-B13/A2-B- cannot be fully determined by family analysis. The haplotype can only be determined by estimation of relative frequency. Assume we know that the frequency of haplotype A2-B13 is 0.0356 and that of haplotype A2-B- is 0.0559 from population survey data. Because the mother can only have these two haplotypes, the relative frequency of A2-B13 is 0.0356/(0.0356+ 0.0559)=0.39 and that of A2-B- is 0.0559/(0.0356+ 0.0559)=0.61. During counting, these haplotypes should be counted

In practical applications, due to advances in HLA genotyping methods, especially the wide‐ spread use of sequencing-based typing methods, high-resolution HLA results are compre‐ hensively adopted; when there is only one allele that is detected at the locus of a certain gene,

is the frequency of the phenotype or antigen

#### *3.6.2. Estimation of haplotype frequency according to phenotype frequency*

Assume that codominant genes i and j are located at two different loci, and other blank genes at these two loci are expressed as 0. Therefore, it is possible to have four haplotypes (ij, j0, i0, and 00) in the population, and the frequencies of these four haplotypes are expressed as *s*, *t*, *u*, and *v*, respectively. The relationship between haplotype frequency and gene frequency is illustrated in Table 7. The actual observed value of the distribution of antigen ij in the popu‐ lation is presented in Table 6.

The expected distribution of antigens i and j in the population can be expressed with the pattern in Table 13, and the expected values in the table are obtained according to Hardy-Weinberg equilibrium. Thus, we can expand (*s+t+u+v*) 2 and then incorporate items with identical phenotypes. *N* is the sample size.


**Table 13.** Expected values related to the distribution of antigen ij

After changing the formula, the gene frequency is expressed as:

Frequency of haplotype ij, *s* = *p <sup>j</sup>* - *qi* + *d* / *N*

Frequency of haplotype j0, *t* =*qi* - *d* / *N*

Frequency of haplotype i0, *u* =*q <sup>j</sup>* - *d* / *N*

Frequency of haplotype 00, *v* = *d* / *N*

If expressing as antigen frequency,

$$\begin{aligned} s &= \sqrt{d/N} + 1 \cdot \sqrt{1 - f\_j} \cdot \sqrt{1 - f\_i} \\ t &= \sqrt{1 - f\_i} \cdot \sqrt{d/N} \\ u &= \sqrt{1 - f\_j} \cdot \sqrt{d/N} \\ v &= \sqrt{d/N} \end{aligned}$$

If expressing as phenotype amount,

$$\begin{aligned} s &= \sqrt{d/N} + 1 \cdot \sqrt{B/N} \cdot \sqrt{D/N} \\ t &= \sqrt{D/N} \cdot \sqrt{d/N} \\ u &= \sqrt{B/N} \cdot \sqrt{d/N} \\ v &= \sqrt{d/N} \end{aligned}$$

The standard error for this equation can be calculated as:

$$\begin{aligned} \mathbf{c}\_{\rm CS} &= \sqrt{\left(1 \cdot \sqrt{d \cdot D}\right)\left(1 \cdot \sqrt{d \cdot D}\right) + s \cdot s^2 / 2} \Big/ 2N \\\\ \mathbf{c}\_{\rm CI} &= \sqrt{\left(1 \cdot \sqrt{d \cdot D}\right) \cdot \mathbf{t}^2 / 2 \Big/ 2N} \Big/ 2N \\\\ \mathbf{c}\_{\rm CI} &= \sqrt{\left(1 \cdot \sqrt{d \cdot D}\right) \cdot \mathbf{u}^2 / 2 \Big/ 2N} \Big/ 2N \\\\ \mathbf{c}\_{\rm CI} &= \frac{1}{2}\sqrt{\left(1 \cdot \mathbf{v}^2\right) / N} \end{aligned}$$

and the standard error of "*s*" can be expressed as:

<sup>σ</sup>*<sup>s</sup>* <sup>=</sup> (1 - *<sup>v</sup>* / (*<sup>t</sup>* <sup>+</sup> *<sup>v</sup>*))(1 - *<sup>v</sup>* / (*<sup>u</sup>* <sup>+</sup> *<sup>v</sup>*)) <sup>+</sup> *<sup>s</sup>* - *<sup>s</sup>* <sup>2</sup> / <sup>2</sup> / 2= ( *pi* - *<sup>s</sup>* 1 - *p <sup>j</sup>* )( *<sup>p</sup> <sup>j</sup>* - *<sup>s</sup>* 1 - *pi* ) + *s* - *s* <sup>2</sup> / 2 / 2

If a haplotype contains a blank gene, such as A1-B-, then the frequency is equal to the sum of the gene frequency of A1 and the haplotype frequencies of the other alleles at locus B. The haplotype frequency calculated according to the above calculation formula for phenotype data may be a negative value sometimes; this is caused by inadequate sample size and sampling error.

frequencies of i and j, which is expressed as *Δ*. If the observed frequency of haplotype ij is "*s*",

When the genotypes and haplotypes of every individual in the population are known, the *Δ* value can be easily obtained by the counting method. However, it is generally necessary to

> *pj =sv-tu*.

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

21

estimate the *Δ* value from population survey data, i.e., *Δij=s-pi*

The following formulas are commonly used:

)(1 - *f <sup>i</sup>* )

The standard error of Δ can be calculated as:

<sup>2</sup> *BD* - *BD*

<sup>4</sup>*<sup>N</sup>* <sup>2</sup> (B <sup>+</sup> <sup>D</sup> <sup>+</sup> d) - *<sup>d</sup>*

*<sup>N</sup>* )

2*N N*

If *tu*=0, *Δij* has the maximal value; if t or u is 0, *Δij* is equal to *pj*

is used to calculate *Δij*(Max), i.e.:

( *<sup>D</sup> <sup>B</sup>* +

*Δ / σ(Δ)* ≥ 1.96 is generally considered to be of significant linkage disequilibrium.

*B <sup>D</sup>* ) - *BD <sup>N</sup>* <sup>3</sup> +

There is no comparability among absolute Δ values, so relative Δ values, i.e., "*Δ(r)*", are

*BDd N* <sup>2</sup> *N*

> *qi* or *pi qj*

, and the lower value

*<sup>N</sup>* ( *<sup>B</sup>* <sup>+</sup> *<sup>D</sup>*

generally calculated for comparison.

From the calculation formula of

and *pj*

.

(1− *pj* );

If s = 0, the negative Δij has the maximal value, and

(1-pi );

:*Δij*(*Max*)= *pi*

:Δij(Max)=pj

*Δij*(*Max*)= − *pj pi*

1

then *Δij=s-pi*

Δ*ij* = *d* / *N* - *qi*

<sup>σ</sup>(Δ) <sup>=</sup> *<sup>α</sup>*

<sup>σ</sup>(Δ)= <sup>1</sup>

or as:

Δ*ij* = *d* / *N* - (1 - *f <sup>j</sup>*

Δ*ij* = *d* / *N* - BD / *N*

<sup>4</sup>*<sup>N</sup>* <sup>2</sup> - <sup>Δ</sup>

<sup>4</sup>*<sup>N</sup>* +

*3.7.2. Relative Δ value*

*Δij*(*r*)=*Δij*(*r*) / *Δij*(*Max*)

*Δij* =*s* − *pi pj* =*sv* −*tu*

we can see:

between *pi*

pi <qj

pi >qj *pj* .

*q j*

#### **3.7. Linkage disequilibrium parameter**

#### *3.7.1. Linkage disequilibrium*

Linkage disequilibrium is controlled by inconsistency of the observed and expected values about the appearance of antigens at two linked loci in the same haplotype. Assuming that the genes at two linked loci are i and j, the linkage disequilibrium parameter is defined as the difference between the actual observed haplotype frequency of ij and the product of gene frequencies of i and j, which is expressed as *Δ*. If the observed frequency of haplotype ij is "*s*", then *Δij=s-pi pj* .

When the genotypes and haplotypes of every individual in the population are known, the *Δ* value can be easily obtained by the counting method. However, it is generally necessary to estimate the *Δ* value from population survey data, i.e., *Δij=s-pi pj =sv-tu*.

The following formulas are commonly used:

$$\begin{aligned} \Delta \dot{\boldsymbol{q}} &= \sqrt{d \/ N} \, \cdot \, q\_i q\_j \\\\ \Delta \dot{\boldsymbol{q}} &= \sqrt{d \/ N} \, \cdot \, \sqrt{(1 - \boldsymbol{f}\_j)(1 - \boldsymbol{f}\_i)} \end{aligned}$$

$$\Delta \dot{\boldsymbol{q}} = \sqrt{d \/ N} \, \cdot \, \sqrt{\text{BD}} \, \Big/ N$$

The standard error of Δ can be calculated as:

$$\sigma\_{\mathcal{O}}\left(\Delta\right) = \sqrt{\frac{\alpha}{4N^{-2}} - \frac{\Delta}{N} \left(\frac{B+D}{2\sqrt{BD}} - \frac{\sqrt{BD}}{N}\right)}$$

or as:

If expressing as antigen frequency,

If expressing as phenotype amount,

The standard error for this equation can be calculated as:

σ*s* = (1 - *d* / *B*)(1 - *d* / *D*) + *s* - *s* <sup>2</sup> / 2 / 2*N*

and the standard error of "*s*" can be expressed as:

**3.7. Linkage disequilibrium parameter**

*3.7.1. Linkage disequilibrium*

<sup>σ</sup>*<sup>s</sup>* <sup>=</sup> (1 - *<sup>v</sup>* / (*<sup>t</sup>* <sup>+</sup> *<sup>v</sup>*))(1 - *<sup>v</sup>* / (*<sup>u</sup>* <sup>+</sup> *<sup>v</sup>*)) <sup>+</sup> *<sup>s</sup>* - *<sup>s</sup>* <sup>2</sup> / <sup>2</sup> / 2= ( *pi* - *<sup>s</sup>*

1 - *p <sup>j</sup>*

If a haplotype contains a blank gene, such as A1-B-, then the frequency is equal to the sum of the gene frequency of A1 and the haplotype frequencies of the other alleles at locus B. The haplotype frequency calculated according to the above calculation formula for phenotype data may be a negative value sometimes; this is caused by inadequate sample size and sampling

Linkage disequilibrium is controlled by inconsistency of the observed and expected values about the appearance of antigens at two linked loci in the same haplotype. Assuming that the genes at two linked loci are i and j, the linkage disequilibrium parameter is defined as the difference between the actual observed haplotype frequency of ij and the product of gene

)( *<sup>p</sup> <sup>j</sup>* - *<sup>s</sup>* 1 - *pi*

) + *s* - *s* <sup>2</sup> / 2 / 2

*s* = *d* / *N* + 1 - *B* / *N* - *D* / *N*

σ*t* = (1 - *d* / *D*) - *t* <sup>2</sup> / 2 / 2*N*

σ*u* = (1 - *d* / *B*) - *u* <sup>2</sup> / 2 / 2*N*

<sup>2</sup> (1 - *<sup>v</sup>*2) / *<sup>N</sup>*

*s* = *d* / *N* + 1 - 1 - *f <sup>j</sup>* - 1 - *f <sup>i</sup>*

20 HLA and Associated Important Diseases

*t* = 1 - *f <sup>i</sup>* - *d* / *N*

*u* = 1 - *f <sup>j</sup>* - *d* / *N*

*t* = *D* / *N* - *d* / *N*

*u* = *B* / *N* - *d* / *N*

*v* = *d* / *N*

*v* = *d* / *N*

<sup>σ</sup>*<sup>v</sup>* <sup>=</sup> <sup>1</sup>

error.

$$\sigma(\Delta) = \sqrt{\frac{1}{4N} + \frac{1}{4N^{\frac{1}{2}}}(\mathbf{B} + \mathbf{D} + \mathbf{d}) \cdot \frac{\sqrt{d}}{2N\sqrt{N}} \left(\sqrt{\frac{D}{B}} + \sqrt{\frac{B}{D}}\right) \cdot \frac{BD}{N^{\frac{3}{2}}} + \frac{\sqrt{BDd}}{N^{\frac{2}{3}}\sqrt{N}}}$$

*Δ / σ(Δ)* ≥ 1.96 is generally considered to be of significant linkage disequilibrium.

#### *3.7.2. Relative Δ value*

There is no comparability among absolute Δ values, so relative Δ values, i.e., "*Δ(r)*", are generally calculated for comparison.

$$
\Delta\_{\vec{\eta}(r)} = \Delta\_{\vec{\eta}(r)} / \Delta\_{\vec{\eta}(\text{Max})},
$$

From the calculation formula of

$$
\Delta\_{ij} = \mathbf{s} - p\_i p\_j = \mathbf{s}\boldsymbol{\upsilon} - t\boldsymbol{\mu}
$$

we can see:

If *tu*=0, *Δij* has the maximal value; if t or u is 0, *Δij* is equal to *pj qi* or *pi qj* , and the lower value between *pi* and *pj* is used to calculate *Δij*(Max), i.e.:

$$\mathbf{p}\_{i} \mathbf{<} \mathbf{q}\_{\mathfrak{j}} \colon \boldsymbol{\Delta}\_{i \circ (\mathbf{Max})} = p\_{i} (1 - p\_{\mathfrak{j}}) \mathbf{x}$$

pi >qj :Δij(Max)=pj (1-pi );

If s = 0, the negative Δij has the maximal value, and

*Δij*(*Max*)= − *pj pi* .

#### **3.8. Genetic distance**

In order to quantitatively describe the process of generating genetic differences between two populations due to selection, mutation, migration, and random drift, the concept of genetic distance has been introduced.genetic distance is a measure of the gene frequency differences between populations, and it is used to describe interpopulation variance.

In 1977, Cavalli-Sforza and Bodmerthe defined the genetic distance (*d*) as:

$$d = \sqrt{1 - \sum\_{i} \sqrt{p\_{i1} p\_{i2}}}$$

where *pi1* and *pi2* are the frequencies of gene i in populations 1 and 2, respectively.

### **4. Software analysis of HLA data**

To conveniently implement haplotype frequency estimation, linkage disequilibrium, Hardy-Weinberg equilibrium, pairwise genetic distances, etc., of HLA data, computer software is usually required. There are severalprofessional statistical software and genetic analysis software programs.This chapter will introduce some common problems encountered when using software for HLA data analysis.

each other. Therefore, standard data haplotypes are compared for their content at each locus,

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

23

The first step for the analysis of your data isto prepare an input data file (project file) for *Arlequin*. Because *Arlequin* is a versatile program that is able to analyze several data types, you must include information about the property of your data together with the raw data into the

*Arlequin* project files contain a description of the data properties as well as the raw data

Input files are structured into two main sections with additional subsections that must appear

without regarding the nature of the alleles, which can either be similar or different.

project file. A text editor can be used to define your data using reserved keywords.

themselves. The project file may also refer to one or more external data files.

*4.1.1. Structure of an Arlequin input file*

in the following order (Figure 2): 1) Profile section (mandatory) 2) Data section (mandatory) 2a) Haplotype list (optional) 2b) Distance matrices (optional)

2c) Samples (mandatory)

2d) Genetic structure (optional)

*4.1.1.1. Input data file*

**Figure 1.** *Arlequin* software

#### **4.1. The processing method of HLA data analysis using** *Arlequin* **software**

*Arlequin* is the French translation of "Arlecchino," a famous Italian character from "Commedia dell'Arte." Arlecchinois a multi-faceted character, but he has the ability to switch among his various character assets very easily according to his needs and necessities. This polymorphic ability is symbolized by his colorful costume, from which the *Arlequin* icon was designed (Figure 1).

The goal of *Arlequin* is to provide the average user in population genetics with a large set of basic methods and statistical tests to extract information on genetic and demographic features of a collection of population samples.

*Arlequin* can handle several types of data either in haplotypic or genotypic form.


HLA data belong to "Standard data" in which the molecular basis of a polymorphism is not defined specifically, or when different alleles are considered mutationally equidistant from

**Figure 1.** *Arlequin* software

**3.8. Genetic distance**

22 HLA and Associated Important Diseases

*pi*<sup>1</sup> *pi*<sup>2</sup>

**4. Software analysis of HLA data**

using software for HLA data analysis.

of a collection of population samples.

*d* = 1 - ∑ *i*

(Figure 1).

**•** DNA sequences

**•** Microsatellite data

**•** Allele frequency data

**•** Standard data

**•** RFLP data

In order to quantitatively describe the process of generating genetic differences between two populations due to selection, mutation, migration, and random drift, the concept of genetic distance has been introduced.genetic distance is a measure of the gene frequency differences

To conveniently implement haplotype frequency estimation, linkage disequilibrium, Hardy-Weinberg equilibrium, pairwise genetic distances, etc., of HLA data, computer software is usually required. There are severalprofessional statistical software and genetic analysis software programs.This chapter will introduce some common problems encountered when

*Arlequin* is the French translation of "Arlecchino," a famous Italian character from "Commedia dell'Arte." Arlecchinois a multi-faceted character, but he has the ability to switch among his various character assets very easily according to his needs and necessities. This polymorphic ability is symbolized by his colorful costume, from which the *Arlequin* icon was designed

The goal of *Arlequin* is to provide the average user in population genetics with a large set of basic methods and statistical tests to extract information on genetic and demographic features

HLA data belong to "Standard data" in which the molecular basis of a polymorphism is not defined specifically, or when different alleles are considered mutationally equidistant from

*Arlequin* can handle several types of data either in haplotypic or genotypic form.

between populations, and it is used to describe interpopulation variance.

In 1977, Cavalli-Sforza and Bodmerthe defined the genetic distance (*d*) as:

where *pi1* and *pi2* are the frequencies of gene i in populations 1 and 2, respectively.

**4.1. The processing method of HLA data analysis using** *Arlequin* **software**

each other. Therefore, standard data haplotypes are compared for their content at each locus, without regarding the nature of the alleles, which can either be similar or different.

### *4.1.1. Structure of an Arlequin input file*

#### *4.1.1.1. Input data file*

The first step for the analysis of your data isto prepare an input data file (project file) for *Arlequin*. Because *Arlequin* is a versatile program that is able to analyze several data types, you must include information about the property of your data together with the raw data into the project file. A text editor can be used to define your data using reserved keywords.

*Arlequin* project files contain a description of the data properties as well as the raw data themselves. The project file may also refer to one or more external data files.

Input files are structured into two main sections with additional subsections that must appear in the following order (Figure 2):

1) Profile section (mandatory)

2) Data section (mandatory)


#### 2e) Mantel tests (optional)

#### *4.1.1.2. Profile section*

The data properties must be described in the profile section. The beginning of the profile section is indicated by the keyword [Profile] (within brackets).

**•** Thegametic phase of the genotype

**◦** Possible values: 0 (unknown gametic phase), 1 (known gametic phase)

**◦** Possible values: 0 (co-dominant data), 1 (recessive data)

**◦** For general HLA data analysis, the parameter is "0." If approaches such as pedigree analysis are used, and the HLA haplotype of each individual sample are given, "1" is used as the parameter.In the data input, one haplotype should be entered in the same

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

25

Because the HLA loci are codominant, "1" is used as the parameter when dealing with HLA

**◦** Possible values: Any string of characters within double quotation marks. This character string can be used explicitly in the input file to indicate the occurrence of a recessive

**◦** Possible values: A character used to specify the code for missing data, which can be

**•** The absolute or relative values of haplotype or phenotype frequencies

**◦** Notation: GameticPhase =

**◦** Example: GameticPhase = 1

**•** Indication of a recessive allele

**◦** Notation: RecessiveData =

**◦** Example: RecessiveData =1

**•** The code for the recessive allele

**◦** Notation: RecessiveAllele =

homozygote at one orseveral loci.

**•** The character used to code for missing data

**◦** entered between single or double quotes.

**◦** Example: RecessiveAllele ="xxx"

**◦** Default value: "null"

**◦** Notation: MissingData =

**◦** Example: MissingData ='\$'

**◦** Default value: '?'

**◦** Notation: Frequency =

**◦** Default value: 0

Data

**◦** Default value: 1

row.

The user must specify the following parameters:

	- **◦** Notation: Title=
	- **◦** Possible value: Any string of characters within double quotation marks
	- **◦** Example: Title="An analysis of haplotype frequencies in two populations"
	- **◦** Notation: NbSamples =
	- **◦** Example: NbSamples =3
	- **◦** The type of data to be analyzed. Only one type of data isallowed per project.
	- **◦** Notation: DataType =
	- **◦** Possible values: DNA, RFLP, MICROSAT, STANDARD, and FREQUENCY
	- **◦** Example: DataType = DNA
	- **◦** The parameter of "STANDARD" is used here when dealing with HLA Data.
	- **◦** Notation: GenotypicData =
	- **◦** Possible values: 0 (haplotypic data), 1 (genotypic data)
	- **◦** Example: GenotypicData = 0

This parameter is used to demonstrate whether haplotypic or genotypicdata are being used for the HLA data analysis.Unless specified, the parameter used here is usually "1."

Additionally, the user has the option to specify the following parameters:

	- **◦** Notation: LocusSeparator =
	- **◦** Possible values: WHITESPACE, TAB, NONE, any character other than "#", or a character
	- **◦** specifying missing data
	- **◦** Example: LocusSeparator = TAB
	- **◦** Default value: WHITESPACE
	- **◦** Notation: GameticPhase =
	- **◦** Possible values: 0 (unknown gametic phase), 1 (known gametic phase)
	- **◦** Example: GameticPhase = 1
	- **◦** Default value: 1

2e) Mantel tests (optional)

24 HLA and Associated Important Diseases

is indicated by the keyword [Profile] (within brackets).

**•** The title of the current project (used to describe the current analysis)

**•** The number of samples or populations present in the current project

**◦** Possible value: Any string of characters within double quotation marks

**◦** Example: Title="An analysis of haplotype frequencies in two populations"

**◦** The type of data to be analyzed. Only one type of data isallowed per project.

**◦** Possible values: DNA, RFLP, MICROSAT, STANDARD, and FREQUENCY

**◦** The parameter of "STANDARD" is used here when dealing with HLA Data.

This parameter is used to demonstrate whether haplotypic or genotypicdata are being used

**◦** Possible values: WHITESPACE, TAB, NONE, any character other than "#", or a character

for the HLA data analysis.Unless specified, the parameter used here is usually "1."

**•** The character used to separate the alleles at different loci (the locus separator)

Additionally, the user has the option to specify the following parameters:

The user must specify the following parameters:

The data properties must be described in the profile section. The beginning of the profile section

*4.1.1.2. Profile section*

**◦** Notation: Title=

**◦** Notation: NbSamples = **◦** Example: NbSamples =3

**◦** Notation: DataType =

**◦** Example: DataType = DNA

**◦** Notation: GenotypicData =

**◦** Example: GenotypicData = 0

**◦** Notation: LocusSeparator =

**◦** Example: LocusSeparator = TAB

**◦** Default value: WHITESPACE

**◦** specifying missing data

**•** Thetype of data that the project addresses

**◦** Possible values: 0 (haplotypic data), 1 (genotypic data)

	- **◦** Notation: RecessiveData =
	- **◦** Possible values: 0 (co-dominant data), 1 (recessive data)
	- **◦** Example: RecessiveData =1
	- **◦** Default value: 0

Because the HLA loci are codominant, "1" is used as the parameter when dealing with HLA Data

	- **◦** Notation: RecessiveAllele =
	- **◦** Possible values: Any string of characters within double quotation marks. This character string can be used explicitly in the input file to indicate the occurrence of a recessive homozygote at one orseveral loci.
	- **◦** Example: RecessiveAllele ="xxx"
	- **◦** Default value: "null"
	- **◦** Notation: MissingData =
	- **◦** Possible values: A character used to specify the code for missing data, which can be
	- **◦** entered between single or double quotes.
	- **◦** Example: MissingData ='\$'
	- **◦** Default value: '?'
	- **◦** Notation: Frequency =

**◦** Possible values: ABS (absolute values), REL (relative values: absolute values will be found by multiplying the relative frequencies by the sample sizes)

**•** The data

**◦** Example:

}

*4.1.1.4. Examples of input files*

[Profile]

NbSamples=1 GenotypicData=1 DataType=STANDARD

MissingData='?' GameticPhase=0 RecessiveData=1 RecessiveAllele="xxx"

SampleSize=63 SampleData={

[Data] [[Samples]]

LocusSeparator=WHITESPACE

SampleName="Population 1"

MAN0102 12 A33 Cw10 B70 DR1304 DQ0301 A33 Cw10 B7801 DR1304 DQ0302 MAN0103 22 A33 Cw10 B70 DR1301 DQ0301 A33 Cw10 B7801 DR1302 DQ0501 MAN0108 23 A23 Cw6 B35 DR1102 DQ0301

**◦** Notation: SampleData =

SampleData={

sample, which is entered within braces.

**◦** Possible values: A list of haplotypes or genotypes and their frequencies contained in the

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

27

MAN0102 1 A33 Cw10 B70 #pseudo-haplotypes

If the gametic phase is known, the pseudo-haplotypes are treated as truly defined haplotypes.

If the gametic phase is unknown, then only the allelic content of each locus is known.

(1) Example of standard data (genotypic data, unknown gametic phase, recessive alleles)

single locus homozygotes are considered potential heterozygotes with a null allele.

Title="Genotypic Data, Phase Unknown, 5 HLA loci"

In this example, the individual genotypes for 5 HLA loci are entered on two separate lines. In this example, the gametic phase between loci is unknown, and the data contains a recessive allele, which has been defined specifically as "xxx". Notably, with recessive data, all of the

A33 Cw10 B7801 #the second pseudo-haplotype

	- **◦** Notation: FrequencyThreshold =
	- **◦** Possible values: A real number between 1e-2 and 1e-7
	- **◦** Example: FrequencyThreshold = 0.00001
	- **◦** Default value: 1e-5
	- **◦** Notation: EpsilonValue =
	- **◦** Possible values: A real number between 1e-7 and 1e-12.
	- **◦** Example: EpsilonValue = 1e-10
	- **◦** Default value: 1e-7

#### *4.1.1.3. Data section*

In this obligatory subsection, the user defines the haplotypic or genotypic content of the different samples to be analyzed. Each sample definition begins with the keyword Sample‐ Name and ends after the SampleData have been defined.

The user must specify the following parameters:

	- **◦** Notation: SampleName =
	- **◦** Possible values: Any string of characters within quotation marks
	- **◦** Example: SampleName= "A first example of a sample name"
	- **◦** Notation: SampleSize =
	- **◦** Possible values: Any integer value
	- **◦** Example: SampleSize=732.

Note: For haplotypic data, the sample size is equal to the haploid sample size. For genotypic data, the sample size should be equal to the number of diploid individuals present in the sample.

**•** The data

**◦** Possible values: ABS (absolute values), REL (relative values: absolute values will be found

**•** The convergence criterion for the EM algorithm used to estimate haplotype frequencies and

In this obligatory subsection, the user defines the haplotypic or genotypic content of the different samples to be analyzed. Each sample definition begins with the keyword Sample‐

Note: For haplotypic data, the sample size is equal to the haploid sample size. For genotypic data, the sample size should be equal to the number of diploid individuals

by multiplying the relative frequencies by the sample sizes)

**•** The number of significant digits for haplotype frequency outputs

**◦** Possible values: A real number between 1e-2 and 1e-7

**◦** Possible values: A real number between 1e-7 and 1e-12.

Name and ends after the SampleData have been defined.

**◦** Possible values: Any string of characters within quotation marks

**◦** Example: SampleName= "A first example of a sample name"

The user must specify the following parameters:

**◦** Example: Frequency = ABS

**◦** Notation: FrequencyThreshold =

**◦** Example: FrequencyThreshold = 0.00001

linkage disequilibrium from genotypic data

**◦** Default value: ABS

26 HLA and Associated Important Diseases

**◦** Default value: 1e-5

**◦** Default value: 1e-7

**•** A name for each sample

**•** Thesample size

**◦** Notation: SampleName =

**◦** Notation: SampleSize =

**◦** Example: SampleSize=732.

present in the sample.

**◦** Possible values: Any integer value

*4.1.1.3. Data section*

**◦** Notation: EpsilonValue =

**◦** Example: EpsilonValue = 1e-10


 SampleData={ MAN0102 1 A33 Cw10 B70 #pseudo-haplotypes A33 Cw10 B7801 #the second pseudo-haplotype }

If the gametic phase is known, the pseudo-haplotypes are treated as truly defined haplotypes. If the gametic phase is unknown, then only the allelic content of each locus is known.

#### *4.1.1.4. Examples of input files*

(1) Example of standard data (genotypic data, unknown gametic phase, recessive alleles)

In this example, the individual genotypes for 5 HLA loci are entered on two separate lines. In this example, the gametic phase between loci is unknown, and the data contains a recessive allele, which has been defined specifically as "xxx". Notably, with recessive data, all of the single locus homozygotes are considered potential heterozygotes with a null allele.

```
[Profile]
Title="Genotypic Data, Phase Unknown, 5 HLA loci"
NbSamples=1
GenotypicData=1
DataType=STANDARD
LocusSeparator=WHITESPACE
MissingData='?'
GameticPhase=0
RecessiveData=1
RecessiveAllele="xxx"
[Data]
[[Samples]]
SampleName="Population 1" 
SampleSize=63
SampleData={
MAN0102 12 A33 Cw10 B70 DR1304 DQ0301
 A33 Cw10 B7801 DR1304 DQ0302
MAN0103 22 A33 Cw10 B70 DR1301 DQ0301
 A33 Cw10 B7801 DR1302 DQ0501
MAN0108 23 A23 Cw6 B35 DR1102 DQ0301
```

```
 A29 Cw7 B57 DR1104 DQ0602
MAN0109 6 A30 Cw4 B35 DR0801 xxx
 A68 Cw4 B35 DR0801 xxx
  }
```
(2) Example of standard data (genotypic data, known gametic phase)

In this example, three samples that consist of standard multi-loci data with known gametic phase have been defined. Therefore, the alleles listed on the same line constitute a haplotype on a given chromosome. For example, the genotype G1 consists of the following two haplo‐ types: A23-Cw6 on one chromosome and A29-Cw7 on the second.

**Locus by locus**: Perform a separate HWE test for each locus.

**Figure 2.** Settings for the Hardy-Weinberg equilibrium test

(1) Allele frequency, genotype frequency, and haplotype frequency

EM algorithms can be performed at the following levels:

When genotypic data with an unknown gametic phase is being processed, two methods can be employed to infer haplotypes: the Expectation-Maximization (EM) algorithm (maximumlikelihood (ML)), which is the most commonly used, or the ELB algorithm (Bayesian).

When the gametic phase is not known or when recessive alleles are present, the ML haplotype frequencies are estimated from the observed data using an EM algorithm for multi-locus genotypic data. The settings are provided in Figure 4, and the results are shown in Figure 5.1,

**Haplotype level**: Estimate haplotype frequencies for haplotypes defined by alleles at all loci.

*4.1.2.2. The calculation of genetic parameters*

5.2, and 5.3.

available).

available)

**Whole haplotype**: Perform an HWE test at the haplotype level (if the gametic phase is

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

29

**Locus by locus and whole haplotype**: Perform both types of tests (if the gametic phase is

```
[Profile]
Title="An example of genotypic data with known gametic phase"
NbSamples=3
GenotypicData=1
GameticPhase=1
RecessiveData=0
DataType=STANDARD
LocusSeparator=WHITESPACE
[Data]
[[Samples]]
SampleName="standard_pop1"
SampleSize=20
SampleData= {
G1 4 A23 Cw6
 A29 Cw7
G2 5 A30 Cw4 
 A68 Cw4
}
```
#### *4.1.2. The calculation of Hardy-Weinberg equilibrium and genetic parameters*

#### *4.1.2.1. The calculation of Hardy-Weinberg equilibrium*

Performing an exact test of Hardy-Weinberg equilibrium (HWE) tests the hypothesis that the observed diploid genotypes are the product of a random union of gametes. This test is only possible for genotypic data, and separate tests are carried out at each locus. This test is analogous to Fisher's exact test on a two-by-two contingency table but extended to a contin‐ gency table of arbitrary size. If the gametic phase is unknown, then the test is only possible locus by locus. For data with a known gametic phase, the association at the haplotypic level within individuals can be tested.

The settings for the Hardy-Weinberg equilibrium test are displayed in Figure 2, and the output results are provided in Figure 3:

**Locus by locus**: Perform a separate HWE test for each locus.

 A29 Cw7 B57 DR1104 DQ0602 MAN0109 6 A30 Cw4 B35 DR0801 xxx A68 Cw4 B35 DR0801 xxx

(2) Example of standard data (genotypic data, known gametic phase)

types: A23-Cw6 on one chromosome and A29-Cw7 on the second.

*4.1.2. The calculation of Hardy-Weinberg equilibrium and genetic parameters*

Performing an exact test of Hardy-Weinberg equilibrium (HWE) tests the hypothesis that the observed diploid genotypes are the product of a random union of gametes. This test is only possible for genotypic data, and separate tests are carried out at each locus. This test is analogous to Fisher's exact test on a two-by-two contingency table but extended to a contin‐ gency table of arbitrary size. If the gametic phase is unknown, then the test is only possible locus by locus. For data with a known gametic phase, the association at the haplotypic level

The settings for the Hardy-Weinberg equilibrium test are displayed in Figure 2, and the output

In this example, three samples that consist of standard multi-loci data with known gametic phase have been defined. Therefore, the alleles listed on the same line constitute a haplotype on a given chromosome. For example, the genotype G1 consists of the following two haplo‐

Title="An example of genotypic data with known gametic phase"

}

28 HLA and Associated Important Diseases

[Profile]

NbSamples=3 GenotypicData=1 GameticPhase=1 RecessiveData=0 DataType=STANDARD

[Data] [[Samples]]

}

SampleSize=20 SampleData= { G1 4 A23 Cw6 A29 Cw7 G2 5 A30 Cw4 A68 Cw4

within individuals can be tested.

results are provided in Figure 3:

LocusSeparator=WHITESPACE

SampleName="standard\_pop1"

*4.1.2.1. The calculation of Hardy-Weinberg equilibrium*

**Whole haplotype**: Perform an HWE test at the haplotype level (if the gametic phase is available).

**Locus by locus and whole haplotype**: Perform both types of tests (if the gametic phase is available)


**Figure 2.** Settings for the Hardy-Weinberg equilibrium test

*4.1.2.2. The calculation of genetic parameters*

(1) Allele frequency, genotype frequency, and haplotype frequency

When genotypic data with an unknown gametic phase is being processed, two methods can be employed to infer haplotypes: the Expectation-Maximization (EM) algorithm (maximumlikelihood (ML)), which is the most commonly used, or the ELB algorithm (Bayesian).

When the gametic phase is not known or when recessive alleles are present, the ML haplotype frequencies are estimated from the observed data using an EM algorithm for multi-locus genotypic data. The settings are provided in Figure 4, and the results are shown in Figure 5.1, 5.2, and 5.3.

EM algorithms can be performed at the following levels:

**Haplotype level**: Estimate haplotype frequencies for haplotypes defined by alleles at all loci.


**Figure 3.** Results of the Hardy-Weinberg equilibrium test

**Locus level**: Estimate allele frequencies for each locus.

**Haplotype and locus levels**: The previous two options are performed in succession.

**Figure 5.** Results of allele frequency, genotype frequencyand haplotype frequency

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

31

**Figure 4.** Settings for EM algorithmwith unknown gametic phase


**Locus level**: Estimate allele frequencies for each locus.

**Figure 3.** Results of the Hardy-Weinberg equilibrium test

30 HLA and Associated Important Diseases

**Figure 4.** Settings for EM algorithmwith unknown gametic phase

**Haplotype and locus levels**: The previous two options are performed in succession.


**Figure 5.** Results of allele frequency, genotype frequencyand haplotype frequency

The settings when process in haplotypic data or genotypic (diploid) data with a known gametic phase are displayed in Figure 6, and the contents of the output results are provided in Figures 5.1, 5.2, and 5.3. The following parameters can be used in the process.

(2) The estimation of linkage disequilibrium parameters

7, and results of the calculation are shown in Figures 8.1 and 8.2.

gency table but extended to a contingency table of arbitrary size.

represents the above section mentioned relative *Δ* value.

**Figure 7.** Settings for linkage disequilibrium

*r2*

The settings when processing data where the gametic phase is known are provided in Figure

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

33

**Linkage disequilibrium between all pairs of loci**: The user can test for the presence of a significant association between pairs of loci based on an exact test of linkage disequilibrium. The number of loci can be arbitrary, but if there are less than two polymorphic loci, this test is not applicable. The test procedure is analogous to Fisher's exact test on a two-by-two contin‐

**LD coefficients between pairs of alleles at different loci**: Using this parameter, the *D*, *D'*, and

coefficients between all pairs can be calculated. *D'* is the most commonly used coefficientand

**Use original definition**: Haplotypes are identified according to their original identifier without considering that haplotype molecular definitions could be identical.

**Infer from distance matrix**: Similar haplotypes will be identified by computing a molecular distance matrix between haplotypes.

Haplotype frequency estimation:

**Estimate haplotype frequencies by mere counting**: Estimate the ML haplotype frequencies from the observed data using a mere gene counting procedure.

**Estimate allele frequencies at all loci**: Estimate allele frequencies at all loci separately.


**Figure 6.** Settings for Haplotype inferencewith a known gametic phase

(2) The estimation of linkage disequilibrium parameters

The settings when process in haplotypic data or genotypic (diploid) data with a known gametic phase are displayed in Figure 6, and the contents of the output results are provided in Figures

**Use original definition**: Haplotypes are identified according to their original identifier

**Infer from distance matrix**: Similar haplotypes will be identified by computing a molecular

**Estimate haplotype frequencies by mere counting**: Estimate the ML haplotype frequencies

**Estimate allele frequencies at all loci**: Estimate allele frequencies at all loci separately.

5.1, 5.2, and 5.3. The following parameters can be used in the process.

from the observed data using a mere gene counting procedure.

**Figure 6.** Settings for Haplotype inferencewith a known gametic phase

distance matrix between haplotypes.

Haplotype frequency estimation:

32 HLA and Associated Important Diseases

without considering that haplotype molecular definitions could be identical.

The settings when processing data where the gametic phase is known are provided in Figure 7, and results of the calculation are shown in Figures 8.1 and 8.2.

**Linkage disequilibrium between all pairs of loci**: The user can test for the presence of a significant association between pairs of loci based on an exact test of linkage disequilibrium. The number of loci can be arbitrary, but if there are less than two polymorphic loci, this test is not applicable. The test procedure is analogous to Fisher's exact test on a two-by-two contin‐ gency table but extended to a contingency table of arbitrary size.

**LD coefficients between pairs of alleles at different loci**: Using this parameter, the *D*, *D'*, and *r2* coefficients between all pairs can be calculated. *D'* is the most commonly used coefficientand represents the above section mentioned relative *Δ* value.


**Figure 7.** Settings for linkage disequilibrium


**Figure 9.** Settings for linkage disequilibrium with unkown phase

**Figure 10.** Results oflinkage disequilibrium with unkown phase

*Arlequin* provides several calculation methods to determine genetic distance, including *Reynolds'* distance, *Slatkin's* linearized coefficient, *Nei's* genetic distance, etc. *Nei's* genetic distance and the *Cavalli-Sforza* genetic distance calculating methods are the most commonly

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

35

(3) The calculation of genetic distance

**Figure 8.** Results oflinkage disequilibrium

When the gametic phase is unknown, a different procedure for testing the significance of the association between pairs of loci is used. The procedure is based on a likelihood ratio test, where the likelihood of the sample evaluated under the hypothesis of no association between loci (linkage equilibrium) is compared with the likelihood of the sample when association is allowed. The significance of the observed likelihood ratio is found by computing the null distribution of this ratio under the hypothesis of linkage equilibrium, using a permutation procedure. The settings for this procedure are shown in Figure 9, and the output results are provided in Figure 10.


**Figure 9.** Settings for linkage disequilibrium with unkown phase


**Figure 10.** Results oflinkage disequilibrium with unkown phase

(3) The calculation of genetic distance

**Figure 8.** Results oflinkage disequilibrium

34 HLA and Associated Important Diseases

provided in Figure 10.

When the gametic phase is unknown, a different procedure for testing the significance of the association between pairs of loci is used. The procedure is based on a likelihood ratio test, where the likelihood of the sample evaluated under the hypothesis of no association between loci (linkage equilibrium) is compared with the likelihood of the sample when association is allowed. The significance of the observed likelihood ratio is found by computing the null distribution of this ratio under the hypothesis of linkage equilibrium, using a permutation procedure. The settings for this procedure are shown in Figure 9, and the output results are

*Arlequin* provides several calculation methods to determine genetic distance, including *Reynolds'* distance, *Slatkin's* linearized coefficient, *Nei's* genetic distance, etc. *Nei's* genetic distance and the *Cavalli-Sforza* genetic distance calculating methods are the most commonly used and produce the most similar results. The settings for calculating genetic distance are shown in Figure 11, and the output results are provided in Figure 12.

**4.2. The requirements of data analysis on new-generation HLA typing techniques**

the data analyses process.

**Acknowledgements**

**Author details**

Fang Yuan and Yongzhi Xi\*

China(No.81070450 and 30470751) to Dr. X.-Y.Z.

\*Address all correspondence to: xiyz@yahoo.com

Affiliated to Academy of Medical Sciences, Beijing, PRC

HLA data analysis methods have always been closely related to the development of HLA genotyping techniques. In the 1980s and 1990s, HLA serotyping was the preferred technique. HLA phenotypes were determined first, and the square-root method was used commonly to predict HLA genotype frequencies. Currently, HLA genotyping techniques are more preva‐ lent. Researchers tend to use the direct counting method to calculate the genotype. In previous HLA haplotype analyses, the haplotype was predicted using group analysis, and then the individual haplotype frequency was estimated. However, with the considerable cost decrease of genotyping techniques, more pedigree data are available for studies, such as the *Haplomap* program, where haplotypes can be studied directly using pedigree analysis. Moreover, with the development of new-generation gene sequencing techniques and the optimization of largefragment high-throughput sequencing and fragment (reads) assembly algorithms, individual haplotypes would be distinguished directly. These methods contribute greatly to simplifying

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

37

Currently, HLA data types are no longer limited to allele data. Other types of data, such as SNPs, microsatellite markers, short sequence repeats, etc., could be used for conjoint analysis with HLA data. This chapter has only introduced the application of *Arlequin* software in classic HLA data analysis. However, many other outstanding tools, such as *Phase* are more commonly used in studies of haplotype establishment using group genotype data and hot spot model recombination. *HapView* is more prominent in graphic linkage disequilibrium (LD) and haplotype studies, and the professional statistical software *SAS* is also used commonly for HLA data analysis. With the development of HLA typing techniques and analysis techniques,

Supported by grantsfrom the State Key Development Programfor Basic Research of China (No. 2003CB515509 and 2009CB522401) and from National Natural Scientific Foundation of

Department of Immunology and National Center for Biomedicine Analysis, Beijing Hospital

data processing methods will also become more in-depth and detailed.

The calculation parameters are as follows:

**Compute pairwise differences**: Computes *Nei's* average number of pairwise differences within and between populations.


**Figure 11.** Settings for genetic distance


**Figure 12.** Results of genetic distance

### **4.2. The requirements of data analysis on new-generation HLA typing techniques**

HLA data analysis methods have always been closely related to the development of HLA genotyping techniques. In the 1980s and 1990s, HLA serotyping was the preferred technique. HLA phenotypes were determined first, and the square-root method was used commonly to predict HLA genotype frequencies. Currently, HLA genotyping techniques are more preva‐ lent. Researchers tend to use the direct counting method to calculate the genotype. In previous HLA haplotype analyses, the haplotype was predicted using group analysis, and then the individual haplotype frequency was estimated. However, with the considerable cost decrease of genotyping techniques, more pedigree data are available for studies, such as the *Haplomap* program, where haplotypes can be studied directly using pedigree analysis. Moreover, with the development of new-generation gene sequencing techniques and the optimization of largefragment high-throughput sequencing and fragment (reads) assembly algorithms, individual haplotypes would be distinguished directly. These methods contribute greatly to simplifying the data analyses process.

Currently, HLA data types are no longer limited to allele data. Other types of data, such as SNPs, microsatellite markers, short sequence repeats, etc., could be used for conjoint analysis with HLA data. This chapter has only introduced the application of *Arlequin* software in classic HLA data analysis. However, many other outstanding tools, such as *Phase* are more commonly used in studies of haplotype establishment using group genotype data and hot spot model recombination. *HapView* is more prominent in graphic linkage disequilibrium (LD) and haplotype studies, and the professional statistical software *SAS* is also used commonly for HLA data analysis. With the development of HLA typing techniques and analysis techniques, data processing methods will also become more in-depth and detailed.

### **Acknowledgements**

used and produce the most similar results. The settings for calculating genetic distance are

**Compute pairwise differences**: Computes *Nei's* average number of pairwise differences

shown in Figure 11, and the output results are provided in Figure 12.

The calculation parameters are as follows:

within and between populations.

36 HLA and Associated Important Diseases

**Figure 11.** Settings for genetic distance

**Figure 12.** Results of genetic distance

Supported by grantsfrom the State Key Development Programfor Basic Research of China (No. 2003CB515509 and 2009CB522401) and from National Natural Scientific Foundation of China(No.81070450 and 30470751) to Dr. X.-Y.Z.

### **Author details**

Fang Yuan and Yongzhi Xi\*

\*Address all correspondence to: xiyz@yahoo.com

Department of Immunology and National Center for Biomedicine Analysis, Beijing Hospital Affiliated to Academy of Medical Sciences, Beijing, PRC

#### **References**

[1] Edwards AW. Foundations of Mathematical Genetics, 2nd edition. Cambridge Uni‐ versity Press. Cambridge. 2000.

[18] Gaggiotti O, Excoffier L. A Simple Method of Removing the Effect of a Bottleneck and Unequal Population Sizes on Pairwise Genetic Distances. Proceedings of the

Statistic and Analytical Strategies for HLA Data

http://dx.doi.org/10.5772/57493

39

[19] Dempster A, Laird N, Rubin D. Maximum Likelihood Estimation From Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. 1977, 39:1.

[20] Cavalli-Sforza LL, Population structure and human evolution. Proceedings of the

[21] Cavalli-Sforza LL, Bodmer WF. The Genetics of Human Populations. W.H. Freeman

[22] Lange K, Mathematical and Statistical Methods for Genetic Analysis. Springer. New

[23] Excoffier L, Lischer H. Arlequin Suite ver 3.5: A New Series of Programs to Perform Population Genetics Analyses under Linux and Windows. Molecular Ecology Re‐

[24] Stephens M, Scheet P. Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-Data Imputation. American Journal of Human Genetics. 2005,

[25] Scheet P, Stephens M. A Fast and Flexible Statistical Model for Large-Scale Popula‐ tion Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic

Phase. American Journal of Human Genetics. 2006, 78: 629.

Royal Society London. 2000, 267: 81.

Royal Society London, 1966, 164, 362.

Publishers. San Francisco. 1971.

York. 1997

76:449-462.

sources. 2010, 10: 564.


[18] Gaggiotti O, Excoffier L. A Simple Method of Removing the Effect of a Bottleneck and Unequal Population Sizes on Pairwise Genetic Distances. Proceedings of the Royal Society London. 2000, 267: 81.

**References**

38 HLA and Associated Important Diseases

2006.

tem. 2010.

1994.

1995,49:1280.

search. 2009,37: 1013.

Medical Publishing House. Beijing, 2002.

Publishing House. Beijing. 2012.

nauer Associates Inc. USA. 1996.

Multiple Alleles. Biometrics. 1992, 48:361.

versity Press. Cambridge. 2000.

ology. BioEssays. 2012, 34: 701.

[1] Edwards AW. Foundations of Mathematical Genetics, 2nd edition. Cambridge Uni‐

[3] Masel, Joanna. Rethinking Hardy–Weinberg and Genetic Drift in Undergraduate Bi‐

[4] Cox DR. Principles of Statistical Inference. Cambridge University Press. Cambridge.

[6] Xu TH, Wang J. Design of Medical Experiments: Lecture 2, Rules of Randomization

[7] Marsh SG, Albert ED, Bodmer WF, et al. Nomenclature for Factors of the HLA Sys‐

[8] Robinson J, Waller MJ, Fail SC, et al. The IMGT/HLA database. Nucleic Acids Re‐

[10] Tan JM, Tissue Typing Technique and Clinical Application, 1st edition. People's

[11] Wang XZ. Principles of Population Genetics. Sichuan University Press. Chengdu.

[12] Guo J, Hu LP. Medical Genetics Statistics and SAS Application. People's Medical

[13] Weir BS. Genetic Data Analysis II: Methods for Discrete Population Genetic Data. Si‐

[14] Excoffier L, Slatkin M. Maximum-Likelihood Estimation Of Molecular Haplotype Frequencies In A Diploid Population. Molecular Biology and Evolution. 1995, 12:921.

[15] Excoffier L, Slatkin M. Incorporating genotypes of relatives into a test of linkage dise‐

[16] Guo S, Thompson E. Performing the Exact Test of Hardy-Weinberg Proportion for

[17] Raymond M, Rousset F. An Exact Test for Population Differentiation. Evolution.

quilibrium. The American Journal of Human Genetics. 1998, 171-180.

[9] Sharon L. Sampling: Design and Analysis, 2nd edition. Cengage Learning. 2009.

[2] Crow JF. Hardy-Weinberg and Language Impediments. Genetics. 1999, 152: 821.

[5] Hu LP. Medical Statistics. People's Military Medical Press. Beijing. 2010.

and Blinding Method. Chinese Medical Journal. 2005, 40: 54.


**Chapter 2**

**HLA Class I Polymorphism and Tapasin Dependency**

Human leucocyte antigens (HLA) are highly specialized proteins, expressed on all nucleated cells and platelets, that form stable complexes with peptides of self or pathogenic proteins generated by proteasomal degradation. These peptide-HLA (pHLA) complexes are presented at the cell surface where they are subsequently recognized by T cells. Thus, T cells with their specific antigenic receptor (TCR) continuously scan an array of pHLA complexes which are presented on the cell surface [1]. One of the distinct properties of TCRs is that they can recognize an antigen only when it is associated with a host or "self"-encoded HLA molecule. This property of T cells was discovered by Zinkernagel and Doherty in 1974 and is called ´MHC-restriction´ [2]. The recognition of pHLA complexes by TCRs regulates many immu‐ nological responses such as antiviral cytolysis, graft or tumor rejection and regulation of B cell function. The genes encoding for HLA molecules are known to be the most polymorphic genes present in the whole genome. To date more than 9,000 HLA alleles have been identified of which there are ~7,300 HLA class I and ~2,200 class II polymorphic alleles (Figure 1) [3].

These polymorphisms do not occur throughout the HLA molecule, but are confined to specific AA positions in the peptide-binding region (PBR) [4, 5, 6]. They can cause alterations in the conformation of the PBR, thus changing the peptide binding specificities of these molecules [7]. The frequency of HLA alleles varies greatly among different ethnic groups. It has been postulated that evolutionary pressures such as those exerted by epidemics of infectious diseases might lead to the evolution of new HLA alleles having distinct peptide binding

Following hematopoietic stem cell transplantation (HSCT), polymorphic differences between donor and recipient HLA class I molecules can lead to transplant rejection or graft-versus-host disease (GvHD). Extensive clinical data have demonstrated that the risk of GvHD strongly

> © 2014 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Soumya Badrinath, Trevor Huyton,

http://dx.doi.org/10.5772/57495

**1. Introduction**

properties.

Rainer Blasczyk and Christina Bade-Doeding

Additional information is available at the end of the chapter

## **HLA Class I Polymorphism and Tapasin Dependency**

Soumya Badrinath, Trevor Huyton, Rainer Blasczyk and Christina Bade-Doeding

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/57495

### **1. Introduction**

Human leucocyte antigens (HLA) are highly specialized proteins, expressed on all nucleated cells and platelets, that form stable complexes with peptides of self or pathogenic proteins generated by proteasomal degradation. These peptide-HLA (pHLA) complexes are presented at the cell surface where they are subsequently recognized by T cells. Thus, T cells with their specific antigenic receptor (TCR) continuously scan an array of pHLA complexes which are presented on the cell surface [1]. One of the distinct properties of TCRs is that they can recognize an antigen only when it is associated with a host or "self"-encoded HLA molecule. This property of T cells was discovered by Zinkernagel and Doherty in 1974 and is called ´MHC-restriction´ [2]. The recognition of pHLA complexes by TCRs regulates many immu‐ nological responses such as antiviral cytolysis, graft or tumor rejection and regulation of B cell function. The genes encoding for HLA molecules are known to be the most polymorphic genes present in the whole genome. To date more than 9,000 HLA alleles have been identified of which there are ~7,300 HLA class I and ~2,200 class II polymorphic alleles (Figure 1) [3].

These polymorphisms do not occur throughout the HLA molecule, but are confined to specific AA positions in the peptide-binding region (PBR) [4, 5, 6]. They can cause alterations in the conformation of the PBR, thus changing the peptide binding specificities of these molecules [7]. The frequency of HLA alleles varies greatly among different ethnic groups. It has been postulated that evolutionary pressures such as those exerted by epidemics of infectious diseases might lead to the evolution of new HLA alleles having distinct peptide binding properties.

Following hematopoietic stem cell transplantation (HSCT), polymorphic differences between donor and recipient HLA class I molecules can lead to transplant rejection or graft-versus-host disease (GvHD). Extensive clinical data have demonstrated that the risk of GvHD strongly

© 2014 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

even without assistance from the PLC but are then sub-optimally loaded. Due to the crucial role of TPN in selecting peptides, its indirect role in the immunorecognition of pathogens becomes obvious. This makes TPN an ideal target for viruses to interfere with the presentation of viral peptides to CTLs. For instance, US3 protein of HCMV binds to TPN and acts as a TPN inhibitor. This affects the antigen presentation by TPN-dependent HLA class I molecules. However, TPN-independent molecules can selectively escape the US3 mediated class I

HLA Class I Polymorphism and Tapasin Dependency

http://dx.doi.org/10.5772/57495

43

Given HLA class I polymorphisms affect the generic antigen processing pathway and their dependency on TPN for antigen presentation. TPN-independency is occurring very rarely and might have evolved as an evolutionary trade-off to combat viral infections. However, presen‐ tation of unusual ligands by certain HLA class I alleles could be a risk factor during stem cell transplantation and needs to be considered during donor selection process. The future of HSCT relies on our understanding of how successful clinical outcomes can be achieved despite

The HLA system is one of the major barriers in hematopoietic stem cell transplantation (HSCT) and the degree of HLA matching is found to reflect on the outcome following transplantation. The best result following HSCT is achieved when an identical twin or a genotypically HLA identical sibling is used as donor. However, only 30 % of the donors for HSCT have a HLA identical sibling donor available [8]. Therefore, in most of the situations, HLA haplo-identical related, matched unrelated or partially matched related or unrelated donors are considered for transplantation. However, these transplants are associated with high risks of posttransplant complications such as graft failure/rejection or graft-versus-host disease (GvHD), mainly because of undefined or HLA incompatibilities. Many studies have demonstrated the negative impacts of HLA mismatches on the outcome following HSCT [9-11]. To understand the magnitude of distinct mismatches between HLA variants, several studies analyzed allele specific peptide motifs towards a rating of incompatibility [12-21] [22] [23, 24]. The knowledge of the peptide binding motifs of individual alleles and their comparison within allelic groups is the basis for understanding the impact of a given mismatch and is fundamental in predicting the relevance of allelic differences. Since allelic mismatches occurring at critical residues within the class I heavy chain may cause allorecognition, high resolution matching of patients and unrelated donors have been found to significantly improve post-transplant survival [25], lower the incidence and severity of GvHD [26, 27] and improve engraftment [28, 29]. The question whether a mismatch is permissive or not is critical in deciding which individual is the best matched donor. This could be achieved by conducting a systematic study to determine the effect of AA sequence polymorphisms on the function of a particular HLA molecule and on

retention and continue to present the viral peptides.

**2. HLA polymorphisms and transplantation**

the immune responses post-transplantation.

patient-donor allelic mismatches.

There are currently 9,719 HLA and related alleles described by the HLA nomenclature and included in the IMGT/HLA Database. Red bars represent class I alleles and yellow bards represent class II alleles. As of 2013, there are around 7,353 class I alleles and 2,202 class II alleles that have been identified [3].

**Figure 1.** Number of HLA alleles that have been identified from the year 1987 until 2013

correlates with the number and kind of HLA mismatches and that both the type of amino acid (AA) substitution and its location within the HLA molecule can directly influence the trans‐ plantation outcome. Polymorphisms occurring within the PBR of HLA class I molecules determine which allogenic peptides are selectively bound and subsequently recognized as self or non-self by the effector T-lymphocytes that survey pHLA complexes on antigen presenting cells. Assembly of MHC class I heavy chain (hc) and β2 microglobulin (β2m) subunits with peptides is assisted by the peptide loading complex (PLC). Initially, proteasomally digested peptides are transported into the endoplasmic reticulum (ER) via the transporter associated with antigen processing (TAP) and are then loaded onto HLA class I molecules with the assistance of the PLC. The transmembrane glycoprotein tapasin (TPN) functions within this multimeric PLC as a disulfide-linked heterodimer along with the thiol oxidoreductase ERp57 to stabilize the empty class I molecule and promotes the selection of high affinity peptides. In addition to bridging HLA class I molecules with TAP, TPN was found to stabilize the peptidereceptive state of HLA class I molecules and increased the steady state levels of TAP hetero‐ dimers. Certain HLA class I polymorphisms within the PBR appear not only to influence the repertoire of bound peptides, but also determine the requirement for PLC mediated acquisition and optimal loading of peptides for the given HLA class I molecule. Whereas most class I allotypes associate strongly with the PLC and are highly dependent upon TPN for effective presentation of high affinity peptides and cell surface expression, others can acquire peptides even without assistance from the PLC but are then sub-optimally loaded. Due to the crucial role of TPN in selecting peptides, its indirect role in the immunorecognition of pathogens becomes obvious. This makes TPN an ideal target for viruses to interfere with the presentation of viral peptides to CTLs. For instance, US3 protein of HCMV binds to TPN and acts as a TPN inhibitor. This affects the antigen presentation by TPN-dependent HLA class I molecules. However, TPN-independent molecules can selectively escape the US3 mediated class I retention and continue to present the viral peptides.

Given HLA class I polymorphisms affect the generic antigen processing pathway and their dependency on TPN for antigen presentation. TPN-independency is occurring very rarely and might have evolved as an evolutionary trade-off to combat viral infections. However, presen‐ tation of unusual ligands by certain HLA class I alleles could be a risk factor during stem cell transplantation and needs to be considered during donor selection process. The future of HSCT relies on our understanding of how successful clinical outcomes can be achieved despite patient-donor allelic mismatches.

### **2. HLA polymorphisms and transplantation**

correlates with the number and kind of HLA mismatches and that both the type of amino acid (AA) substitution and its location within the HLA molecule can directly influence the trans‐ plantation outcome. Polymorphisms occurring within the PBR of HLA class I molecules determine which allogenic peptides are selectively bound and subsequently recognized as self or non-self by the effector T-lymphocytes that survey pHLA complexes on antigen presenting cells. Assembly of MHC class I heavy chain (hc) and β2 microglobulin (β2m) subunits with peptides is assisted by the peptide loading complex (PLC). Initially, proteasomally digested peptides are transported into the endoplasmic reticulum (ER) via the transporter associated with antigen processing (TAP) and are then loaded onto HLA class I molecules with the assistance of the PLC. The transmembrane glycoprotein tapasin (TPN) functions within this multimeric PLC as a disulfide-linked heterodimer along with the thiol oxidoreductase ERp57 to stabilize the empty class I molecule and promotes the selection of high affinity peptides. In addition to bridging HLA class I molecules with TAP, TPN was found to stabilize the peptidereceptive state of HLA class I molecules and increased the steady state levels of TAP hetero‐ dimers. Certain HLA class I polymorphisms within the PBR appear not only to influence the repertoire of bound peptides, but also determine the requirement for PLC mediated acquisition and optimal loading of peptides for the given HLA class I molecule. Whereas most class I allotypes associate strongly with the PLC and are highly dependent upon TPN for effective presentation of high affinity peptides and cell surface expression, others can acquire peptides

There are currently 9,719 HLA and related alleles described by the HLA nomenclature and included in the IMGT/HLA Database. Red bars represent class I alleles and yellow bards represent class II alleles. As of 2013, there are around

7,353 class I alleles and 2,202 class II alleles that have been identified [3].

42 HLA and Associated Important Diseases

**Figure 1.** Number of HLA alleles that have been identified from the year 1987 until 2013

The HLA system is one of the major barriers in hematopoietic stem cell transplantation (HSCT) and the degree of HLA matching is found to reflect on the outcome following transplantation. The best result following HSCT is achieved when an identical twin or a genotypically HLA identical sibling is used as donor. However, only 30 % of the donors for HSCT have a HLA identical sibling donor available [8]. Therefore, in most of the situations, HLA haplo-identical related, matched unrelated or partially matched related or unrelated donors are considered for transplantation. However, these transplants are associated with high risks of posttransplant complications such as graft failure/rejection or graft-versus-host disease (GvHD), mainly because of undefined or HLA incompatibilities. Many studies have demonstrated the negative impacts of HLA mismatches on the outcome following HSCT [9-11]. To understand the magnitude of distinct mismatches between HLA variants, several studies analyzed allele specific peptide motifs towards a rating of incompatibility [12-21] [22] [23, 24]. The knowledge of the peptide binding motifs of individual alleles and their comparison within allelic groups is the basis for understanding the impact of a given mismatch and is fundamental in predicting the relevance of allelic differences. Since allelic mismatches occurring at critical residues within the class I heavy chain may cause allorecognition, high resolution matching of patients and unrelated donors have been found to significantly improve post-transplant survival [25], lower the incidence and severity of GvHD [26, 27] and improve engraftment [28, 29]. The question whether a mismatch is permissive or not is critical in deciding which individual is the best matched donor. This could be achieved by conducting a systematic study to determine the effect of AA sequence polymorphisms on the function of a particular HLA molecule and on the immune responses post-transplantation.

### **3. HLA class I molecules and the peptide loading complex**

HLA class I molecules loaded with high affinity peptides are essential for efficient immune surveillance and elimination of virally infected cells by CTLs. Newly synthesized class I hc and ß2m are translocated into the ER by their amino terminal signal sequences. Following translocation, HLA class I hcs are glycosylated and folded by the formation of two intra-chain disulphide bonds. Calnexin (CNX) facilitates the stabilization of class I hc and its association with ß2m. Following the formation of class I hc - ß2m heterodimer, CNX is replaced by calreticulin (CRT). Peptides are loaded onto the class I heterodimer by a complex machinery consisting of many chaperons, known as the peptide loading complex (PLC). The PLC consists of the transporter associated with antigen processing (TAP) heterodimers, transmembrane glycoprotein – TPN, lectin like chaperon – CRT, thiol oxidoreductase – Erp57 which is noncovalently associated with CRT and disulphide linked to TPN (Figure 2).

Peptides presented by HLA class I molecules originate mostly from cytosolic or nuclear proteins and are processed by the proteasome, a multicatalytic protease complex. TAP helps in the translocation of peptides from cytosol into the ER lumen. TPN bridges class I - ß2m heterodimer to TAP and acts as a peptide editor, facilitating the loading of high affinity peptides onto HLA class I molecules. Stable HLA class I molecules dissociate from TAP heterodimers and are transported through the golgi complex to the cell surface where they

HLA Class I Polymorphism and Tapasin Dependency

http://dx.doi.org/10.5772/57495

45

TPN is a type 1 transmembrane glycoprotein, 428 amino acid long and consisting of three parts: an N-terminal ER luminal region consisting of two domains, a transmembrane domain and a short cytoplasmic tail. TPN has multitudinous roles within the PLC, all of which are directed towards peptide presentation by the class I molecules. The transmembrane domain of TPN interacts with TAP and bridges TAP to class I molecules. TPN facilitates the stabilization of TAP and promotes the binding and translocation of peptides by TAP. Absence of TPN abrogates the binding of class I molecules to TAP. It was also shown that certain class I molecules presented at the cell surface in the absence/independent of TPN were very unstable and dissociated rapidly. The double lysine motif at the C-terminus of the TPN molecule mediates its interaction with coat protein type I (COP I) vesicle and facilitates the recycling of class I molecules which have not been not loaded with optimal peptides. Mutational analysis identified a conserved region on the ER-luminal domain of TPN that interacts with HLA class I molecules and was found to be critical for peptide loading and its editing function. Poly‐ morphisms occurring in HLA class I molecules are found to affect the dependency of these

TPN is a critical component of the PLC which plays an important role in optimization and selection of peptides subsequently presented on the cell surface by HLA class I molecules [30, 31]. Transmembrane glycoprotein US3 expressed during the immediate early phase of HCMV infection binds to TPN and inhibits its ability to load kinetically stable peptides onto HLA class I molecules, thus retaining class I molecules in the ER [32]. However, not all HLA class I molecules are equally affected by US3, thus highlighting that not all HLA class I molecules are equally dependent upon TPN for maturation in the ER [33]. US3 and TPN are associated by their ER luminal domains, but the transmembrane domains are also required for the inhibition of TPN [32]. Another transmembrane glycoprotein E3-19K from the adenovirus also inhibits crucial functioning of TPN by blocking its ability to bridge TAP to HLA class I molecules. E3-19K associates with TAP and impairs the formation of TAP-TPN complex and inclusion of TAP in the PLC [34]. This competitive inhibition by E3-19K delays the maturation and

molecules on TPN for antigen presentation and cell surface expression.

assembly of TPN-dependent HLA class I loading complex [34].

present peptides to CD8+ T cells.

**5. Inhibitors of TPN**

**4. Tapasin**

Peptides are loaded onto MHC class I molecules with the assistance of the Peptide Loading Complex (PLC). Processed peptides are transported into the ER via TAP. N-terminal trimming of peptides via the ER aminopeptidases (ERAP) 1 and 2. TPN functions within the multimeric PLC as a disulfide-linked, stable heterodimer with the thiol oxidoreductase ERp57.

**Figure 2.** Peptide loading complex

Peptides presented by HLA class I molecules originate mostly from cytosolic or nuclear proteins and are processed by the proteasome, a multicatalytic protease complex. TAP helps in the translocation of peptides from cytosol into the ER lumen. TPN bridges class I - ß2m heterodimer to TAP and acts as a peptide editor, facilitating the loading of high affinity peptides onto HLA class I molecules. Stable HLA class I molecules dissociate from TAP heterodimers and are transported through the golgi complex to the cell surface where they present peptides to CD8+ T cells.

### **4. Tapasin**

**3. HLA class I molecules and the peptide loading complex**

44 HLA and Associated Important Diseases

covalently associated with CRT and disulphide linked to TPN (Figure 2).

HLA class I molecules loaded with high affinity peptides are essential for efficient immune surveillance and elimination of virally infected cells by CTLs. Newly synthesized class I hc and ß2m are translocated into the ER by their amino terminal signal sequences. Following translocation, HLA class I hcs are glycosylated and folded by the formation of two intra-chain disulphide bonds. Calnexin (CNX) facilitates the stabilization of class I hc and its association with ß2m. Following the formation of class I hc - ß2m heterodimer, CNX is replaced by calreticulin (CRT). Peptides are loaded onto the class I heterodimer by a complex machinery consisting of many chaperons, known as the peptide loading complex (PLC). The PLC consists of the transporter associated with antigen processing (TAP) heterodimers, transmembrane glycoprotein – TPN, lectin like chaperon – CRT, thiol oxidoreductase – Erp57 which is non-

Peptides are loaded onto MHC class I molecules with the assistance of the Peptide Loading Complex (PLC). Processed peptides are transported into the ER via TAP. N-terminal trimming of peptides via the ER aminopeptidases (ERAP) 1 and 2. TPN functions within the multimeric PLC as a disulfide-linked, stable heterodimer with the thiol oxidoreductase

ERp57.

**Figure 2.** Peptide loading complex

TPN is a type 1 transmembrane glycoprotein, 428 amino acid long and consisting of three parts: an N-terminal ER luminal region consisting of two domains, a transmembrane domain and a short cytoplasmic tail. TPN has multitudinous roles within the PLC, all of which are directed towards peptide presentation by the class I molecules. The transmembrane domain of TPN interacts with TAP and bridges TAP to class I molecules. TPN facilitates the stabilization of TAP and promotes the binding and translocation of peptides by TAP. Absence of TPN abrogates the binding of class I molecules to TAP. It was also shown that certain class I molecules presented at the cell surface in the absence/independent of TPN were very unstable and dissociated rapidly. The double lysine motif at the C-terminus of the TPN molecule mediates its interaction with coat protein type I (COP I) vesicle and facilitates the recycling of class I molecules which have not been not loaded with optimal peptides. Mutational analysis identified a conserved region on the ER-luminal domain of TPN that interacts with HLA class I molecules and was found to be critical for peptide loading and its editing function. Poly‐ morphisms occurring in HLA class I molecules are found to affect the dependency of these molecules on TPN for antigen presentation and cell surface expression.

### **5. Inhibitors of TPN**

TPN is a critical component of the PLC which plays an important role in optimization and selection of peptides subsequently presented on the cell surface by HLA class I molecules [30, 31]. Transmembrane glycoprotein US3 expressed during the immediate early phase of HCMV infection binds to TPN and inhibits its ability to load kinetically stable peptides onto HLA class I molecules, thus retaining class I molecules in the ER [32]. However, not all HLA class I molecules are equally affected by US3, thus highlighting that not all HLA class I molecules are equally dependent upon TPN for maturation in the ER [33]. US3 and TPN are associated by their ER luminal domains, but the transmembrane domains are also required for the inhibition of TPN [32]. Another transmembrane glycoprotein E3-19K from the adenovirus also inhibits crucial functioning of TPN by blocking its ability to bridge TAP to HLA class I molecules. E3-19K associates with TAP and impairs the formation of TAP-TPN complex and inclusion of TAP in the PLC [34]. This competitive inhibition by E3-19K delays the maturation and assembly of TPN-dependent HLA class I loading complex [34].

Certain other viral proteins prevent surface expression of HLA class I molecules by retention or degradation of HLA molecules in the ER. For example, cowpox virus protein 203 (CPXV 203] causes retention of HLA class I molecules in the ER [35]. US2 and US11 proteins from HCMV and mK3 protein from mouse herpes virus directs HLA class I molecules towards proteasomal degradation [36] [37] [38] [39]. Sorters such as HIV-1 protein Nef and murine CMV protein gp48 averts the trafficking of HLA class I molecules from golgi to lysosomal compart‐ ment where they are subsequently degraded [40] [41] [42].

TPN molecule (TN6 - Glu185Lys, Arg187Glu, Gln189Ser and Gln261Ser) completely abolished its binding with the class I heavy chain. Also, only small amounts of the heavy chain were found to interact with the mutants located in or around the conserved patch. The ability of HLA class I binding to wild type/mutant TPN molecules reflected their relative capacities in mediating peptide loading. TN6 mutant mediated only 8 % of peptide loading activity in. 220.B\*08:01 molecules as compared to the activity of wild type TPN. It was also observed that the transduction of TN6 mutant into TPN independent 220.B\*44:02 cells did not favour the

central region of TPN was suggested to be responsible for the stabilization of the α2-helix of the PBR. This stabilization was found to maintain the peptide binding groove in an open peptide-receptive conformation until an optimal peptide binds to it [54]. The findings from this study are in agreement with previous studies conducted where Thr134Lys mutation in HLA-A\*02:01 was found to disrupt its interaction with TPN [47]. Co-immunoprecipitation experiments performed by Lehner *et al* demonstrated that deletion of the transmembrane region of TPN did not have any effect on the interaction of HLA class I molecules with TPN. However, HLA class I molecules failed to co-precipitate with TPN molecules truncated at the N terminus, suggesting that the residues in this region were important for the interaction of

Many studies have highlighted the role of TPN in stabilizing HLA class I molecules [30, 31, 45, 55] and maintaining them in a peptide-receptive conformation [56]. It has also been shown that TPN facilitates peptide optimization, a process in which bound peptides of low affinity are exchanged for the high affinity ones [45, 57-61]. These functions were attributed to TPN based on the findings that the class I-peptide complexes expressed on the surface of cells lacking TPN were less stable than those complexes expressed on normal cells containing TPN. However, analysis of peptides eluted from HLA class I complexes expressed on the cell surface in the presence and absence of TPN demonstrated no co-relation between the decreased stability of HLA class I-peptide complexes and binding of low-affinity peptides in TPN deficient cells [62]. The authors suggested that the plausible ability of TPN to stabilize immature HLA class I molecules in the ER instead broadens the bound peptide repertoire both in terms of complexity of bound peptides and their binding affinities [62]. A more recent study conducted by Howarth *et al* demonstrated key functions of TPN in shaping the peptide repertoire presented to the cell surface based on their intrinsic half-lives [63]. They investigated the effect of TPN on the presentation of a hierarchy of peptides generated based on the H2-Kb

binding peptide SIINFEKL by varying the anchor residues in order to produce peptides having a wide array of binding affinities. These peptides were expressed stably as mini-genes in the

all the peptides to be presented at high levels in the presence of TPN and their relative expression levels were found to be in accordance to their peptide-half lives. However, in the absence of TPN, this hierarchy was disrupted and a peptide with intermediate half-life was

and in.220Kb

The conserved, functionally important

HLA Class I Polymorphism and Tapasin Dependency

http://dx.doi.org/10.5772/57495

47

TPN transfected cell line. Results indicated


surface expression of these molecules on the cell surface.

HLA class I molecules with TPN [43].

cytosol of TPN deficient cell line.220Kb

**7. Peptide editing function of TPN**

#### **6. Interactions between HLA class I molecules and TPN**

In addition to bridging HLA class I molecules to TAP, TPN was found to stabilize the peptidereceptive state of class I molecules [43] and increase the steady state levels of TAP heterodimers [44]. TPN also facilitated the retention of empty class I molecules in the ER of insect cells [30]. Barnden *et al* demonstrated that TPN prevented premature exit of HLA class I molecules from the ER of mammalian cells, thus suggesting a potential role for TPN in the retention of suboptimally loaded class I molecules in the ER [45]. HLA class I molecules expressed in TPNdeficient cells were found to be unstable and were loaded with a significant proportion of suboptimal ligands [33, 46, 47]. Yu *et al* demonstrated that mutations occurring at residues 128-136 in the α2 helix of class I heavy chain affected the interaction of HLA class I molecules with TPN. These residues are located in the loop region connecting β-pleated sheets below the peptide binding groove with an α-helix reaching above the groove. This region of the heavy which forms a potential interacting site with TPN was found to be sensitive to peptide binding and underwent conformational changes, thus implying the ability of TPN to distinguish between empty and peptide-bound HLA class I molecules [48, 49]. Some groups have speculated about the interaction of TPN with the α3 domain of HLA class I heavy chain [50]. However, given that residues 128–136 in the α2 domain and residues 227–229 in the α3 domain are located on the same plane along the side of class I heavy chain, TPN might be able to interact with both these determinants. Mutations occurring in the HLA class I heavy chain were found to affect the interaction of these molecules with the loading complex components [49, 51-53]. However, precise interacting surfaces/interfaces between the class I molecules and TPN are yet to be defined. To determine the potential interacting surfaces of TPN with HLA class I molecules, Dong *et al* initially compared the sequence of TPN across different species and identified a region in the N terminal domain of TPN that was highly conserved amongst these species [54]. Residues occurring in this region and in other parts of the TPN molecule were mutated and the effect of these mutations on PLC function was tested. Eight different TPN mutants conjugated with ERp57 were incubated with extracts from LCL 721.220/B\*08:01 cells that are enriched with empty HLA class I molecules. Co-immunoprecipitation experiments were performed to determine the interaction of HLA class I molecules with these different TPN mutants. HLA class I molecules associated at normal levels with the wild type TPN; with two other TPN mutants in which amino terminal residues (TN1 and TN2) located farthest from the conserved patches were mutated and TPN mutant with a single polymorphism located in the carboxy terminal region (TC1). Mutating the residues in the central, conserved patch of TPN molecule (TN6 - Glu185Lys, Arg187Glu, Gln189Ser and Gln261Ser) completely abolished its binding with the class I heavy chain. Also, only small amounts of the heavy chain were found to interact with the mutants located in or around the conserved patch. The ability of HLA class I binding to wild type/mutant TPN molecules reflected their relative capacities in mediating peptide loading. TN6 mutant mediated only 8 % of peptide loading activity in. 220.B\*08:01 molecules as compared to the activity of wild type TPN. It was also observed that the transduction of TN6 mutant into TPN independent 220.B\*44:02 cells did not favour the surface expression of these molecules on the cell surface. The conserved, functionally important central region of TPN was suggested to be responsible for the stabilization of the α2-helix of the PBR. This stabilization was found to maintain the peptide binding groove in an open peptide-receptive conformation until an optimal peptide binds to it [54]. The findings from this study are in agreement with previous studies conducted where Thr134Lys mutation in HLA-A\*02:01 was found to disrupt its interaction with TPN [47]. Co-immunoprecipitation experiments performed by Lehner *et al* demonstrated that deletion of the transmembrane region of TPN did not have any effect on the interaction of HLA class I molecules with TPN. However, HLA class I molecules failed to co-precipitate with TPN molecules truncated at the N terminus, suggesting that the residues in this region were important for the interaction of HLA class I molecules with TPN [43].

### **7. Peptide editing function of TPN**

Certain other viral proteins prevent surface expression of HLA class I molecules by retention or degradation of HLA molecules in the ER. For example, cowpox virus protein 203 (CPXV 203] causes retention of HLA class I molecules in the ER [35]. US2 and US11 proteins from HCMV and mK3 protein from mouse herpes virus directs HLA class I molecules towards proteasomal degradation [36] [37] [38] [39]. Sorters such as HIV-1 protein Nef and murine CMV protein gp48 averts the trafficking of HLA class I molecules from golgi to lysosomal compart‐

In addition to bridging HLA class I molecules to TAP, TPN was found to stabilize the peptidereceptive state of class I molecules [43] and increase the steady state levels of TAP heterodimers [44]. TPN also facilitated the retention of empty class I molecules in the ER of insect cells [30]. Barnden *et al* demonstrated that TPN prevented premature exit of HLA class I molecules from the ER of mammalian cells, thus suggesting a potential role for TPN in the retention of suboptimally loaded class I molecules in the ER [45]. HLA class I molecules expressed in TPNdeficient cells were found to be unstable and were loaded with a significant proportion of suboptimal ligands [33, 46, 47]. Yu *et al* demonstrated that mutations occurring at residues 128-136 in the α2 helix of class I heavy chain affected the interaction of HLA class I molecules with TPN. These residues are located in the loop region connecting β-pleated sheets below the peptide binding groove with an α-helix reaching above the groove. This region of the heavy which forms a potential interacting site with TPN was found to be sensitive to peptide binding and underwent conformational changes, thus implying the ability of TPN to distinguish between empty and peptide-bound HLA class I molecules [48, 49]. Some groups have speculated about the interaction of TPN with the α3 domain of HLA class I heavy chain [50]. However, given that residues 128–136 in the α2 domain and residues 227–229 in the α3 domain are located on the same plane along the side of class I heavy chain, TPN might be able to interact with both these determinants. Mutations occurring in the HLA class I heavy chain were found to affect the interaction of these molecules with the loading complex components [49, 51-53]. However, precise interacting surfaces/interfaces between the class I molecules and TPN are yet to be defined. To determine the potential interacting surfaces of TPN with HLA class I molecules, Dong *et al* initially compared the sequence of TPN across different species and identified a region in the N terminal domain of TPN that was highly conserved amongst these species [54]. Residues occurring in this region and in other parts of the TPN molecule were mutated and the effect of these mutations on PLC function was tested. Eight different TPN mutants conjugated with ERp57 were incubated with extracts from LCL 721.220/B\*08:01 cells that are enriched with empty HLA class I molecules. Co-immunoprecipitation experiments were performed to determine the interaction of HLA class I molecules with these different TPN mutants. HLA class I molecules associated at normal levels with the wild type TPN; with two other TPN mutants in which amino terminal residues (TN1 and TN2) located farthest from the conserved patches were mutated and TPN mutant with a single polymorphism located in the carboxy terminal region (TC1). Mutating the residues in the central, conserved patch of

ment where they are subsequently degraded [40] [41] [42].

46 HLA and Associated Important Diseases

**6. Interactions between HLA class I molecules and TPN**

Many studies have highlighted the role of TPN in stabilizing HLA class I molecules [30, 31, 45, 55] and maintaining them in a peptide-receptive conformation [56]. It has also been shown that TPN facilitates peptide optimization, a process in which bound peptides of low affinity are exchanged for the high affinity ones [45, 57-61]. These functions were attributed to TPN based on the findings that the class I-peptide complexes expressed on the surface of cells lacking TPN were less stable than those complexes expressed on normal cells containing TPN. However, analysis of peptides eluted from HLA class I complexes expressed on the cell surface in the presence and absence of TPN demonstrated no co-relation between the decreased stability of HLA class I-peptide complexes and binding of low-affinity peptides in TPN deficient cells [62]. The authors suggested that the plausible ability of TPN to stabilize immature HLA class I molecules in the ER instead broadens the bound peptide repertoire both in terms of complexity of bound peptides and their binding affinities [62]. A more recent study conducted by Howarth *et al* demonstrated key functions of TPN in shaping the peptide repertoire presented to the cell surface based on their intrinsic half-lives [63]. They investigated the effect of TPN on the presentation of a hierarchy of peptides generated based on the H2-Kb binding peptide SIINFEKL by varying the anchor residues in order to produce peptides having a wide array of binding affinities. These peptides were expressed stably as mini-genes in the cytosol of TPN deficient cell line.220Kb and in.220Kb TPN transfected cell line. Results indicated all the peptides to be presented at high levels in the presence of TPN and their relative expression levels were found to be in accordance to their peptide-half lives. However, in the absence of TPN, this hierarchy was disrupted and a peptide with intermediate half-life was found to be presented more dominantly than the rest of the peptides. Since all the peptides utilized in this study had similar affinities to H2-Kb -binding peptide SIINFEKL, editing function by TPN was suggested to be influenced primarily by the peptide-off rate rather than peptide-affinity per se [63].

to a partially open disulphide bond in the α2 domain [66] and that TPN facilitates the conver‐ sion of this disordered conformation into a stable, peptide-receptive conformation [55].

HLA Class I Polymorphism and Tapasin Dependency

http://dx.doi.org/10.5772/57495

49

Studies performed using TPN deficient cell lines (LCL 721.220] transfected with various HLA-A and B allotypes demonstrated an altered dependency of these class I variants on TPN for their cell surface expression [33, 67, 68] [18]. HLA-B\*27:05 molecules showed high levels of surface expression and were able to present specific viral peptides even in the absence of TPN. On contrary, HLA-B\*44:02 molecules were found to be highly dependent upon TPN for these functions and HLA-B8 molecules showed intermediate dependency on TPN [33]. It has also been observed that while HLA-A1 molecules fail to present antigens in the absence of TPN [31], HLA-A2 molecules present peptides very efficiently on the surface of these cells [69].

Many studies have highlighted the importance of AAs occurring at position 114 of class I hc in determining their dependency on TPN for efficient antigen processing and presentation [33, 55, 64, 70, 71]. Park *et al* demonstrated that the class I molecules having an acidic AA at position 114 such as HLA-B\*44:02114Asp or HLA-A\*30:01114Glu are highly dependent upon TPN for their cell surface expression, alleles with neutral AAs such as HLA-B\*08:01114Asn or HLA-B\*54:01114Asn are weakly dependent while alleles with basic AAs such as HLA-B\*27:02114His, HLA-B\*27:05114His, HLA-A\*02:10114His or HLA-A\*24:01114His are independent of TPN for their surface expression [71]. However, both HLA-B\*44:02 and HLA-B\*44:05 are found to have an acidic AA at position 114 and yet show opposite ends of the TPN dependency spectrum. In the absence of TPN, HLA-B\*44:02 fails to bind high affinity peptides and is prone to intracel‐ lular degradation. In contrast, HLA-B\*44:05 shows efficient cell surface expression both in the presence and absence of TPN [58]. These two allotypes differ exclusively at AA position 116 which is located in the F pocket of the peptide binding groove and contacts C terminus of the bound peptide. HLA-B\*44:02 has an Asp while B\*44:05 has a Tyr at position 116. While HLA-B\*44:02 binds very efficiently to TAP and undergoes significant optimization of its peptide cargo, B\*44:05 molecules are not incorporated in the PLC and undergo only partial optimiza‐ tion of their peptide cargo in the presence of TPN [58]. Asp116His mutation in HLA-B\*44:02 resulted in a TPN-independent molecule [55]. Sieker *et al* hypothesized that the presence of two acidic residues at positions 114 and 116 in HLA-B\*44:02 hc resulted in the disruption of the F-pocket conformation due to excessive hydration [65] and that the ability of HLA-B\*44:05 to acquire limited peptides without being incorporated into the PLC was due to the aspartic acid to tyrosine exchange at residue 116 which decreased the electronegativity and increased the hydrophobicity around the F pocket [64]. Experiments performed by Neisig *et al* demon‐ strated that among the HLA-B allotypes investigated, those containing an aromatic AA at position 116 bound efficiently to TAP while the others did not [70]. HLA-B\*35:02 and B\*35:03 having aromatic AAs Tyr and Phe respectively at position 116 demonstrated strong associa‐ tions with TAP while B\*35:01 and B\*35:08 both containing Ser at position 116 showed no significant association with TAP [70]. Similarly, among the HLA-B\*15 allotypes, B\*15:10 having a Tyr at position 116 showed stronger association not only to TAP but also to TPN and CRT when compared to B\*15:18 or B\*15:01 having Ser at this position [72]. It was also observed that HLA-B\*68:07116His associated much stronger with TAP than B\*68:03116Asp [73]. The authors pointed out that residue 116 pointing upwards from the F pocket into the binding groove might

Many groups have established *in vitro* assays to provide a molecular understanding of the mechanisms of peptide editing by TPN. However, weak intrinsic interactions between TPN and HLA class I molecules make it difficult to perform *in vitro* experiments using recombinant TPN to assess its functions. In order to overcome this problem, Chen *et al* used leucine zippers to tether soluble TPN together with HLA class I molecules [56]. For this study, they selected HLA-B\*08:01 as this allele has earlier shown to be TPN dependent [33] for normal levels of cell surface expression. The results of this study indicated that TPN acts as a chaperon by acceler‐ ating the ratio of active to inactive-peptide deficient HLA class I molecules. In addition to stabilizing HLA class I molecules, TPN was also found to increase the association-dissociation rates of peptides with class I molecules owing to its ability to widen the peptide binding groove, thereby enabling a diversified set of peptides to initially bind into the groove. This TPNassisted mechanism of peptide selection was suggested to be mediated by disruption of the conserved hydrogen bonds at the C terminus of the binding groove [56]. In yet another approach to determine the peptide-editing mechanism of TPN, Wearsch *et al* reconstituted the PLC subcomplex *in vitro* by co-incubating recombinant soluble TPN-ERp57 conjugate with additional cell extracts containing CRT and peptide receptive heavy chain-β2m complex [59]. The results from their study demonstrated that the TPN-ERp57 conjugate promoted rapid exchange of sub-optimal low and intermediate peptides with high affinity ones [59]. Praveen *et al* demonstrated an alternative approach to explore the TPN-mediated peptide editing function in the lumen of ER microsome [60], wherein components of the loading complex can interact with each other with their native affinities. For their experiments, they used the allomorphs Kb wild type (WT) and Kb mutant (T134K) in which the replacement of Thr134Lys abolished the interaction of these molecules with TPN. They found that when these allomorphs were incubated with a mixture of high affinity peptides and a 100-fold excess of a low affinity peptide or alternatively with the low affinity peptides and a 100-fold reduced concentration of a high affinity peptide, the high affinity peptide was predominantly bound by wild type Kb while Kb mutant (T134K) mostly bound the low affinity ones [60].

### **8. TPN dependence/ independence of HLA class I molecules**

Polymorphisms occurring at specific AA positions within the HLA class I hc are found to influence the dependency of these molecules on TPN for efficient cell surface expression and peptide presentation. It has been hypothesized that the nature of AAs occurring at the bottom of the F pocket influences the conformational flexibility of empty class I molecules [55, 64], which could in turn determine the ability of a particular allotype to bind peptides in the presence or absence of TPN [65]. It has been shown that in the TPN dependent alloforms, the region around the F pocket of the peptide binding groove is in a disordered conformation due to a partially open disulphide bond in the α2 domain [66] and that TPN facilitates the conver‐ sion of this disordered conformation into a stable, peptide-receptive conformation [55].

found to be presented more dominantly than the rest of the peptides. Since all the peptides

function by TPN was suggested to be influenced primarily by the peptide-off rate rather than

Many groups have established *in vitro* assays to provide a molecular understanding of the mechanisms of peptide editing by TPN. However, weak intrinsic interactions between TPN and HLA class I molecules make it difficult to perform *in vitro* experiments using recombinant TPN to assess its functions. In order to overcome this problem, Chen *et al* used leucine zippers to tether soluble TPN together with HLA class I molecules [56]. For this study, they selected HLA-B\*08:01 as this allele has earlier shown to be TPN dependent [33] for normal levels of cell surface expression. The results of this study indicated that TPN acts as a chaperon by acceler‐ ating the ratio of active to inactive-peptide deficient HLA class I molecules. In addition to stabilizing HLA class I molecules, TPN was also found to increase the association-dissociation rates of peptides with class I molecules owing to its ability to widen the peptide binding groove, thereby enabling a diversified set of peptides to initially bind into the groove. This TPNassisted mechanism of peptide selection was suggested to be mediated by disruption of the conserved hydrogen bonds at the C terminus of the binding groove [56]. In yet another approach to determine the peptide-editing mechanism of TPN, Wearsch *et al* reconstituted the PLC subcomplex *in vitro* by co-incubating recombinant soluble TPN-ERp57 conjugate with additional cell extracts containing CRT and peptide receptive heavy chain-β2m complex [59]. The results from their study demonstrated that the TPN-ERp57 conjugate promoted rapid exchange of sub-optimal low and intermediate peptides with high affinity ones [59]. Praveen *et al* demonstrated an alternative approach to explore the TPN-mediated peptide editing function in the lumen of ER microsome [60], wherein components of the loading complex can interact with each other with their native affinities. For their experiments, they used the allomorphs Kb wild type (WT) and Kb mutant (T134K) in which the replacement of Thr134Lys abolished the interaction of these molecules with TPN. They found that when these allomorphs were incubated with a mixture of high affinity peptides and a 100-fold excess of a low affinity peptide or alternatively with the low affinity peptides and a 100-fold reduced concentration of a high affinity peptide, the high affinity peptide was predominantly bound by wild type

mutant (T134K) mostly bound the low affinity ones [60].

**8. TPN dependence/ independence of HLA class I molecules**

Polymorphisms occurring at specific AA positions within the HLA class I hc are found to influence the dependency of these molecules on TPN for efficient cell surface expression and peptide presentation. It has been hypothesized that the nature of AAs occurring at the bottom of the F pocket influences the conformational flexibility of empty class I molecules [55, 64], which could in turn determine the ability of a particular allotype to bind peptides in the presence or absence of TPN [65]. It has been shown that in the TPN dependent alloforms, the region around the F pocket of the peptide binding groove is in a disordered conformation due


utilized in this study had similar affinities to H2-Kb

peptide-affinity per se [63].

48 HLA and Associated Important Diseases

Kb

while Kb

Studies performed using TPN deficient cell lines (LCL 721.220] transfected with various HLA-A and B allotypes demonstrated an altered dependency of these class I variants on TPN for their cell surface expression [33, 67, 68] [18]. HLA-B\*27:05 molecules showed high levels of surface expression and were able to present specific viral peptides even in the absence of TPN. On contrary, HLA-B\*44:02 molecules were found to be highly dependent upon TPN for these functions and HLA-B8 molecules showed intermediate dependency on TPN [33]. It has also been observed that while HLA-A1 molecules fail to present antigens in the absence of TPN [31], HLA-A2 molecules present peptides very efficiently on the surface of these cells [69].

Many studies have highlighted the importance of AAs occurring at position 114 of class I hc in determining their dependency on TPN for efficient antigen processing and presentation [33, 55, 64, 70, 71]. Park *et al* demonstrated that the class I molecules having an acidic AA at position 114 such as HLA-B\*44:02114Asp or HLA-A\*30:01114Glu are highly dependent upon TPN for their cell surface expression, alleles with neutral AAs such as HLA-B\*08:01114Asn or HLA-B\*54:01114Asn are weakly dependent while alleles with basic AAs such as HLA-B\*27:02114His, HLA-B\*27:05114His, HLA-A\*02:10114His or HLA-A\*24:01114His are independent of TPN for their surface expression [71]. However, both HLA-B\*44:02 and HLA-B\*44:05 are found to have an acidic AA at position 114 and yet show opposite ends of the TPN dependency spectrum. In the absence of TPN, HLA-B\*44:02 fails to bind high affinity peptides and is prone to intracel‐ lular degradation. In contrast, HLA-B\*44:05 shows efficient cell surface expression both in the presence and absence of TPN [58]. These two allotypes differ exclusively at AA position 116 which is located in the F pocket of the peptide binding groove and contacts C terminus of the bound peptide. HLA-B\*44:02 has an Asp while B\*44:05 has a Tyr at position 116. While HLA-B\*44:02 binds very efficiently to TAP and undergoes significant optimization of its peptide cargo, B\*44:05 molecules are not incorporated in the PLC and undergo only partial optimiza‐ tion of their peptide cargo in the presence of TPN [58]. Asp116His mutation in HLA-B\*44:02 resulted in a TPN-independent molecule [55]. Sieker *et al* hypothesized that the presence of two acidic residues at positions 114 and 116 in HLA-B\*44:02 hc resulted in the disruption of the F-pocket conformation due to excessive hydration [65] and that the ability of HLA-B\*44:05 to acquire limited peptides without being incorporated into the PLC was due to the aspartic acid to tyrosine exchange at residue 116 which decreased the electronegativity and increased the hydrophobicity around the F pocket [64]. Experiments performed by Neisig *et al* demon‐ strated that among the HLA-B allotypes investigated, those containing an aromatic AA at position 116 bound efficiently to TAP while the others did not [70]. HLA-B\*35:02 and B\*35:03 having aromatic AAs Tyr and Phe respectively at position 116 demonstrated strong associa‐ tions with TAP while B\*35:01 and B\*35:08 both containing Ser at position 116 showed no significant association with TAP [70]. Similarly, among the HLA-B\*15 allotypes, B\*15:10 having a Tyr at position 116 showed stronger association not only to TAP but also to TPN and CRT when compared to B\*15:18 or B\*15:01 having Ser at this position [72]. It was also observed that HLA-B\*68:07116His associated much stronger with TAP than B\*68:03116Asp [73]. The authors pointed out that residue 116 pointing upwards from the F pocket into the binding groove might be involved in the association of TPN with the class I heavy chain, which in turn regulates the differential binding of position 116 variants with TAP. However, it is seen that HLA-B\*44:02 and B\*44:03 both have an Asp at position 116 and yet associate differentially with TAP. These two alleles differ from each other by a single AA at position 156. While HLA-B\*44:02 binds efficiently to TAP, B\*44:03 is found to be a weak TAP binder. The authors pointed out that the AA at position 156 located at the centre of the α2 helix of class I heavy chains might determine the strong and the weak binding of HLA-B\*44:02 and B\*44:03 respectively to TAP [70].

Previous studies have demonstrated that the nature of AAs occurring at residues 114 or 116 determine the interaction of these different class I allotypes with the PLC components [64]. These two residues are located in the F pocket of the PBR, that interacts with the C-terminal peptide residue and thus determine the nature of a bound peptide. It is found that certain AA polymorphisms occurring at these positions result in loading and presentation of peptides independent of the loading complex (TAP/ TPN) [64, 71] via a non-classical pathway resulting in the presentation of pHLA complexes, that might be poorly tolerated by the self immune

HLA Class I Polymorphism and Tapasin Dependency

http://dx.doi.org/10.5772/57495

51

Recently the association of TPN with HLA subtypes featuring micro-polymorphisms at AA position 156 was discovered [18]. This position is part of the pockets D and E within the peptide binding region and contacts peptide of canonical length at positions p3 and p7 [75], explaining its distinct structural role in influencing the conformation of the pHLA complex [76] [77]. Polymorphisms at residue 156 represent one of the most non-permissive transplantation scenarios and are associated with acute GvHD for HLA-A, -B and -C alleles [58] [78] [76] [77]

The HLA-B\*44 allelic group occurring in approximately 25 % of the Caucasian population has four naturally occurring variants (B\*44:02156Asp, 44:03156Leu, 44:28156Arg, 44:35156Glu), which exclusively differ by just one single AA at residue 156. The mismatch B\*44:02156Asp / B\*44:03156Leu e.g. is described to represent a non-permissive transplantation scenario. The association with strong alloreactive T-cell responses due to distinct structural differences between B\*44:02 and B\*44:03 pHLA complexes leads to acute GvHD [79]. It has also been demonstrated that the resulting mismatch leads to a disparity in the derived peptide repertoire, which explains the cytotoxic T-lymphocyte recognition of different pHLA landscapes between B\*44:02 and B\*44:03. Structural involvement of position 156 in influencing the conformation of PBR was demonstrated by comparing the crystal structures of HLA-B\*44:02 and HLA-B\*44:03 complexed with the same, natural high affinity ligand [18]. In order to determine if position 156 is also involved in the PLC/HLA association and if polymorphism at this position affect TPN dependency through alteration of the structure and property of the PBR and subsequently the peptide repertoire, we investigated the mode of peptide loading for the

Our data demonstrates that exclusively HLA-B\*44:28156Arg variant can acquire peptides independently of TPN and that AA position 156 is unambiguously responsible for the HLA/TPN interaction within B\*44 subtypes. Based on its position and orientation, residue 156 is unlikely to contact TPN directly. Similarly, TPN-dependent B\*44:02 and TPN-independent B\*44:05 alleles with a micropolymorphic difference at residue 116 also appear unlikely to contact TPN directly. Although AA residue 156 is not a part of the first segment of α2-helix, it is likely that it influences the strand/loop region that TPN interacts with and in a similar

manner to residue 116 affects the stability/dynamics of the unloaded HLA molecule.

**9. Impact of mismatch at residue 156 in B\*44 allotypes**

system.

[79] [80].

B\*44/156 mismatched variants.

Some of the more recent studies have shed light on the functional consequences of HLA class I polymorphisms in modulating the presented peptide repertoire. It could be demonstrated that the TPN dependent B\*44:02116Asp and TPN-independent variant B\*44:05116Tyr differed in their preference at the PΩ anchor residue [64]. Binding preference of HLA-B\*44:05 at P9 was restricted to Phe while B\*44:02 showed preference for both Phe and Tyr at this position, largely due to the ability of Asp116 in B\*44:02 to make hydrogen bond with Tyr at P9 [64]. In yet another study, it was demonstrated that although the surface expression of HLA-B\*27:05 was similar both in the presence and absence of TPN, there was a difference in the cytotoxic lysis of B\*27:05 targets upon infection with recombinant vaccina viruses under these two circum‐ stances. Measurement of cytotoxicity at four hours post infection demonstrated that the lysis of B\*27:05/TPN negative targets was only half the cytotoxicity level observed for B\*27:05/TPN positive target cells. At 12 hours post infection, the cytotoxic lysis of B\*27:05/TPN negative targets was similar to B\*27:05/TPN positive target cells. However, this study pointed out an impairment in the presentation of specific viral peptides by B\*27:05 in the absence of TPN. Although there was some overlap in the peptides presented in the presence and absence of TPN, unique set of peptides were selected and presented by B\*27:05 under these two condi‐ tions. B\*27:05 molecules on the surface of cells lacking TPN are more unstable probably owing to the nature of peptides selected and bound by them in the absence of TPN [33].

Studies examining single AA exchanges within the hc of naturally occurring HLA class I alleles have identified some of the residues in the α2 domain which are of critical importance for the interaction of class I molecules with the PLC components. Elliott *et al* demonstrated that the replacement of Thr with Lys at residue 134 (T134K) in HLA-A2 resulted in disruption of the interaction between class I and the PLC components [47, 52]. In contrast, replacement of Ser with Cys at residue 132 (S132C) in these molecules resulted in prolonged association of class I molecule with the PLC components, slower maturation of the complex and binding of optimal high-affinity ligands [52].Yu *et al* have shown that residues 128-136 occurring in the α2 domain play an important role in peptide loading and formation of the class I loading complex [49]. These studies have led to the identification of a putative PLC binding surface of the class I heterodimer that is located on the α2 domain of the molecule. The surface on the class I molecule that these regions contribute to defines a pronounced groove which might form a docking structure for one or more components of the PLC. Also, the conserved disulphide bond between AAs Cys101 and Cys164 are located within this region of the α2 domain. This disulphide bond is responsible for linking the α2 helix to the peptide binding floor and isomerisation of this bond has been implicated during peptide binding [74].

Previous studies have demonstrated that the nature of AAs occurring at residues 114 or 116 determine the interaction of these different class I allotypes with the PLC components [64]. These two residues are located in the F pocket of the PBR, that interacts with the C-terminal peptide residue and thus determine the nature of a bound peptide. It is found that certain AA polymorphisms occurring at these positions result in loading and presentation of peptides independent of the loading complex (TAP/ TPN) [64, 71] via a non-classical pathway resulting in the presentation of pHLA complexes, that might be poorly tolerated by the self immune system.

### **9. Impact of mismatch at residue 156 in B\*44 allotypes**

be involved in the association of TPN with the class I heavy chain, which in turn regulates the differential binding of position 116 variants with TAP. However, it is seen that HLA-B\*44:02 and B\*44:03 both have an Asp at position 116 and yet associate differentially with TAP. These two alleles differ from each other by a single AA at position 156. While HLA-B\*44:02 binds efficiently to TAP, B\*44:03 is found to be a weak TAP binder. The authors pointed out that the AA at position 156 located at the centre of the α2 helix of class I heavy chains might determine the strong and the weak binding of HLA-B\*44:02 and B\*44:03 respectively to TAP [70].

50 HLA and Associated Important Diseases

Some of the more recent studies have shed light on the functional consequences of HLA class I polymorphisms in modulating the presented peptide repertoire. It could be demonstrated that the TPN dependent B\*44:02116Asp and TPN-independent variant B\*44:05116Tyr differed in their preference at the PΩ anchor residue [64]. Binding preference of HLA-B\*44:05 at P9 was restricted to Phe while B\*44:02 showed preference for both Phe and Tyr at this position, largely due to the ability of Asp116 in B\*44:02 to make hydrogen bond with Tyr at P9 [64]. In yet another study, it was demonstrated that although the surface expression of HLA-B\*27:05 was similar both in the presence and absence of TPN, there was a difference in the cytotoxic lysis of B\*27:05 targets upon infection with recombinant vaccina viruses under these two circum‐ stances. Measurement of cytotoxicity at four hours post infection demonstrated that the lysis of B\*27:05/TPN negative targets was only half the cytotoxicity level observed for B\*27:05/TPN positive target cells. At 12 hours post infection, the cytotoxic lysis of B\*27:05/TPN negative targets was similar to B\*27:05/TPN positive target cells. However, this study pointed out an impairment in the presentation of specific viral peptides by B\*27:05 in the absence of TPN. Although there was some overlap in the peptides presented in the presence and absence of TPN, unique set of peptides were selected and presented by B\*27:05 under these two condi‐ tions. B\*27:05 molecules on the surface of cells lacking TPN are more unstable probably owing

to the nature of peptides selected and bound by them in the absence of TPN [33].

isomerisation of this bond has been implicated during peptide binding [74].

Studies examining single AA exchanges within the hc of naturally occurring HLA class I alleles have identified some of the residues in the α2 domain which are of critical importance for the interaction of class I molecules with the PLC components. Elliott *et al* demonstrated that the replacement of Thr with Lys at residue 134 (T134K) in HLA-A2 resulted in disruption of the interaction between class I and the PLC components [47, 52]. In contrast, replacement of Ser with Cys at residue 132 (S132C) in these molecules resulted in prolonged association of class I molecule with the PLC components, slower maturation of the complex and binding of optimal high-affinity ligands [52].Yu *et al* have shown that residues 128-136 occurring in the α2 domain play an important role in peptide loading and formation of the class I loading complex [49]. These studies have led to the identification of a putative PLC binding surface of the class I heterodimer that is located on the α2 domain of the molecule. The surface on the class I molecule that these regions contribute to defines a pronounced groove which might form a docking structure for one or more components of the PLC. Also, the conserved disulphide bond between AAs Cys101 and Cys164 are located within this region of the α2 domain. This disulphide bond is responsible for linking the α2 helix to the peptide binding floor and

Recently the association of TPN with HLA subtypes featuring micro-polymorphisms at AA position 156 was discovered [18]. This position is part of the pockets D and E within the peptide binding region and contacts peptide of canonical length at positions p3 and p7 [75], explaining its distinct structural role in influencing the conformation of the pHLA complex [76] [77]. Polymorphisms at residue 156 represent one of the most non-permissive transplantation scenarios and are associated with acute GvHD for HLA-A, -B and -C alleles [58] [78] [76] [77] [79] [80].

The HLA-B\*44 allelic group occurring in approximately 25 % of the Caucasian population has four naturally occurring variants (B\*44:02156Asp, 44:03156Leu, 44:28156Arg, 44:35156Glu), which exclusively differ by just one single AA at residue 156. The mismatch B\*44:02156Asp / B\*44:03156Leu e.g. is described to represent a non-permissive transplantation scenario. The association with strong alloreactive T-cell responses due to distinct structural differences between B\*44:02 and B\*44:03 pHLA complexes leads to acute GvHD [79]. It has also been demonstrated that the resulting mismatch leads to a disparity in the derived peptide repertoire, which explains the cytotoxic T-lymphocyte recognition of different pHLA landscapes between B\*44:02 and B\*44:03. Structural involvement of position 156 in influencing the conformation of PBR was demonstrated by comparing the crystal structures of HLA-B\*44:02 and HLA-B\*44:03 complexed with the same, natural high affinity ligand [18]. In order to determine if position 156 is also involved in the PLC/HLA association and if polymorphism at this position affect TPN dependency through alteration of the structure and property of the PBR and subsequently the peptide repertoire, we investigated the mode of peptide loading for the B\*44/156 mismatched variants.

Our data demonstrates that exclusively HLA-B\*44:28156Arg variant can acquire peptides independently of TPN and that AA position 156 is unambiguously responsible for the HLA/TPN interaction within B\*44 subtypes. Based on its position and orientation, residue 156 is unlikely to contact TPN directly. Similarly, TPN-dependent B\*44:02 and TPN-independent B\*44:05 alleles with a micropolymorphic difference at residue 116 also appear unlikely to contact TPN directly. Although AA residue 156 is not a part of the first segment of α2-helix, it is likely that it influences the strand/loop region that TPN interacts with and in a similar manner to residue 116 affects the stability/dynamics of the unloaded HLA molecule.

By systematically analysing the influence of residue 156 in B\*44 variants and their interaction with TPN could clearly be demonstrated. Using mass spectrometry we sequenced those peptides derived from B\*44:02 aquired with the assistance of TPN and hence through the optimization machinery of the PLC and compared those with peptides bound to B\*44:28 aquired in a TPN-independent manner. Significant differences between these sets of peptides could be observed, both in their attributed binding affinity and in the length of the derived peptides. The peptide repertoires of sHLA-B\*44:02 and sHLA-B\*44:28 display subtle differen‐ ces, suggesting an alternate antigen presentation pathway, the core binding motifs are strongly retained [18]. The results from the structural insight through computational analysis indicated a role for 156Arg in increasing the stability of the pHLA complex through contacts to both Asp114 and to peptide backbone at P5 (Figure 3). **Figure 3. B\*44/156 substitution model** 

**Figure 3. Based on the B\*44:02 structure (1M6O) (76) all 20 AAs were modelled at position 156 fitting the best side chain rotamer. Arg156 shows increased hydrogen bonding both to residue Asp114 and to peptide backbone. This is likely**  Based on the B\*44:02 structure (1M6O) [76] all 20 AAs were modelled at position 156 fitting the best side chain ro‐ tamer. Arg156 shows increased hydrogen bonding both to residue Asp114 and to peptide backbone. This is likely to increase stability of the HLA-peptide complex.

**to increase stability of the HLA-peptide complex. Figure 3.** B\*44/156 substitution model

Our results indicate that the HLA-B\*44:28156Arg variant stabilises the binding groove in its empty state, thus negating the contribution of the PLC and allowing independent loading of high affinity peptides. The interaction between Arg156 and Asp114 on the floor of the peptide binding groove seems to be able to generate a stable peptide receptive state. Our results indicate that the HLA-B\*44:28156Arg variant stabilises the binding groove in its empty state, thus negating the contribution of the PLC and allowing independent loading of high affinity peptides. The interaction between Arg156 and Asp114 on the floor of the peptide binding groove seems to be able to generate a stable peptide receptive state.

10

**10. Conclusion**

ios.

**Nomenclature**

**•** T cell receptor (TCR)

**•** amino acid (AA)

**•** heavy chain (hc)

**•** tapasin (TPN)

**•** Calnexin (CNX)

**•** Calreticulin (CRT)

**•** β2 microglobulin (β2m)

**•** peptide loading complex (PLC)

**•** endoplasmic reticulum (ER)

**•** cytotoxic T lymphocyte (CTL)

**•** human cytomegalie virus (HCMV)

**•** Human leucocyte antigens (HLA)

**•** peptide-HLA complexes (pHLA)

**•** graft-versus-host disease (GvHD)

**•** peptide-binding region (PBR)

**•** Major Histocompatibility Complex (MHC)

**•** hematopoietic stem cell transplantation (HSCT)

**•** transporter associated with antigen processing (TAP)

TPN independency offers flexibility on one hand, because it provides an effective pathogen evasion, however peptides are loaded suboptimally and that might influence the immunoge‐

HLA Class I Polymorphism and Tapasin Dependency

http://dx.doi.org/10.5772/57495

53

The question whether TPN-dependency or TPN-independency is advantageous or not is likely to depend on the combination of HLA-A, -B, and C- alleles of an individual. An appreciation of the interaction between TPN, HLA class I molecules and peptide loading may therefore be important not only during viral infections, but also while considering transplantation scenar‐

nicity and half-life time of pHLA complexes and this might result in autoimmunity.

### **10. Conclusion**

By systematically analysing the influence of residue 156 in B\*44 variants and their interaction with TPN could clearly be demonstrated. Using mass spectrometry we sequenced those peptides derived from B\*44:02 aquired with the assistance of TPN and hence through the optimization machinery of the PLC and compared those with peptides bound to B\*44:28 aquired in a TPN-independent manner. Significant differences between these sets of peptides could be observed, both in their attributed binding affinity and in the length of the derived peptides. The peptide repertoires of sHLA-B\*44:02 and sHLA-B\*44:28 display subtle differen‐ ces, suggesting an alternate antigen presentation pathway, the core binding motifs are strongly retained [18]. The results from the structural insight through computational analysis indicated a role for 156Arg in increasing the stability of the pHLA complex through contacts to both

**Based on the B\*44:02 structure (1M6O) (76) all 20 AAs were modelled at position 156 fitting the best side chain rotamer. Arg156 shows increased hydrogen bonding both to residue Asp114 and to peptide backbone. This is likely** 

Arg156

Our results indicate that the HLA-B\*44:28156Arg variant stabilises the binding groove in its empty state, thus negating the contribution of the PLC and allowing independent loading of high affinity peptides. The interaction between Arg156 and

Asp114 on the floor of the peptide binding groove seems to be able to generate a stable peptide receptive state.

binding groove seems to be able to generate a stable peptide receptive state.

Our results indicate that the HLA-B\*44:28156Arg variant stabilises the binding groove in its empty state, thus negating the contribution of the PLC and allowing independent loading of high affinity peptides. The interaction between Arg156 and Asp114 on the floor of the peptide

Based on the B\*44:02 structure (1M6O) [76] all 20 AAs were modelled at position 156 fitting the best side chain ro‐ tamer. Arg156 shows increased hydrogen bonding both to residue Asp114 and to peptide backbone. This is likely to

Asp114 and to peptide backbone at P5 (Figure 3).

**to increase stability of the HLA-peptide complex.** 

increase stability of the HLA-peptide complex.

**Figure 3.** B\*44/156 substitution model

Arg97

Asp114

**Figure 3. B\*44/156 substitution model** 

52 HLA and Associated Important Diseases

**Figure 3.** 

TPN independency offers flexibility on one hand, because it provides an effective pathogen evasion, however peptides are loaded suboptimally and that might influence the immunoge‐ nicity and half-life time of pHLA complexes and this might result in autoimmunity.

The question whether TPN-dependency or TPN-independency is advantageous or not is likely to depend on the combination of HLA-A, -B, and C- alleles of an individual. An appreciation of the interaction between TPN, HLA class I molecules and peptide loading may therefore be important not only during viral infections, but also while considering transplantation scenar‐ ios.

### **Nomenclature**


10

**•** Calreticulin (CRT)

### **Acknowledgements**

The authors would like to thank Heike Kunze-Schumacher for excellent technical assistance.

[9] Mickelson EM, Petersdorf E, Anasetti C, Martin P, Woolfrey A, Hansen JA. HLA matching in hematopoietic cell transplantation. Hum Immunol. 2000 Feb;61(2):

HLA Class I Polymorphism and Tapasin Dependency

http://dx.doi.org/10.5772/57495

55

[10] Ottinger HD, Ferencik S, Beelen DW, Lindemann M, Peceny R, Elmaagacli AH, et al. Hematopoietic stem cell transplantation: contrasting the outcome of transplantations from HLA-identical siblings, partially HLA-mismatched related donors, and HLA-

[11] Schaffer M, Aldener-Cannava A, Remberger M, Ringden O, Olerup O. Roles of HLA-B, HLA-C and HLA-DPA1 incompatibilities in the outcome of unrelated stem-cell

[12] Bade-Doeding C, Cano P, Huyton T, Badrinath S, Eiz-Vesper B, Hiller O, et al. Mis‐ matches outside exons 2 and 3 do not alter the peptide motif of the allele group

[13] Bade-Doeding C, DeLuca DS, Seltsam A, Blasczyk R, Eiz-Vesper B. Amino acid 95 causes strong alteration of peptide position Pomega in HLA-B\*41 variants. Immuno‐

[14] Bade-Doeding C, Eiz-Vesper B, Figueiredo C, Seltsam A, Elsner HA, Blasczyk R. Pep‐ tide-binding motif of HLA-A\*6603. Immunogenetics. 2005 Jan;56(10):769-72.

[15] Bade-Doeding C, Elsner HA, Eiz-Vesper B, Seltsam A, Holtkamp U, Blasczyk R. A single amino-acid polymorphism in pocket A of HLA-A\*6602 alters the auxiliary an‐ chors compared with HLA-A\*6601 ligands. Immunogenetics. 2004 May;56(2):83-8. [16] Bade-Doeding C, Huyton T, Eiz-Vesper B, Blasczyk R. The composition of the F pocket in HLA-A\*74 generates C-terminal promiscuity among its bound peptides.

[17] Badrinath S, Huyton T, Schumacher H, Blasczyk R, Bade-Doeding C. Position 45 in‐ fluences the peptide binding motif of HLA-B\*44:08. Immunogenetics. Mar;64(3):

[18] Badrinath S, Saunders P, Huyton T, Aufderbeck S, Hiller O, Blasczyk R, et al. Posi‐ tion 156 influences the peptide repertoire and tapasin dependency of human leuko‐

[19] Elamin NE, Bade-Doeding C, Blasczyk R, Eiz-Vesper B. Polymorphism between HLA-A\*0301 and A\*0302 located outside the pocket F alters the POmega peptide mo‐

[20] Huyton T, Ladas N, Schumacher H, Blasczyk R, Bade-Doeding C. Pocketcheck: up‐ dating the HLA class I peptide specificity roadmap. Tissue Antigens. Sep;80(3):

cyte antigen B\*44 allotypes. Haematologica. Jan;97(1):98-106.

matched unrelated donors. Blood. 2003 Aug 1;102(3):1131-7.

transplantation. Tissue Antigens. 2003 Sep;62(3):243-50.

B\*44:02P. Hum Immunol. Nov;72(11):1039-44.

genetics. 2007 Apr;59(4):253-9.

Tissue Antigens. Nov;78(5):378-81.

tif. Tissue Antigens. Dec;76(6):487-90.

245-9.

239-48.

92-100.

### **Author details**

Soumya Badrinath, Trevor Huyton, Rainer Blasczyk and Christina Bade-Doeding

\*Address all correspondence to: bade-doeding.christina@mh-hannover.de

Institute for Transfusion Medicine, Hannover Medical School, Hannover, Germany

### **References**


[9] Mickelson EM, Petersdorf E, Anasetti C, Martin P, Woolfrey A, Hansen JA. HLA matching in hematopoietic cell transplantation. Hum Immunol. 2000 Feb;61(2): 92-100.

**Acknowledgements**

54 HLA and Associated Important Diseases

**Author details**

**References**

The authors would like to thank Heike Kunze-Schumacher for excellent technical assistance.

Soumya Badrinath, Trevor Huyton, Rainer Blasczyk and Christina Bade-Doeding

Institute for Transfusion Medicine, Hannover Medical School, Hannover, Germany

[1] Germain RN, Margulies DH. The biochemistry and cell biology of antigen processing

[2] Zinkernagel RM, Doherty PC. Restriction of in vitro T cell-mediated cytotoxicity in lymphocytic choriomeningitis within a syngeneic or semiallogeneic system. Nature.

[3] Marsh SG, Albert ED, Bodmer WF, Bontrop RE, Dupont B, Erlich HA, et al. An up‐ date to HLA nomenclature, 2010. Bone Marrow Transplant. May;45(5):846-8.

[4] Bjorkman PJ, Saper MA, Samraoui B, Bennett WS, Strominger JL, Wiley DC. Struc‐ ture of the human class I histocompatibility antigen, HLA-A2. Nature. [Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, P.H.S.]. 1987 Oct

[5] Bjorkman PJ, Parham P. Structure, function, and diversity of class I major histocom‐

[6] Klein J, Sato A. The HLA system. First of two parts. N Engl J Med. 2000 Sep 7;343(10):

[7] Falk K, Rotzschke O, Stevanovic S, Jung G, Rammensee HG. Allele-specific motifs re‐ vealed by sequencing of self-peptides eluted from MHC molecules. Nature. 1991

[8] Zuckerman T, Rowe JM. Alternative donor transplantation in acute myeloid leuke‐

mia: which source and when? Curr Opin Hematol. 2007 Mar;14(2):152-61.

patibility complex molecules. Annu Rev Biochem. 1990;59:253-88.

\*Address all correspondence to: bade-doeding.christina@mh-hannover.de

and presentation. Annu Rev Immunol. 1993;11:403-50.

1974 Apr 19;248(450):701-2.

8-14;329(6139):506-12.

May 23;351(6324):290-6.

702-9.


[21] Huyton T, Schumacher H, Blasczyk R, Bade-Doeding C. Residue 81 confers a restrict‐ ed C-terminal peptide binding motif in HLA-B\*44:09. Immunogenetics. Sep;64(9): 663-8.

[33] Peh CA, Burrows SR, Barnden M, Khanna R, Cresswell P, Moss DJ, et al. HLA-B27 restricted antigen presentation in the absence of tapasin reveals polymorphism in

HLA Class I Polymorphism and Tapasin Dependency

http://dx.doi.org/10.5772/57495

57

[34] Bennett EM, Bennink JR, Yewdell JW, Brodsky FM. Cutting edge: adenovirus E19 has two mechanisms for affecting class I MHC expression. J Immunol. 1999 May 1;162(9):

[35] Byun MW, Kim JH, Kim DH, Kim HJ, Jo C. Effects of irradiation and sodium hypo‐ chlorite on the micro-organisms attached to a commercial food container. Food Mi‐

[36] Lilley BN, Ploegh HL. Viral modulation of antigen presentation: manipulation of cel‐

[37] Lybarger L, Wang X, Harris M, Hansen TH. Viral immune evasion molecules attack the ER peptide-loading complex and exploit ER-associated degradation pathways.

[38] Wiertz EJ, Jones TR, Sun L, Bogyo M, Geuze HJ, Ploegh HL. The human cytomegalo‐ virus US11 gene product dislocates MHC class I heavy chains from the endoplasmic

[39] Wiertz EJ, Tortorella D, Bogyo M, Yu J, Mothes W, Jones TR, et al. Sec61-mediated transfer of a membrane protein from the endoplasmic reticulum to the proteasome

[40] Schwartz O, Marechal V, Le Gall S, Lemonnier F, Heard JM. Endocytosis of major histocompatibility complex class I molecules is induced by the HIV-1 Nef protein.

[41] Collins KL, Chen BK, Kalams SA, Walker BD, Baltimore D. HIV-1 Nef protein pro‐ tects infected primary cells against killing by cytotoxic T lymphocytes. Nature. 1998

[42] Reusch U, Muranyi W, Lucin P, Burgert HG, Hengel H, Koszinowski UH. A cytome‐ galovirus glycoprotein re-routes MHC class I complexes to lysosomes for degrada‐

[43] Lehner PJ, Surman MJ, Cresswell P. Soluble tapasin restores MHC class I expression and function in the tapasin-negative cell line.220. Immunity. 1998 Feb;8(2):221-31.

[44] Li S, Paulsson KM, Chen S, Sjogren HO, Wang P. Tapasin is required for efficient peptide binding to transporter associated with antigen processing. J Biol Chem. 2000

[45] Barnden MJ, Purcell AW, Gorman JJ, McCluskey J. Tapasin-mediated retention and optimization of peptide ligands during the assembly of class I molecules. J Immunol.

lular targets in the ER and beyond. Immunol Rev. 2005 Oct;207:126-44.

mechanisms of HLA class I peptide loading. Immunity. 1998 May;8(5):531-42.

5049-52.

crobiol. 2007 Aug;24(5):544-8.

Nat Med. 1996 Mar;2(3):338-42.

tion. Embo J. 1999 Feb 15;18(4):1081-91.

Jan 22;391(6665):397-401.

Jan 21;275(3):1581-6.

2000 Jul 1;165(1):322-30.

Curr Opin Immunol. 2005 Feb;17(1):71-8.

reticulum to the cytosol. Cell. 1996 Mar 8;84(5):769-79.

for destruction. Nature. 1996 Dec 5;384(6608):432-8.


[33] Peh CA, Burrows SR, Barnden M, Khanna R, Cresswell P, Moss DJ, et al. HLA-B27 restricted antigen presentation in the absence of tapasin reveals polymorphism in mechanisms of HLA class I peptide loading. Immunity. 1998 May;8(5):531-42.

[21] Huyton T, Schumacher H, Blasczyk R, Bade-Doeding C. Residue 81 confers a restrict‐ ed C-terminal peptide binding motif in HLA-B\*44:09. Immunogenetics. Sep;64(9):

[22] Krausa P, Munz C, Keilholz W, Stevanovic S, Jones EY, Browning M, et al. Definition of peptide binding motifs amongst the HLA-A\*30 allelic group. Tissue Antigens.

[23] Prilliman K, Lindsey M, Zuo Y, Jackson KW, Zhang Y, Hildebrand W. Large-scale production of class I bound peptides: assigning a signature to HLA-B\*1501. Immuno‐

[24] Prilliman KR, Jackson KW, Lindsey M, Wang J, Crawford D, Hildebrand WH. HLA-B15 peptide ligands are preferentially anchored at their C termini. J Immunol. 1999

[25] Bray RA, Hurley CK, Kamani NR, Woolfrey A, Muller C, Spellman S, et al. National marrow donor program HLA matching guidelines for unrelated adult donor hema‐ topoietic cell transplants. Biol Blood Marrow Transplant. 2008 Sep;14(9 Suppl):45-53.

[26] Morishima Y, Kawase T, Malkki M, Petersdorf EW. Effect of HLA-A2 allele disparity on clinical outcome in hematopoietic cell transplantation from unrelated donors. Tis‐

[27] Morishima Y, Sasazuki T, Inoko H, Juji T, Akaza T, Yamamoto K, et al. The clinical significance of human leukocyte antigen (HLA) allele compatibility in patients re‐ ceiving a marrow transplant from serologically HLA-A, HLA-B, and HLA-DR

[28] Petersdorf EW. Optimal HLA matching in hematopoietic cell transplantation. Curr

[29] Petersdorf EW, Hansen JA, Martin PJ, Woolfrey A, Malkki M, Gooley T, et al. Majorhistocompatibility-complex class I alleles and antigens in hematopoietic-cell trans‐

[30] Schoenhals GJ, Krishna RM, Grandea AG, 3rd, Spies T, Peterson PA, Yang Y, et al. Retention of empty MHC class I molecules by tapasin is essential to reconstitute anti‐

[31] Ortmann B, Copeman J, Lehner PJ, Sadasivan B, Herberg JA, Grandea AG, et al. A critical role for tapasin in the assembly and function of multimeric MHC class I-TAP

[32] Lee S, Yoon J, Park B, Jun Y, Jin M, Sung HC, et al. Structural and functional dissec‐ tion of human cytomegalovirus US3 in binding major histocompatibility complex

gen presentation in invertebrate cells. Embo J. 1999 Feb 1;18(3):743-53.

matched unrelated donors. Blood. 2002 Jun 1;99(11):4200-6.

plantation. N Engl J Med. 2001 Dec 20;345(25):1794-800.

complexes. Science. 1997 Aug 29;277(5330):1306-9.

class I molecules. J Virol. 2000 Dec;74(23):11262-9.

663-8.

56 HLA and Associated Important Diseases

2000 Jul;56(1):10-8.

genetics. 1997;45(6):379-85.

Jun 15;162(12):7277-84.

sue Antigens. 2007 Apr;69 Suppl 1:31-5.

Opin Immunol. 2008 Oct;20(5):588-93.


[46] Purcell AW, Kelly AJ, Peh CA, Dudek NL, McCluskey J. Endogenous and exogenous factors contributing to the surface expression of HLA B27 on mutant APC. Hum Im‐ munol. 2000 Feb;61(2):120-30.

[58] Williams AP, Peh CA, Purcell AW, McCluskey J, Elliott T. Optimization of the MHC class I peptide cargo is dependent on tapasin. Immunity. 2002 Apr;16(4):509-20. [59] Wearsch PA, Cresswell P. Selective loading of high-affinity peptides onto major his‐ tocompatibility complex class I molecules by the tapasin-ERp57 heterodimer. Nat Im‐

HLA Class I Polymorphism and Tapasin Dependency

http://dx.doi.org/10.5772/57495

59

[60] Praveen PV, Yaneva R, Kalbacher H, Springer S. Tapasin edits peptides on MHC class I molecules by accelerating peptide exchange. Eur J Immunol. Jan;40(1):214-24.

[61] Barber LD, Howarth M, Bowness P, Elliott T. The quantity of naturally processed peptides stably bound by HLA-A\*0201 is significantly reduced in the absence of ta‐

[62] Zarling AL, Luckey CJ, Marto JA, White FM, Brame CJ, Evans AM, et al. Tapasin is a facilitator, not an editor, of class I MHC peptide binding. J Immunol. 2003 Nov

[63] Howarth M, Williams A, Tolstrup AB, Elliott T. Tapasin enhances MHC class I pep‐ tide presentation according to peptide half-life. Proc Natl Acad Sci U S A. 2004 Aug

[64] Zernich D, Purcell AW, Macdonald WA, Kjer-Nielsen L, Ely LK, Laham N, et al. Nat‐ ural HLA class I polymorphism controls the pathway of antigen presentation and

[65] Sieker F, Straatsma TP, Springer S, Zacharias M. Differential tapasin dependence of MHC class I molecules correlates with conformational changes upon peptide dissoci‐ ation: a molecular dynamics simulation study. Mol Immunol. 2008 Aug;45(14):

[66] Fussell HE, Kunkel LE, Lewy CS, McFarland BH, McCarty D. Using a standardized patient walk-through to improve implementation of clinical trials. J Subst Abuse

[67] Greenwood R, Shimizu Y, Sekhon GS, DeMars R. Novel allele-specific, post-transla‐ tional reduction in HLA class I surface expression in a mutant human B cell line. J

[68] Grandea AG, 3rd, Androlewicz MJ, Athwal RS, Geraghty DE, Spies T. Dependence of peptide binding by MHC class I molecules on their interaction with TAP. Science.

[69] Lewis JW, Sewell A, Price D, Elliott T. HLA-A\*0201 presents TAP-dependent peptide epitopes to cytotoxic T lymphocytes in the absence of tapasin. Eur J Immunol. 1998

susceptibility to viral evasion. J Exp Med. 2004 Jul 5;200(1):13-24.

munol. [Research Support, Non-U.S. Gov't]. 2007 Aug;8(8):873-81.

pasin. Tissue Antigens. 2001 Dec;58(6):363-8.

15;171(10):5287-95.

10;101(32):11737-42.

Treat. 2008 Dec;35(4):470-5.

1995 Oct 6;270(5233):105-8.

Oct;28(10):3214-20.

Immunol. 1994 Dec 15;153(12):5525-36.

3714-22.


[58] Williams AP, Peh CA, Purcell AW, McCluskey J, Elliott T. Optimization of the MHC class I peptide cargo is dependent on tapasin. Immunity. 2002 Apr;16(4):509-20.

[46] Purcell AW, Kelly AJ, Peh CA, Dudek NL, McCluskey J. Endogenous and exogenous factors contributing to the surface expression of HLA B27 on mutant APC. Hum Im‐

[47] Lewis JW, Elliott T. Evidence for successive peptide binding and quality control

[48] Yu YY, Myers NB, Hilbert CM, Harris MR, Balendiran GK, Hansen TH. Definition and transfer of a serological epitope specific for peptide-empty forms of MHC class I.

[49] Yu YY, Turnquist HR, Myers NB, Balendiran GK, Hansen TH, Solheim JC. An exten‐ sive region of an MHC class I alpha 2 domain loop influences interaction with the

[50] Suh WK, Derby MA, Cohen-Doyle MF, Schoenhals GJ, Fruh K, Berzofsky JA, et al. Interaction of murine MHC class I molecules with tapasin and TAP enhances peptide loading and involves the heavy chain alpha3 domain. J Immunol. 1999 Feb 1;162(3):

[51] Carreno BM, Schreiber KL, McKean DJ, Stroynowski I, Hansen TH. Aglycosylated and phosphatidylinositol-anchored MHC class I molecules are associated with cal‐ nexin. Evidence implicating the class I-connecting peptide segment in calnexin asso‐

[52] Lewis JW, Neisig A, Neefjes J, Elliott T. Point mutations in the alpha 2 domain of HLA-A2.1 define a functionally relevant interaction with TAP. Curr Biol. 1996 Jul

[53] Peace-Brewer AL, Tussey LG, Matsui M, Li G, Quinn DG, Frelinger JA. A point mu‐ tation in HLA-A\*0201 results in failure to bind the TAP complex and to present vi‐

[54] Dong G, Wearsch PA, Peaper DR, Cresswell P, Reinisch KM. Insights into MHC class I peptide loading from the structure of the tapasin-ERp57 thiol oxidoreductase heter‐

[55] Garstka MA, Fritzsche S, Lenart I, Hein Z, Jankevicius G, Boyle LH, et al. Tapasin de‐ pendence of major histocompatibility complex class I molecules correlates with their

[56] Chen M, Bouvier M. Analysis of interactions in a tapasin/class I complex provides a mechanism for peptide selection. EMBO J. [Research Support, N.I.H., Extramural Re‐

[57] Garbi N, Tan P, Diehl AD, Chambers BJ, Ljunggren HG, Momburg F, et al. Impaired immune responses and altered peptide repertoire in tapasin-deficient mice. Nat Im‐

rus-derived peptides to CTL. Immunity. 1996 May;4(5):505-14.

stages during MHC class I assembly. Curr Biol. 1998 Jun 4;8(12):717-20.

assembly complex. J Immunol. 1999 Oct 15;163(8):4427-33.

ciation. J Immunol. 1995 May 15;154(10):5173-80.

odimer. Immunity. 2009 Jan 16;30(1):21-32.

munol. 2000 Sep;1(3):234-8.

conformational flexibility. Faseb J. Nov;25(11):3989-98.

search Support, Non-U.S. Gov't]. 2007 Mar 21;26(6):1681-90.

munol. 2000 Feb;61(2):120-30.

58 HLA and Associated Important Diseases

1530-40.

1;6(7):873-83.

Int Immunol. 1999 Dec;11(12):1897-906.


[70] Neisig A, Wubbolts R, Zang X, Melief C, Neefjes J. Allele-specific differences in the interaction of MHC class I molecules with transporters associated with antigen proc‐ essing. J Immunol. 1996 May 1;156(9):3196-206.

**Chapter 3**

**HLA-E, HLA-F and HLA-G — The Non-Classical Side of the**

Traditionally, the MHC is divided into the classes containing groups of genes with related functions; the MHC class I and II genes encode for human leukocyte antigens (HLA), proteins that are displayed on the cell surface. In humans, MHC class I molecules com‐ prise the classical (class I-a) HLA-A, -B, and -C, and the non-classical (class I-b) HLA-E, - F, -G and – H (HFE) molecules (Pietra et al., 2009). Both categories are similar in their mechanisms of peptide binding and presentation and in the induced T-cell responses (Rodgers and Cook, 2005). The most remarkable feature of MHC class I-b molecules is their highly conserved nature (van Hall et al., 2010). In contrast with class Ia molecules they have been under a distinct selective pressure, exhibiting very low levels of allelic polymor‐ phism (Strong et al., 2003). Classical MHC class I gene transcription is mediated by several cis-acting regulatory elements, in the proximal promoter region (Gobin and van den Elsen, 2000). Those elements determine the constitutive and cytokine induced expression levels of

The literature on the different roles played by class I-b molecules is in rapid expansion and focus in pathogen recognition, virus-induced immunopathology, tumor immunosurveillance and regulation of autoimmunity (Hofstetter et al., 2011). The HLA-G, HLA-E and HLA-F genes encode for molecules that have been shown to be involved in regulation of autoimmune disease (Donadi et al., 2011; Kim et al., 2008). HLA-G biological features include: restricted tissue expression, the presence of membrane bound and soluble isoforms, generated by alternative splicing, limited protein variability, unique molecular structure, with a reduced citoplasmic

> © 2014 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Iris Foroni, Ana Rita Couto, Bruno Filipe Bettencourt,

Margarida Santos, Manuela Lima and

Additional information is available at the end of the chapter

the molecule (Gobin and van den Elsen, 2000).

**MHC Cluster**

Jácome Bruges-Armas

http://dx.doi.org/10.5772/57507

**1. Introduction**

