**Meet the editor**

Mohammad Saber Fallah Nezhad is an associate professor at Yazd University, Iran. He received his BS, MS, and PhD degrees, all in Industrial Engineering, from Sharif University of Technology, Tehran, Iran, under the supervision of Professor Akhavan Niaki. He was a visiting researcher at the Karlsruhe University under the supervision of Professor Juergen Branke. Also, he was award-

ed a silver medal in the 16th National Mathematics Olympiad in Iran. Dr. Fallah Nezhad was ranked first and eighth in the graduate national university comprehensive exam in System Management and Industrial Engineering in Iran, respectively. He was ranked 47th among all high school graduates in Iran. His areas of interest include dynamic programming, quality control, Bayesian inference, and operations research.

Contents

**Preface VII**

**RNA-seq Data 7**

**Analysis Models 21**

**System Analysis 41** Shigeru Kashiwaya

Douglas S. McNair

Bernd Panassiti

**Detection 109**

Chapter 1 **Introductory Chapter: Bayesian Thinking 1** Mohammad Saber Fallah Nezhad

Sunghee Oh and Seongho Song

Chapter 3 **Bayesian Analysis for Hidden Markov Factor**

Chapter 2 **Bayesian Modeling Approaches for Temporal Dynamics in**

Yemao Xia, Xiaoqian Zeng and Niansheng Tang

Chapter 4 **Dynamic Process Model Parameter Estimation by Global**

**Assessing Fairness in Machine-Learning**

Chapter 6 **Using Bayesian Inference to Investigate the Influence of**

Chapter 7 **A Bayesian Hau-Kashyap Approach for Hepatitis Disease**

Y.C. Tang and Moamin A. Mahmoud

**Environmental Factors on a Phytoplasma Disease 97**

Andino Maseleno, Rohmah Zahroh Hidayati, Marini Othman, Alicia

**Decision-Support Models 71**

Chapter 5 **Preventing Disparities: Bayesian and Frequentist Methods for**

## Contents

#### **Preface XI**


**Detection 109** Andino Maseleno, Rohmah Zahroh Hidayati, Marini Othman, Alicia Y.C. Tang and Moamin A. Mahmoud

Preface

in most real issues.

who have contributed to these materials.

This book is an introduction to the mathematical analysis of Bayesian decision-making when the state of the problem is unknown but further data about it can be obtained. The objective of such analysis is to determine the optimal decision or solution that is logically consistent with the preferences of the decision-maker, that can be analyzed using numerical utilities or criteria with the probabilities assigned to the possible state of the problem, such

It is seldom possible to take findings and methodologies from a basic conceptual framework and apply them directly to a real problem. Often, the findings of another field must be adapted to the problem. Thus, we need a methodology that can refine the results and data from different fields in an analytical framework, and lead these data to an appropriate sol‐ ution for the problem at hand. Since every kind of information, including qualitative or quantitative or fuzzy data, can be used in Bayesian thinking, this technique can also be used

The book focuses on the different models of Bayesian inference. It provides a description of methodologies along with the different approaches to Bayesian thinking. Throughout, the text concentrates on the results, rather than on the mathematical detail, but every effort has been made to elaborate the theoretical concepts and methodologies. A somewhat more de‐ tailed description of the topics covered in individual chapters may be needed for those who want to investigate about certain methodologies and basic formulations. The book arose from the materials contributed by scholars in the field of Bayesian inference. We thank all

**Mohammad Saber Fallah Nezhad**

Yazd University Yazd, Iran

that these probabilities are updated by gathering new information.

## Preface

This book is an introduction to the mathematical analysis of Bayesian decision-making when the state of the problem is unknown but further data about it can be obtained. The objective of such analysis is to determine the optimal decision or solution that is logically consistent with the preferences of the decision-maker, that can be analyzed using numerical utilities or criteria with the probabilities assigned to the possible state of the problem, such that these probabilities are updated by gathering new information.

It is seldom possible to take findings and methodologies from a basic conceptual framework and apply them directly to a real problem. Often, the findings of another field must be adapted to the problem. Thus, we need a methodology that can refine the results and data from different fields in an analytical framework, and lead these data to an appropriate sol‐ ution for the problem at hand. Since every kind of information, including qualitative or quantitative or fuzzy data, can be used in Bayesian thinking, this technique can also be used in most real issues.

The book focuses on the different models of Bayesian inference. It provides a description of methodologies along with the different approaches to Bayesian thinking. Throughout, the text concentrates on the results, rather than on the mathematical detail, but every effort has been made to elaborate the theoretical concepts and methodologies. A somewhat more de‐ tailed description of the topics covered in individual chapters may be needed for those who want to investigate about certain methodologies and basic formulations. The book arose from the materials contributed by scholars in the field of Bayesian inference. We thank all who have contributed to these materials.

> **Mohammad Saber Fallah Nezhad** Yazd University Yazd, Iran

**Chapter 1**

Provisional chapter

**Introductory Chapter: Bayesian Thinking**

Bayesian inference is developed based on the simple Bayesian rule in the probability theory but this method of thinking is one of the most important findings in the history of science. It means that we can modify our beliefs about the nature by gathering data from phenomenon or by analyzing the behavior of people around us or by investigating about historical events.

DOI: 10.5772/intechopen.75053

We start with an estimate of the probability that any claim, belief, hypothesis is true, then look at any new data and update the probability given the new data. Bayes' theorem is a method for analyzing the correctness of beliefs (hypotheses, claims and propositions) based on the best available evidence (observations, data, information). Here is the basic description: initial belief

We change our beliefs with objective information: initial beliefs + new objective data = posterior belief. Each time the system is updated, the posterior becomes the prior of the new stage. It was an evolving system; every bit of new information was closer to correct solution. This technique can be expressed both mathematically and philosophically about how we learn about the universe: that we learn about it through approximation, getting closer and closer to the fact as

Bayesian thinking is based on the idea that we can increase our information about a physical situation than is contained in the data from a single experiment. Bayesian methods can be applied for analyzing the data from different experiments, for example. In other situations, there may be sound reasons about the allowable values that can be assigned to a parameter. But often the data are scarce or noisy or biased or all of these. Experimental results are compared with predicted values, and the predictions are modified by arbitrarily subtracting off the discrepancy based on the difference. When new data are collected, once again the values disagree with predictions, and another "correction" is applied, leading to an aggregate of ad hoc tweaks, certainly not best practice, however common. But Bayesian methods can be

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

Introductory Chapter: Bayesian Thinking

Mohammad Saber Fallah Nezhad

Mohammad Saber Fallah Nezhad

http://dx.doi.org/10.5772/intechopen.75053

plus new evidence = improved belief.

used here to avoid these heuristics [2].

we gather more evidence [1].

1. Introduction

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

#### **Chapter 1** Provisional chapter

#### **Introductory Chapter: Bayesian Thinking** Introductory Chapter: Bayesian Thinking

#### Mohammad Saber Fallah Nezhad Mohammad Saber Fallah Nezhad

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75053

#### 1. Introduction

Bayesian inference is developed based on the simple Bayesian rule in the probability theory but this method of thinking is one of the most important findings in the history of science. It means that we can modify our beliefs about the nature by gathering data from phenomenon or by analyzing the behavior of people around us or by investigating about historical events.

DOI: 10.5772/intechopen.75053

We start with an estimate of the probability that any claim, belief, hypothesis is true, then look at any new data and update the probability given the new data. Bayes' theorem is a method for analyzing the correctness of beliefs (hypotheses, claims and propositions) based on the best available evidence (observations, data, information). Here is the basic description: initial belief plus new evidence = improved belief.

We change our beliefs with objective information: initial beliefs + new objective data = posterior belief. Each time the system is updated, the posterior becomes the prior of the new stage. It was an evolving system; every bit of new information was closer to correct solution. This technique can be expressed both mathematically and philosophically about how we learn about the universe: that we learn about it through approximation, getting closer and closer to the fact as we gather more evidence [1].

Bayesian thinking is based on the idea that we can increase our information about a physical situation than is contained in the data from a single experiment. Bayesian methods can be applied for analyzing the data from different experiments, for example. In other situations, there may be sound reasons about the allowable values that can be assigned to a parameter. But often the data are scarce or noisy or biased or all of these. Experimental results are compared with predicted values, and the predictions are modified by arbitrarily subtracting off the discrepancy based on the difference. When new data are collected, once again the values disagree with predictions, and another "correction" is applied, leading to an aggregate of ad hoc tweaks, certainly not best practice, however common. But Bayesian methods can be used here to avoid these heuristics [2].

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

In fact, we all employ Bayesian inference in decision-making about the real problems in life. When we want to select a solution for a problem among several solutions, then we usually analyze the past data from each solution or we try to predict the outcome of each solution then the solution with higher success probability is selected. Thus, this process is very similar to Bayesian thinking when the decisions are selected based on posterior probabilities updated by Bayesian formula.

In fact, understanding the secret of mankind is still the most important challenge for humans, why the creature is so complex, and why some people come to a point where absolute destruction is called. Perhaps this is because learning the right way to think about is the most important duty of wise people who have been missing out on this training. Perhaps this should

Introductory Chapter: Bayesian Thinking http://dx.doi.org/10.5772/intechopen.75053 3

If we cannot instruct the right way of thinking to humans or we teach wrong beliefs because of the economic benefits and so on, then we have unknowingly created monsters that will once again come back to their sources. If scientists could teach the right thinking to all human beings, then there would never be imaginary thoughts and superstition that this illusion would be the root of the misery of the human race. The most dangerous illusion in the present world is to think that our beliefs and words are based on religion and divine law and it is in the interest of God and the law of nature, while even the foundations of our thinking are verged on

Even if one can analyze the events of human societies on the basis of Bayesian thinking, these methods cannot be implemented because the assumptions of human societies are based on concepts that cannot be explained by the use of classical science, and this may be the main reason for human disasters in the human history. The happiness of man depends on being able

But, we should note that if a wrong thought takes a lot of power, any wrong decision will cause great harm, and it is better to try to survive in this situation. If we look carefully at the present situation of the world and the leaders of the great powers, the equilibrium point of power warfare based on historical data will be probably in a difficult situation unless the wise men take over the world's affairs. If the wrong information is given to even a good person by the time gradually, then the soul of the individual will lose the ability to understand the truth at all because he becomes accustomed to the wrong information, and ultimately, it will be like a human with evil spirit. Usually, when a great mistake or sin is committed, most people will fall

Almost all ethnicities in the world with every religion have proven that their brutality is not limited; therefore, if these brutal thoughts are to be strengthened and combined simultaneously in all parts of the world, it will be a matter of concern. Maybe that is why the talk of the global village can have a reverse effect. When we cannot understand the alarming data in the present world, it may be the time to destroy all the ethical issues of the human race.

Thus, it is possible that many wars in the history of human have occurred due to wrong information while this information is generated by the winners of the wars before the wars.

When the human runs into a lot of trouble so that he cannot tolerate them then unconsciously he will think of supernatural power and ask for help. Perhaps understanding this vague relationship is the most important point in human intervention and fate, while God is not responsible for providing a life with prosperity and happiness to man, but he is surely respon-

Without ethical issues, there would be no reason for human interaction.

stay as secret because of unknown reasons.

actual sacrilege.

to think right.

asleep and false data are generated.

sible for ensuring a decent life for man.

It is seldom possible to take findings and methodologies from a basic conceptual framework and apply them directly to a real problem. Often, the findings of another field must be adapted to the problem. Thus, we need a methodology that can refine the results and data from different fields in an analytical framework such that these data lead to an appropriate solution for the problem at hand. Since every kind of information, including qualitative or quantitative or fuzzy data can be used in Bayesian thinking, so this technique can be used in most real issues.

When choosing the right choice from several available options, we need to analyze the observations from each of the options. Now, if there is some kind of competition between options, we need to balance them somehow so that each of the options can produce observations close to their actual performance. This is the reason for combining game theory and Bayesian inference in real cases.

Also, mathematical methodologies such as game theory and dynamic programming and others should be redefined using Bayesian thinking. For example, in dynamic planning, since decisions are selected stage by stage, after observing information in each stage, we can update the probability functions of random variables and solve the problem for the number of remaining stages that may lead to a change in optimal policy. In the game theory, it can be assumed that after one stage of game and observing the rewards of each decision, it is possible to replay the game, assuming that rewards are random variable such that their probability function is determined based on past data and the optimal decision after each stage may change. The author thinks that three concepts of dynamic programming and game theory and Bayesian inference should be considered in the formulation of each real problem because the interests of decision-makers in each problem are different and the decisions are effective on the future and the next stages and the new data will be available in each new stage thus a new mathematical formulation is needed regarding these issues.

The main disadvantage of Bayesian thinking is the existence of false information that cripples the mechanism of this method and results in disastrous issues in real terms. It should be noted that our view of life is based on the information we receive and, if there is a deviation in the part of the information received, will lead to wrong decisions. Then, we have to treat the human as mad people or animals where his only sin is to receive false information. Maybe, this is the reason for thinking about the paradise for the poor people.

When there is a lot of false information around us, we can only hope for God to provide small light in this darkness. In the case of sacred books, it should be noted that the prophets knew that these books would also be read by their enemies, so the facts are usually in the veil of insight and understanding of the facts that require consideration in many complications.

In fact, understanding the secret of mankind is still the most important challenge for humans, why the creature is so complex, and why some people come to a point where absolute destruction is called. Perhaps this is because learning the right way to think about is the most important duty of wise people who have been missing out on this training. Perhaps this should stay as secret because of unknown reasons.

In fact, we all employ Bayesian inference in decision-making about the real problems in life. When we want to select a solution for a problem among several solutions, then we usually analyze the past data from each solution or we try to predict the outcome of each solution then the solution with higher success probability is selected. Thus, this process is very similar to Bayesian thinking when the decisions are selected based on posterior probabilities updated by

It is seldom possible to take findings and methodologies from a basic conceptual framework and apply them directly to a real problem. Often, the findings of another field must be adapted to the problem. Thus, we need a methodology that can refine the results and data from different fields in an analytical framework such that these data lead to an appropriate solution for the problem at hand. Since every kind of information, including qualitative or quantitative or fuzzy data can be used in Bayesian thinking, so this technique can be used in

When choosing the right choice from several available options, we need to analyze the observations from each of the options. Now, if there is some kind of competition between options, we need to balance them somehow so that each of the options can produce observations close to their actual performance. This is the reason for combining game theory and Bayesian

Also, mathematical methodologies such as game theory and dynamic programming and others should be redefined using Bayesian thinking. For example, in dynamic planning, since decisions are selected stage by stage, after observing information in each stage, we can update the probability functions of random variables and solve the problem for the number of remaining stages that may lead to a change in optimal policy. In the game theory, it can be assumed that after one stage of game and observing the rewards of each decision, it is possible to replay the game, assuming that rewards are random variable such that their probability function is determined based on past data and the optimal decision after each stage may change. The author thinks that three concepts of dynamic programming and game theory and Bayesian inference should be considered in the formulation of each real problem because the interests of decision-makers in each problem are different and the decisions are effective on the future and the next stages and the new data will be available in each new stage thus a new

The main disadvantage of Bayesian thinking is the existence of false information that cripples the mechanism of this method and results in disastrous issues in real terms. It should be noted that our view of life is based on the information we receive and, if there is a deviation in the part of the information received, will lead to wrong decisions. Then, we have to treat the human as mad people or animals where his only sin is to receive false information. Maybe,

When there is a lot of false information around us, we can only hope for God to provide small light in this darkness. In the case of sacred books, it should be noted that the prophets knew that these books would also be read by their enemies, so the facts are usually in the veil of insight and understanding of the facts that require consideration in many complications.

mathematical formulation is needed regarding these issues.

this is the reason for thinking about the paradise for the poor people.

Bayesian formula.

2 New Insights into Bayesian Inference

most real issues.

inference in real cases.

If we cannot instruct the right way of thinking to humans or we teach wrong beliefs because of the economic benefits and so on, then we have unknowingly created monsters that will once again come back to their sources. If scientists could teach the right thinking to all human beings, then there would never be imaginary thoughts and superstition that this illusion would be the root of the misery of the human race. The most dangerous illusion in the present world is to think that our beliefs and words are based on religion and divine law and it is in the interest of God and the law of nature, while even the foundations of our thinking are verged on actual sacrilege.

Even if one can analyze the events of human societies on the basis of Bayesian thinking, these methods cannot be implemented because the assumptions of human societies are based on concepts that cannot be explained by the use of classical science, and this may be the main reason for human disasters in the human history. The happiness of man depends on being able to think right.

But, we should note that if a wrong thought takes a lot of power, any wrong decision will cause great harm, and it is better to try to survive in this situation. If we look carefully at the present situation of the world and the leaders of the great powers, the equilibrium point of power warfare based on historical data will be probably in a difficult situation unless the wise men take over the world's affairs. If the wrong information is given to even a good person by the time gradually, then the soul of the individual will lose the ability to understand the truth at all because he becomes accustomed to the wrong information, and ultimately, it will be like a human with evil spirit. Usually, when a great mistake or sin is committed, most people will fall asleep and false data are generated.

Almost all ethnicities in the world with every religion have proven that their brutality is not limited; therefore, if these brutal thoughts are to be strengthened and combined simultaneously in all parts of the world, it will be a matter of concern. Maybe that is why the talk of the global village can have a reverse effect. When we cannot understand the alarming data in the present world, it may be the time to destroy all the ethical issues of the human race. Without ethical issues, there would be no reason for human interaction.

Thus, it is possible that many wars in the history of human have occurred due to wrong information while this information is generated by the winners of the wars before the wars.

When the human runs into a lot of trouble so that he cannot tolerate them then unconsciously he will think of supernatural power and ask for help. Perhaps understanding this vague relationship is the most important point in human intervention and fate, while God is not responsible for providing a life with prosperity and happiness to man, but he is surely responsible for ensuring a decent life for man.

### 2. Bayesian thinking at the time of Chaos in society

In common usage, "chaos" means "a state of disorder." However, in chaos theory, the term is defined more precisely. Chaos theory concerns deterministic systems whose behavior can in principle be predicted. Chaotic systems are predictable for a while and then appear to become random. Chaos can occur in societies that can effect on the human history and these effects must be analyzed in order to prevent any harm in the future.

simply, which is recognized as an advantage of the current world, but false and malicious

Introductory Chapter: Bayesian Thinking http://dx.doi.org/10.5772/intechopen.75053 5

Perhaps the main reason for the misconduct of human beliefs and thoughts is that the received information is wrong, leading to wrong decisions and behaviors. Repeating these mistakes results in the stress that it will slowly lead to a mental problem, so these people will also be victimized by the complex order prevailing in the world. The major problem is that these people usually do not know about their own deterioration, and the behaviors that are legiti-

Therefore, living in a time of chaos in the community will be accompanied by a lot of troubles and problems, because most people will be away from humanity and there will be plenty of psychological harm to the people of the society, whose effects will remain for the next generations. But any chaos ultimately leads to the order. If we can analyze the data of the chaos effects on human, and analyze the cause of the destruction of the social, religious, and cultural foundations, then the principles of the new order can be effectively designed so that better people can be trained. People who remain loyal to ethical issues in the most difficult conditions. Perhaps this ideal goal cannot be achieved, but it seems to be the only way for human in the near future.

Given the current state of the world, it can be said that the current situation in the world is rapidly jumping to an intensive chaos. In fact, what is worrying is that this disorder is being systematically created by world powers, and they certainly are aware of the disadvantages and effects of this chaos and there is no justifiable reason to create this chaos at this very high level. It seems that key and critical problems are solved based on instantaneous and emotional decisions. Mistaken decisions under butterfly effect may lead to global disruptions, such that no wise

What is mentioned in the foregoing is only my own perceptions of Bayesian thinking that is

[1] https://gainweightjournal.com/bayesian-thinking-if-you-want-to-be-a-critical-thinker-you-

information is also simply exchanged.

person can solve them.

Author details

References

based on limited experiences and studies.

need-to-understand-this-concept/

Address all correspondence to: fallahnezhad@yazd.ac.ir

Department of Industrial Engineering, Yazd University, Yazd, Iran

[2] https://www.statisticalengineering.com/bayes\_thinking.htm

Mohammad Saber Fallah Nezhad

mate for them may be very hideous on the basis of divine law.

The basis of all the sciences and knowledge is the order of the universe that the purpose of the knowledge is the discovery of this order. One of the important methods for discovering how the world order is ruling is to employ empirical method that, based on the data collected, we can identify the pattern in the behavior of phenomena in order to anticipate patterns and future relationships, or even to design and optimize, according to these detected relations.

But we know that there will be a chaos in some conditions, which, based on the concept of chaos theory, it will lead to disorder in order to create a new order. The chaos theory is a complicated and disputed mathematical theory that seeks to explain the effect of seemingly insignificant factors. The chaos theory name originates from the idea that the theory can give an explanation for chaotic or random occurrences. When the data are analyzed at the time of the disorder, no specific conception and result can be observed, since the data are not produced based on a specific pattern and they do not have a specific behavior.

But it can be said that the more the disorder is, the new order will be more important. Also, when a disorder occurs in a community, most people in the community will experience a psychological breakdown in a variety of ways, which can be stressful. Anyway, the greater the degree of disorder, the greater the damage done to these individuals.

Such people usually have very contradictory behaviors and are quite isolated and alone, but it can be said that they have a stronger understanding of their surroundings and, when they cannot adapt the conditions of society to their logical criteria, they experience a kind of paradox and contradiction. Thus these Attitudes with respect to the conditions of society and their stress will lead to some kind of psychological breakdown.

Such people usually have to live in a fantasy world, and this imagination will lead to a departure from the real situation. These people are somehow the victim of this new order, but the data that they produce will be quite close to the features of the new order because their damaged psyche grows and reconstructs in accordance with the new order, and the insinuation of these people takes form in the new order.

Therefore, it can be said that the existential essence of these people actually represents the new order that is being created. So, if the psychosocial vulnerability of these people is analyzed accurately, according to the data produced by these individuals, one can identify problems of the new order.

But the feature of new orders in the present world is their complexity. In fact, understanding the behavior of such people and analysis is not simply possible, because the truth is never told and what is seen in public is more of a show. Perhaps the root cause of the mental problems in the current societies is generation of false beliefs. Because every information is transmitted simply, which is recognized as an advantage of the current world, but false and malicious information is also simply exchanged.

Perhaps the main reason for the misconduct of human beliefs and thoughts is that the received information is wrong, leading to wrong decisions and behaviors. Repeating these mistakes results in the stress that it will slowly lead to a mental problem, so these people will also be victimized by the complex order prevailing in the world. The major problem is that these people usually do not know about their own deterioration, and the behaviors that are legitimate for them may be very hideous on the basis of divine law.

Therefore, living in a time of chaos in the community will be accompanied by a lot of troubles and problems, because most people will be away from humanity and there will be plenty of psychological harm to the people of the society, whose effects will remain for the next generations. But any chaos ultimately leads to the order. If we can analyze the data of the chaos effects on human, and analyze the cause of the destruction of the social, religious, and cultural foundations, then the principles of the new order can be effectively designed so that better people can be trained. People who remain loyal to ethical issues in the most difficult conditions. Perhaps this ideal goal cannot be achieved, but it seems to be the only way for human in the near future.

Given the current state of the world, it can be said that the current situation in the world is rapidly jumping to an intensive chaos. In fact, what is worrying is that this disorder is being systematically created by world powers, and they certainly are aware of the disadvantages and effects of this chaos and there is no justifiable reason to create this chaos at this very high level. It seems that key and critical problems are solved based on instantaneous and emotional decisions. Mistaken decisions under butterfly effect may lead to global disruptions, such that no wise person can solve them.

What is mentioned in the foregoing is only my own perceptions of Bayesian thinking that is based on limited experiences and studies.

### Author details

2. Bayesian thinking at the time of Chaos in society

4 New Insights into Bayesian Inference

must be analyzed in order to prevent any harm in the future.

In common usage, "chaos" means "a state of disorder." However, in chaos theory, the term is defined more precisely. Chaos theory concerns deterministic systems whose behavior can in principle be predicted. Chaotic systems are predictable for a while and then appear to become random. Chaos can occur in societies that can effect on the human history and these effects

The basis of all the sciences and knowledge is the order of the universe that the purpose of the knowledge is the discovery of this order. One of the important methods for discovering how the world order is ruling is to employ empirical method that, based on the data collected, we can identify the pattern in the behavior of phenomena in order to anticipate patterns and future relationships, or even to design and optimize, according to these detected relations.

But we know that there will be a chaos in some conditions, which, based on the concept of chaos theory, it will lead to disorder in order to create a new order. The chaos theory is a complicated and disputed mathematical theory that seeks to explain the effect of seemingly insignificant factors. The chaos theory name originates from the idea that the theory can give an explanation for chaotic or random occurrences. When the data are analyzed at the time of the disorder, no specific conception and result can be observed, since the data are not pro-

But it can be said that the more the disorder is, the new order will be more important. Also, when a disorder occurs in a community, most people in the community will experience a psychological breakdown in a variety of ways, which can be stressful. Anyway, the greater the degree of

Such people usually have very contradictory behaviors and are quite isolated and alone, but it can be said that they have a stronger understanding of their surroundings and, when they cannot adapt the conditions of society to their logical criteria, they experience a kind of paradox and contradiction. Thus these Attitudes with respect to the conditions of society and their stress will lead to some

Such people usually have to live in a fantasy world, and this imagination will lead to a departure from the real situation. These people are somehow the victim of this new order, but the data that they produce will be quite close to the features of the new order because their damaged psyche grows and reconstructs in accordance with the new order, and the insinua-

Therefore, it can be said that the existential essence of these people actually represents the new order that is being created. So, if the psychosocial vulnerability of these people is analyzed accurately, according to the data produced by these individuals, one can identify problems of

But the feature of new orders in the present world is their complexity. In fact, understanding the behavior of such people and analysis is not simply possible, because the truth is never told and what is seen in public is more of a show. Perhaps the root cause of the mental problems in the current societies is generation of false beliefs. Because every information is transmitted

duced based on a specific pattern and they do not have a specific behavior.

disorder, the greater the damage done to these individuals.

kind of psychological breakdown.

the new order.

tion of these people takes form in the new order.

Mohammad Saber Fallah Nezhad

Address all correspondence to: fallahnezhad@yazd.ac.ir

Department of Industrial Engineering, Yazd University, Yazd, Iran

#### References


**Chapter 2**

**Provisional chapter**

**Bayesian Modeling Approaches for Temporal Dynamics**

Analysis of differential expression has been a central role to address the variety of biological questions in the manner to characterize abnormal patterns of cellular and molecular functions for last decades. To date, identification of differentially expressed genes and isoforms has been more intensively focused on temporal dynamics over a series of time points. Bayesian strategies have been successfully employed to uncover the complexity of biological interest with the methodological and analytical perspectives for the various platforms of high-throughput data, for instance, methods in differential expression analysis and network modules in transcriptome data, peak-callers in ChipSeq data, target prediction in microRNA data and meta-methods between different platforms. In this chapter, we will discuss how our methodological works based on Bayesian models address important questions to arise in the architecture of temporal dynamics in RNA-

**Keywords:** hierarchical Dirichlet Bayesian mixture model, Poisson Gamma auto

**Bayesian Modeling Approaches for Temporal Dynamics** 

DOI: 10.5772/intechopen.73062

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution,

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

and reproduction in any medium, provided the original work is properly cited.

The differential expression analysis across external conditions (e.g. drug treatments, or between- or within-cell/tissue types) in stimuli-response data has long been crucial part on clinical applications [1–11]. The primary goal of these studies is to target therapeutic effects on genes and their pathways that are highly associated with the alterations between different conditions, corresponding underlying biological mechanisms, and condition-specific molecular processes from microarray until recent RNA-seq platform [1–3, 5, 6, 9–43]. Such substantial effects on transcriptome data involved by various types of human diseases have

**in RNA-seq Data**

**Abstract**

seq data.

**1. Introduction**

**in RNA-seq Data**

Sunghee Oh and Seongho Song

Sunghee Oh and Seongho Song

http://dx.doi.org/10.5772/intechopen.73062

Additional information is available at the end of the chapter

regressive model, temporal dynamics, RNA-seq

Additional information is available at the end of the chapter

**Provisional chapter**

#### **Bayesian Modeling Approaches for Temporal Dynamics in RNA-seq Data in RNA-seq Data**

**Bayesian Modeling Approaches for Temporal Dynamics** 

DOI: 10.5772/intechopen.73062

Sunghee Oh and Seongho Song

Sunghee Oh and Seongho Song Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.73062

#### **Abstract**

Analysis of differential expression has been a central role to address the variety of biological questions in the manner to characterize abnormal patterns of cellular and molecular functions for last decades. To date, identification of differentially expressed genes and isoforms has been more intensively focused on temporal dynamics over a series of time points. Bayesian strategies have been successfully employed to uncover the complexity of biological interest with the methodological and analytical perspectives for the various platforms of high-throughput data, for instance, methods in differential expression analysis and network modules in transcriptome data, peak-callers in ChipSeq data, target prediction in microRNA data and meta-methods between different platforms. In this chapter, we will discuss how our methodological works based on Bayesian models address important questions to arise in the architecture of temporal dynamics in RNAseq data.

**Keywords:** hierarchical Dirichlet Bayesian mixture model, Poisson Gamma auto regressive model, temporal dynamics, RNA-seq

#### **1. Introduction**

The differential expression analysis across external conditions (e.g. drug treatments, or between- or within-cell/tissue types) in stimuli-response data has long been crucial part on clinical applications [1–11]. The primary goal of these studies is to target therapeutic effects on genes and their pathways that are highly associated with the alterations between different conditions, corresponding underlying biological mechanisms, and condition-specific molecular processes from microarray until recent RNA-seq platform [1–3, 5, 6, 9–43]. Such substantial effects on transcriptome data involved by various types of human diseases have

Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons

significantly addressed fundamental issues characterized by biological phenomena in transcriptional regulation. For instance, examination of classification of subtypes on hereditary breast and ovary cancer [44–46], reciprocal phylogenetic conservation and heterogeneity between closest animal models of aging and depression in brain tissues [47–49], identification of differential expression on enzyme effect in Gaucher's disease across distinct three tissues [14, 50], developmental transient patterns in mouse embryonic stem cells in pre-frontal cortex and limb tissues [13, 51], and et cetera.

developmental stages; multi-series of factorial time course with multiple external conditions at each time point; cell-cyclic periodical data with (or without) external stimuli [7, 8, 22, 23,

Bayesian Modeling Approaches for Temporal Dynamics in RNA-seq Data

http://dx.doi.org/10.5772/intechopen.73062

9

To date, despite the substantial potential and significance to explore temporal dynamics at genes and other genomic features in various human disease progressive models and therapeutic effects, the lack of analytical methodologies in order to precisely characterize temporal dynamics has been an important challenging issue to better understand biological mechanisms relevant with both time specific and responsively altered changes by given external

In this chapter, we propose Bayesian approaches to better infer differential expression in temporal and spatial dynamic regulation that can be widely adopted in the community of biomedical research such as pediatric disease progressive models, age-related neurodegen-

Dynamic gene regulation within a time window in transcriptome data is generally subcategorized into (1) within-subject longitudinally repeatedly measured stimuli-response data in a single series of time course experiment, (2) between-subject factorial multi-series of time course data with different conditions at each time point, and (3) periodical data in cell-cycle or circadian rhythmic patterns with or without external conditions. For the first type of temporal dynamics, we initially proposed Bayesian Poisson Gamma (negative binomial model) strategy to identify temporally differentially expressed genes in the previous study [23]. In this model, each gene is statistically tested whether it is equal or differential expression by auto-regressive (AR) model. The detailed description for notations is given in the following,

**Model I**: supposedly, a gene expression profile across a series of multiple time points, *ygrt* is independently distributed as Poisson Gamma (negative binomial model to account for variability of biological replicates within a group), *g* = 1, 2, …,*G* (gene or other genomic feature),

> <sup>~</sup>*N*(0, *<sup>σ</sup>*<sup>2</sup> \_\_\_\_\_ 1 − *ϕgr*<sup>1</sup> 2 ),

~*N*(*ϕgr*<sup>1</sup> ∗ *wgrt*−1, *σ*<sup>2</sup>

+ *β<sup>g</sup>* ,

), *t* > 2

*t* = 1, 2, …,*T* (time point), and *r* = 1, 2, …,*R* (biological replicate at each time point),

*ygrt*~*POI*(*μgrt*),

log(*μgrt*) = *wgrt*

*wgr*<sup>1</sup>

*wgrt* ∣ *wgr*,1,2,…,*t*−<sup>1</sup>

erative diseases, other types of longitudinal and multi-series of time course data.

29, 30, 60].

stimulus.

**2. Methods**

**2.1. Data types of time course experiments**

The characterization based on Bayesian strategies for whole-genome wide transcriptome data and other types of Seq data has been successfully addressed on the variety of questions to arise in biomedical community [8, 9, 52]. As a naïve Bayesian method, baySeq framework has been proposed in differential expression analysis between different groups with replicates [52]. ShrinkBayes has been developed to identify differential expression analysis at static data on the basis of zero-inflated Poisson Gamma model and Integrated Nested Laplace Approximation to estimate shrunken parameters [53]. And another Bayesian technique in ChipSeq, BayesPeak has been developed to detect significantly enriched regions in transcription factor binding sites and histone modification datasets. It is on the basis of Bayesian hidden Markov model strategy and MCMC simulations with Poisson Gamma distribution to take into account over-dispersion in the abundance of read counts in different regions by comparing to the existing methods of peak callers in ChipSeq data, MACS, PeakSeq, and ChipSeq Peak Finder [54]. Additionally, Bayesian approaches to define differential expression of alternative splicing in RNA-seq have been proposed by MISO (as known as mixture of isoforms) model and MATS (as known as multivariate analysis of transcript splicing) [9, 18, 55]. MISO method is to estimate differential expression of alternatively spliced exons and isoforms based on Bayes factor to quantify the odds of differential regulation of given isoform for the ratio of inclusion and exclusion levels. Similarly, MATS (and rMATS) is implemented to test differential alternative splicing patterns by estimating exon inclusion levels between two samples without (and with replicates), respectively. In addition, microRNAs target prediction methods that play a key regulatory role in gene regulation on the variety of biological processes in human diseases, especially, cancer development, have been proposed by Bayesian methods [56, 57]. And also, Bayesian network methods have been proposed to predict the important functions of long non-coding RNAs as well as coding genes in RNA-seq [58, 59]. Thus, RNA-seq has become the alternative in transcriptome studies with advantageous features over arrays, dynamic range of expression signals, higher reproducibility and quality on samples and upgraded annotation without any known priori [3, 16, 19, 20, 24, 26–28, 32].

With the strength on improvements on technology and continuously declining cost to sequencing, RNA-seq enables to facilitate to perform more dense experimental designs such as time course data with abundant resources of dynamic gene regulation. Furthermore, transcriptome data and other types of meta-framed data across different platforms will be more popularly investigated in years to come. Indeed, next generation sequencing technologies have been steadily improved with higher throughput, longer reads, deeper sequencings, larger samples size of replicates, and less biases on data. Such advances allow investigators to conduct more complex experimental studies, various types of time course experiments, such as single series of longitudinally measured time course and transient dynamic patterns in developmental stages; multi-series of factorial time course with multiple external conditions at each time point; cell-cyclic periodical data with (or without) external stimuli [7, 8, 22, 23, 29, 30, 60].

To date, despite the substantial potential and significance to explore temporal dynamics at genes and other genomic features in various human disease progressive models and therapeutic effects, the lack of analytical methodologies in order to precisely characterize temporal dynamics has been an important challenging issue to better understand biological mechanisms relevant with both time specific and responsively altered changes by given external stimulus.

In this chapter, we propose Bayesian approaches to better infer differential expression in temporal and spatial dynamic regulation that can be widely adopted in the community of biomedical research such as pediatric disease progressive models, age-related neurodegenerative diseases, other types of longitudinal and multi-series of time course data.

#### **2. Methods**

significantly addressed fundamental issues characterized by biological phenomena in transcriptional regulation. For instance, examination of classification of subtypes on hereditary breast and ovary cancer [44–46], reciprocal phylogenetic conservation and heterogeneity between closest animal models of aging and depression in brain tissues [47–49], identification of differential expression on enzyme effect in Gaucher's disease across distinct three tissues [14, 50], developmental transient patterns in mouse embryonic stem cells in pre-frontal cortex

The characterization based on Bayesian strategies for whole-genome wide transcriptome data and other types of Seq data has been successfully addressed on the variety of questions to arise in biomedical community [8, 9, 52]. As a naïve Bayesian method, baySeq framework has been proposed in differential expression analysis between different groups with replicates [52]. ShrinkBayes has been developed to identify differential expression analysis at static data on the basis of zero-inflated Poisson Gamma model and Integrated Nested Laplace Approximation to estimate shrunken parameters [53]. And another Bayesian technique in ChipSeq, BayesPeak has been developed to detect significantly enriched regions in transcription factor binding sites and histone modification datasets. It is on the basis of Bayesian hidden Markov model strategy and MCMC simulations with Poisson Gamma distribution to take into account over-dispersion in the abundance of read counts in different regions by comparing to the existing methods of peak callers in ChipSeq data, MACS, PeakSeq, and ChipSeq Peak Finder [54]. Additionally, Bayesian approaches to define differential expression of alternative splicing in RNA-seq have been proposed by MISO (as known as mixture of isoforms) model and MATS (as known as multivariate analysis of transcript splicing) [9, 18, 55]. MISO method is to estimate differential expression of alternatively spliced exons and isoforms based on Bayes factor to quantify the odds of differential regulation of given isoform for the ratio of inclusion and exclusion levels. Similarly, MATS (and rMATS) is implemented to test differential alternative splicing patterns by estimating exon inclusion levels between two samples without (and with replicates), respectively. In addition, microRNAs target prediction methods that play a key regulatory role in gene regulation on the variety of biological processes in human diseases, especially, cancer development, have been proposed by Bayesian methods [56, 57]. And also, Bayesian network methods have been proposed to predict the important functions of long non-coding RNAs as well as coding genes in RNA-seq [58, 59]. Thus, RNA-seq has become the alternative in transcriptome studies with advantageous features over arrays, dynamic range of expression signals, higher reproducibility and quality on samples and upgraded annotation without any known priori [3, 16, 19, 20, 24, 26–28, 32]. With the strength on improvements on technology and continuously declining cost to sequencing, RNA-seq enables to facilitate to perform more dense experimental designs such as time course data with abundant resources of dynamic gene regulation. Furthermore, transcriptome data and other types of meta-framed data across different platforms will be more popularly investigated in years to come. Indeed, next generation sequencing technologies have been steadily improved with higher throughput, longer reads, deeper sequencings, larger samples size of replicates, and less biases on data. Such advances allow investigators to conduct more complex experimental studies, various types of time course experiments, such as single series of longitudinally measured time course and transient dynamic patterns in

and limb tissues [13, 51], and et cetera.

8 New Insights into Bayesian Inference

#### **2.1. Data types of time course experiments**

Dynamic gene regulation within a time window in transcriptome data is generally subcategorized into (1) within-subject longitudinally repeatedly measured stimuli-response data in a single series of time course experiment, (2) between-subject factorial multi-series of time course data with different conditions at each time point, and (3) periodical data in cell-cycle or circadian rhythmic patterns with or without external conditions. For the first type of temporal dynamics, we initially proposed Bayesian Poisson Gamma (negative binomial model) strategy to identify temporally differentially expressed genes in the previous study [23]. In this model, each gene is statistically tested whether it is equal or differential expression by auto-regressive (AR) model. The detailed description for notations is given in the following,

**Model I**: supposedly, a gene expression profile across a series of multiple time points, *ygrt* is independently distributed as Poisson Gamma (negative binomial model to account for variability of biological replicates within a group), *g* = 1, 2, …,*G* (gene or other genomic feature), *t* = 1, 2, …,*T* (time point), and *r* = 1, 2, …,*R* (biological replicate at each time point),

$$\begin{aligned} \mathcal{Y}\_{\mathcal{Y}^t} & \sim \text{POI}\{\mu\_{\mathcal{Y}^t}\} \\\\ \log\left(\mu\_{\mathcal{Y}^t}\right) &= \left. w\_{\mathcal{Y}^t} + \beta\_{\mathcal{Y}^t} \right. \\\\ w\_{\mathcal{Y}^t} & \sim \text{N}\left(0, \frac{\sigma^2}{1 - \rho\_{\mathcal{Y}^t}^2}\right) \\\\ \left. w\_{\mathcal{Y}^t} \right| & \left. w\_{\mathcal{Y}^t, 1, 2, \dots, t-1} \sim \text{N}\left(\rho\_{\mathcal{Y}^t} \* w\_{\mathcal{Y}^{t-1}}, \sigma^2\right), t \ge 2 \right. \end{aligned}$$

In this proposed model, *β<sup>g</sup>* is assumed to have non-informative priors and time series random effects model for sequentially measured single series of time course data is assumed. To update the defined auto-regressive model and estimate the posterior probabilities of given parameter sets, we employ Markov Chain Monte Carlo simulations, *N* = 10,000 iterations and 8000 burn-ins. To define temporal dynamics whether or not a given gene is temporally differentially expressed, we further examine the most interesting parameter of auto-coefficient representing time series sequential random effects in this proposed model. To classify between equal and differential expression, we implemented to compute Bayesian credible interval estimates. This proposed model is implemented by OpenBUGS (WinBUGS) in R (submitted paper). Basically, it allows us to include major factor of time and variability of replicates at each time point and this simple linear auto-regressive (AR) model is straightforwardly extended to identify temporally differentially expressed other genomic features, for instance, quantified abundance of transcripts. Despite the much better improved quality of samples in RNA-seq data when compared to the beginning of technology, the preprocessing and normalization procedures are still required to precisely infer temporally differentially expressed genes and isoforms and to reduce the misleading results in subsequent analyses. And some samples are discarded in the preprocessing step such as due to sample preparation in experiments and the rest of corresponding samples for the given missing sample should be also deleted as it is the method for longitudinal data with repeated measurements across the series of time points. More importantly, RNA-seq data is highly skewed expression data toward to zero and low expression levels than high expression level. Collectively, in the format of longitudinally measured experimental designs in temporal dynamics of RNA-seq data, as further improved strategy, we are currently developing a zero-inflated Poisson Gamma model with missing observations in longitudinal data to improve the capability of detection of temporal changes in highly skewed count data [61, 62]. The detailed descriptions and notations are given in the following, we consider the conditional distribution of *ygrt* <sup>∣</sup> *Egrt*,

code for different forms of a protein (isoform) as a result of alternative splicing that increases

Bayesian Modeling Approaches for Temporal Dynamics in RNA-seq Data

http://dx.doi.org/10.5772/intechopen.73062

11

In the previous literatures, deep sequencing based transcriptome data predicts that more than ~95% of human genes typically contain multi-exons and transcript-variants to undergo alternative spliced events [8, 9, 25, 28, 33, 35, 63–65, 67]. The aberrant alternative splicing events occurring in post-transcriptional and translational procedures are highly associated with different tissues- or developmental stages- or environmental condition-specific manner. It has been investigated that malformation and dysfunctional mechanisms by the majority of abnormal alternative splicings in human brain diseases. Aberrant patterns of splicings in neurodegenerative Alzheimer's patients and other types of pediatric cancer progression could be significant contributors to targeted therapies on disease progressive models and developmental evolutionary processes in transcriptional activity in temporal dynamics [28, 63–66]. Despite the importance of alternative splicing events in recent technology, characterization of dynamic processes has been merely limited to gene level and static data

In the methodological point of view, for quantification and identification of isoforms, a couple of bioinformatics tools including IQSeq, rSeq, MapSplice have been recently developed [15, 17, 31], and identification of differentially expressed isoforms at static data types have been pursued by MATS (focused on a specified experimental design on a sample versus another single sample comparison at a fixed time point) [9], DEXSeq (flexibly to allow various types of experimental and biological conditions in generalized linear model from the basis of multiple comparisons at exon levels) [5], and cufflinks and cuffdiff (as known as on the of most popular versatile tools for quantification and identification of differential expression at isoform levels,

To our best knowledge, none of current static and dynamic methods can identify temporally differentially expression at alternative splicing by explicitly accounting for data-driven nature

For the first type of longitudinal time course experiments, quantified expression levels at isoforms and other types of genomic features can be directly applied for our proposed dynamic

For the second type of between-subject factorial multi-series of time course data, another our previous study [8] proposed a hierarchical Bayesian modeling approach to define differential expression analysis at isoforms when having multiple conditions at each time point, such as different tissues, drug treatments, stress, and trauma in temporal dynamics (see **Figure 1**).

**Model II**: the detailed description for notations in the proposed model is given in the following, for the sake of consistency in notations of the proposed models, we pursue Bayesian hierarchical Dirichlet Bayesian mixture model of temporal dynamics in multi-series of time course data by making use of the identical notations on the transcriptomic expression levels as shown in the Model I. Suppose that a gene (or other genomic feature) expression profile across a series of multiple time points is tested and each time point contains different external conditions, such as different types of tissues, cell lines, drug treatments, trauma, stress, and et

but restricted to simple pairwise comparison with replicates) [10, 11, 30, 68].

cetera, where we denote a condition factor, *c* = 1, 2, …,*C* at each time point.

the complexity of mammalian transcriptome.

of various time course experimental settings.

approaches.

AR model.

$$\mathcal{Y}\_{\mathcal{g}^{rt}} \mid E\_{\mathcal{g}^{rt}} \sim \text{POL} \{ \mu\_{\mathcal{g}^{rt}} \} \text{ if } E\_{\mathcal{g}^{rt}} = 1$$

and

$$\mathbf{P}\left(\left.y\_{grt}=\mathbf{0}\right|E\_{grt}=\mathbf{0}\right)=\mathbf{1}\text{ if }E\_{grt}=\mathbf{0}$$

where *Egrt* is the binary indicator of whether a gene expression profile across a series of multiple time points is present for a gene *g*, time *t* and replication *r*. We also assume that conditioned on *pgrt*, the *Egrt*'s are independent Bernoulli random variables with *P*(*Egrt* <sup>=</sup> <sup>1</sup>) <sup>=</sup> *pgrt* for *<sup>g</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, …,*G*, *<sup>t</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, …,*T*, and *<sup>r</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, …,*R*. Here, given *Egrt* <sup>=</sup> <sup>1</sup>, assume that the *ygrt*'s are conditionally independent.

Compared to arrays, the major strength of RNA-seq transcriptome data enables to quantify and identify spliced isoforms as well as individual exon-level expression which had not been previously done due to low resolution [8–11, 13, 14, 25, 28, 30, 33, 35, 60, 63–68]. In addition to gene level analyses, it is well established that alternative splicing is a prevalent mechanism on the variety of organisms. It involves multiple selective schemes of splice sites to construct diverse functional pathways and protein structures in gene regulation. A single mRNA may code for different forms of a protein (isoform) as a result of alternative splicing that increases the complexity of mammalian transcriptome.

In this proposed model, *β<sup>g</sup>*

10 New Insights into Bayesian Inference

and

ally independent.

is assumed to have non-informative priors and time series ran-

dom effects model for sequentially measured single series of time course data is assumed. To update the defined auto-regressive model and estimate the posterior probabilities of given parameter sets, we employ Markov Chain Monte Carlo simulations, *N* = 10,000 iterations and 8000 burn-ins. To define temporal dynamics whether or not a given gene is temporally differentially expressed, we further examine the most interesting parameter of auto-coefficient representing time series sequential random effects in this proposed model. To classify between equal and differential expression, we implemented to compute Bayesian credible interval estimates. This proposed model is implemented by OpenBUGS (WinBUGS) in R (submitted paper). Basically, it allows us to include major factor of time and variability of replicates at each time point and this simple linear auto-regressive (AR) model is straightforwardly extended to identify temporally differentially expressed other genomic features, for instance, quantified abundance of transcripts. Despite the much better improved quality of samples in RNA-seq data when compared to the beginning of technology, the preprocessing and normalization procedures are still required to precisely infer temporally differentially expressed genes and isoforms and to reduce the misleading results in subsequent analyses. And some samples are discarded in the preprocessing step such as due to sample preparation in experiments and the rest of corresponding samples for the given missing sample should be also deleted as it is the method for longitudinal data with repeated measurements across the series of time points. More importantly, RNA-seq data is highly skewed expression data toward to zero and low expression levels than high expression level. Collectively, in the format of longitudinally measured experimental designs in temporal dynamics of RNA-seq data, as further improved strategy, we are currently developing a zero-inflated Poisson Gamma model with missing observations in longitudinal data to improve the capability of detection of temporal changes in highly skewed count data [61, 62]. The detailed descriptions and notations are

given in the following, we consider the conditional distribution of *ygrt* <sup>∣</sup> *Egrt*,

<sup>|</sup>*Egrt* <sup>=</sup> <sup>0</sup>) <sup>=</sup> <sup>1</sup> if *Egrt* <sup>=</sup> <sup>0</sup>

where *Egrt* is the binary indicator of whether a gene expression profile across a series of multiple time points is present for a gene *g*, time *t* and replication *r*. We also assume that conditioned on *pgrt*, the *Egrt*'s are independent Bernoulli random variables with *P*(*Egrt* <sup>=</sup> <sup>1</sup>) <sup>=</sup> *pgrt* for *<sup>g</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, …,*G*, *<sup>t</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, …,*T*, and *<sup>r</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, …,*R*. Here, given *Egrt* <sup>=</sup> <sup>1</sup>, assume that the *ygrt*'s are condition-

Compared to arrays, the major strength of RNA-seq transcriptome data enables to quantify and identify spliced isoforms as well as individual exon-level expression which had not been previously done due to low resolution [8–11, 13, 14, 25, 28, 30, 33, 35, 60, 63–68]. In addition to gene level analyses, it is well established that alternative splicing is a prevalent mechanism on the variety of organisms. It involves multiple selective schemes of splice sites to construct diverse functional pathways and protein structures in gene regulation. A single mRNA may

*ygrt* ∣ *Egrt* ∼ *POI*(*μgrt*), if *Egrt* = 1

P( *ygrt* = 0

In the previous literatures, deep sequencing based transcriptome data predicts that more than ~95% of human genes typically contain multi-exons and transcript-variants to undergo alternative spliced events [8, 9, 25, 28, 33, 35, 63–65, 67]. The aberrant alternative splicing events occurring in post-transcriptional and translational procedures are highly associated with different tissues- or developmental stages- or environmental condition-specific manner. It has been investigated that malformation and dysfunctional mechanisms by the majority of abnormal alternative splicings in human brain diseases. Aberrant patterns of splicings in neurodegenerative Alzheimer's patients and other types of pediatric cancer progression could be significant contributors to targeted therapies on disease progressive models and developmental evolutionary processes in transcriptional activity in temporal dynamics [28, 63–66]. Despite the importance of alternative splicing events in recent technology, characterization of dynamic processes has been merely limited to gene level and static data approaches.

In the methodological point of view, for quantification and identification of isoforms, a couple of bioinformatics tools including IQSeq, rSeq, MapSplice have been recently developed [15, 17, 31], and identification of differentially expressed isoforms at static data types have been pursued by MATS (focused on a specified experimental design on a sample versus another single sample comparison at a fixed time point) [9], DEXSeq (flexibly to allow various types of experimental and biological conditions in generalized linear model from the basis of multiple comparisons at exon levels) [5], and cufflinks and cuffdiff (as known as on the of most popular versatile tools for quantification and identification of differential expression at isoform levels, but restricted to simple pairwise comparison with replicates) [10, 11, 30, 68].

To our best knowledge, none of current static and dynamic methods can identify temporally differentially expression at alternative splicing by explicitly accounting for data-driven nature of various time course experimental settings.

For the first type of longitudinal time course experiments, quantified expression levels at isoforms and other types of genomic features can be directly applied for our proposed dynamic AR model.

For the second type of between-subject factorial multi-series of time course data, another our previous study [8] proposed a hierarchical Bayesian modeling approach to define differential expression analysis at isoforms when having multiple conditions at each time point, such as different tissues, drug treatments, stress, and trauma in temporal dynamics (see **Figure 1**).

**Model II**: the detailed description for notations in the proposed model is given in the following, for the sake of consistency in notations of the proposed models, we pursue Bayesian hierarchical Dirichlet Bayesian mixture model of temporal dynamics in multi-series of time course data by making use of the identical notations on the transcriptomic expression levels as shown in the Model I. Suppose that a gene (or other genomic feature) expression profile across a series of multiple time points is tested and each time point contains different external conditions, such as different types of tissues, cell lines, drug treatments, trauma, stress, and et cetera, where we denote a condition factor, *c* = 1, 2, …,*C* at each time point.

*ng* <sup>=</sup> <sup>|</sup>{*g*; *<sup>l</sup>*

for equal expression (EEX) and *G*<sup>1</sup>

rior probability is given by, P(*γ<sup>g</sup>* <sup>=</sup> *<sup>d</sup>* <sup>∣</sup> *<sup>l</sup>*

**3. Closing remarks in future directions**

of stimuli-response data in this chapter.

genomic levels such as transcripts and exon levels.

parameter *α*. And, *F*<sup>0</sup>

expressed gene.

data.

*<sup>g</sup>* = *l*

∗ ~ \_\_1 <sup>2</sup> *<sup>N</sup>*(−*δ*, *<sup>σ</sup>*<sup>2</sup>

*F*(*β*)~*DP*(*F*0(*F*(*β*)); *η*0),

*wgrtc* ∣ (*rg* = *d*, *l*

}<sup>|</sup> denote the size of the *<sup>l</sup>*th cluster,

)~*DP*(*Gd* ∗ ; *M*),

represents the mixture format of *β* with p(*β*) = *NG*−1(0, *<sup>ψ</sup>*0) such that *G*<sup>0</sup>

∗ ~*N*(0, *σ*<sup>2</sup> )

> , *ψ*<sup>0</sup> ,

13

) for differential expression (DEX) and *η*<sup>0</sup>

*<sup>g</sup>* <sup>=</sup> *<sup>l</sup>*, *yg*111, …,*yg*,*r*=*R*,*t*=*T*,*c*=*<sup>C</sup>*) for the temporally differentially

Bayesian Modeling Approaches for Temporal Dynamics in RNA-seq Data

http://dx.doi.org/10.5772/intechopen.73062

*<sup>g</sup>* = *l*

where *DP*(H;*α*) stands for the Dirichlet process having its baseline distribution H(∙) and mass

and *M* are fixed hyper-parameters based on prior information [6, 69]. Therefore, the poste-

Based on our proposed Bayesian approach for multi-series of factorial time course data, we are currently implementing OpenBUGS(WinBUGS) in R to perform differential expression analysis. In order to validate our proposed model, we need to compare to other maSigPro for RNA-seq data and Gaussian Process modeling approach in terms of temporally differentially expressed genes in the multiple datasets after transformation of stabilizing variance on counts

In earlier sections, we have discussed Bayesian techniques to address different types of experimental (clinical) settings in temporal dynamics by focusing on Poisson Gamma autoregressive model for longitudinally measured single-series of time course data and hierarchical Dirichlet Bayesian mixture model framework for multi-series of factorial time course data, respectively. Thus, as the continuous efforts to modeling approaches, we propose differential expression analytical frameworks that precisely characterize temporal dynamics for each type

The novel features in the proposed hierarchical Dirichlet Bayesian mixture model enables to identify significant temporal changes of expression levels between at least two external conditional factors over a series of time points based on Bayesian strategy grouping clusters by the patterns of differential or equal expression [6, 69]. The identified temporal changes are determined as the putative biomarkers that could be relevantly linked with various dynamic genetic mechanisms of molecular and physiological processes. Additionally, our proposed method enables to allow more than two genetic and environmental factors within a time point and to address how the intra-factor of multiple conditions and time factor affect altered expression patterns as significant contributors independently and interactively. Furthermore, this proposed model is straightforwardly extended to detect temporal changes at other

) <sup>+</sup> \_\_1 <sup>2</sup> *<sup>N</sup>*(*δ*, *<sup>σ</sup>*<sup>2</sup>

**Figure 1.** It depicts the latent variable to be estimated in hierarchical Dirichlet Bayesian mixture model.

For a particular time point at *T* = *t*, and condition at *C* = *c*, the given expression level is given by *ygrtc*, *<sup>g</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, …,*<sup>G</sup>* (gene), *<sup>r</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, …,*R* (replicate at each time point and a condition), *<sup>t</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, …,*<sup>T</sup>* (time point), and *<sup>c</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, …,*C* (external condition at each time point). Let *γ<sup>g</sup>* be a latent variable to be estimated in a hierarchical mixture model whether equal expression (EEX) or temporally differentially expression (TDEX) is at a given gene of interest. At a given time point *T* = *t*, *τg* denotes the *g*th gene effect, *β<sup>c</sup>* denotes the effect of the *c*th condition. The latent variable *γg* represents an indicator variable such that the given *g*th gene is temporally differentially expressed or not, that is, *γ<sup>g</sup>* <sup>=</sup> { <sup>1</sup>, when TDEX 0, when EEX .

$$\begin{aligned} \mathcal{Y}\_{grtc} \mid \mathcal{Y}\_{\mathcal{S}} &= d \sim \text{POI}\{\mu\_{grtc}\} \\\\ \log\left(\mu\_{grtc}\right) &= \tau\_{\mathcal{S}} + \beta\_c + F(\beta) + w\_{grtc} \\\\ \beta\_c \mid l\_{\mathcal{S}} &= l \sim F(\beta\_l) \end{aligned}$$

*l* represents distinct patterns between conditions across time points,

 Namely, *l <sup>g</sup>* = *l* iff *g*th gene belongs to cluster *l*,

Let *L* denote the number of clusters on expression data,

 *ng* <sup>=</sup> <sup>|</sup>{*g*; *<sup>l</sup> <sup>g</sup>* = *l* }<sup>|</sup> denote the size of the *<sup>l</sup>*th cluster,

$$F(\beta) \sim DP\left(F\_o(F(\beta)); \eta\_o\right),$$

$$w\_{g\_{\mathcal{I}^{\text{tot}}}} \mid \ (r\_{\mathcal{I}} = d, l\_{\mathcal{I}} = l) \sim DP(G\_{d'}^\* M)\_{\mathcal{I}^{\text{tot}}}$$

where *DP*(H;*α*) stands for the Dirichlet process having its baseline distribution H(∙) and mass parameter *α*. And, *F*<sup>0</sup> represents the mixture format of *β* with p(*β*) = *NG*−1(0, *<sup>ψ</sup>*0) such that *G*<sup>0</sup> ∗ ~*N*(0, *σ*<sup>2</sup> ) for equal expression (EEX) and *G*<sup>1</sup> ∗ ~ \_\_1 <sup>2</sup> *<sup>N</sup>*(−*δ*, *<sup>σ</sup>*<sup>2</sup> ) <sup>+</sup> \_\_1 <sup>2</sup> *<sup>N</sup>*(*δ*, *<sup>σ</sup>*<sup>2</sup> ) for differential expression (DEX) and *η*<sup>0</sup> , *ψ*<sup>0</sup> , and *M* are fixed hyper-parameters based on prior information [6, 69]. Therefore, the posterior probability is given by, P(*γ<sup>g</sup>* <sup>=</sup> *<sup>d</sup>* <sup>∣</sup> *<sup>l</sup> <sup>g</sup>* <sup>=</sup> *<sup>l</sup>*, *yg*111, …,*yg*,*r*=*R*,*t*=*T*,*c*=*<sup>C</sup>*) for the temporally differentially expressed gene.

Based on our proposed Bayesian approach for multi-series of factorial time course data, we are currently implementing OpenBUGS(WinBUGS) in R to perform differential expression analysis. In order to validate our proposed model, we need to compare to other maSigPro for RNA-seq data and Gaussian Process modeling approach in terms of temporally differentially expressed genes in the multiple datasets after transformation of stabilizing variance on counts data.

#### **3. Closing remarks in future directions**

For a particular time point at *T* = *t*, and condition at *C* = *c*, the given expression level is given by *ygrtc*, *<sup>g</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, …,*<sup>G</sup>* (gene), *<sup>r</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, …,*R* (replicate at each time point and a condition), *<sup>t</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, …,*<sup>T</sup>*

to be estimated in a hierarchical mixture model whether equal expression (EEX) or temporally differentially expression (TDEX) is at a given gene of interest. At a given time point *T* = *t*,

represents an indicator variable such that the given *g*th gene is temporally differentially

*<sup>g</sup>* = *l*~*F*(*β<sup>l</sup>*

),

*<sup>g</sup>* = *l* iff *g*th gene belongs to cluster *l*,

denotes the effect of the *c*th condition. The latent variable

be a latent variable

(time point), and *<sup>c</sup>* <sup>=</sup> <sup>1</sup>, <sup>2</sup>, …,*C* (external condition at each time point). Let *γ<sup>g</sup>*

**Figure 1.** It depicts the latent variable to be estimated in hierarchical Dirichlet Bayesian mixture model.

<sup>1</sup>, when TDEX 0, when EEX .

*l* represents distinct patterns between conditions across time points,

Let *L* denote the number of clusters on expression data,

*ygrtc* ∣ *γ<sup>g</sup>* = *d*~*POI*(*μgrtc*),

log(*μgrtc*) = *τ<sup>g</sup>* + *β<sup>c</sup>* + *F*(*β*) + *wgrtc*,

*τg*

*γg*

denotes the *g*th gene effect, *β<sup>c</sup>*

*β<sup>c</sup>* ∣ *l*

Namely, *l*

expressed or not, that is, *γ<sup>g</sup>* <sup>=</sup> {

12 New Insights into Bayesian Inference

In earlier sections, we have discussed Bayesian techniques to address different types of experimental (clinical) settings in temporal dynamics by focusing on Poisson Gamma autoregressive model for longitudinally measured single-series of time course data and hierarchical Dirichlet Bayesian mixture model framework for multi-series of factorial time course data, respectively. Thus, as the continuous efforts to modeling approaches, we propose differential expression analytical frameworks that precisely characterize temporal dynamics for each type of stimuli-response data in this chapter.

The novel features in the proposed hierarchical Dirichlet Bayesian mixture model enables to identify significant temporal changes of expression levels between at least two external conditional factors over a series of time points based on Bayesian strategy grouping clusters by the patterns of differential or equal expression [6, 69]. The identified temporal changes are determined as the putative biomarkers that could be relevantly linked with various dynamic genetic mechanisms of molecular and physiological processes. Additionally, our proposed method enables to allow more than two genetic and environmental factors within a time point and to address how the intra-factor of multiple conditions and time factor affect altered expression patterns as significant contributors independently and interactively. Furthermore, this proposed model is straightforwardly extended to detect temporal changes at other genomic levels such as transcripts and exon levels.

As the extension of this study, we are currently developing how to measure the relationship of a parent gene-to-multiple child isoforms in temporal dynamic patterns. For the task of this procedure, we carry out directional comparison, gene-to-isoform in differential expression based on similarity and discrepancy on magnitude and pattern of expression. This proposed model enables to define connectivity visualization of splicing maps on the variety of structural formations by switchable exon usage. Moreover, we are currently developing differential expression method for cell-cyclic periodical data with or without external conditions in stimuli-response data [70]. Thus, this proposed study is timely crucial to define temporal dynamics at alternative splicing diversity related with disease progression by discovering which splicing events are condition and time specifically observed and how eventually their spliced abnormal patterns and splicing maps are associated with biological functions. And it is essential to develop strategies to correct aberrant splicing as well as gene approaches in temporal and spatial dynamics on the variety of disease progression and evolutionary comparative studies between human diseases and other closely related species.

**Author details**

\* and Seongho Song<sup>2</sup>

\*Address all correspondence to: sshshshoh1105@gmail.com

Biology. 2010;**11**:R106. DOI: 10.1186/gb-2010-11-10-r106

2010;**11**:94. DOI: 10.1186/1471-2105-11-94

10.1093/bioinformatics/btp616

6753-7\_12

gkr1291

1 Department of Computer Science and Statistics, Jeju National University, Jeju City,

Bayesian Modeling Approaches for Temporal Dynamics in RNA-seq Data

http://dx.doi.org/10.5772/intechopen.73062

15

2 Department of Mathematical Sciences, University of Cincinnati, Cincinnati, USA

[1] Anders S, Huber W. Differential expression analysis for sequence count data. Genome

[2] Bullard JH, Purdom E, Hansen KD, Dudoit S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics.

[3] Ritchie ME et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research. 2015;**43**:e47. DOI: 10.1093/nar/gkv007

[4] Robinson MD, DJ MC, Smyth GK. edgeR: A bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;**26**:139-140. DOI:

[5] Anders S, Reyes A, Huber W. Detecting differential usage of exons from RNA-seq data.

[6] Guindani M, Sepulveda N, Paulino CD, Muller P. A Bayesian semi-parametric approach for the differential analysis of sequence counts data. Journal of the Royal Statistical

[7] Heinonen M et al. Detecting time periods of differential gene expression using Gaussian processes: An application to endothelial cells exposed to radiotherapy dose fraction.

[8] Oh S, Song S. Differential gene expression (DEX) and alternative splicing events (ASE) for temporal dynamic processes using HMMs and hierarchical Bayesian modeling approaches. Methods in Molecular Biology. 2017;**1552**:165-176. DOI: 10.1007/978-1-4939-

[9] Shen S et al. MATS: A Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data. Nucleic Acids Research. 2012;**40**:e61. DOI: 10.1093/nar/

[10] Trapnell C et al. Differential analysis of gene regulation at transcript resolution with

RNA-seq. Nature Biotechnology. 2013;**31**:46-53. DOI: 10.1038/nbt.2450

Society. Series C, Applied Statistics. 2014;**63**:385-404. DOI: 10.1111/rssc.12041

Genome Research. 2012;**22**:2008-2017. DOI: 10.1101/gr.133744.111

Bioinformatics. 2015;**31**:728-735. DOI: 10.1093/bioinformatics/btu699

Sunghee Oh<sup>1</sup>

South Korea

**References**

#### **Conflict of interest**

The authors have no conflicts of interest to disclose.

#### **Contributors' statements**

SO wrote manuscript and SO and SS conceived this study.

For discloser of any prior publications or submission with any overlapping information including studies and patients, there are no prior publications or submissions with any overlapping information including studies and patients.

The manuscript has not been and will not be submitted to any other journal while it is under consideration by this book chapter in Bayesian inference.

All authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.

#### **Funding source**

This study is supported by an internal grant from Jeju National University to Dr. Sunghee Oh.

#### **Financial disclosure**

The authors have no financial relationships relevant to this article to disclose.

### **Author details**

As the extension of this study, we are currently developing how to measure the relationship of a parent gene-to-multiple child isoforms in temporal dynamic patterns. For the task of this procedure, we carry out directional comparison, gene-to-isoform in differential expression based on similarity and discrepancy on magnitude and pattern of expression. This proposed model enables to define connectivity visualization of splicing maps on the variety of structural formations by switchable exon usage. Moreover, we are currently developing differential expression method for cell-cyclic periodical data with or without external conditions in stimuli-response data [70]. Thus, this proposed study is timely crucial to define temporal dynamics at alternative splicing diversity related with disease progression by discovering which splicing events are condition and time specifically observed and how eventually their spliced abnormal patterns and splicing maps are associated with biological functions. And it is essential to develop strategies to correct aberrant splicing as well as gene approaches in temporal and spatial dynamics on the variety of disease progression and evolutionary comparative studies between human diseases and other closely

For discloser of any prior publications or submission with any overlapping information including studies and patients, there are no prior publications or submissions with any over-

The manuscript has not been and will not be submitted to any other journal while it is under

All authors approved the final manuscript as submitted and agree to be accountable for all

This study is supported by an internal grant from Jeju National University to Dr. Sunghee Oh.

The authors have no financial relationships relevant to this article to disclose.

related species.

**Conflict of interest**

14 New Insights into Bayesian Inference

aspects of the work.

**Funding source**

**Financial disclosure**

**Contributors' statements**

The authors have no conflicts of interest to disclose.

SO wrote manuscript and SO and SS conceived this study.

lapping information including studies and patients.

consideration by this book chapter in Bayesian inference.

Sunghee Oh<sup>1</sup> \* and Seongho Song<sup>2</sup>

\*Address all correspondence to: sshshshoh1105@gmail.com

1 Department of Computer Science and Statistics, Jeju National University, Jeju City, South Korea

2 Department of Mathematical Sciences, University of Cincinnati, Cincinnati, USA

#### **References**


[11] Trapnell C et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols. 2012;**7**:562-578. DOI: 10.1038/nprot. 2012.016

[25] Richard H et al. Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments. Nucleic Acids Research. 2010;**38**:e112. DOI: 10.1093/nar/gkq041 [26] Risso D, Ngai J, Speed TP, Dudoit S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotechnology. 2014;**32**:896-902. DOI: 10.1038/

Bayesian Modeling Approaches for Temporal Dynamics in RNA-seq Data

http://dx.doi.org/10.5772/intechopen.73062

17

[27] Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L. Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biology. 2011;**12**:R22. DOI: 10.1186/

[28] Sultan M et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008;**321**:956-960. DOI: 10.1126/science.1160342

[29] Sun X et al. Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model. BMC Bioinformatics. 2016;**17**:324. DOI: 10.1186/s12859-016-1180-9 [30] Trapnell C et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology.

[31] Wang K et al. MapSplice: Accurate mapping of RNA-seq reads for splice junction discov-

[32] Young MD, Wakefield MJ, Smyth GK, Oshlack A. Gene ontology analysis for RNAseq: Accounting for selection bias. Genome Biology. 2010;**11**:R14. DOI: 10.1186/gb-2010-

[33] Zhao K, Lu ZX, Park JW, Zhou Q, Xing Y. GLiMMPS: Robust statistical model for regulatory variation of alternative splicing using RNA-seq data. Genome Biology. 2013;**14**:R74.

[34] Zhou YH, Xia K, Wright FA. A powerful and flexible approach to the analysis of RNA sequence count data. Bioinformatics. 2011;**27**:2672-2678. DOI: 10.1093/bioinformatics/

[35] Ezkurdia I et al. Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and function. Molecular Biology and Evolution.

[36] Margolin AA et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006;**7**(Suppl 1):S7. DOI:

[37] Ronen M, Rosenberg R, Shraiman BI, Alon U. Assigning numbers to the arrows: Parameterizing a gene regulation network by using accurate expression kinetics. Proceedings of the National Academy of Sciences of the United States of America. 2002;**99**:10555-10560.

[38] Xia J, Gill EE, Hancock RE. NetworkAnalyst for statistical, visual and network-based meta-analysis of gene expression data. Nature Protocols. 2015;**10**:823-844. DOI: 10.1038/

ery. Nucleic Acids Research. 2010;**38**:e178. DOI: 10.1093/nar/gkq622

nbt.2931

11-2-r14

btr449

gb-2011-12-3-r22

2010;**28**:511-515. DOI: 10.1038/nbt.1621

DOI: 10.1186/gb-2013-14-7-r74

10.1186/1471-2105-7-S1-S7

DOI: 10.1073/pnas.152046799

nprot.2015.052

2012;**29**:2265-2283. DOI: 10.1093/molbev/mss100


[25] Richard H et al. Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments. Nucleic Acids Research. 2010;**38**:e112. DOI: 10.1093/nar/gkq041

[11] Trapnell C et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols. 2012;**7**:562-578. DOI: 10.1038/nprot.

[12] Anders S, Pyl PT, Huber W. HTSeq—A Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;**31**:166-169. DOI: 10.1093/bioinformatics/

[13] Ayoub AE et al. Transcriptional programs in transient embryonic zones of the cerebral cortex defined by high-resolution mRNA sequencing. Proceedings of the National Academy of Sciences of the United States of America. 2011;**108**:14950-14955. DOI: 10.

[14] Dasgupta N et al. Gaucher disease: Transcriptome analyses using microarray or mRNA sequencing in a Gba1 mutant mouse model treated with velaglucerase alfa or imiglu-

[15] Du J et al. IQSeq: Integrated isoform quantification analysis based on next-generation

[16] Hansen KD, Wu Z, Irizarry RA, Leek JT. Sequencing technology does not eliminate biological variability. Nature Biotechnology. 2011;**29**:572-573. DOI: 10.1038/nbt.1910

[17] Jiang H, Wong WH. Statistical inferences for isoform expression in RNA-Seq.

[18] Katz Y, Wang ET, Airoldi EM, Burge CB. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nature Methods. 2010;**7**:1009-1015. DOI:

[19] Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Research.

[20] Metzker ML. Sequencing technologies—The next generation. Nature Reviews. Genetics.

[21] Mills JD et al. RNA-Seq analysis of the parietal cortex in Alzheimer's disease reveals alternatively spliced isoforms related to lipid metabolism. Neuroscience Letters. 2013;**536**:90-

[22] Nueda MJ, Tarazona S, Conesa A. Next maSigPro: Updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics. 2014;**30**:2598-2602. DOI: 10.1093/

[23] Oh S, Song S, Grabowski G, Zhao H, Noonan JP. Time series expression analyses using RNA-seq: A statistical approach. BioMed Research International. 2013;**2013**:203681. DOI:

[24] Reis-Filho JS. Next-generation sequencing. Breast Cancer Research. 2009;**11**(Suppl 3):

cerase. PLoS One. 2013;**8**:e74912. DOI: 10.1371/journal.pone.0074912

sequencing. PLoS One. 2012;**7**:e29175. DOI: 10.1371/journal.pone.0029175

Bioinformatics. 2009;**25**:1026-1032. DOI: 10.1093/bioinformatics/btp113

2012.016

16 New Insights into Bayesian Inference

btu638

1073/pnas.1112213108

10.1038/nmeth.1528

bioinformatics/btu333

10.1155/2013/203681

S12. DOI: 10.1186/bcr2431

2008;**18**:1509-1517. DOI: 10.1101/gr.079558.108

2010;**11**:31-46. DOI: 10.1038/nrg2626

95. DOI: 10.1016/j.neulet.2012.12.042


[39] Zoppoli P, Morganella S, Ceccarelli M. TimeDelay-ARACNE: Reverse engineering of gene networks from time-course data by an information theoretic approach. BMC Bioinformatics. 2010;**11**:154. DOI: 10.1186/1471-2105-11-154

[53] van de Wiel MA, Neerincx M, Buffart TE, Sie D, Verheul HM. ShrinkBayes: A versatile R-package for analysis of count-based sequencing data in complex study designs. BMC

Bayesian Modeling Approaches for Temporal Dynamics in RNA-seq Data

http://dx.doi.org/10.5772/intechopen.73062

19

[54] Spyrou C, Stark R, Lynch AG, Tavare S. BayesPeak: Bayesian analysis of ChIP-seq data.

[55] Shen S et al. rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proceedings of the National Academy of Sciences of the

[56] Liu H et al. A Bayesian approach for identifying miRNA targets by combining sequence prediction and gene expression profiling. BMC Genomics. 2010;**11**(Suppl 3):S12. DOI:

[57] Wang Z, Xu W, Zhu H, Liu Y. A bayesian framework to improve microRNA target prediction by incorporating external information. Cancer Informatics. 2014;**13**:19-25. DOI:

[58] Xiao Y et al. Predicting the functions of long noncoding RNAs using RNA-seq based on Bayesian network. BioMed Research International. 2015;**2015**:839590. DOI: 10.1155/2015/

[59] van Dam S, Vosa U, van der Graaf A, Franke L, de Magalhaes JP. Gene co-expression analysis for functional classification and gene-disease predictions. Briefings in Bioinfor-

[60] Oh S, Song S, Dasgupta N, Grabowski G. The analytical landscape of static and temporal dynamics in transcriptome data. Frontiers in Genetics. 2014;**5**:35. DOI: 10.3389/

[61] Neelon BH, O'Malley AJ, Normand SL. A Bayesian model for repeated measures zeroinflated count data with application to outpatient psychiatric service use. Statistical

[62] Wang X, Chen MH, Kuo RC, Dey DK. Bayesian spatial-temporal modeling of ecological zero-inflated count data. Statistica Sinica. 2015;**25**:189-204. DOI: 10.5705/ss.2013.212w

[63] Garcia-Blanco MA, Baraniak AP, Lasda EL. Alternative splicing in disease and therapy.

[64] Mills JD, Janitz M. Alternative splicing of mRNA in the molecular pathology of neurodegenerative diseases. Neurobiology of Aging. 2012;**33**(1012):e1011-e1024. DOI: 10.1016/j.

[65] Sanford JR, Ellis JD, Cazalla D, Caceres JF. Reversible phosphorylation differentially affects nuclear and cytoplasmic functions of splicing factor 2/alternative splicing factor. Proceedings of the National Academy of Sciences of the United States of America.

Modelling. 2010;**10**:421-439. DOI: 10.1177/1471082X0901000404

Nature Biotechnology. 2004;**22**:535-546. DOI: 10.1038/nbt964

2005;**102**:15042-15047. DOI: 10.1073/pnas.0507827102

United States of America. 2014;**111**:E5593-E5601. DOI: 10.1073/pnas.1419161111

Bioinformatics. 2014;**15**:116. DOI: 10.1186/1471-2105-15-116

10.1186/1471-2164-11-S3-S12

matics. 2017. DOI: 10.1093/bib/bbw139

10.4137/CIN.S16348

fgene.2014.00035

neurobiolaging.2011.10.030

839590

BMC Bioinformatics. 2009;**10**:299. DOI: 10.1186/1471-2105-10-299


[53] van de Wiel MA, Neerincx M, Buffart TE, Sie D, Verheul HM. ShrinkBayes: A versatile R-package for analysis of count-based sequencing data in complex study designs. BMC Bioinformatics. 2014;**15**:116. DOI: 10.1186/1471-2105-15-116

[39] Zoppoli P, Morganella S, Ceccarelli M. TimeDelay-ARACNE: Reverse engineering of gene networks from time-course data by an information theoretic approach. BMC

[40] Chen R et al. Personal omics profiling reveals dynamic molecular and medical pheno-

[41] Loven J et al. Revisiting global gene expression analysis. Cell. 2012;**151**:476-482. DOI:

[42] Shah SP et al. The clonal and mutational evolution spectrum of primary triple-negative

[43] Martin CL et al. Cytogenetic and molecular characterization of A2BP1/FOX1 as a candidate gene for autism. American Journal of Medical Genetics. Part B, Neuropsychiatric

[44] Guirguis A et al. Use of gene expression profiles to stage concurrent endometrioid tumors of the endometrium and ovary. Gynecologic Oncology. 2008;**108**:370-376. DOI:

[45] Kalyana-Sundaram S et al. Gene fusions associated with recurrent amplicons represent a

[46] McElwee JL et al. Identification of PADI2 as a potential breast cancer biomarker and therapeutic target. BMC Cancer. 2012;**12**:500. DOI: 10.1186/1471-2407-12-500

[47] Oh S, Tseng GC, Sibille E. Reciprocal phylogenetic conservation of molecular aging in mouse and human brain. Neurobiology of Aging. 2011;**32**:1331-1335. DOI: 10.1016/j.

[48] Sibille E et al. A molecular signature of depression in the amygdala. The American Journal of Psychiatry. 2009;**166**:1011-1024. DOI: 10.1176/appi.ajp.2009.08121760

[49] Chiu IM et al. A neurodegeneration-specific gene-expression signature of acutely isolated microglia from an amyotrophic lateral sclerosis mouse model. Cell Reports.

[50] Xu YH, Sun Y, Barnes S, Grabowski GA. Comparative therapeutic effects of velaglucerase alfa and imiglucerase in a Gaucher disease mouse model. PLoS One. 2010;**5**:e10750. DOI:

[51] Cotney J et al. Chromatin state signatures associated with tissue-specific gene expression and enhancer activity in the embryonic limb. Genome Research. 2012;**22**:1069-1080. DOI:

[52] Hardcastle TJ, Kelly KA. baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics. 2010;**11**:422. DOI: 10.1186/

class of passenger aberrations in breast cancer. Neoplasia. 2012;**14**:702-708

Bioinformatics. 2010;**11**:154. DOI: 10.1186/1471-2105-11-154

types. Cell. 2012;**148**:1293-1307. DOI: 10.1016/j.cell.2012.02.009

breast cancers. Nature. 2012;**486**:395-399. DOI: 10.1038/nature10933

Genetics. 2007;**144B**:869-876. DOI: 10.1002/ajmg.b.30530

10.1016/j.cell.2012.10.012

18 New Insights into Bayesian Inference

10.1016/j.ygyno.2007.10.008

neurobiolaging.2009.08.004

10.1371/journal.pone.0010750

10.1101/gr.129817.111

1471-2105-11-422

2013;**4**:385-401. DOI: 10.1016/j.celrep.2013.06.018


[66] Stower H. Splicing: Waiting to be spliced. Nature Reviews. Genetics. 2012;**13**:599. DOI: 10.1038/nrg3310

**Chapter 3**

Provisional chapter

**Bayesian Analysis for Hidden Markov Factor Analysis**

DOI: 10.5772/intechopen.72837

The purpose of this chapter is to provide an introduction to Bayesian approach within a general framework and develop a Bayesian procedure for analyzing multivariate

The Bayesian approach is now well recognized in the statistics literature as an attractive approach to analyzing a wide variety of models [1], and there is rich literature on this issue. Here, we are not going to present a full coverage on the general Bayesian theory, and readers may refer to excellent books, for example [2, 3], for more details for this general statistical method. This chapter provides an introduction to the Bayesian approach within a general framework and develops a specific Bayesian procedure for analyzing multivariate longitudinal data within the hidden Markov factor analysis framework. We begin with the basic ideas of the Bayesian approach and then describe the model under consideration in the second section. The following section considers Bayesian inferences including parameter estimation, model selection, and posterior density estimates. The final section demonstrates the practical value of proposed methodology to cocaine use data to get some Bayesian results. Some technical details

Consider a data set Y with the probability model p Yð Þ jθ where θ is a univariate or multivariate population parameters vector, which quantifies the uncertainty of data. In the statistical literature,

> © The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited.

distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

longitudinal data within the hidden Markov factor analysis framework.

Keywords: hidden Markov factor analysis model, Markov chain Monte Carlo

Bayesian Analysis for Hidden Markov Factor Analysis

Yemao Xia, Xiaoqian Zeng and Niansheng Tang

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.72837

Yemao Xia, Xiaoqian Zeng and

**Models**

Models

Niansheng Tang

Abstract

1. Introduction

are given in the Appendix.

sampling, cocaine use


#### **Bayesian Analysis for Hidden Markov Factor Analysis Models** Bayesian Analysis for Hidden Markov Factor Analysis Models

DOI: 10.5772/intechopen.72837

Yemao Xia, Xiaoqian Zeng and Niansheng Tang Yemao Xia, Xiaoqian Zeng and

Additional information is available at the end of the chapter Niansheng Tang

http://dx.doi.org/10.5772/intechopen.72837 Additional information is available at the end of the chapter

Abstract

[66] Stower H. Splicing: Waiting to be spliced. Nature Reviews. Genetics. 2012;**13**:599. DOI:

[67] Wang ET et al. Alternative isoform regulation in human tissue transcriptomes. Nature.

[68] Kim D et al. TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology. 2013;**14**:R36. DOI: 10.1186/gb-2013-14-4-r36

[69] Medvedovic M, Sivaganesan S. Bayesian infinite mixture model based clustering of gene

[70] Spellman PT et al. Comprehensive identification of cell cycle-regulated genes of the yeast *Saccharomyces cerevisiae* by microarray hybridization. Molecular Biology of the

10.1038/nrg3310

20 New Insights into Bayesian Inference

Cell. 1998;**9**:3273-3297

2008;**456**:470-476. DOI: 10.1038/nature07509

expression profiles. Bioinformatics. 2002;**18**:1194-1206

The purpose of this chapter is to provide an introduction to Bayesian approach within a general framework and develop a Bayesian procedure for analyzing multivariate longitudinal data within the hidden Markov factor analysis framework.

Keywords: hidden Markov factor analysis model, Markov chain Monte Carlo sampling, cocaine use

#### 1. Introduction

The Bayesian approach is now well recognized in the statistics literature as an attractive approach to analyzing a wide variety of models [1], and there is rich literature on this issue. Here, we are not going to present a full coverage on the general Bayesian theory, and readers may refer to excellent books, for example [2, 3], for more details for this general statistical method. This chapter provides an introduction to the Bayesian approach within a general framework and develops a specific Bayesian procedure for analyzing multivariate longitudinal data within the hidden Markov factor analysis framework. We begin with the basic ideas of the Bayesian approach and then describe the model under consideration in the second section. The following section considers Bayesian inferences including parameter estimation, model selection, and posterior density estimates. The final section demonstrates the practical value of proposed methodology to cocaine use data to get some Bayesian results. Some technical details are given in the Appendix.

Consider a data set Y with the probability model p Yð Þ jθ where θ is a univariate or multivariate population parameters vector, which quantifies the uncertainty of data. In the statistical literature,

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited.

p Yð Þ jθ is called likelihood or sampling distribution and often represented as Lð Þ θ . From the frequency statistics point of view, statistical inferences are carried out based on Lð Þ θ . In this case, θ, though unknown, is treated as fixed. Unlike the frequency statistical inferences, the Bayesian approach for data analysis assumes that θ is random and has a distribution π θð Þ. This distribution, which represents the knowledge about θ, is referred to as prior distribution or prior. When data are available, the information on θ is summarized within the posterior distribution or posterior, a conditional distribution θ given data, i.e.,

$$p(\theta|Y) = \frac{p(Y|\theta)\pi(\theta)}{p(Y)} \propto p(Y|\theta)p(\theta) \tag{1}$$

investigating its transition pattern over time. In these cases, hidden Markov latent variable model (HMLVM) [10–13] provides a feasible and unified framework to address these issues. HMLVM assumes that the overall model constitutes the observed process and the underlying hidden state process. The state process, as the convention in the classic HMM (see for example, [14–17]), is an univariate discrete process, which follows a first-order Markov chain, while the observed process, conditional on the state sequence, is an independent process with emission distribution specified via LVMs [18]. Hence, in this regard, HMLVM provides a unified way of describing the correlation of multiple items, temporal dependence, and heterogeneity among the data simultaneously. However, the current existing developments cited beforehand focus on the maximum likelihood analysis in which statistical inferences heavily depend on the asymptotic properties. As an illustration of Bayesian inferences on practical problems, in this chapter, we develop a Bayesian procedure to analyze cocaine use data within the hidden Markov factor analysis model framework. Compared to ML, a basic nice feature of a Bayesian approach is its flexibility to utilize useful prior information for achieving better results. Additionally, simulation-based Bayesian methods depend less on asymptotic theory and hence have

Bayesian Analysis for Hidden Markov Factor Analysis Models

http://dx.doi.org/10.5772/intechopen.72837

23

Consider a set of multivariate longitudinal observations formed by p-dimensional observed

across N subjects: i ¼ 1, ⋯, N. In the field of multivariate analysis, interest mainly focuses on exploring item dependence since measurements may be highly correlated arising from the multicollinearity problem. But more often, interest also concentrates on the heterogeneity resulting from the situation where the population of yit constitutes more than one component. This is particularly true in the situation where the data illustrate extreme behaviors such as multimodal and/or skewed characteristics. In these cases, a finite mixture factor analysis model (FMFAM) can provide a powerful tool to address these issues. Typically, FMFAM assumes that conditioning on an univariate discrete value state variable zit and an m-dimensional (m < p) continuous latent factor vector ωit, yit are independent and distributed with a p-dimensional multivariate normal distribution, and meanwhile, given zit, ωit also follows an m-dimensional

<sup>y</sup>itjωit; zit <sup>¼</sup> <sup>r</sup> � <sup>N</sup> <sup>p</sup> <sup>μ</sup><sup>r</sup> <sup>þ</sup> <sup>Λ</sup><sup>r</sup> <sup>ω</sup>it; <sup>Ψ</sup>ε<sup>r</sup>

a p � p diagonal matrix with the jth diagonal element Ψ<sup>e</sup>krj > 0, and Φ<sup>r</sup> is an m � m positive

ωit ð Þ� jzit ¼ r N <sup>m</sup>ð Þ 0; Φ<sup>r</sup>

, which are recorded on p items over periods of length T: t ¼ 1, ⋯, T

is a p-dimensional intercept vector, which represents the baseline

is a <sup>p</sup> � <sup>m</sup> factor loading matrix, <sup>Ψ</sup><sup>e</sup><sup>r</sup> <sup>¼</sup> diag <sup>Ψ</sup><sup>e</sup>kr1; <sup>⋯</sup>; <sup>Ψ</sup><sup>e</sup>krp is

(2)

the potential to produce more reliable results even with small samples.

2. Model description

vectors yit <sup>¼</sup> yit1;…; yitp <sup>⊺</sup>

normal distribution, that is,

where <sup>μ</sup><sup>r</sup> <sup>¼</sup> <sup>μ</sup>r1;…; <sup>μ</sup>rp <sup>⊺</sup>

level of <sup>y</sup>it, <sup>Λ</sup><sup>r</sup> <sup>¼</sup> <sup>Λ</sup><sup>⊺</sup>

definite matrix.

<sup>r</sup>1; …; Λ<sup>⊺</sup> rp <sup>⊺</sup>

2.1. Hidden Markov factor analysis model

where p Yð Þ¼ <sup>Ð</sup> p Yð Þ jθ π θð Þdθ is the marginal distribution of Y. The right-hand-side term in (1) omits the factor p Yð Þ since given Y it is a known constant. In Bayes literature, p Yð Þ jθ pð Þ θ is also termed the unnormalized posterior. Analogous to the role of likelihood in frequency statistical inferences, posterior is the starting point of Bayesian inferences.

Selecting proper priors for parameters is fundamental to Bayesian analysis. Basically, there are two kinds of prior distributions, namely, the noninformative prior distributions and the informative prior distributions. Noninformative prior distributions associate with situations when the prior distributions have no population basis. They are used when we have little prior information on θ and desire that the prior distributions play a minimal role in the posterior distribution distribution. Informative prior distribution represents the distribution of possible parameter values, from which the parameter θ has been drawn. We may have prior knowledge about this distribution, either from closed related data or from the subjective knowledge of experts. A commonly used informative prior distribution in the general Bayesian approach to statistical problems is the conjugate prior distribution, a prior ensuring that the posterior distribution follows the same parametric form as the prior distribution [1, 3].

A potential difficulty underlying Bayesian inferences is the statistical computation when posterior distribution takes on the complicated form. This is particularly true in the situation where latent variables or other unobservable quantities are involved in the model, as discussed in this chapter. In such cases, statistical inferences usually recur to simulation-based methods. Among various sampling methods, Markov chains Monte Carlo methods (MCMC) provide powerful tools for simulating observations from posterior. The key to Markov chain simulation is to create a Markov sequence whose stationary distribution is a specified posterior pð Þ θjY . Posterior inferences are carried out based on these simulated observations. There are many ways of constructing these Markov chains, but all of them, including the Gibbs sampler [4, 5], are special cases of the general framework of Metropolis et al. [6] and Hastings [7]. However, we do not intend to pursue this issue here, and details on simulation-based methods can be referenced to [2, 3, 8, 9].

In what follows, as an illustration, we will develop a Bayesian analysis procedure for multivariate data under longitudinal setting. Multivariate longitudinal or clustered data occur when multiple items are measured repeatedly over periods of time or across occasions. Under such setting, the primary interest is inference about the dependence of the multiple measurements and the temporal correlation resulting from the repeated measures on the same items. But more often, particular interest also focuses on exploring the potential heterogeneity of data and investigating its transition pattern over time. In these cases, hidden Markov latent variable model (HMLVM) [10–13] provides a feasible and unified framework to address these issues. HMLVM assumes that the overall model constitutes the observed process and the underlying hidden state process. The state process, as the convention in the classic HMM (see for example, [14–17]), is an univariate discrete process, which follows a first-order Markov chain, while the observed process, conditional on the state sequence, is an independent process with emission distribution specified via LVMs [18]. Hence, in this regard, HMLVM provides a unified way of describing the correlation of multiple items, temporal dependence, and heterogeneity among the data simultaneously. However, the current existing developments cited beforehand focus on the maximum likelihood analysis in which statistical inferences heavily depend on the asymptotic properties. As an illustration of Bayesian inferences on practical problems, in this chapter, we develop a Bayesian procedure to analyze cocaine use data within the hidden Markov factor analysis model framework. Compared to ML, a basic nice feature of a Bayesian approach is its flexibility to utilize useful prior information for achieving better results. Additionally, simulation-based Bayesian methods depend less on asymptotic theory and hence have the potential to produce more reliable results even with small samples.

#### 2. Model description

p Yð Þ jθ is called likelihood or sampling distribution and often represented as Lð Þ θ . From the frequency statistics point of view, statistical inferences are carried out based on Lð Þ θ . In this case, θ, though unknown, is treated as fixed. Unlike the frequency statistical inferences, the Bayesian approach for data analysis assumes that θ is random and has a distribution π θð Þ. This distribution, which represents the knowledge about θ, is referred to as prior distribution or prior. When data are available, the information on θ is summarized within the posterior

omits the factor p Yð Þ since given Y it is a known constant. In Bayes literature, p Yð Þ jθ pð Þ θ is also termed the unnormalized posterior. Analogous to the role of likelihood in frequency statistical

Selecting proper priors for parameters is fundamental to Bayesian analysis. Basically, there are two kinds of prior distributions, namely, the noninformative prior distributions and the informative prior distributions. Noninformative prior distributions associate with situations when the prior distributions have no population basis. They are used when we have little prior information on θ and desire that the prior distributions play a minimal role in the posterior distribution distribution. Informative prior distribution represents the distribution of possible parameter values, from which the parameter θ has been drawn. We may have prior knowledge about this distribution, either from closed related data or from the subjective knowledge of experts. A commonly used informative prior distribution in the general Bayesian approach to statistical problems is the conjugate prior distribution, a prior ensuring that the posterior

A potential difficulty underlying Bayesian inferences is the statistical computation when posterior distribution takes on the complicated form. This is particularly true in the situation where latent variables or other unobservable quantities are involved in the model, as discussed in this chapter. In such cases, statistical inferences usually recur to simulation-based methods. Among various sampling methods, Markov chains Monte Carlo methods (MCMC) provide powerful tools for simulating observations from posterior. The key to Markov chain simulation is to create a Markov sequence whose stationary distribution is a specified posterior pð Þ θjY . Posterior inferences are carried out based on these simulated observations. There are many ways of constructing these Markov chains, but all of them, including the Gibbs sampler [4, 5], are special cases of the general framework of Metropolis et al. [6] and Hastings [7]. However, we do not intend to pursue this issue here, and details on simulation-based methods can be

In what follows, as an illustration, we will develop a Bayesian analysis procedure for multivariate data under longitudinal setting. Multivariate longitudinal or clustered data occur when multiple items are measured repeatedly over periods of time or across occasions. Under such setting, the primary interest is inference about the dependence of the multiple measurements and the temporal correlation resulting from the repeated measures on the same items. But more often, particular interest also focuses on exploring the potential heterogeneity of data and

p Yð Þ jθ π θð Þdθ is the marginal distribution of Y. The right-hand-side term in (1)

p Yð Þ <sup>∝</sup>p Yð Þ <sup>j</sup><sup>θ</sup> <sup>p</sup>ð Þ <sup>θ</sup> (1)

distribution or posterior, a conditional distribution θ given data, i.e.,

inferences, posterior is the starting point of Bayesian inferences.

where p Yð Þ¼ <sup>Ð</sup>

22 New Insights into Bayesian Inference

referenced to [2, 3, 8, 9].

<sup>p</sup>ð Þ¼ <sup>θ</sup>j<sup>Y</sup> p Yð Þ <sup>j</sup><sup>θ</sup> π θð Þ

distribution follows the same parametric form as the prior distribution [1, 3].

#### 2.1. Hidden Markov factor analysis model

Consider a set of multivariate longitudinal observations formed by p-dimensional observed vectors yit <sup>¼</sup> yit1;…; yitp <sup>⊺</sup> , which are recorded on p items over periods of length T: t ¼ 1, ⋯, T across N subjects: i ¼ 1, ⋯, N. In the field of multivariate analysis, interest mainly focuses on exploring item dependence since measurements may be highly correlated arising from the multicollinearity problem. But more often, interest also concentrates on the heterogeneity resulting from the situation where the population of yit constitutes more than one component. This is particularly true in the situation where the data illustrate extreme behaviors such as multimodal and/or skewed characteristics. In these cases, a finite mixture factor analysis model (FMFAM) can provide a powerful tool to address these issues. Typically, FMFAM assumes that conditioning on an univariate discrete value state variable zit and an m-dimensional (m < p) continuous latent factor vector ωit, yit are independent and distributed with a p-dimensional multivariate normal distribution, and meanwhile, given zit, ωit also follows an m-dimensional normal distribution, that is,

$$\begin{cases} \left( \mathbf{y}\_{it} | \boldsymbol{\omega}\_{it}, \boldsymbol{z}\_{it} = \boldsymbol{r} \right) \sim \mathcal{N}\_p \left( \boldsymbol{\mu}\_r + \boldsymbol{\Lambda}\_r \boldsymbol{\omega}\_{it}, \ \boldsymbol{\Psi}\_{\varepsilon r} \right) \\\ \left( \boldsymbol{\omega}\_{it} | \boldsymbol{z}\_{it} = \boldsymbol{r} \right) \sim \mathcal{N}\_m (\mathbf{0}, \boldsymbol{\Phi}\_r) \end{cases} \tag{2}$$

where <sup>μ</sup><sup>r</sup> <sup>¼</sup> <sup>μ</sup>r1;…; <sup>μ</sup>rp <sup>⊺</sup> is a p-dimensional intercept vector, which represents the baseline level of <sup>y</sup>it, <sup>Λ</sup><sup>r</sup> <sup>¼</sup> <sup>Λ</sup><sup>⊺</sup> <sup>r</sup>1; …; Λ<sup>⊺</sup> rp <sup>⊺</sup> is a <sup>p</sup> � <sup>m</sup> factor loading matrix, <sup>Ψ</sup><sup>e</sup><sup>r</sup> <sup>¼</sup> diag <sup>Ψ</sup><sup>e</sup>kr1; <sup>⋯</sup>; <sup>Ψ</sup><sup>e</sup>krp is a p � p diagonal matrix with the jth diagonal element Ψ<sup>e</sup>krj > 0, and Φ<sup>r</sup> is an m � m positive definite matrix.

Formulation given in (2) has two basic features: one is to characterize heterogeneity of population of yit at the occasion level and the other is to establish the dependence among the multiple measurements. The heterogeneous population is specified via state-specific parameters contained in the model while the dependence between different measurements is identified via sharing the common factors in the manner of liner combinations. In particular, apart from explaining the idiosyncratic part of measurements, latent factors also characterize the association between any two measurements. As a matter of fact, one can show that the correlation coefficient between yitj and yitk at state zit is given by

P<sup>S</sup>

<sup>s</sup>¼<sup>1</sup> Qrs <sup>¼</sup> <sup>1</sup>:0 for <sup>r</sup> <sup>¼</sup> <sup>1</sup>, <sup>⋯</sup>, S. Modeling state sequences into (5) allows us to explore the transition pattern of individuals across occasions exactly. For example, in the cocaine use data analysis, zit is often identified with the latent state of patient i at time t, then Qrs specifies how individual i being in state r transfers to state s on two successive occasions. Surely, we can relax the time-homogeneous assumption of transition probabilities by including relevant covariates to interpret the inhomogeneous transition behavior among observation data (see, for example,

The current model defined in (2)–(5) provides a comprehensive framework for modeling the multivariate longitudinal data with the latent variables. It accommodates the dynamic behavior of observed sequences, heterogeneity of observed data at the occasion level, and dependence among the multiple items simultaneously. In particular, it makes sense to measure

Let Y be the collection of all observations, and Ω be the set of corresponding factors. Denote Z ¼ zit f g : 1 ≤ i ≤ N; 1 ≤ t ≤ T be set of state variables. It follows from Eqs. (2), (4) and (5) that the

t¼2

� �

trΨ<sup>e</sup>zit <sup>y</sup>it � <sup>μ</sup>zit � <sup>Λ</sup><sup>⊺</sup>

� <sup>Y</sup> N

i¼1

YT t¼1

2

where <sup>θ</sup> is formed by free parameters in <sup>μ</sup>r, <sup>Λ</sup>r, <sup>Ψ</sup>r, and <sup>Φ</sup>r. Here, we write <sup>a</sup> <sup>⊗</sup> <sup>2</sup> <sup>¼</sup> aa<sup>⊺</sup> and denote I Að Þ the indicator function of a set A. The observed data likelihood is then achieved by taking integration of pð Þ Y; Ω;Zjθ; δ; Q over Ω and Z, which involves high-dimensional inte-

� �, <sup>Λ</sup> <sup>¼</sup> f g <sup>Λ</sup><sup>r</sup> , <sup>Ψ</sup><sup>e</sup> <sup>¼</sup> f g <sup>Ψ</sup><sup>e</sup><sup>r</sup> , and <sup>Φ</sup> <sup>¼</sup> f g <sup>Φ</sup>kr . For the Bayesian analysis, we need to

pð Þ¼ θ; δ; Q pð Þ μ pð Þ Λ; Ψ<sup>e</sup> pð Þ Φ pð Þ δ pð Þ Q : (7)

assign priors to the unknown parameters involved for completing model specification. Since θ, δ, and Q are involved in different submodels, it is natural to assume that θ, δ, and Q are mutually independent and the components contained in θ are also mutually independent, that is,

<sup>p</sup> <sup>y</sup>it; <sup>ω</sup>itjzit; <sup>θ</sup> � �p zit ð Þ <sup>j</sup>zi,t�<sup>1</sup>; <sup>Q</sup>

zit <sup>ω</sup>it � � <sup>⊗</sup> <sup>2</sup>

Y S

δI zf g <sup>i</sup>1¼<sup>r</sup> r

Y S

Bayesian Analysis for Hidden Markov Factor Analysis Models

http://dx.doi.org/10.5772/intechopen.72837

rs !

<sup>Q</sup>I z f g i,t�1¼r;zit¼<sup>s</sup>

(6)

25

s¼1

r¼1

[12, 13, 16]) but at the expense of computational burden.

effects of latent factors on the manifest variables quantitatively.

<sup>p</sup> <sup>y</sup>i1; <sup>ω</sup>itjzi1; <sup>θ</sup> � �p zð Þ <sup>i</sup>1j<sup>δ</sup> <sup>Y</sup><sup>T</sup>

�<sup>1</sup>=<sup>2</sup> exp � <sup>1</sup>

2 trΦ�<sup>1</sup> zit <sup>ω</sup> <sup>⊗</sup> <sup>2</sup> it � �!

1 Ψ<sup>e</sup>zit � � �

<sup>1</sup>=<sup>2</sup> exp � <sup>1</sup>

joint sampling distribution of Y, Ω, and Z is given by

N

i¼1

YT t¼1

i¼1

� <sup>1</sup> Φzit � � � �

∝ Y N

<sup>p</sup>ðY; <sup>Ω</sup>; <sup>Z</sup>jθ; <sup>δ</sup>; <sup>Q</sup>Þ ¼ <sup>Y</sup>

3. Posterior inferences

3.1. Prior specifications

grations.

Let μ ¼ μ<sup>r</sup>

$$\text{Corr}\left(y\_{t\circ j}, y\_{tk}|z\_{lt} = r\right) = \frac{\sum\_{t=1}^{m} \sum\_{h=1}^{m} \lambda\_{r\circ l} \lambda\_{rkh} \Phi\_{rlh}}{\sqrt{\sum\_{t=1}^{m} \sum\_{h=1}^{m} \lambda\_{r\circ l} \lambda\_{r\circ h} \Phi\_{rlh} + \Psi\_{erj}} \sqrt{\sum\_{t=1}^{m} \sum\_{h=1}^{m} \lambda\_{rkt} \lambda\_{rkh} \Phi\_{rlh} + \Psi\_{erk}}} \tag{3}$$

in which λrjk is the ð Þ j; k th element of Λ<sup>r</sup> and Φr, hk is the ð Þ h; k th element in Φ, respectively. The strength of correlation is identified by the factor loadings and covariance of factors together. In the case when ωit degenerates to zero (i.e., Φ ¼ 0) or Λ ¼ 0, the association among items disappears and model (2) reduces to p-independent mean-variance models within cluster r. Hence, latent factors play a dominant role in characterizing association of multiple items. Note that, in actual applications, latent factors, though unobservable, often have their own physical interpretations. In psychology, for example, latent factors are often used to identify concepts such as treatment, temper, and anxiety, which are important within the framework of theoretical models. The measurements are just proxies for these unobserved concepts of interest. We will provide further interpretations in the real example.

The primary reason for collecting information on multiple occasions for each subject is that it allows investigation of change and/or temporal dependence over time within the subject. There exist various constructs for characterizing dynamic characteristics. A commonly used method is to construct proper dynamic structures for latent factors and establish dynamic factor models, see for example, [19–21]. An alternative choice we adopt here is specifying the joint distribution for state sequences. Following the common routine (see, for example, [22, 23]), we assume that each individual state sequence z<sup>i</sup> ¼ ð Þ zi1; ⋯; ziT satisfies the following first-order hidden Markov model

$$p(z\_i) = p(z\_{i1}) \prod\_{t=2}^{T} p(z\_{it}|z\_{i,t-1}) \tag{4}$$

where p zð Þ <sup>i</sup><sup>1</sup> and p zit ð Þ jzi,t�<sup>1</sup> are, respectively, the initial distribution and transition probability given by

$$P(z\_{i1} = r | ) = \delta\_{r\prime} \quad P(z\_{i\prime} = s | z\_{i,\prime -1} = r) = Q\_{rs} \ (r, s = 1, \cdots, S) \tag{5}$$

where <sup>S</sup> is a positive integer, <sup>δ</sup> <sup>¼</sup> ð Þ <sup>δ</sup>1; <sup>⋯</sup>; <sup>δ</sup><sup>S</sup> is an <sup>S</sup> � 1 vector satisfying <sup>δ</sup><sup>r</sup> <sup>≥</sup> 0 and <sup>P</sup><sup>S</sup> <sup>r</sup>¼<sup>1</sup> <sup>δ</sup><sup>r</sup> <sup>¼</sup> <sup>1</sup>:0, and Q ¼ Qrs ð Þ is an S � S transition matrix with the ð Þ r;s th entry being Qrs, that is, Qrs ≥ 0 and P<sup>S</sup> <sup>s</sup>¼<sup>1</sup> Qrs <sup>¼</sup> <sup>1</sup>:0 for <sup>r</sup> <sup>¼</sup> <sup>1</sup>, <sup>⋯</sup>, S. Modeling state sequences into (5) allows us to explore the transition pattern of individuals across occasions exactly. For example, in the cocaine use data analysis, zit is often identified with the latent state of patient i at time t, then Qrs specifies how individual i being in state r transfers to state s on two successive occasions. Surely, we can relax the time-homogeneous assumption of transition probabilities by including relevant covariates to interpret the inhomogeneous transition behavior among observation data (see, for example, [12, 13, 16]) but at the expense of computational burden.

The current model defined in (2)–(5) provides a comprehensive framework for modeling the multivariate longitudinal data with the latent variables. It accommodates the dynamic behavior of observed sequences, heterogeneity of observed data at the occasion level, and dependence among the multiple items simultaneously. In particular, it makes sense to measure effects of latent factors on the manifest variables quantitatively.

Let Y be the collection of all observations, and Ω be the set of corresponding factors. Denote Z ¼ zit f g : 1 ≤ i ≤ N; 1 ≤ t ≤ T be set of state variables. It follows from Eqs. (2), (4) and (5) that the joint sampling distribution of Y, Ω, and Z is given by

$$p(\mathbf{Y}, \mathbf{Q}, \mathbf{Z} | \boldsymbol{\theta}, \boldsymbol{\delta}, \mathbf{Q}) = \prod\_{i=1}^{N} p(\mathbf{y}\_{zi}, \boldsymbol{\alpha}\_{ii} | z\_{i1}, \boldsymbol{\theta}) p(z\_{i1} | \boldsymbol{\delta}) \prod\_{t=2}^{T} p(\mathbf{y}\_{zi}, \boldsymbol{\alpha}\_{ii} | z\_{it}, \boldsymbol{\theta}) p(z\_{it} | z\_{i, t-1}, \boldsymbol{\Omega})$$

$$\propto \prod\_{i=1}^{N} \prod\_{t=1}^{T} \left( \frac{1}{|\mathbf{W}\_{ez\_{it}}|^{1/2}} \exp\left\{-\frac{1}{2} \text{tr} \boldsymbol{\Psi}\_{ez\_{it}} \left(\mathbf{y}\_{it} - \boldsymbol{\mu}\_{z\_{it}} - \boldsymbol{\Lambda}\_{z\_{it}}^{\mathrm{I}} \boldsymbol{\omega}\_{it}\right)^{\otimes 2}\right\} \right. \tag{6}$$

$$\times \frac{1}{|\boldsymbol{\Phi}\_{z\_{it}}|^{1/2}} \exp\left\{-\frac{1}{2} \text{tr} \boldsymbol{\Phi}\_{z\_{it}}^{-1} \boldsymbol{\omega}\_{it}^{\otimes 2}\right\} \Big) \times \prod\_{i=1}^{N} \prod\_{t=1}^{T} \left( \prod\_{r=1}^{S} \delta\_{r}^{l'z\_{i1} = r} \prod\_{s=1}^{S} Q\_{s}^{\left\{z\_{i, t-1} = r, z\_{i} = s\right\}} \right)$$

where <sup>θ</sup> is formed by free parameters in <sup>μ</sup>r, <sup>Λ</sup>r, <sup>Ψ</sup>r, and <sup>Φ</sup>r. Here, we write <sup>a</sup> <sup>⊗</sup> <sup>2</sup> <sup>¼</sup> aa<sup>⊺</sup> and denote I Að Þ the indicator function of a set A. The observed data likelihood is then achieved by taking integration of pð Þ Y; Ω;Zjθ; δ; Q over Ω and Z, which involves high-dimensional integrations.

#### 3. Posterior inferences

#### 3.1. Prior specifications

Formulation given in (2) has two basic features: one is to characterize heterogeneity of population of yit at the occasion level and the other is to establish the dependence among the multiple measurements. The heterogeneous population is specified via state-specific parameters contained in the model while the dependence between different measurements is identified via sharing the common factors in the manner of liner combinations. In particular, apart from explaining the idiosyncratic part of measurements, latent factors also characterize the association between any two measurements. As a matter of fact, one can show that the

> Pm <sup>ℓ</sup>¼<sup>1</sup> Pm h¼1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

in which λrjk is the ð Þ j; k th element of Λ<sup>r</sup> and Φr, hk is the ð Þ h; k th element in Φ, respectively. The strength of correlation is identified by the factor loadings and covariance of factors together. In the case when ωit degenerates to zero (i.e., Φ ¼ 0) or Λ ¼ 0, the association among items disappears and model (2) reduces to p-independent mean-variance models within cluster r. Hence, latent factors play a dominant role in characterizing association of multiple items. Note that, in actual applications, latent factors, though unobservable, often have their own physical interpretations. In psychology, for example, latent factors are often used to identify concepts such as treatment, temper, and anxiety, which are important within the framework of theoretical models. The measurements are just proxies for these unobserved concepts of interest. We

The primary reason for collecting information on multiple occasions for each subject is that it allows investigation of change and/or temporal dependence over time within the subject. There exist various constructs for characterizing dynamic characteristics. A commonly used method is to construct proper dynamic structures for latent factors and establish dynamic factor models, see for example, [19–21]. An alternative choice we adopt here is specifying the joint distribution for state sequences. Following the common routine (see, for example, [22, 23]), we assume that each individual state sequence z<sup>i</sup> ¼ ð Þ zi1; ⋯; ziT satisfies the following

> Y T

t¼2

P zð Þ¼ <sup>i</sup><sup>1</sup> ¼ rj δr, Pzð it ¼ sjzi,t�<sup>1</sup> ¼ rÞ ¼ Qrs ð Þ r;s ¼ 1; ⋯; S (5)

where p zð Þ <sup>i</sup><sup>1</sup> and p zit ð Þ jzi,t�<sup>1</sup> are, respectively, the initial distribution and transition probability

and Q ¼ Qrs ð Þ is an S � S transition matrix with the ð Þ r;s th entry being Qrs, that is, Qrs ≥ 0 and

p zð Þ¼<sup>i</sup> p zð Þ <sup>i</sup><sup>1</sup>

where <sup>S</sup> is a positive integer, <sup>δ</sup> <sup>¼</sup> ð Þ <sup>δ</sup>1; <sup>⋯</sup>; <sup>δ</sup><sup>S</sup> is an <sup>S</sup> � 1 vector satisfying <sup>δ</sup><sup>r</sup> <sup>≥</sup> 0 and <sup>P</sup><sup>S</sup>

λrj<sup>ℓ</sup>λrjhΦrℓ<sup>h</sup> þ Ψ<sup>e</sup>rj

λrj<sup>ℓ</sup>λrkhΦrlh

λrk<sup>ℓ</sup>λrkhΦrℓ<sup>h</sup> þ Ψ<sup>e</sup>rk <sup>s</sup> (3)

p zit ð Þ jzi,t�<sup>1</sup> (4)

<sup>r</sup>¼<sup>1</sup> <sup>δ</sup><sup>r</sup> <sup>¼</sup> <sup>1</sup>:0,

s ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pm <sup>ℓ</sup>¼<sup>1</sup> Pm h¼1

correlation coefficient between yitj and yitk at state zit is given by

¼

will provide further interpretations in the real example.

Pm <sup>ℓ</sup>¼<sup>1</sup> Pm h¼1

Corr yitj; yitkjzit ¼ r � �

24 New Insights into Bayesian Inference

first-order hidden Markov model

given by

Let μ ¼ μ<sup>r</sup> � �, <sup>Λ</sup> <sup>¼</sup> f g <sup>Λ</sup><sup>r</sup> , <sup>Ψ</sup><sup>e</sup> <sup>¼</sup> f g <sup>Ψ</sup><sup>e</sup><sup>r</sup> , and <sup>Φ</sup> <sup>¼</sup> f g <sup>Φ</sup>kr . For the Bayesian analysis, we need to assign priors to the unknown parameters involved for completing model specification. Since θ, δ, and Q are involved in different submodels, it is natural to assume that θ, δ, and Q are mutually independent and the components contained in θ are also mutually independent, that is,

$$p(\boldsymbol{\theta}, \boldsymbol{\delta}, \mathbf{Q}) = p(\boldsymbol{\mu}) p(\boldsymbol{\Lambda}, \boldsymbol{\Psi}\_{\mathbf{e}}) p(\boldsymbol{\Phi}) p(\boldsymbol{\delta}) p(\mathbf{Q}).\tag{7}$$

For the convenience of conjugacy, we assume that the parameters are drawn from the following commonly used conjugate types prior distributions (see for example [24]).

the full conditional distributions of Ω, Z, θ, δ, and Q have closed forms. This provides a solid foundation for Markov chain Monte Carlo methods. Markov chain Monte Carlo sampling does not draw observations from pð Þ Ω;Z; θ; δ; QjY directly. On the contrary, it generates observations from the full conditionals of each component alternatively, thus forming the dependent sample, i.e., Markov chains. Specifically, as pointed out in the introduction, we use Gibbs sampler [4, 5] to draw observations from this target distribution. Obviously, the sampling scheme in the Gibbs sampler includes two types of moves: updating the components involved in the factor analysis model and updating the components related to the hidden Markov model. We propose using the following Gibbs sampler which iteratively simulates from the conditional distributions, where variables are removed from the conditioning set either by explicit integration or by conditional

Bayesian Analysis for Hidden Markov Factor Analysis Models

http://dx.doi.org/10.5772/intechopen.72837

27

Under mild conditions and similar to [4] (see also, for example, [26]), one can show that for sufficiently large <sup>b</sup>, say <sup>B</sup>0, the joint distribution of <sup>Ω</sup>ð Þ<sup>b</sup> ;Zð Þ<sup>b</sup> ; <sup>θ</sup>ð Þ<sup>b</sup> ; <sup>δ</sup>ð Þ<sup>b</sup> ; <sup>Q</sup>ð Þ<sup>b</sup> n o converges at an

exponential rate to the desired posterior distribution pð Þ Ω;Z; θ; δ; QjY . Hence, pð Þ Ω;Z; θ; δ; QjY can be approximated by the empirical distribution of <sup>Ω</sup>ð Þ<sup>b</sup> ; <sup>Z</sup>ð Þ<sup>b</sup> ; <sup>θ</sup>ð Þ<sup>b</sup> ; <sup>δ</sup>ð Þ<sup>b</sup> ; <sup>Q</sup>ð Þ<sup>b</sup> n o : <sup>b</sup> <sup>¼</sup> <sup>B</sup>0<sup>þ</sup> 1, ⋯, B<sup>0</sup> þ Bg where B is chosen to give sufficient precision to the empirical distribution. The convergence of the Gibbs sampler can be monitored by the 'estimated potential scale reduction (EPSR)' values as suggested by Gelman and Rubin [27] or by plotting the traces of estimates

Simulated observations obtained from the posterior can be used for statistical inferences via straightforward analysis procedures. For brevity, let <sup>θ</sup>ð Þ<sup>b</sup> ; <sup>δ</sup>ð Þ<sup>b</sup> ; <sup>Q</sup>ð Þ<sup>b</sup> ; <sup>Ω</sup>ð Þ<sup>b</sup> ; <sup>Z</sup>ð Þ<sup>b</sup> n o be the random

observations generated by the Gibbs sampler from pð Þ θ; δ; Q; Ω;ZjY . The joint Bayesian estimate of θ and Ω can be obtained easily via the corresponding sample means of the generated

Clearly, these Bayesian estimates are consistent estimates of the corresponding posterior means, see [26]. The consistent estimates of covariance matrix of estimates can be obtained as

B

<sup>Ω</sup>ð Þ<sup>b</sup> , <sup>Z</sup><sup>b</sup> <sup>¼</sup> ð Þ <sup>B</sup> � <sup>1</sup> �<sup>1</sup><sup>X</sup>

B

Zð Þ<sup>b</sup> : (10)

b¼1

b¼1

<sup>θ</sup>ð Þ<sup>b</sup> , <sup>c</sup><sup>Ω</sup> <sup>¼</sup> ð Þ <sup>B</sup> � <sup>1</sup> �<sup>1</sup><sup>X</sup>

independence. The steps involved in the Gibbs sampler are

Step c: Generate f g μ; Λ; Ψ<sup>e</sup> from pð Þ μ; Λ; ΨejZ; Ω;Y

against iterations under different starting values.

<sup>θ</sup><sup>b</sup> <sup>¼</sup> ð Þ <sup>B</sup> � <sup>1</sup> �<sup>1</sup><sup>X</sup>

B

b¼1

Step a: Generate Z from pð Þ Zjθ; δ; Q; Ω;Y

Step b: Generate Ω from pð Þ ΩjZ; θ; Y

Step d: Generate Φ from pð Þ ΦjZ; Ω

Step e: Generate δ from pð Þ δjZ

Step f: Generate Q from pð Þ QjZ

observations as follows:

follows:

$$p(\boldsymbol{\mu}) = \prod\_{r=1}^{S} p(\boldsymbol{\mu}\_r) \stackrel{\mathcal{D}}{=} \prod\_{r=1}^{S} \mathcal{N}\_{\boldsymbol{P}}(\boldsymbol{\mu}\_{0r}, \boldsymbol{\Sigma}\_{0r}), \mathbf{Q} \sim \prod\_{r=1}^{S} p(\mathbf{Q}\_r) \stackrel{\mathcal{D}}{=} \prod\_{r=1}^{S} \mathcal{W}\_{\boldsymbol{m}}^{-1}(\boldsymbol{\rho}\_{0r}, \mathbf{R}\_{0r}^{-1}),$$

$$p(\mathbf{A}, \boldsymbol{\Psi}\_{\epsilon}) = \prod\_{r=1}^{S} p(\mathbf{A}\_r | \boldsymbol{\Psi}\_{\boldsymbol{w}}) \times p(\mathbf{\Psi}\_{\boldsymbol{\sigma}}) \stackrel{\mathcal{D}}{=} \prod\_{r=1}^{S} \prod\_{j=1}^{p} \mathcal{N}\_{\boldsymbol{m}}(\boldsymbol{\Lambda}\_{0\gamma j}, \boldsymbol{\psi}\_{\boldsymbol{\sigma}\gamma} \mathbf{H}\_{\boldsymbol{\epsilon}0\gamma j}) \cdot \mathcal{G} \bar{a}^{-1} \Big(\boldsymbol{\alpha}\_{\boldsymbol{\epsilon}0\gamma}, \boldsymbol{\beta}\_{\boldsymbol{\epsilon}0\gamma}\Big),\tag{8}$$

$$\delta | \delta\_0 \sim \mathcal{D} \dot{\boldsymbol{r}}\_{\mathcal{S}}(\boldsymbol{\gamma}\_0, \dots, \boldsymbol{\gamma}\_0), p(\mathbf{Q}) = \prod\_{r=1}^{S} p(\mathbf{Q}\_r) \stackrel{\mathcal{D}}{=} \prod\_{r=1}^{S} \mathcal{D} \dot{\boldsymbol{r}}\_{\mathcal{S}}(\boldsymbol{\nu}\_0, \dots, \boldsymbol{\nu}\_0)$$

where 'Ga�<sup>1</sup>ð Þ <sup>a</sup>; <sup>b</sup> ' denotes the inverse Gamma distributions with shape <sup>a</sup> <sup>&</sup>gt; 0 and scale <sup>b</sup> <sup>&</sup>gt; <sup>0</sup> and 'W�<sup>1</sup> <sup>m</sup><sup>2</sup> <sup>r</sup>0r; <sup>R</sup>�<sup>1</sup> 0r � �' represents the <sup>q</sup>-dimensional inverse Wishart distribution with <sup>r</sup>0<sup>r</sup> degrees of freedom and ð Þ m<sup>2</sup> � m<sup>2</sup> scale matrix R0r; Q<sup>r</sup> is the rth row vector of Q. The scalars α<sup>e</sup>0rj, β<sup>e</sup>0rj, r0r, γ0, ν0, the vectors μ0r, Λ0rj, and the matrices R0<sup>r</sup> and H<sup>e</sup>0rj are assumed to be known. Thus, standard conjugate priors were specified for all parametric components in the model. The conjugate type prior distributions are sufficiently flexible in most applications, and for situations with a reasonable amount of data available, the hyperparameters scarcely affect the analysis. It should be noted that although Eq. (8) allows different hyperparameters for different latent states, in practice, we choose identical priors for all s. Details of hyperparameter choices are discussed later when we present the empirical results.

#### 3.2. Gibbs sampling scheme and posterior analysis

Combining the sampling distribution for the observable yit's and the prior distribution specified in (8) yields the joint posterior distribution of f g θ; δ; Q given by

$$p(\theta,\,\delta,\mathbf{Q}|\mathbf{Y}) \propto p(\mathbf{Y}|\theta,\delta,\mathbf{Q})p(\theta)p(\delta)p(\mathbf{Q})\tag{9}$$

where we ignore the normalization constant pð Þ Y . However, due to the latent factors and state variables present, the computation of pð Þ Yjθ; δ; Q is intractable since it involves high-dimensional integrals. Consequently, no closed form can be available for the posterior pð Þ θ, δ, QjY . This problem can be addressed via the data augmentation idea in Tanner and Wong [25]. Data augmentation technique treats the latent quantities f g Ω;Z as the hypothetical missing data and augments them with the observed data to form complete data. The posterior analysis is now carried out based on the joint distribution pð Þ Ω;Z; θ; δ; QjY , which is proportional to pð Þ Y; Ω;Zjθ; δ; Q pð Þ θ; δ; Q , the product of likelihood of complete data and priors. Compared to the intractable observed data likelihood, the complete data likelihood has nice hierarchical structure based on conditional independent assumptions in (2) and (4) and hence is relatively easy to analyze. However, pð Þ Ω;Z; θ; δ; QjY is still not in closed form and is thus difficult to deal with analytically. In this regard, simulation-based methods can be used to generate observations to carry out posterior analysis. In view of the multiple components involved, the usual independent sampling methods are not feasible. Note that, on the basis of complete data, the full conditional distributions of Ω, Z, θ, δ, and Q have closed forms. This provides a solid foundation for Markov chain Monte Carlo methods. Markov chain Monte Carlo sampling does not draw observations from pð Þ Ω;Z; θ; δ; QjY directly. On the contrary, it generates observations from the full conditionals of each component alternatively, thus forming the dependent sample, i.e., Markov chains. Specifically, as pointed out in the introduction, we use Gibbs sampler [4, 5] to draw observations from this target distribution. Obviously, the sampling scheme in the Gibbs sampler includes two types of moves: updating the components involved in the factor analysis model and updating the components related to the hidden Markov model. We propose using the following Gibbs sampler which iteratively simulates from the conditional distributions, where variables are removed from the conditioning set either by explicit integration or by conditional independence. The steps involved in the Gibbs sampler are

Step a: Generate Z from pð Þ Zjθ; δ; Q; Ω;Y

Step b: Generate Ω from pð Þ ΩjZ; θ; Y

Step c: Generate f g μ; Λ; Ψ<sup>e</sup> from pð Þ μ; Λ; ΨejZ; Ω;Y

Step d: Generate Φ from pð Þ ΦjZ; Ω

Step e: Generate δ from pð Þ δjZ

For the convenience of conjugacy, we assume that the parameters are drawn from the follow-

S

r¼1

pð Þ¼ Φ<sup>r</sup> <sup>D</sup>Y S

N <sup>m</sup> Λ0rj;ψ<sup>e</sup>rjH<sup>e</sup>0rj � �

r¼1

pð Þ θ; δ; QjY ∝ pð Þ Yjθ; δ; Q pð Þ θ pð Þ δ pð Þ Q (9)

<sup>p</sup> <sup>Q</sup><sup>r</sup> ð Þ¼<sup>D</sup><sup>Y</sup> S

r¼1

W�<sup>1</sup>

DirSð Þ ν0; ⋯; ν<sup>0</sup>

<sup>m</sup> r0r; R�<sup>1</sup> 0r � �,

> � <sup>G</sup>a�<sup>1</sup> <sup>α</sup><sup>e</sup>0rj; <sup>β</sup><sup>e</sup>0rj � �

,

(8)

� �, <sup>Φ</sup> � <sup>Y</sup>

<sup>D</sup>Y S

Y p

j¼1

S

r¼1

� �' represents the <sup>q</sup>-dimensional inverse Wishart distribution with <sup>r</sup>0<sup>r</sup> degrees

where 'Ga�<sup>1</sup>ð Þ <sup>a</sup>; <sup>b</sup> ' denotes the inverse Gamma distributions with shape <sup>a</sup> <sup>&</sup>gt; 0 and scale <sup>b</sup> <sup>&</sup>gt; <sup>0</sup>

of freedom and ð Þ m<sup>2</sup> � m<sup>2</sup> scale matrix R0r; Q<sup>r</sup> is the rth row vector of Q. The scalars α<sup>e</sup>0rj, β<sup>e</sup>0rj, r0r, γ0, ν0, the vectors μ0r, Λ0rj, and the matrices R0<sup>r</sup> and H<sup>e</sup>0rj are assumed to be known. Thus, standard conjugate priors were specified for all parametric components in the model. The conjugate type prior distributions are sufficiently flexible in most applications, and for situations with a reasonable amount of data available, the hyperparameters scarcely affect the analysis. It should be noted that although Eq. (8) allows different hyperparameters for different latent states, in practice, we choose identical priors for all s. Details of hyperparameter

Combining the sampling distribution for the observable yit's and the prior distribution speci-

where we ignore the normalization constant pð Þ Y . However, due to the latent factors and state variables present, the computation of pð Þ Yjθ; δ; Q is intractable since it involves high-dimensional integrals. Consequently, no closed form can be available for the posterior pð Þ θ, δ, QjY . This problem can be addressed via the data augmentation idea in Tanner and Wong [25]. Data augmentation technique treats the latent quantities f g Ω;Z as the hypothetical missing data and augments them with the observed data to form complete data. The posterior analysis is now carried out based on the joint distribution pð Þ Ω;Z; θ; δ; QjY , which is proportional to pð Þ Y; Ω;Zjθ; δ; Q pð Þ θ; δ; Q , the product of likelihood of complete data and priors. Compared to the intractable observed data likelihood, the complete data likelihood has nice hierarchical structure based on conditional independent assumptions in (2) and (4) and hence is relatively easy to analyze. However, pð Þ Ω;Z; θ; δ; QjY is still not in closed form and is thus difficult to deal with analytically. In this regard, simulation-based methods can be used to generate observations to carry out posterior analysis. In view of the multiple components involved, the usual independent sampling methods are not feasible. Note that, on the basis of complete data,

r¼1

ing commonly used conjugate types prior distributions (see for example [24]).

N <sup>p</sup> μ0r; Σ0<sup>r</sup>

� �, pð Þ¼ <sup>Q</sup> <sup>Y</sup>

<sup>p</sup>ð Þ¼ <sup>μ</sup> <sup>Y</sup> S

pð Þ¼ Λ; Ψ<sup>e</sup>

26 New Insights into Bayesian Inference

<sup>m</sup><sup>2</sup> <sup>r</sup>0r; <sup>R</sup>�<sup>1</sup> 0r

and 'W�<sup>1</sup>

r¼1 p μ<sup>r</sup> � �<sup>¼</sup> <sup>D</sup>Y S

Y S

r¼1

r¼1

δ∣δ<sup>0</sup> � DirS γ0;…; γ<sup>0</sup>

p Λ<sup>r</sup> ð Þ� jΨ<sup>e</sup><sup>r</sup> pð Þ¼ Ψ<sup>e</sup><sup>r</sup>

choices are discussed later when we present the empirical results.

fied in (8) yields the joint posterior distribution of f g θ; δ; Q given by

3.2. Gibbs sampling scheme and posterior analysis

Step f: Generate Q from pð Þ QjZ

Under mild conditions and similar to [4] (see also, for example, [26]), one can show that for sufficiently large <sup>b</sup>, say <sup>B</sup>0, the joint distribution of <sup>Ω</sup>ð Þ<sup>b</sup> ;Zð Þ<sup>b</sup> ; <sup>θ</sup>ð Þ<sup>b</sup> ; <sup>δ</sup>ð Þ<sup>b</sup> ; <sup>Q</sup>ð Þ<sup>b</sup> n o converges at an exponential rate to the desired posterior distribution pð Þ Ω;Z; θ; δ; QjY . Hence, pð Þ Ω;Z; θ; δ; QjY can be approximated by the empirical distribution of <sup>Ω</sup>ð Þ<sup>b</sup> ; <sup>Z</sup>ð Þ<sup>b</sup> ; <sup>θ</sup>ð Þ<sup>b</sup> ; <sup>δ</sup>ð Þ<sup>b</sup> ; <sup>Q</sup>ð Þ<sup>b</sup> n o : <sup>b</sup> <sup>¼</sup> <sup>B</sup>0<sup>þ</sup> 1, ⋯, B<sup>0</sup> þ Bg where B is chosen to give sufficient precision to the empirical distribution. The convergence of the Gibbs sampler can be monitored by the 'estimated potential scale reduction (EPSR)' values as suggested by Gelman and Rubin [27] or by plotting the traces of estimates against iterations under different starting values.

Simulated observations obtained from the posterior can be used for statistical inferences via straightforward analysis procedures. For brevity, let <sup>θ</sup>ð Þ<sup>b</sup> ; <sup>δ</sup>ð Þ<sup>b</sup> ; <sup>Q</sup>ð Þ<sup>b</sup> ; <sup>Ω</sup>ð Þ<sup>b</sup> ; <sup>Z</sup>ð Þ<sup>b</sup> n o be the random observations generated by the Gibbs sampler from pð Þ θ; δ; Q; Ω;ZjY . The joint Bayesian estimate of θ and Ω can be obtained easily via the corresponding sample means of the generated observations as follows:

$$
\widehat{\boldsymbol{\Theta}} = (\boldsymbol{B} - \boldsymbol{1})^{-1} \sum\_{b=1}^{B} \boldsymbol{\Theta}^{(b)}, \widehat{\boldsymbol{\Omega}} = (\boldsymbol{B} - \boldsymbol{1})^{-1} \sum\_{b=1}^{B} \boldsymbol{\Omega}^{(b)}, \widehat{\boldsymbol{\mathcal{Z}}} = (\boldsymbol{B} - \boldsymbol{1})^{-1} \sum\_{b=1}^{B} \boldsymbol{\mathcal{Z}}^{(b)}.\tag{10}
$$

Clearly, these Bayesian estimates are consistent estimates of the corresponding posterior means, see [26]. The consistent estimates of covariance matrix of estimates can be obtained as follows:

$$\widehat{\text{Cov}(\boldsymbol{\hat{\theta}}|\mathbf{Y})} = \left(\boldsymbol{B} - 1\right)^{-1} \sum\_{b=1}^{B} \left(\boldsymbol{\theta}^{(b)} - \widehat{\boldsymbol{\theta}}\right) \left(\boldsymbol{\theta}^{(b)} - \widehat{\boldsymbol{\theta}}\right)^{\mathsf{T}} \tag{11}$$

where the expectation is taken with respect to the posterior predictive distribution. Clearly, small values of the Lν-measure indicate that the model gives predictions close to the observed values, and the variability in the predictions is low as well. Hence, the model with the smallest Lν-measure is selected from a collection of competing models. It has been shown that Lνmeasure with ν ¼ 0:5 has nice theoretical properties [34]. Thus, this value of ν will be used in

Bayesian Analysis for Hidden Markov Factor Analysis Models

http://dx.doi.org/10.5772/intechopen.72837

29

In this section, a small portion of cocaine use data is analyzed to illustrate the practical value of the proposed methodology. The original data are collected from 321 cocaine use patients who were admitted in 1988–1989 to the West Los Angeles Veterans Affairs Medical Center. The whole data constitute 68 measurements of 17 items, which were recorded at four time points: at baseline, 1 year after the treatment, 2 years after the treatment, and 12 years after the treatment in 2002–2003. These measurements cover the information on the cocaine use, treatment received, psychological problems, social status, employments, and so on. As an illustration, three variables are selected to conduct data analysis: 'y<sup>1</sup> : days of cocaine use per month at intake (CC)', 'y<sup>2</sup> : times per month in formal treatment (FT)', and 'y<sup>3</sup> : months in formal treatment (MFT)', which, respectively, represent the severity of cocaine use and the levels of treatment received by a patient. Since these variables were measured in 0–120 points scale, to unify the scales, we take logarithms and standardize them. Among them, some measurements are missing. The missing proportion is about 8:4%. For brevity, we assume that the missing is missing at random [35]. A distinct characteristic underlying data are nonnormal and heavy tailed. Figure 1 gives the plots of histograms and the posterior predictive density estimates (see below) of logarithms of CC, FT, and MFT (with missing data removed) on four occasions. The histograms illustrate that the distributions of selected variables are deviated from normality in terms of multimodality and skewness. The skewness and kurtosis of CC on four occasions are f g �1:631; 5:031 , f g �0:847; 3:354 , 0f g :328; 1:476 , and f g �0:473; 2:467 , respectively. Data set also demonstrates dynamic characteristics. The distribution of CC, for instance, is skewed to the left at baseline and moves to the right gradually on the following two occasions and becomes right-skewed eventually. This implies that a single factor analysis model may not be appropri-

In this analysis, one of the objectives is to explore the effects of latent factors on the observed variables and assess the dependence among latent factors. Based on the nature of the problem under consideration, it is natural to group the single variable 'CC' to reflect one latent factor 'cocaine use' (η) and to group 'FT' and 'MFT' to represent another latent factor 'treatment' (ξ).

tion, Φ<sup>r</sup> and Λ<sup>r</sup> are restricted to be invariant across states but leave the baseline level μ<sup>r</sup> varying with r. Further, the following non-overlapped structure for factor loading matrix is

. To be convenient for interpretation and computa-

our empirical illustrations.

4. Cocaine use data analysis

ate to fit the data at each time point.

<sup>⊺</sup> and <sup>ω</sup>it <sup>¼</sup> <sup>η</sup>it; <sup>ξ</sup>it <sup>⊺</sup>

Let yit ¼ yit1; yit2; yit<sup>3</sup>

considered

$$\widehat{\text{Cov}(\widehat{\Omega}|\mathbf{Y})} = (B-1)^{-1} \sum\_{b=1}^{B} \left( \mathfrak{Q}^{(b)} - \widehat{\mathfrak{Q}} \right) \left( \mathfrak{Q}^{(b)} - \widehat{\mathfrak{Q}} \right)^{\mathsf{T}} \tag{12}$$

Hence, the standard error estimates can be obtained conveniently by the Gibbs sampler algorithm. Other statistical inferences about θ and Ω such as deriving the confidence intervals and statistics for hypothesis testing can be achieved based on the simulative observations as well (see, for example, [28, 29]).

One important statistical inference beyond estimation is on testing of various hypotheses about the model. In the field of hidden Markov modeling, determining the proper number of states may be the first step towards data analysis. Too many states may overfit the observations, meaning that it can fit the training data accurately but may not be a good model for underlying data-generating process. On the other hand, too few states may not be flexible enough to approximate the underlying model. In the context of Bayesian model selection, Bayes factor (BF, [30]) is a popular choice for model comparison. BF is defined as the ratio of the marginal likelihoods of data under two competing models. However, the computation of BF is difficult since it often involves the high-dimensional integrations. It has also been shown that BF is sensitive to the choice of priors and will become infeasible when improper priors are used. A simple and more convenient alternative is the Lν-measures [31–34] which is based on the posterior predictive density. It has been shown [34] that this approach is conceptually and computationally simple and is useful in model checking for wide varieties of complicated situations. Moreover, the required computation is a by-product of the common Bayesian simulation procedures such as the Gibbs sampler or its related algorithms. Specifically, let Yrep denotes future values of Y in a replicate experiment, that is, Yrep has the same sampling density as that of <sup>Y</sup>. The posterior predictive distribution <sup>p</sup> <sup>Y</sup>rep ð Þ <sup>j</sup><sup>Y</sup> is defined as

$$p(\mathbf{Y}^{\text{rep}}|\mathbf{Y}) = \int p(\mathbf{Y}^{\text{rep}}|\boldsymbol{\theta}, \boldsymbol{\delta}, \mathbf{Q}) p(\boldsymbol{\theta}, \boldsymbol{\delta}, \mathbf{Q}|\mathbf{Y}) d\boldsymbol{\theta} d\boldsymbol{\delta} d\mathbf{Q} \tag{13}$$

Naturally, if the posited model under consideration is the true model in the sense that from which the data are generated, then Yrep would behave like data Y and its squared biases and covariances should be small. With this notion in mind, Ibrahim, Chen, and Sinha [34] proposed an L statistics to assess the fitness of posited models to the data by weighting the squared biases and covariance, which can be interpreted as a trade-off between them. Here, we extend it to the multivariate longitudinal setting. Let <sup>Y</sup>rep <sup>¼</sup> <sup>y</sup>rep<sup>⊺</sup> <sup>1</sup> ; ⋯; y rep⊺ N � �<sup>⊺</sup> be a collection set of future responses in our proposal. For some 0 ≤ ν < 1, we consider the following multivariate version of Lν-measure:

$$L\_r(\mathbf{Y}) = \sum\_{i=1}^{N} \text{tr} \left[ \mathbb{C}ov(\mathbf{y}\_i^{\text{rep}} | \mathbf{Y}) \right] + \nu \sum\_{i=1}^{N} \text{tr} \left[ \left\{ \mathbb{E} (\mathbf{y}\_i^{\text{rep}} | \mathbf{Y}) - \mathbf{y}\_i \right\} \left\{ \mathbb{E} (\mathbf{y}\_i^{\text{rep}} | \mathbf{Y}) - \mathbf{y}\_i \right\} \right] \tag{14}$$

where the expectation is taken with respect to the posterior predictive distribution. Clearly, small values of the Lν-measure indicate that the model gives predictions close to the observed values, and the variability in the predictions is low as well. Hence, the model with the smallest Lν-measure is selected from a collection of competing models. It has been shown that Lνmeasure with ν ¼ 0:5 has nice theoretical properties [34]. Thus, this value of ν will be used in our empirical illustrations.

#### 4. Cocaine use data analysis

Covdðθ∣YÞ ¼ ð Þ <sup>B</sup> � <sup>1</sup> �<sup>1</sup><sup>X</sup>

CovdðΩ∣YÞ ¼ ð Þ <sup>B</sup> � <sup>1</sup> �<sup>1</sup><sup>X</sup>

well (see, for example, [28, 29]).

28 New Insights into Bayesian Inference

B

<sup>θ</sup>ð Þ<sup>b</sup> � <sup>θ</sup><sup>b</sup> � �

<sup>Ω</sup>ð Þ<sup>b</sup> � <sup>Ω</sup><sup>b</sup> � �

<sup>θ</sup>ð Þ<sup>b</sup> � <sup>θ</sup><sup>b</sup> � �<sup>⊺</sup>

<sup>Ω</sup>ð Þ<sup>b</sup> � <sup>c</sup><sup>Ω</sup> � �<sup>⊺</sup>

<sup>p</sup> <sup>Y</sup>rep ð Þ <sup>j</sup>θ; <sup>δ</sup>; <sup>Q</sup> <sup>p</sup>ð Þ <sup>θ</sup>; <sup>δ</sup>; <sup>Q</sup>j<sup>Y</sup> <sup>d</sup>θdδd<sup>Q</sup> (13)

<sup>1</sup> ; ⋯; y

rep⊺ N

� �<sup>⊺</sup> be a collection set of

<sup>i</sup> <sup>j</sup><sup>Y</sup> � � � <sup>y</sup><sup>i</sup> � � � �<sup>⊺</sup> (14)

(11)

(12)

b¼1

B

b¼1

Hence, the standard error estimates can be obtained conveniently by the Gibbs sampler algorithm. Other statistical inferences about θ and Ω such as deriving the confidence intervals and statistics for hypothesis testing can be achieved based on the simulative observations as

One important statistical inference beyond estimation is on testing of various hypotheses about the model. In the field of hidden Markov modeling, determining the proper number of states may be the first step towards data analysis. Too many states may overfit the observations, meaning that it can fit the training data accurately but may not be a good model for underlying data-generating process. On the other hand, too few states may not be flexible enough to approximate the underlying model. In the context of Bayesian model selection, Bayes factor (BF, [30]) is a popular choice for model comparison. BF is defined as the ratio of the marginal likelihoods of data under two competing models. However, the computation of BF is difficult since it often involves the high-dimensional integrations. It has also been shown that BF is sensitive to the choice of priors and will become infeasible when improper priors are used. A simple and more convenient alternative is the Lν-measures [31–34] which is based on the posterior predictive density. It has been shown [34] that this approach is conceptually and computationally simple and is useful in model checking for wide varieties of complicated situations. Moreover, the required computation is a by-product of the common Bayesian simulation procedures such as the Gibbs sampler or its related algorithms. Specifically, let Yrep denotes future values of Y in a replicate experiment, that is, Yrep has the same sampling

density as that of <sup>Y</sup>. The posterior predictive distribution <sup>p</sup> <sup>Y</sup>rep ð Þ <sup>j</sup><sup>Y</sup> is defined as

Naturally, if the posited model under consideration is the true model in the sense that from which the data are generated, then Yrep would behave like data Y and its squared biases and covariances should be small. With this notion in mind, Ibrahim, Chen, and Sinha [34] proposed an L statistics to assess the fitness of posited models to the data by weighting the squared biases and covariance, which can be interpreted as a trade-off between them. Here, we extend

future responses in our proposal. For some 0 ≤ ν < 1, we consider the following multivariate

tr E yrep

<sup>i</sup> <sup>j</sup><sup>Y</sup> � � � <sup>y</sup><sup>i</sup> � � E yrep

X N

i¼1

ð

<sup>p</sup> <sup>Y</sup>rep ð Þ¼ <sup>j</sup><sup>Y</sup>

it to the multivariate longitudinal setting. Let <sup>Y</sup>rep <sup>¼</sup> <sup>y</sup>rep<sup>⊺</sup>

tr Cov yrep

<sup>i</sup> <sup>j</sup><sup>Y</sup> � � � � <sup>þ</sup> <sup>ν</sup>

version of Lν-measure:

<sup>L</sup>νð Þ¼ <sup>Y</sup> <sup>X</sup>

N

i¼1

In this section, a small portion of cocaine use data is analyzed to illustrate the practical value of the proposed methodology. The original data are collected from 321 cocaine use patients who were admitted in 1988–1989 to the West Los Angeles Veterans Affairs Medical Center. The whole data constitute 68 measurements of 17 items, which were recorded at four time points: at baseline, 1 year after the treatment, 2 years after the treatment, and 12 years after the treatment in 2002–2003. These measurements cover the information on the cocaine use, treatment received, psychological problems, social status, employments, and so on. As an illustration, three variables are selected to conduct data analysis: 'y<sup>1</sup> : days of cocaine use per month at intake (CC)', 'y<sup>2</sup> : times per month in formal treatment (FT)', and 'y<sup>3</sup> : months in formal treatment (MFT)', which, respectively, represent the severity of cocaine use and the levels of treatment received by a patient. Since these variables were measured in 0–120 points scale, to unify the scales, we take logarithms and standardize them. Among them, some measurements are missing. The missing proportion is about 8:4%. For brevity, we assume that the missing is missing at random [35]. A distinct characteristic underlying data are nonnormal and heavy tailed. Figure 1 gives the plots of histograms and the posterior predictive density estimates (see below) of logarithms of CC, FT, and MFT (with missing data removed) on four occasions. The histograms illustrate that the distributions of selected variables are deviated from normality in terms of multimodality and skewness. The skewness and kurtosis of CC on four occasions are f g �1:631; 5:031 , f g �0:847; 3:354 , 0f g :328; 1:476 , and f g �0:473; 2:467 , respectively. Data set also demonstrates dynamic characteristics. The distribution of CC, for instance, is skewed to the left at baseline and moves to the right gradually on the following two occasions and becomes right-skewed eventually. This implies that a single factor analysis model may not be appropriate to fit the data at each time point.

In this analysis, one of the objectives is to explore the effects of latent factors on the observed variables and assess the dependence among latent factors. Based on the nature of the problem under consideration, it is natural to group the single variable 'CC' to reflect one latent factor 'cocaine use' (η) and to group 'FT' and 'MFT' to represent another latent factor 'treatment' (ξ). Let yit ¼ yit1; yit2; yit<sup>3</sup> <sup>⊺</sup> and <sup>ω</sup>it <sup>¼</sup> <sup>η</sup>it; <sup>ξ</sup>it <sup>⊺</sup> . To be convenient for interpretation and computation, Φ<sup>r</sup> and Λ<sup>r</sup> are restricted to be invariant across states but leave the baseline level μ<sup>r</sup> varying with r. Further, the following non-overlapped structure for factor loading matrix is considered

Figure 1. Plots of histograms and posterior predictive density estimates of 'CC', 'FT' and 'MFT' under FA model and hidden Markov CFA model with seven states in the cocaine use data analysis: the dashed lines denote CFA and the solid lines represent the hidden Markov FA.

$$
\Lambda^{\mathbb{T}} = \begin{pmatrix} 1^\* & 0^\* & 0^\* \\ 0^\* & 1^\* & \Lambda\_{32} \end{pmatrix} \tag{15}
$$

We implement the proposed algorithm given in Section 3 to conduct Bayesian analysis. Let Yobs be the collection of observed data and Ymis be the set of missing data. Due to the missing data, we need to draw Ymis from p Ymis ð Þ jΩ;Z; θ;Yobs in MCMC sampling. This can be implemented easily since conditioning on Ω, Z, and θ, p Ymis ð Þ jΩ;Z; θ;Yobs , independent of Yobs, has the normal distribution. Hence, drawing Ymis is rather straightforward and fast. To obtain some idea about the number of the Gibbs sampler iterations in getting convergence, we conducted a few test runs as a pilot study and found that in all these runs, the Gibbs sampler converged in about 1000–2000 iterations, where the EPSR values [27] are less than 1.2. So, for all cases under consideration, we collect 3000 random observations after initial 2000 iterations being removed

We calculate the values of L<sup>0</sup>:<sup>5</sup> under each fitting. For computation, we use simulation-based

from the posterior of Ω, Z, θ via MCMC sampling discussed before, we just draw one Yrep

obsjΩ; <sup>Z</sup>; <sup>θ</sup> � � for each <sup>Ω</sup>, <sup>Z</sup>, and <sup>θ</sup> and obtain <sup>M</sup> simulations in the end for <sup>Y</sup>rep

on these simulated observations, L<sup>ν</sup> measures can be estimated consistently via sample means. We draw 3000 observations after convergence of MCMC algorithm for calculating L<sup>0</sup>:<sup>5</sup> and the

Examination of Table 1 indicates that the proposed model with six to eight latent states seems to give better fits to the data. Furthermore, we calculate the posterior predictive density estimates of ytjð Þ t ¼ 1; ⋯; 4; j ¼ 1; ⋯; 3 under one state and seven states, respectively (see Figure 1). It can be seen clearly that our proposed method is successful in capturing the skewness and modes of data while factor analysis model fails. For the computation details, we choose 60–100 equally spaced grids in the interval min yobs,itj n o � <sup>1</sup>:0; max yobs,itj n o <sup>þ</sup> <sup>1</sup>:<sup>0</sup>

and collect 3000 simulated observations from the Gibbs sampler at each point after removing

Table 2 presents the summary of Bayesian estimates of unknown parameters and their standard errors using the formula given in (11) with S ¼ 1 (denoted by FA) and S ¼ 7 (denoted by HMFA). For comparison, maximum likelihood estimates of unknown parameters with their standard deviations under HMFA are also presented in Table 2. The maximum likelihood

Model L<sup>0</sup>:<sup>5</sup> Model L<sup>0</sup>:<sup>5</sup> S ¼ 1 2322.447 S ¼ 6 590.448 S ¼ 2 2107.514 S ¼ 7 572.172 S ¼ 3 1030.264 S ¼ 8 597.843 S ¼ 4 941.230 S ¼ 9 932.763 S ¼ 5 839.726 S ¼ 10 1030.264

Table 1. Summary of L<sup>0</sup>:<sup>5</sup> under competing models in the analysis of cocaine use data.

obsjYobs � � <sup>¼</sup> <sup>Ð</sup>

obs from <sup>p</sup> <sup>Y</sup>rep

p Yrep

obs is rather easy when Ω, Z, and θ are available. Given that we have M simulations

obsjYobs � �, where <sup>Y</sup>rep

obsjΩ;Z; <sup>θ</sup> � �pð Þ <sup>Ω</sup>;Z; <sup>θ</sup>jYobs <sup>d</sup>ΩdZdθ. Hence,

Bayesian Analysis for Hidden Markov Factor Analysis Models

http://dx.doi.org/10.5772/intechopen.72837

h i

obs is the hypothetical

obs

31

obs. Based

for posterior analysis.

drawing Yrep

from p Yrep

method by drawing predictive values Yrep

replication of Yobs. Note that p Yrep

results are reported in Table 1.

initial 2000 iteration as burn-ins.

where parameters with an asterisk are treated as fixed for identification. Note that fixing Λ<sup>11</sup> ¼ 1 indicates that η is identified with CC. This is similar to that in Λ22. Hence, in this case, Φ<sup>12</sup> in Φ measures the magnitude of dependence of ξ on η.

Data set is fitted to the proposed models with 10 different transition models: S ¼ 1, ⋯, 10. Although these state spaces are in nested forms, the corresponding models are not, since one cannot be reduced to another by constraining parameters in the interior of parameter space. This indicates that chi-square distribution may not be suitable for the classic likelihood ratio test statistic. We use L-measure to implement model selection. Obviously, if S<sup>1</sup> is taken, then the proposed model reduces to common factor analysis model (CFA, [18]).

The following inputs are taken for the super-parameters involved in the prior distributions (8): for <sup>r</sup> <sup>¼</sup> <sup>1</sup>, <sup>⋯</sup>, S, <sup>μ</sup>0rj <sup>¼</sup> min yitj n o <sup>þ</sup> <sup>r</sup>=S, <sup>Σ</sup>0<sup>r</sup> <sup>¼</sup> <sup>S</sup>yy=S, where <sup>S</sup>yy is the sample covariance matrix of data. The entries in <sup>Λ</sup><sup>0</sup> are set to be zeros, <sup>r</sup><sup>0</sup> <sup>¼</sup> <sup>10</sup>:0, <sup>R</sup>�<sup>1</sup> <sup>0</sup> ¼ 7:0 � I2, which leads to the mean of Φ equal to I2, H<sup>e</sup><sup>0</sup> ¼ I3, α<sup>e</sup>0<sup>j</sup> ¼ 9:0, β<sup>e</sup>0<sup>j</sup> ¼ 8:0, ν<sup>0</sup> ¼ γ<sup>0</sup> ¼ 0:1. Note that these values are the standard inputs in the latent variable analysis (see [24]). We also took other values for these inputs and found that the resulting estimates are scarcely affected.

We implement the proposed algorithm given in Section 3 to conduct Bayesian analysis. Let Yobs be the collection of observed data and Ymis be the set of missing data. Due to the missing data, we need to draw Ymis from p Ymis ð Þ jΩ;Z; θ;Yobs in MCMC sampling. This can be implemented easily since conditioning on Ω, Z, and θ, p Ymis ð Þ jΩ;Z; θ;Yobs , independent of Yobs, has the normal distribution. Hence, drawing Ymis is rather straightforward and fast. To obtain some idea about the number of the Gibbs sampler iterations in getting convergence, we conducted a few test runs as a pilot study and found that in all these runs, the Gibbs sampler converged in about 1000–2000 iterations, where the EPSR values [27] are less than 1.2. So, for all cases under consideration, we collect 3000 random observations after initial 2000 iterations being removed for posterior analysis.

We calculate the values of L<sup>0</sup>:<sup>5</sup> under each fitting. For computation, we use simulation-based method by drawing predictive values Yrep obs from <sup>p</sup> <sup>Y</sup>rep obsjYobs � �, where <sup>Y</sup>rep obs is the hypothetical replication of Yobs. Note that p Yrep obsjYobs � � <sup>¼</sup> <sup>Ð</sup> p Yrep obsjΩ;Z; <sup>θ</sup> � �pð Þ <sup>Ω</sup>;Z; <sup>θ</sup>jYobs <sup>d</sup>ΩdZdθ. Hence, drawing Yrep obs is rather easy when Ω, Z, and θ are available. Given that we have M simulations from the posterior of Ω, Z, θ via MCMC sampling discussed before, we just draw one Yrep obs from p Yrep obsjΩ; <sup>Z</sup>; <sup>θ</sup> � � for each <sup>Ω</sup>, <sup>Z</sup>, and <sup>θ</sup> and obtain <sup>M</sup> simulations in the end for <sup>Y</sup>rep obs. Based on these simulated observations, L<sup>ν</sup> measures can be estimated consistently via sample means. We draw 3000 observations after convergence of MCMC algorithm for calculating L<sup>0</sup>:<sup>5</sup> and the results are reported in Table 1.

Examination of Table 1 indicates that the proposed model with six to eight latent states seems to give better fits to the data. Furthermore, we calculate the posterior predictive density estimates of ytjð Þ t ¼ 1; ⋯; 4; j ¼ 1; ⋯; 3 under one state and seven states, respectively (see Figure 1). It can be seen clearly that our proposed method is successful in capturing the skewness and modes of data while factor analysis model fails. For the computation details, we choose 60–100 equally spaced grids in the interval min yobs,itj n o � <sup>1</sup>:0; max yobs,itj n o <sup>þ</sup> <sup>1</sup>:<sup>0</sup> h i and collect 3000 simulated observations from the Gibbs sampler at each point after removing initial 2000 iteration as burn-ins.

Table 2 presents the summary of Bayesian estimates of unknown parameters and their standard errors using the formula given in (11) with S ¼ 1 (denoted by FA) and S ¼ 7 (denoted by HMFA). For comparison, maximum likelihood estimates of unknown parameters with their standard deviations under HMFA are also presented in Table 2. The maximum likelihood


Table 1. Summary of L<sup>0</sup>:<sup>5</sup> under competing models in the analysis of cocaine use data.

<sup>Λ</sup><sup>⊺</sup> <sup>¼</sup> <sup>1</sup><sup>∗</sup> <sup>0</sup><sup>∗</sup> <sup>0</sup><sup>∗</sup>

Figure 1. Plots of histograms and posterior predictive density estimates of 'CC', 'FT' and 'MFT' under FA model and hidden Markov CFA model with seven states in the cocaine use data analysis: the dashed lines denote CFA and the solid

where parameters with an asterisk are treated as fixed for identification. Note that fixing Λ<sup>11</sup> ¼ 1 indicates that η is identified with CC. This is similar to that in Λ22. Hence, in this case,

Data set is fitted to the proposed models with 10 different transition models: S ¼ 1, ⋯, 10. Although these state spaces are in nested forms, the corresponding models are not, since one cannot be reduced to another by constraining parameters in the interior of parameter space. This indicates that chi-square distribution may not be suitable for the classic likelihood ratio test statistic. We use L-measure to implement model selection. Obviously, if S<sup>1</sup> is taken, then

The following inputs are taken for the super-parameters involved in the prior distributions (8):

of Φ equal to I2, H<sup>e</sup><sup>0</sup> ¼ I3, α<sup>e</sup>0<sup>j</sup> ¼ 9:0, β<sup>e</sup>0<sup>j</sup> ¼ 8:0, ν<sup>0</sup> ¼ γ<sup>0</sup> ¼ 0:1. Note that these values are the standard inputs in the latent variable analysis (see [24]). We also took other values for these

þ r=S, Σ0<sup>r</sup> ¼ Syy=S, where Syy is the sample covariance matrix

<sup>0</sup> ¼ 7:0 � I2, which leads to the mean

Φ<sup>12</sup> in Φ measures the magnitude of dependence of ξ on η.

n o

of data. The entries in <sup>Λ</sup><sup>0</sup> are set to be zeros, <sup>r</sup><sup>0</sup> <sup>¼</sup> <sup>10</sup>:0, <sup>R</sup>�<sup>1</sup>

inputs and found that the resulting estimates are scarcely affected.

for r ¼ 1, ⋯, S, μ0rj ¼ min yitj

lines represent the hidden Markov FA.

30 New Insights into Bayesian Inference

the proposed model reduces to common factor analysis model (CFA, [18]).

0<sup>∗</sup> 1<sup>∗</sup> Λ<sup>32</sup> � �

(15)


Moreover, we computed the posterior probabilities P zð Þ <sup>t</sup> ¼ rjYobs for r ¼ 1, ⋯, 7 and t ¼ 1, ⋯, 4 under S ¼ 7 based on 10,000 simulated observations drawn from pð Þ ZjYobs and found that the transition path corresponding to the maximum posterior probability is 7 ! 1 ! 1 ! 1. This implies that latent state of the patient being in is extremely serious at baseline and becomes moderate in the subsequent treatments. This also reflects a positive effect of intervention on the patient's latent state. Note that unlike the common Viterbi algorithm in exploring the optimal transition path of states in ML analysis, calculating posterior probability P zð Þ <sup>t</sup> ¼ rjYobs within Bayesian framework is a by-product of the estimation procedure. This voids the complex

Bayesian Analysis for Hidden Markov Factor Analysis Models

http://dx.doi.org/10.5772/intechopen.72837

33

This chapter reviews Bayesian inferences within a general framework and proposes a Bayesian procedure for analyzing hidden Markov factor analysis model under multivariate longitudinal setting. Compared to ML method, the pragmatic advantage of Bayesian framework is its flexibility and generality for coping with very complex problems. When good prior information can be available, results obtained from Bayesian method are more reliable and accurate than that under ML. With increased access to computation advances in simulation-based approaches, in particular the MCMC methodology, Bayesian inferences provide enormous

Although we concentrate our attention on applications of the hidden Markov factor analysis model, the methodology developed in this chapter can be extended to the case where the LVM is nonlinear. Another possible extension is to consider a dynamic LVM, wherein model parameters vary over time. These extensions will raise theoretical and computational challenges and

The authors are grateful to the editor's valuable suggestions and comments which have greatly improved the manuscript. Xia's work was fully supported by grant from National Nature Science Foundation of China (No. 11471161) and Tang's work was supported by

Let ω<sup>i</sup> denote the sequence of latent factors across T occasions for individual i. To draw state

National Science Fund for Distinguished Young Scholars of China (No. 11225103).

computation of marginal likelihood of the observed data and hence is very fast.

5. Discussion

scope for realistic statistical modeling.

certainly require further investigation.

A. Appendix. Full conditionals

variables Z from pð Þ Zjδ; Q; θ; Ω;Y , we first notice that

Acknowledgements

(a) pð Þ Zjθ; δ; Q; Ω;Y

Table 2. Summary statistics for Bayesian and ML estimates in the cocaine use data analysis.

analysis is conducted via MCECM algorithm [36] and the standard error estimates are calculated via Louis formula [37].

Based on Table 2, we can find the following facts: First, three estimates of Λ<sup>32</sup> give the positive effects of latent factor ξ on the 'MFT'. This is not surprising since ξ is related to the treatment level of a patient received. But there are obvious differences in magnitudes among the three methods. For FA and HMFA, the former gives Λb <sup>32</sup> ¼ 0:001 associated with standard deviation 0.014, while the latter gives Λb <sup>32</sup> ¼ 0:752 with standard deviation 0.045. This reflects that the heterogeneity of data affects the estimates Λb <sup>32</sup> seriously. Compared to the previous two methods, ML method produces that Λb <sup>32</sup> ¼ 0:196 with SD = 0.029, which are in between them. Second, the estimates of variance parameters Ψ<sup>e</sup><sup>j</sup> under S ¼ 1 are larger than those under S ¼ 7. This indicates that factor analysis model accommodates heavy tails of data at the expense of variance inflation. Further investigations on the estimates of Φjj under FA and HMFA also reveal the same phenomenon as that of Ψ<sup>e</sup>j. However, we observe that the ML estimate of Ψe3, the unique variance corresponding to the third item, is equal to 0.008 with SD = NAN, an illogical number, which is very close to an improper Heywood case. As pointed out by Lee [18], Heywood cases in the ML estimation can be avoided by imposing an inequality constraint on Ψ<sup>e</sup><sup>3</sup> with a penalty function. In the Bayesian approach, the conjugate prior distribution of Ψ�<sup>1</sup> <sup>e</sup><sup>3</sup> specified Ψ<sup>e</sup><sup>3</sup> in a region of positive values and hence has a similar effect as adding a penalty function. Hence, no Heywood cases are found in the Bayesian solution because of the penalty function induced by the prior distribution on Ψ�<sup>1</sup> <sup>e</sup><sup>3</sup> . Third, three estimates give the negative correlation between η and ξ, which is consistent with the fact that the improvement of treatment will decrease the intensity of cocaine use, thus leading to a decrease of cocaine use in days. ML estimates for Φjk are very close to those under HMFA. However, the estimate of Φ<sup>12</sup> under S ¼ 1 is �0.018, which is quite different from �0:182 for S ¼ 7. Furthermore, the coefficients of correlation of ξ and η under S ¼ 1 and S ¼ 7 are �0.0204 and �0.6612, respectively. The former suggests that ξ and η are approximately independent while the latter implies stronger dependence between them.

Moreover, we computed the posterior probabilities P zð Þ <sup>t</sup> ¼ rjYobs for r ¼ 1, ⋯, 7 and t ¼ 1, ⋯, 4 under S ¼ 7 based on 10,000 simulated observations drawn from pð Þ ZjYobs and found that the transition path corresponding to the maximum posterior probability is 7 ! 1 ! 1 ! 1. This implies that latent state of the patient being in is extremely serious at baseline and becomes moderate in the subsequent treatments. This also reflects a positive effect of intervention on the patient's latent state. Note that unlike the common Viterbi algorithm in exploring the optimal transition path of states in ML analysis, calculating posterior probability P zð Þ <sup>t</sup> ¼ rjYobs within Bayesian framework is a by-product of the estimation procedure. This voids the complex computation of marginal likelihood of the observed data and hence is very fast.

#### 5. Discussion

analysis is conducted via MCECM algorithm [36] and the standard error estimates are calcu-

Table 2. Summary statistics for Bayesian and ML estimates in the cocaine use data analysis.

FA ML HMFA

Para. Est. SD Est. SD Est. SD Λ<sup>32</sup> 0.001 0.014 0.196 0.029 0.752 0.045 Ψε<sup>1</sup> 1.443 0.315 0.559 0.297 0.432 0.049 Ψε<sup>2</sup> 0.439 0.056 0.204 0.039 0.339 0.034 Ψε<sup>3</sup> 0.305 0.030 0.008 NAN 0.025 0.001 Φ<sup>11</sup> 0.770 0.315 0.510 0.132 0.346 0.049 Φ<sup>12</sup> �0.018 0.018 �0.053 0.041 �0.182 0.052 Φ<sup>22</sup> 1.007 0.080 0.312 0.053 0.219 0.033

Based on Table 2, we can find the following facts: First, three estimates of Λ<sup>32</sup> give the positive effects of latent factor ξ on the 'MFT'. This is not surprising since ξ is related to the treatment level of a patient received. But there are obvious differences in magnitudes among the three methods. For FA and HMFA, the former gives Λb <sup>32</sup> ¼ 0:001 associated with standard deviation 0.014, while the latter gives Λb <sup>32</sup> ¼ 0:752 with standard deviation 0.045. This reflects that the heterogeneity of data affects the estimates Λb <sup>32</sup> seriously. Compared to the previous two methods, ML method produces that Λb <sup>32</sup> ¼ 0:196 with SD = 0.029, which are in between them. Second, the estimates of variance parameters Ψ<sup>e</sup><sup>j</sup> under S ¼ 1 are larger than those under S ¼ 7. This indicates that factor analysis model accommodates heavy tails of data at the expense of variance inflation. Further investigations on the estimates of Φjj under FA and HMFA also reveal the same phenomenon as that of Ψ<sup>e</sup>j. However, we observe that the ML estimate of Ψe3, the unique variance corresponding to the third item, is equal to 0.008 with SD = NAN, an illogical number, which is very close to an improper Heywood case. As pointed out by Lee [18], Heywood cases in the ML estimation can be avoided by imposing an inequality constraint on Ψ<sup>e</sup><sup>3</sup> with a penalty function. In the Bayesian approach, the conjugate prior

<sup>e</sup><sup>3</sup> specified Ψ<sup>e</sup><sup>3</sup> in a region of positive values and hence has a similar effect as

<sup>e</sup><sup>3</sup> . Third, three esti-

adding a penalty function. Hence, no Heywood cases are found in the Bayesian solution

mates give the negative correlation between η and ξ, which is consistent with the fact that the improvement of treatment will decrease the intensity of cocaine use, thus leading to a decrease of cocaine use in days. ML estimates for Φjk are very close to those under HMFA. However, the estimate of Φ<sup>12</sup> under S ¼ 1 is �0.018, which is quite different from �0:182 for S ¼ 7. Furthermore, the coefficients of correlation of ξ and η under S ¼ 1 and S ¼ 7 are �0.0204 and �0.6612, respectively. The former suggests that ξ and η are approximately independent while the latter

because of the penalty function induced by the prior distribution on Ψ�<sup>1</sup>

lated via Louis formula [37].

32 New Insights into Bayesian Inference

distribution of Ψ�<sup>1</sup>

implies stronger dependence between them.

This chapter reviews Bayesian inferences within a general framework and proposes a Bayesian procedure for analyzing hidden Markov factor analysis model under multivariate longitudinal setting. Compared to ML method, the pragmatic advantage of Bayesian framework is its flexibility and generality for coping with very complex problems. When good prior information can be available, results obtained from Bayesian method are more reliable and accurate than that under ML. With increased access to computation advances in simulation-based approaches, in particular the MCMC methodology, Bayesian inferences provide enormous scope for realistic statistical modeling.

Although we concentrate our attention on applications of the hidden Markov factor analysis model, the methodology developed in this chapter can be extended to the case where the LVM is nonlinear. Another possible extension is to consider a dynamic LVM, wherein model parameters vary over time. These extensions will raise theoretical and computational challenges and certainly require further investigation.

#### Acknowledgements

The authors are grateful to the editor's valuable suggestions and comments which have greatly improved the manuscript. Xia's work was fully supported by grant from National Nature Science Foundation of China (No. 11471161) and Tang's work was supported by National Science Fund for Distinguished Young Scholars of China (No. 11225103).

#### A. Appendix. Full conditionals

#### (a) pð Þ Zjθ; δ; Q; Ω;Y

Let ω<sup>i</sup> denote the sequence of latent factors across T occasions for individual i. To draw state variables Z from pð Þ Zjδ; Q; θ; Ω;Y , we first notice that

$$p(\mathbf{Z}|\boldsymbol{\theta}, \boldsymbol{\Omega}, \mathbf{Y}) = \prod\_{i=1}^{N} p\left(\mathbf{z}\_{i}|\boldsymbol{\omega}\_{i}, \boldsymbol{\delta}, \mathbf{Q}, \boldsymbol{\theta}, \mathbf{y}\_{i}\right) \tag{16}$$

p zitjzi,tþ1:<sup>T</sup>; ωi,1:<sup>T</sup>; yi, <sup>1</sup>:<sup>T</sup>

¼ p zit; ωi,1:<sup>t</sup>; yi, <sup>1</sup>:<sup>t</sup>

¼ p zit; ωi,1:<sup>t</sup>; yi, <sup>1</sup>:<sup>t</sup>

¼ p zit; ωi,1:<sup>t</sup>; yi, <sup>1</sup>:<sup>t</sup>

Algorithm:

(b) pð Þ ΩjZ; θ;Y

in which

To draw Ω, we first note that

<sup>p</sup> <sup>ω</sup>itjzit; <sup>θ</sup>; <sup>y</sup>it � �<sup>∝</sup> exp � <sup>1</sup>

� � <sup>∝</sup>p zit; <sup>z</sup>i,tþ1:<sup>T</sup>; <sup>ω</sup>i, <sup>1</sup>:<sup>T</sup>; <sup>y</sup>i,1:<sup>T</sup>

� �<sup>p</sup> <sup>z</sup>i,tþ1:<sup>T</sup>jzit; <sup>ω</sup>i, <sup>1</sup>:<sup>t</sup>; <sup>y</sup>i, <sup>1</sup>:<sup>t</sup>

� �<sup>p</sup> <sup>z</sup>i,tþ1:<sup>T</sup>; <sup>ω</sup>i,tþ1:<sup>T</sup>; <sup>y</sup>i,tþ1:<sup>T</sup>jzit; <sup>ω</sup>i,1:<sup>t</sup>; <sup>y</sup>i,1:<sup>t</sup>

� �<sup>p</sup> <sup>z</sup>i,tþ1:<sup>T</sup>jzit ð Þ; <sup>p</sup> <sup>ω</sup>i,tþ1:<sup>T</sup>; <sup>y</sup>i,tþ1:<sup>T</sup>jzi,tþ1:<sup>T</sup>

values due to the Markov Chain characteristics of <sup>y</sup>it; <sup>ω</sup>it; <sup>z</sup>it � �. This leads to

� � <sup>¼</sup> <sup>α</sup>i,t∣<sup>t</sup>ð Þ<sup>r</sup> qrzi,tþ<sup>1</sup>

i. running the recursion αit and stored the conditional probabilities αi,t∣<sup>t</sup> for t ¼ 1,…,T;

P zit ¼ rjωi,1:<sup>T</sup>; yi,1:<sup>T</sup>; zi,tþ1:<sup>T</sup>

N

Y T

t¼1

Ψ�<sup>1</sup>

with r ¼ zit. Hence, similar to that in drawing Z, updating Ω can be achieved by drawing ωit independently from <sup>p</sup> <sup>ω</sup>itjzit; <sup>θ</sup>; <sup>y</sup>it � � for <sup>i</sup> <sup>¼</sup> <sup>1</sup>, <sup>⋯</sup>, N and <sup>t</sup> <sup>¼</sup> <sup>1</sup>, <sup>⋯</sup>, T. It can be shown that

<sup>r</sup> <sup>ω</sup>it � �

<sup>D</sup><sup>N</sup> <sup>m</sup> <sup>m</sup><sup>b</sup> it; <sup>Σ</sup><sup>b</sup> <sup>r</sup>

i¼1

The last equation holds since given zit, yi,t:<sup>T</sup>; ωi,t:<sup>T</sup>; zi,tþ1:<sup>T</sup>

P zit ¼ rjzi,tþ1:<sup>T</sup>; yi,1:<sup>T</sup>; ωi,1:<sup>T</sup>;

Hence, FFBS algorithm for drawing z<sup>i</sup> is implemented by

ii. sampling ziT from the filtered conditional probability αi,T∣T;

iii. for t ¼ T � 1, ⋯, 1, sampling zit from the conditional probability

<sup>p</sup>ð Þ¼ <sup>Ω</sup>jZ; <sup>θ</sup>;<sup>Y</sup> <sup>Y</sup>

<sup>2</sup> <sup>y</sup>it � <sup>μ</sup><sup>r</sup> � <sup>Λ</sup><sup>r</sup> <sup>ω</sup>it � �<sup>⊺</sup>

<sup>p</sup> <sup>ω</sup>itjzit <sup>¼</sup> <sup>r</sup>; <sup>θ</sup>; <sup>y</sup>it � �<sup>¼</sup>

� �

� �

P S s¼1

� �<sup>p</sup> <sup>ω</sup>i,tþ1:<sup>T</sup>; <sup>y</sup>i,tþ1:<sup>T</sup>jzi,tþ1:<sup>T</sup>; zit; <sup>ω</sup>i, <sup>1</sup>:<sup>t</sup>; <sup>y</sup>i, <sup>1</sup>:<sup>t</sup>

<sup>α</sup>i,t∣<sup>t</sup>ð Þ<sup>s</sup> qszi,tþ<sup>1</sup>

� �

Bayesian Analysis for Hidden Markov Factor Analysis Models

http://dx.doi.org/10.5772/intechopen.72837

n o does not depend on the previous

� �: (24)

<sup>e</sup><sup>r</sup> <sup>y</sup>it � <sup>μ</sup>kr � <sup>Λ</sup>rωit � �� <sup>1</sup>

<sup>p</sup> <sup>ω</sup>itjzit; <sup>θ</sup>; <sup>y</sup>it � � (25)

2 ω⊺ itΦ�<sup>1</sup>

� �: (27)

(26)

t ¼ T � 1, ⋯, 1: (23)

(22)

35

� �

Hence, drawing Z can be accomplished via single-component method by drawing z<sup>i</sup> independently from p zijωi; δ; Q; θ; y<sup>i</sup> � �. Furthermore, notice that the sequences <sup>y</sup><sup>i</sup> ; ωi; z<sup>i</sup> � � are still the one-order Markov sequences. Hence, we can simulate z<sup>i</sup> through a well-known forward filtering-backward sampling algorithm (see, for example, [38]). For notation clarity, we suppress θ, δ, and Q in the following derivations.

Forward filtering-backward sampling (FFBS) consists of first forward filtering (FF) and then backward sampling (BS). The forward filtering step recursively updates

$$\alpha\_{i,t|t} = p\left(z\_{it}|\,\omega\_{i,1:t}, \mathbf{y}\_{i,1:t}\right) \quad t = 1, \ldots, T. \tag{17}$$

Here yi, <sup>1</sup>:<sup>t</sup> represents the set of observations of subject i up to time t and so are ωi,1:<sup>t</sup> and zi, <sup>1</sup>:<sup>t</sup>. The backward sampling is to draw z<sup>i</sup> from the joint distribution of the states given the data using

$$p\left(\mathbf{z}\_{i,1:T}|\,\boldsymbol{\omega}\_{i,1:T},\mathbf{y}\_{i,1:T}\right) = p\left(z\_{iT}|\,\boldsymbol{\omega}\_{i,1:T},\mathbf{y}\_{i,1:T}\right)\,\dots\,p\left(z\_{i1}|\mathbf{z}\_{i,2:T},\boldsymbol{\omega}\_{i,1:T},\mathbf{y}\_{i,1:T}\right).\tag{18}$$

That is, we first draw the last state given all the data and then work backwards in time drawing each state conditional on all the subsequent ones.

To implement forward filtering, let

$$\alpha\_{it}(r) = \mathbb{P}\left(\mathbf{y}\_{i,1:t}, \omega\_{i,1:t}, z\_{it} = r\right), \quad t = 1, \cdots, T \tag{19}$$

Obviously, <sup>α</sup>i1ð Þ¼ <sup>r</sup> <sup>δ</sup>rp <sup>y</sup>i1; <sup>ω</sup>i1jzi<sup>1</sup> <sup>¼</sup> <sup>r</sup> � �. Moreover, it can be shown that

$$a\_{it}(r) = \left(\sum\_{s=1}^{S} a\_{it-1}(s)q\_{sr}\right) p(\mathbf{y}\_{it}, \omega\_{it}|z\_{it} = r), \quad t = 2, \cdots, T \tag{20}$$

The outputs f g αit T <sup>t</sup>¼<sup>1</sup> from recursive Eq. (20) can be used to calculate the posterior probability

$$\alpha\_{i,t|t}(r) = \mathbb{P}\left(z\_{it} = r \mid \boldsymbol{\omega}\_{i,1:t}, \mathbf{y}\_{i,1:t}\right) = \frac{\alpha\_{it}(r)}{\sum\_{s=1}^{S} \alpha\_{it}(s)}\tag{21}$$

which leads to the forward filtering (FF) iteration.

The backward sampling step depends on the observation that

$$\begin{split} &p\left(z\_{it}|\mathbf{z}\_{i,t+1:T},\,\boldsymbol{\omega}\_{i,1:T},\,\mathbf{y}\_{i,1:T}\right) \otimes p\left(z\_{it},\mathbf{z}\_{i,t+1:T},\,\boldsymbol{\omega}\_{i,1:T},\,\mathbf{y}\_{i,1:T}\right) \\ &= p\left(z\_{it},\,\boldsymbol{\omega}\_{i,1:t},\,\mathbf{y}\_{i,1:t}\right) p\left(\mathbf{z}\_{i,t+1:T},\,\boldsymbol{\omega}\_{i,t+1:T},\,\mathbf{y}\_{i,t+1:T}|\boldsymbol{z}\_{it},\,\boldsymbol{\omega}\_{i,1:t},\,\mathbf{y}\_{i,1:t}\right) \\ &= p\left(z\_{it},\,\boldsymbol{\omega}\_{i,1:t},\,\mathbf{y}\_{i,1:t}\right) p\left(\mathbf{z}\_{i,t+1:T}|\boldsymbol{z}\_{it},\,\boldsymbol{\omega}\_{i,1:t},\,\mathbf{y}\_{i,1:t}|\,\mathbf{z}\_{i,t+1:T},\,\mathbf{z}\_{it},\,\boldsymbol{\omega}\_{i,1:t},\,\mathbf{y}\_{i,1:t}\right) \\ &= p\left(z\_{it},\,\boldsymbol{\omega}\_{i,1:t},\,\mathbf{y}\_{i,1:t}\right) p\left(\mathbf{z}\_{i,t+1:T}|\boldsymbol{z}\_{it},\,\mathbf{y}\_{i,t+1:T}|\,\mathbf{z}\_{i,t+1:T}\right) \end{split} \tag{22}$$

The last equation holds since given zit, yi,t:<sup>T</sup>; ωi,t:<sup>T</sup>; zi,tþ1:<sup>T</sup> n o does not depend on the previous values due to the Markov Chain characteristics of <sup>y</sup>it; <sup>ω</sup>it; <sup>z</sup>it � �. This leads to

$$\mathbb{P}\left(z\_{it} = r | \mathbf{z}\_{i, t+1:T}, \mathbf{y}\_{i, 1:T}, \omega\_{i, 1:T}\right) = \frac{\alpha\_{i, t \mid t}(r) q\_{r z\_{i, t+1}}}{\sum\_{s=1}^{S} \alpha\_{i, t \mid t}(s) q\_{s z\_{i, t+1}}} \qquad t = T - 1, \cdots, 1. \tag{23}$$

#### Hence, FFBS algorithm for drawing z<sup>i</sup> is implemented by

Algorithm:

<sup>p</sup>ð Þ¼ <sup>Z</sup>jθ; <sup>Ω</sup>; <sup>Y</sup> <sup>Y</sup>

backward sampling (BS). The forward filtering step recursively updates

αi,t∣<sup>t</sup> ¼ p zitjωi, <sup>1</sup>:<sup>t</sup>; yi, <sup>1</sup>:<sup>t</sup>

dently from p zijωi; δ; Q; θ; y<sup>i</sup>

34 New Insights into Bayesian Inference

using

press θ, δ, and Q in the following derivations.

p zi,1:<sup>T</sup>jωi,1:<sup>T</sup>; yi,1:<sup>T</sup> � �

To implement forward filtering, let

T

The outputs f g αit

drawing each state conditional on all the subsequent ones.

<sup>α</sup>itð Þ¼ <sup>r</sup> <sup>X</sup>

which leads to the forward filtering (FF) iteration.

αitð Þ¼ r P yi,1:<sup>t</sup>

S

s¼1

The backward sampling step depends on the observation that

Obviously, <sup>α</sup>i1ð Þ¼ <sup>r</sup> <sup>δ</sup>rp <sup>y</sup>i1; <sup>ω</sup>i1jzi<sup>1</sup> <sup>¼</sup> <sup>r</sup> � �. Moreover, it can be shown that

αit�<sup>1</sup>ð Þs qsr !

αi,t∣<sup>t</sup>ð Þ¼ r P zit ¼ rjωi, <sup>1</sup>:<sup>t</sup>; yi,1:<sup>t</sup>

N

i¼1

� �. Furthermore, notice that the sequences <sup>y</sup><sup>i</sup>

Hence, drawing Z can be accomplished via single-component method by drawing z<sup>i</sup> indepen-

one-order Markov sequences. Hence, we can simulate z<sup>i</sup> through a well-known forward filtering-backward sampling algorithm (see, for example, [38]). For notation clarity, we sup-

Forward filtering-backward sampling (FFBS) consists of first forward filtering (FF) and then

Here yi, <sup>1</sup>:<sup>t</sup> represents the set of observations of subject i up to time t and so are ωi,1:<sup>t</sup> and zi, <sup>1</sup>:<sup>t</sup>. The backward sampling is to draw z<sup>i</sup> from the joint distribution of the states given the data

That is, we first draw the last state given all the data and then work backwards in time

; ωi,1:<sup>t</sup>; zit ¼ r � �

� �

<sup>t</sup>¼<sup>1</sup> from recursive Eq. (20) can be used to calculate the posterior probability

<sup>¼</sup> <sup>α</sup>itð Þ<sup>r</sup> P S s¼1 αitð Þs

� �

¼ p ziTjωi, <sup>1</sup>:<sup>T</sup>; yi,1:<sup>T</sup> � �

p zijωi; δ; Q; θ; y<sup>i</sup>

� � (16)

, t ¼ 1, …, T: (17)

, t ¼ 1, ⋯, T (19)

<sup>p</sup> <sup>y</sup>it; <sup>ω</sup>itjzit <sup>¼</sup> <sup>r</sup> � �, t <sup>¼</sup> <sup>2</sup>, <sup>⋯</sup>, T (20)

… p zi1jzi,2:<sup>T</sup>; ωi,1:<sup>T</sup>; yi, <sup>1</sup>:<sup>T</sup> � �

; ωi; z<sup>i</sup>

� � are still the

: (18)

(21)


$$\mathbb{P}\left(z\_{it} = r \mid \boldsymbol{\omega}\_{i,1:T}, \mathbf{y}\_{i,1:T}, \mathbf{z}\_{i,t+1:T}\right). \tag{24}$$

(b) pð Þ ΩjZ; θ;Y

To draw Ω, we first note that

$$p(\mathbf{\Omega}|\mathbf{Z}, \mathbf{\theta}, \mathbf{Y}) = \prod\_{i=1}^{N} \prod\_{t=1}^{T} p(\omega\_{it}|z\_{it}, \mathbf{\theta}, \mathbf{y}\_{it}) \tag{25}$$

in which

$$p\left(\boldsymbol{\omega}\_{\text{it}}|\boldsymbol{z}\_{\text{it}},\boldsymbol{\theta},\mathbf{y}\_{\text{it}}\right) \propto \exp\left\{-\frac{1}{2}\left(\mathbf{y}\_{\text{it}}-\boldsymbol{\mu}\_{r}-\boldsymbol{\Lambda}\_{r}\boldsymbol{\omega}\_{\text{it}}\right)^{\mathsf{T}}\boldsymbol{\Psi}\_{\text{er}}^{-1}\left(\mathbf{y}\_{\text{it}}-\boldsymbol{\mu}\_{\text{hr}}-\boldsymbol{\Lambda}\_{r}\boldsymbol{\omega}\_{\text{it}}\right) - \frac{1}{2}\boldsymbol{\omega}\_{\text{it}}^{\mathsf{T}}\boldsymbol{\Phi}\_{r}^{-1}\boldsymbol{\omega}\_{\text{it}}\right\}\tag{26}$$

with r ¼ zit. Hence, similar to that in drawing Z, updating Ω can be achieved by drawing ωit independently from <sup>p</sup> <sup>ω</sup>itjzit; <sup>θ</sup>; <sup>y</sup>it � � for <sup>i</sup> <sup>¼</sup> <sup>1</sup>, <sup>⋯</sup>, N and <sup>t</sup> <sup>¼</sup> <sup>1</sup>, <sup>⋯</sup>, T. It can be shown that

$$p\left(\boldsymbol{\omega}\_{it}|\boldsymbol{\omega}\_{it}=\boldsymbol{r},\boldsymbol{\Theta},\mathbf{y}\_{it}\right) \stackrel{\scriptstyle D}{=} \mathcal{N}\_{\boldsymbol{m}}\left(\widehat{\mathbf{m}}\_{it},\widehat{\boldsymbol{\Sigma}}\_{\boldsymbol{r}}\right).\tag{27}$$

in which

$$
\hat{\mathbf{m}}\_{it} = \hat{\boldsymbol{\Sigma}}\_r \boldsymbol{\Lambda}\_r^\mathbf{I} \boldsymbol{\Psi}\_{er}^{-1} (\mathbf{y}\_{it} - \boldsymbol{\mu}\_r), \quad \hat{\boldsymbol{\Sigma}}\_r = \left(\boldsymbol{\Lambda}\_r^\mathbf{I} \boldsymbol{\Psi}\_{er}^{-1} \boldsymbol{\Lambda}\_r + \boldsymbol{\Phi}\_r^{-1}\right)^{-1}. \tag{28}
$$

in which y

(d) pð Þ ΦjΩ;Z

distribution W�<sup>1</sup>

in which <sup>b</sup>n1<sup>r</sup> <sup>¼</sup> <sup>P</sup>

in which <sup>b</sup>nrs <sup>¼</sup> <sup>P</sup>

Author details

Yemao Xia<sup>1</sup>

(e) pð Þ δjZ and (f) pð Þ QjZ

It can be verified directly that

N i¼1

N i¼1 P T t¼2

ð Þr

the jth column vector of Sð Þ<sup>r</sup>

From the prior distribution of Φ�<sup>1</sup>

ð Þ<sup>j</sup> is the <sup>j</sup>th element in <sup>Y</sup>ð Þ<sup>r</sup>

ωy.

<sup>p</sup> <sup>Φ</sup><sup>r</sup> ð Þ <sup>j</sup>Ω;<sup>Z</sup> <sup>∝</sup> <sup>Φ</sup><sup>r</sup> j j <sup>b</sup><sup>n</sup>

<sup>m</sup> <sup>b</sup>nð Þ<sup>r</sup> <sup>þ</sup> <sup>r</sup>0r; <sup>b</sup>nð Þ<sup>r</sup> <sup>S</sup>ð Þ<sup>r</sup>

, Sð Þ<sup>r</sup>

ð Þr þr0rþmþ1 � �=<sup>2</sup>

ωω <sup>þ</sup> <sup>R</sup>�<sup>1</sup> 0

I zf g <sup>i</sup><sup>1</sup> ¼ r . Similarly, it can be shown that

S

r¼1

S

r¼1

1 Department of Applied Mathematics, Nanjing Forestry University, Nanjing, China

3 School of Mathematics and Statistics, Yunnan University, Kunming, China

2 School of Economics, Lanzhou University of Finance and Economics, Lanzhou, China

<sup>p</sup>ð Þ¼ <sup>Q</sup>j<sup>Z</sup> <sup>Y</sup>

<sup>p</sup> <sup>Q</sup><sup>r</sup> ð Þ¼ <sup>j</sup><sup>Z</sup> <sup>D</sup><sup>Y</sup>

I zf g it�<sup>1</sup> ¼ r; zit ¼ s .

\*, Xiaoqian Zeng<sup>2</sup> and Niansheng Tang<sup>3</sup>

\*Address all correspondence to: ymxia@njfu.edu.cn

yy jð Þ;<sup>j</sup> is the <sup>j</sup>th main diagonal element of <sup>S</sup>ð Þ<sup>r</sup>

<sup>r</sup> and the distribution of Ω, it can be shown that

� �. It can be shown from exactly the same reasoning

pð Þ¼ δjZ p δ<sup>k</sup> ð Þ jZ and

<sup>D</sup>irS <sup>ν</sup><sup>0</sup> <sup>þ</sup> <sup>b</sup>nr<sup>1</sup> ð Þ ;…; <sup>ν</sup><sup>0</sup> <sup>þ</sup> <sup>b</sup>nrS :

trΦ�<sup>1</sup> bnð Þ<sup>r</sup> <sup>S</sup>ð Þ<sup>r</sup>

� � (34)

ωω <sup>þ</sup> <sup>R</sup>�<sup>1</sup> 0 � � o (33)

Bayesian Analysis for Hidden Markov Factor Analysis Models

http://dx.doi.org/10.5772/intechopen.72837

exp � <sup>1</sup> 2

where <sup>b</sup>nð Þ<sup>r</sup> and <sup>S</sup>ωω are given in (c). Hence, <sup>p</sup> <sup>Φ</sup><sup>r</sup> ð Þ <sup>j</sup>Ω;<sup>Z</sup> is the <sup>m</sup>-dimensional inverse Wishart

as before that drawing Φ can be achieved by drawing Φ<sup>r</sup> from p Φ<sup>r</sup> ð Þ jΩ; Z independently.

<sup>p</sup>ð Þ¼ <sup>δ</sup>j<sup>Z</sup> <sup>D</sup> <sup>D</sup>irS <sup>γ</sup><sup>0</sup> <sup>þ</sup> <sup>b</sup>n11;…; <sup>γ</sup><sup>0</sup> <sup>þ</sup> <sup>b</sup>n1<sup>S</sup>

p Q<sup>r</sup> ð Þ jZ ,

yy , and <sup>S</sup>ð Þ<sup>r</sup>

<sup>ω</sup>y jð Þ is

37

(35)

#### (c) pð Þ μ; Λ; ΨejZ; Ω;Y

To draw f g μ; Λ; Ψ<sup>e</sup> , we can first draw μ from pð Þ μjΛ; Ψe;Z; Ω;Y and then draw f g Λ; Ψ<sup>e</sup> from <sup>p</sup>ð Þ <sup>Λ</sup>; <sup>Ψ</sup>ejμ;Z; <sup>Ω</sup>; <sup>Y</sup> . For this end, let <sup>b</sup>nð Þ<sup>r</sup> <sup>¼</sup> # zf g it <sup>¼</sup> <sup>r</sup> be the size of cluster zit, and let witr ¼ I zf g it ¼ r . Denote

$$\begin{aligned} \overline{\mathbf{Y}}^{(r)} &= \sum\_{i=1}^{N} \sum\_{t=1}^{T} w\_{itr} \mathbf{y}\_{it} / \widehat{\mathbf{n}}^{(r)}, \quad \overline{\mathbf{D}}^{(r)} = \sum\_{i=1}^{N} \sum\_{t=1}^{T} w\_{itr} \mathbf{o}\_{it} / \widehat{\mathbf{n}}^{(r)}, \\ \mathbf{S}^{(r)}\_{yy} &= \sum\_{i=1}^{N} \sum\_{t=1}^{T} w\_{itr} \mathbf{y}\_{it} \mathbf{y}\_{it}^{\mathsf{I}} / \widehat{\mathbf{n}}^{(r)}, \quad \mathbf{S}^{(r)}\_{\omega y} = \sum\_{i=1}^{N} \sum\_{t=1}^{T} w\_{itr} \mathbf{o}\_{it} \mathbf{y}\_{it}^{\mathsf{I}} / \widehat{\mathbf{n}}^{(r)}. \end{aligned} \tag{29}$$

be the sample means and covariance matrices of Y and Ω within the rth cluster, respectively. By some algebra calculations, it can be shown that

$$\begin{aligned} p(\boldsymbol{\mu}|\mathbf{A},\boldsymbol{\Psi}\_{\epsilon},\mathbf{K},\mathbf{Z},\mathbf{Q},\mathbf{Y}) &= \prod\_{r=1}^{S} p(\boldsymbol{\mu}\_{r}|\mathbf{A}\_{r},\boldsymbol{\Psi}\_{\epsilon r},\mathbf{Z},\mathbf{Q},\mathbf{Y})\_{\epsilon} \quad \text{and} \\ p(\boldsymbol{\Lambda},\boldsymbol{\Psi}\_{\epsilon}|\boldsymbol{\mu},\mathbf{K},\mathbf{Z},\mathbf{Q},\mathbf{Y}) &= \prod\_{r=1}^{S} p(\boldsymbol{\Lambda}\_{r},\boldsymbol{\Psi}\_{\epsilon r}|\boldsymbol{\mu}\_{r},\mathbf{Q},\mathbf{Z},\mathbf{Y})\_{\epsilon} \end{aligned} \tag{30}$$

where

$$\begin{split} p\left(\mu\_{r}|\Lambda\_{r},\Psi\_{\text{er}},\mathbf{Z},\mathbf{\Omega},\mathbf{Y}\right) &= \mathcal{N}\_{p}\left(\hat{\mathbf{a}}\_{\mu r},\hat{\mathbf{Z}}\_{\mu r}\right), \\ p\left(\Lambda\_{r},\Psi\_{\text{er}}|\mu\_{r},\mathbf{Z},\mathbf{\Omega},\mathbf{Y}\right) &= \prod\_{j=1}^{p} p\left(\Psi\_{\text{er}j}|\mu\_{r},\mathbf{Z},\mathbf{\Omega},\mathbf{Y}\right) p\left(\Lambda\_{rj}|\Psi\_{\text{er}j}|\mu\_{r},\mathbf{Z},\mathbf{\Omega},\mathbf{Y}\right) \\ &\stackrel{\mathcal{D}}{=} \prod\_{j=1}^{p} \mathcal{G}a^{-1}\left(\hat{\boldsymbol{\alpha}}\_{\text{er}j},\hat{\boldsymbol{\beta}}\_{\text{er}j}\right) \mathcal{N}\_{m}\left(\hat{\mathbf{A}}\_{\text{er}j},\boldsymbol{\Psi}\_{\text{er}j}\hat{\mathbf{H}}\_{\text{r}}\right), \end{split} \tag{31}$$

with

<sup>b</sup>aμ<sup>r</sup> <sup>¼</sup> <sup>Σ</sup><sup>b</sup> <sup>μ</sup><sup>r</sup> <sup>Σ</sup>�<sup>1</sup> <sup>0</sup><sup>r</sup> <sup>μ</sup>0<sup>r</sup> <sup>þ</sup> <sup>b</sup><sup>n</sup> ð Þr Ψ�<sup>1</sup> <sup>e</sup><sup>r</sup> <sup>Y</sup>ð Þ<sup>r</sup> � <sup>Λ</sup>rΩð Þ<sup>r</sup> h i � � , <sup>Σ</sup><sup>b</sup> μr <sup>¼</sup> <sup>Σ</sup>�<sup>1</sup> <sup>0</sup><sup>r</sup> <sup>þ</sup> <sup>b</sup><sup>n</sup> ð Þr Ψ�<sup>1</sup> er � ��<sup>1</sup> , <sup>Λ</sup><sup>b</sup> rj <sup>¼</sup> <sup>H</sup><sup>b</sup> rj <sup>H</sup>�<sup>1</sup> <sup>e</sup>0rjΛ0rj <sup>þ</sup> <sup>b</sup><sup>n</sup> ð Þ<sup>r</sup> Sð Þ<sup>r</sup> <sup>ω</sup>y jð Þ � <sup>μ</sup>rjΩð Þ<sup>r</sup> h i � � , <sup>H</sup><sup>b</sup> �<sup>1</sup> rj <sup>¼</sup> <sup>H</sup>�<sup>1</sup> <sup>0</sup>rj <sup>þ</sup> <sup>b</sup><sup>n</sup> ð Þr Sð Þ<sup>r</sup> ωω, <sup>α</sup>b<sup>e</sup>rj <sup>¼</sup> <sup>α</sup><sup>e</sup>0rj <sup>þ</sup> <sup>b</sup>nð Þ<sup>r</sup> <sup>=</sup>2, <sup>b</sup>β<sup>e</sup>rj <sup>¼</sup> <sup>β</sup><sup>e</sup>0rj <sup>þ</sup> <sup>Λ</sup><sup>⊺</sup> 0rjH�<sup>1</sup> <sup>0</sup>rjΛ0rj <sup>þ</sup> <sup>b</sup><sup>n</sup> ð Þ<sup>r</sup> Sð Þ<sup>r</sup> yy jð Þ;<sup>j</sup> � <sup>2</sup>μrj<sup>y</sup> ð Þr <sup>j</sup> <sup>þ</sup> <sup>μ</sup><sup>2</sup> rj � � � <sup>c</sup><sup>Λ</sup> <sup>⊺</sup> rjH<sup>b</sup> �<sup>1</sup> rj <sup>Λ</sup><sup>b</sup> rj n o=2, (32)

in which y ð Þr ð Þ<sup>j</sup> is the <sup>j</sup>th element in <sup>Y</sup>ð Þ<sup>r</sup> , Sð Þ<sup>r</sup> yy jð Þ;<sup>j</sup> is the <sup>j</sup>th main diagonal element of <sup>S</sup>ð Þ<sup>r</sup> yy , and <sup>S</sup>ð Þ<sup>r</sup> <sup>ω</sup>y jð Þ is the jth column vector of Sð Þ<sup>r</sup> ωy.

### (d) pð Þ ΦjΩ;Z

in which

where

with

(c) pð Þ μ; Λ; ΨejZ; Ω;Y

36 New Insights into Bayesian Inference

witr ¼ I zf g it ¼ r . Denote

<sup>m</sup><sup>b</sup> it <sup>¼</sup> <sup>Σ</sup><sup>b</sup> <sup>r</sup>Λ<sup>⊺</sup>

<sup>Y</sup>ð Þ<sup>r</sup> <sup>¼</sup> <sup>X</sup> N

i¼1

By some algebra calculations, it can be shown that

Sð Þ<sup>r</sup> yy <sup>¼</sup> <sup>X</sup> N

i¼1

X T

t¼1

X T

t¼1

<sup>p</sup>ðμjΛ; <sup>Ψ</sup>e; <sup>K</sup>;Z; <sup>Ω</sup>;YÞ ¼ <sup>Y</sup>

<sup>p</sup>ðΛ; <sup>Ψ</sup>ejμ; <sup>K</sup>; <sup>Z</sup>; <sup>Ω</sup>;YÞ ¼ <sup>Y</sup>

<sup>p</sup> <sup>μ</sup>rjΛr; <sup>Ψ</sup><sup>e</sup>r;Z; <sup>Ω</sup>;<sup>Y</sup> � � <sup>¼</sup> <sup>N</sup> <sup>p</sup> <sup>b</sup>aμ<sup>r</sup>; <sup>Σ</sup><sup>b</sup> <sup>μ</sup><sup>r</sup>

<sup>p</sup> <sup>Λ</sup>r; <sup>Ψ</sup><sup>e</sup>rjμr;Z; <sup>Ω</sup>;<sup>Y</sup> � � <sup>¼</sup> <sup>Y</sup>

<sup>0</sup><sup>r</sup> <sup>μ</sup>0<sup>r</sup> <sup>þ</sup> <sup>b</sup><sup>n</sup>

<sup>e</sup>0rjΛ0rj <sup>þ</sup> <sup>b</sup><sup>n</sup>

0rjH�<sup>1</sup>

ð Þr Ψ�<sup>1</sup> <sup>e</sup><sup>r</sup> <sup>Y</sup>ð Þ<sup>r</sup> � <sup>Λ</sup>rΩð Þ<sup>r</sup> h i � �

<sup>b</sup>aμ<sup>r</sup> <sup>¼</sup> <sup>Σ</sup><sup>b</sup> <sup>μ</sup><sup>r</sup> <sup>Σ</sup>�<sup>1</sup>

<sup>Λ</sup><sup>b</sup> rj <sup>¼</sup> <sup>H</sup><sup>b</sup> rj <sup>H</sup>�<sup>1</sup>

<sup>α</sup>b<sup>e</sup>rj <sup>¼</sup> <sup>α</sup><sup>e</sup>0rj <sup>þ</sup> <sup>b</sup>nð Þ<sup>r</sup> <sup>=</sup>2,

<sup>b</sup>β<sup>e</sup>rj <sup>¼</sup> <sup>β</sup><sup>e</sup>0rj <sup>þ</sup> <sup>Λ</sup><sup>⊺</sup>

witryity<sup>⊺</sup>

rΨ�<sup>1</sup>

<sup>e</sup><sup>r</sup> yit � μ<sup>r</sup>

� �, <sup>Σ</sup><sup>b</sup> <sup>r</sup> <sup>¼</sup> <sup>Λ</sup><sup>⊺</sup>

To draw f g μ; Λ; Ψ<sup>e</sup> , we can first draw μ from pð Þ μjΛ; Ψe;Z; Ω;Y and then draw f g Λ; Ψ<sup>e</sup> from <sup>p</sup>ð Þ <sup>Λ</sup>; <sup>Ψ</sup>ejμ;Z; <sup>Ω</sup>; <sup>Y</sup> . For this end, let <sup>b</sup>nð Þ<sup>r</sup> <sup>¼</sup> # zf g it <sup>¼</sup> <sup>r</sup> be the size of cluster zit, and let

witryit=bnð Þ<sup>r</sup> , <sup>Ω</sup>ð Þ<sup>r</sup> <sup>¼</sup> <sup>X</sup>

be the sample means and covariance matrices of Y and Ω within the rth cluster, respectively.

S

r¼1

S

r¼1

� �

<sup>G</sup>a�<sup>1</sup> <sup>α</sup>b<sup>e</sup>rj; <sup>b</sup>β<sup>e</sup>rj � �

p

j¼1

¼ <sup>D</sup>Y p

ð Þ<sup>r</sup> Sð Þ<sup>r</sup> <sup>ω</sup>y jð Þ � <sup>μ</sup>rjΩð Þ<sup>r</sup> h i � �

<sup>0</sup>rjΛ0rj <sup>þ</sup> <sup>b</sup><sup>n</sup>

ð Þ<sup>r</sup> Sð Þ<sup>r</sup>

j¼1

,

, Σb μr

, <sup>H</sup><sup>b</sup> �<sup>1</sup>

ð Þr <sup>j</sup> <sup>þ</sup> <sup>μ</sup><sup>2</sup> rj

yy jð Þ;<sup>j</sup> � <sup>2</sup>μrj<sup>y</sup>

n o

� �

it=bnð Þ<sup>r</sup> , <sup>S</sup>ð Þ<sup>r</sup>

rΨ�<sup>1</sup>

N

X T

witr <sup>ω</sup>it=bnð Þ<sup>r</sup> ,

it=bnð Þ<sup>r</sup> ,

t¼1

<sup>p</sup> <sup>μ</sup>rjΛr; <sup>Ψ</sup><sup>e</sup>r; <sup>Z</sup>; <sup>Ω</sup>;<sup>Y</sup> � �, and

<sup>p</sup> <sup>Ψ</sup><sup>e</sup>rjjμr;Z; <sup>Ω</sup>; <sup>Y</sup> � �<sup>p</sup> <sup>Λ</sup>rjjΨ<sup>e</sup>rjjμr;Z; <sup>Ω</sup>;<sup>Y</sup> � �

N <sup>m</sup> Λb rj; Ψ<sup>e</sup>rjHb rj � �

<sup>¼</sup> <sup>Σ</sup>�<sup>1</sup>

rj <sup>¼</sup> <sup>H</sup>�<sup>1</sup>

<sup>0</sup><sup>r</sup> <sup>þ</sup> <sup>b</sup><sup>n</sup> ð Þr Ψ�<sup>1</sup> er

� <sup>c</sup><sup>Λ</sup> <sup>⊺</sup> rjH<sup>b</sup> �<sup>1</sup> rj Λb rj

<sup>0</sup>rj <sup>þ</sup> <sup>b</sup><sup>n</sup>

� ��<sup>1</sup>

ð Þr Sð Þ<sup>r</sup> ωω, ,

=2,

,

<sup>p</sup> <sup>Λ</sup>r; <sup>Ψ</sup><sup>e</sup>rjμr; <sup>Ω</sup>;Z;<sup>Y</sup> � �,

witr ωity<sup>⊺</sup>

i¼1

X T

t¼1

<sup>ω</sup><sup>y</sup> <sup>¼</sup> <sup>X</sup> N

i¼1

<sup>e</sup><sup>r</sup> <sup>Λ</sup><sup>r</sup> <sup>þ</sup> <sup>Φ</sup>�<sup>1</sup> r

: (28)

(29)

(30)

(31)

(32)

� ��<sup>1</sup>

From the prior distribution of Φ�<sup>1</sup> <sup>r</sup> and the distribution of Ω, it can be shown that

$$p(\boldsymbol{\Phi}\_r|\boldsymbol{\Omega}, \mathbf{Z}) \propto |\boldsymbol{\Phi}\_r|^{\binom{\gamma(r)}{n} + \rho\_0 + m + 1} \Big/ \frac{1}{2} \text{tr} \boldsymbol{\Phi}^{-1} \left\{ \hat{n}^{(r)} \mathbf{S}\_{\boldsymbol{\omega} \boldsymbol{\omega}}^{(r)} + \mathbf{R}\_0^{-1} \right\} \tag{33}$$

where <sup>b</sup>nð Þ<sup>r</sup> and <sup>S</sup>ωω are given in (c). Hence, <sup>p</sup> <sup>Φ</sup><sup>r</sup> ð Þ <sup>j</sup>Ω;<sup>Z</sup> is the <sup>m</sup>-dimensional inverse Wishart distribution W�<sup>1</sup> <sup>m</sup> <sup>b</sup>nð Þ<sup>r</sup> <sup>þ</sup> <sup>r</sup>0r; <sup>b</sup>nð Þ<sup>r</sup> <sup>S</sup>ð Þ<sup>r</sup> ωω <sup>þ</sup> <sup>R</sup>�<sup>1</sup> 0 � �. It can be shown from exactly the same reasoning as before that drawing Φ can be achieved by drawing Φ<sup>r</sup> from p Φ<sup>r</sup> ð Þ jΩ; Z independently.

$$\text{(e)}\ p(\delta|\mathbf{Z}) \text{ and (f)}\ p(\mathbf{Q}|\mathbf{Z})$$

It can be verified directly that

$$p(\boldsymbol{\delta} | \mathbf{Z}) = p(\boldsymbol{\delta}\_k | \mathbf{Z}) \quad \text{and}$$

$$p(\boldsymbol{\delta} | \mathbf{Z}) \stackrel{\mathcal{D}}{=} \mathcal{D} \dot{\boldsymbol{r}}\_S(\boldsymbol{\gamma}\_0 + \hat{\boldsymbol{n}}\_{11}, \dots, \boldsymbol{\gamma}\_0 + \hat{\boldsymbol{n}}\_{1S}) \tag{34}$$

in which <sup>b</sup>n1<sup>r</sup> <sup>¼</sup> <sup>P</sup> N i¼1 I zf g <sup>i</sup><sup>1</sup> ¼ r . Similarly, it can be shown that

$$\begin{aligned} p(\mathbf{Q}|\mathbf{Z}) &= \prod\_{r=1}^{S} p(\mathbf{Q}\_r|\mathbf{Z}), \\ p(\mathbf{Q}\_r|\mathbf{Z}) &\stackrel{\mathcal{D}}{=} \prod\_{r=1}^{S} \mathcal{D}ir\_S(\nu\_0 + \hat{\boldsymbol{n}}\_{r1}, \dots, \nu\_0 + \hat{\boldsymbol{n}}\_{rS}). \end{aligned} \tag{35}$$

in which <sup>b</sup>nrs <sup>¼</sup> <sup>P</sup> N i¼1 P T t¼2 I zf g it�<sup>1</sup> ¼ r; zit ¼ s .

#### Author details

Yemao Xia<sup>1</sup> \*, Xiaoqian Zeng<sup>2</sup> and Niansheng Tang<sup>3</sup>

\*Address all correspondence to: ymxia@njfu.edu.cn


#### References

[1] Berger JO. Statistical Decision Theory and Bayesian Analysis. New York: Springer-Verlag; 1985. DOI: 10.1007/978-1-4757-4286-2

[16] Altman RM. Mixed hidden Markov models: An extension of the hidden Markov mode to the longitudinal data setting. Journal of the American Statistical Association. 2007;102

Bayesian Analysis for Hidden Markov Factor Analysis Models

http://dx.doi.org/10.5772/intechopen.72837

39

[17] Maruotti A. Mixed hidden Markov models for longitudinal data: An overview. International Statistical Review. 2011;79(3):427-454. DOI: 10.1111/j.1751-5823.2011.00160.x

[18] Lee SY. Structural Equation Modelling: A Bayesian Approach. New York: John Wiley &

[19] Dunson DB. Dynamic latent trait models for multidimensional longitudinal data. Journal of the American Statistical Association. 2003;98(463):555-563. DOI: 10.1198/01621450300

[20] Zhang ZY, Hamaker EL, Nesselroade JR. Comparisons of four methods for estimating a dynamic factor model. Structural Equation Modeling: A Multidisciplinary Journal. 2008;

[21] Chow SY, Tang NS, Yuan Y, Song XY, Zhu HT. Bayesian estimation of semiparametric nonlinear dynamic factor analysis model using the Dirichlet prior. British Journal of

[22] Ebbes P, Grewal R, DeSarbo WS. Modeling strategic group dynamics: A hidden Markov

[23] Marruotti A. Robust fitting of hidden Markov regression models under a longitudinal

[24] Zhu HT, Lee SY. A Bayesian analysis of finite mixtures in the LISREL model. Psycho-

[25] Tanner MA, Wong WH. The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association. 1987;82(398):528-550.

[26] Geyer CJ. Practical Markov chain Monte Carlo. Statistical Science. 1992;7(4):473-511. DOI:

[27] Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences (with discussion). Statistical Science. 1992;7(4):457-511. DOI: 10.1214/ss/1177011136

[28] Besage J, Green P, Higdon D, Mengersen K. Bayesian computation and stochastic system.

[29] Gelman A. Inference and monitoring convergence. In: Gilks WR, Richardson S, Spiegelhalter DJ, editors. Markov Chain Monte Carlo in Practice. London: Chapman

[30] Kass RE, Raftery AE. Bayes factor (with discussion). Journal of the American Statistical

Association. 1995;90(430):773-795. DOI: 10.1080/01621459.1995.10476572

Statistical Science. 1995;10(1):3-66. DOI: 10.1214/ss/1177010123

data. Journal of Statistical Computation and Simulation. 2014;84:1728-1747

(477):201-210. DOI: 10.1198/016214506000001086

(3, 377):377-402. DOI: 10.1080/10705510802154281

Mathematical and Statistical Psychology. 2011;64:69-106

metrika. 2001;66(1):133-152. DOI: 10.1007/BF02295737

DOI: 10.1080/01621459.1987.10478458

10.1214/ss/1177011137

and Hall; 1996. pp. 131-140

approach. Quantitative Marketing and Economics. 2010;8:241-274

Sons; 2007

0000387


[16] Altman RM. Mixed hidden Markov models: An extension of the hidden Markov mode to the longitudinal data setting. Journal of the American Statistical Association. 2007;102 (477):201-210. DOI: 10.1198/016214506000001086

References

Hall Ltd; 1995

38 New Insights into Bayesian Inference

1990.10476213

1985. DOI: 10.1007/978-1-4757-4286-2

Wesley; 1973. DOI: 10.1002/9781118033197

(6):721-741. DOI: 10.1109/TPAMI.1984.4767596

1999. DOI: 10.1007/978-1-4757-3071-5

40(4):461-488. DOI: 10.1207/s15327906mbr4004\_4

152:259-275. DOI: 10.1016/j.jmva.2016.09.001

Processes. San Francisco, CA: Elsevier Scientific; 1973

B978-0-12-375686-2.00001-7

ties, Series A. 2015;30(1):17-30

[1] Berger JO. Statistical Decision Theory and Bayesian Analysis. New York: Springer-Verlag;

[2] Box GEP, Tiao GC. Bayesian Inference in Statistical Analysis. Reading, MA: Addison-

[3] Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. London: Chapman &

[4] Geman S, Geman D. Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1984;

[5] Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association. 1990;85:398-409. DOI: 10.1080/01621459.

[6] Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equations of state calculations by fast computing machine. Journal of Chemical Physics. 1953;21:1087-1091

[7] Hastings WK. Monte Carlo sampling methods using Markov chains and their applica-

[8] Robert CR, Casella G. Monte Carlo Statistical Methods. New York, Inc.: Springer-Verlag;

[9] Ross SM. Simulations. Amsterdam: Academic Press/Elsevier, Inc.; 2013. DOI: 10.1016/

[10] Schmittmann VD, Dolan CV, Han LJ, van der Maas, Neale CM. Discrete latent Markov models for normally distributed response data. Multivariate Behavioral Research. 2005;

[11] Xia YM, Gou JW, Liu YA. Semi-parametric Bayesian analysis for factor analysis model mixed with hidden Markov model. Applied Mathematics A Journal of Chinese Universi-

[12] Song XY, Xia YM, Zhu HT. Hidden Markov latent variable models with multivariate

[13] Xia YM, Tang NS, Gou JW. Generalized linear latent model for multivariate longitudinal measurements mixed with hidden Markov model. Journal of Multivariate Analysis. 2017;

[14] Wiggings LM. Panel Analysis: Latent Probability Models for Attitude and Behavior

[15] Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE. 1989;77(2):257-284. DOI: 10.1109/5.18626

longitudinal data. Biometrics. 2017;73(1):313-323. DOI: 10.1111/biom.12536

tion. Biometrika. 1970;57(1):97-109. DOI: 10.1093/biomet/57.1.97


[31] Geisser S, Eddy W. A predictive approach to model selection. Journal of the American Statistical Association. 1979;74(365):1537-1160. DOI: 10.1080/01621459.1979.10481632

**Chapter 4**

Provisional chapter

**Dynamic Process Model Parameter Estimation by**

Dynamic Process Model Parameter Estimation by Global

DOI: 10.5772/intechopen.74635

Global system analysis (GSA) was applied to parameter estimation of dynamic process models. First, the posterior distribution of the model parameters was estimated by quasi-Monte Carlo (QMC) simulations or uncertainty analysis. The expected variance of the estimated parameters by GSA was in general smaller than those were obtained by local search for the maximum likelihood. Second, sensitivity analysis was performed as an alternative application of GSA for the same mathematical models and testing data. The total effect index should serve as a quantitative measure of the robustness of each estimated parameter. Two process models were studied to demonstrate effectiveness of the proposed methodology based on GSA: a bio-reactor and a catalytic reactor. Parallelised computation allowed for sampling as many as 500,000 combinations of the model param-

Keywords: parameter estimation, global system analysis, quasi-Monte Carlo, sensitivity

We try to interpret, understand, and extract knowledge by analysing the data obtained from measurements, experiments and recording the behaviour of various processes that we encoun-

Parameter estimation of dynamic process models is investigated in this study. To validate a mathematical process model, we need to match the model prediction with the measurement or

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

**Global System Analysis**

http://dx.doi.org/10.5772/intechopen.74635

eters in reasonable amount of time.

analysis, high-performance computation

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

Shigeru Kashiwaya

Shigeru Kashiwaya

Abstract

1. Introduction

the experiment data.

ter in the industries and real life.

System Analysis


#### **Dynamic Process Model Parameter Estimation by Global System Analysis** Dynamic Process Model Parameter Estimation by Global System Analysis

DOI: 10.5772/intechopen.74635

Shigeru Kashiwaya Shigeru Kashiwaya

[31] Geisser S, Eddy W. A predictive approach to model selection. Journal of the American Statistical Association. 1979;74(365):1537-1160. DOI: 10.1080/01621459.1979.10481632 [32] Laud PW, Ibrahim JG. Predictive model selection. Journal of the Royal Statistical Society,

[33] Gelfand AE, Ghosh SK. Model choice: A minimum posterior predictive loss approach.

[34] Ibrahim JG, Chen MH, Sinha D. Criterion based methods for Bayesian model assessment.

[35] Little RJA, Rubin DB. Statistical Analysis with Missing Data. New York: Wiley; 1987.

[36] Wei GCG, Tanner MAA. Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms. Journal of the American Statistical Association. 1990;

[37] Louis TA. Finding the observed information matrix when using the EM algorithm. Jour-

[38] Cap.ṕe O, Moulines E, Rydén T. Inference in Hidden Markov Models. New York: Springer

Series B. 1995;57(1):247-262. DOI: 10.2307/2346098

Statistica Sinica. 2001;11:419-443

DOI: 10.1002/9781119013563

85:699-704

40 New Insights into Bayesian Inference

Verlag; 2005

Biometrika. 1998;85(1):1C13. DOI: 10.1093/biomet/85.1.1

nal of the Royal Statistical Society, Series B. 1982;44:226-233

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.74635

#### Abstract

Global system analysis (GSA) was applied to parameter estimation of dynamic process models. First, the posterior distribution of the model parameters was estimated by quasi-Monte Carlo (QMC) simulations or uncertainty analysis. The expected variance of the estimated parameters by GSA was in general smaller than those were obtained by local search for the maximum likelihood. Second, sensitivity analysis was performed as an alternative application of GSA for the same mathematical models and testing data. The total effect index should serve as a quantitative measure of the robustness of each estimated parameter. Two process models were studied to demonstrate effectiveness of the proposed methodology based on GSA: a bio-reactor and a catalytic reactor. Parallelised computation allowed for sampling as many as 500,000 combinations of the model parameters in reasonable amount of time.

Keywords: parameter estimation, global system analysis, quasi-Monte Carlo, sensitivity analysis, high-performance computation

#### 1. Introduction

We try to interpret, understand, and extract knowledge by analysing the data obtained from measurements, experiments and recording the behaviour of various processes that we encounter in the industries and real life.

Parameter estimation of dynamic process models is investigated in this study. To validate a mathematical process model, we need to match the model prediction with the measurement or the experiment data.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

We assume that we can construct a certain mathematical or a first principle model to represent a chemical or biochemical process by a set of algebraic (either linear or nonlinear) and differential (either ordinary or partial) equations, that is, differential algebraic equations (DAEs).

clear advantage over Markov Chain Monte Carlo (MCMC) sampling where we need to monitor the convergence to make sure the sample distribution approached the intended conver-

Dynamic Process Model Parameter Estimation by Global System Analysis

http://dx.doi.org/10.5772/intechopen.74635

43

GSA is applied to parameter estimation taking the likelihood function as the response to the factors, which are in fact model parameters that are considered stochastic variables in the Bayesian statistics framework. The parameter search space may be either non-informative or informative in GSA: low discrepancy sequence (LDS) such as Sobol sequence is considered non-informative; or normal (Gaussian) distribution may also be assumed to perform sampling in the multidimensional parameter space (more distribution functions such as gamma or

Posterior of the model parameters can be calculated simply by integrating the sampled likelihood (response) over the entire parameter space (factors). We can then calculate posterior mean

Sensitivity analysis can also be performed by sampling in the hyper-dimensional space just like in the case of uncertainty analysis. The factor sensitivity table gives information about the influence of each parameter on the response, which is the likelihood function of the model

If the process model contains some singularity or correlations among the parameters, the parameter estimation work gets very difficult, if not theoretically impossible. Although a number of different procedures have been proposed for re-parameterisation of the process models, since such manipulations are not essential in the physical significance of the processes of interest, the estimated parameters that are reformulated would not be very useful to under-

Since GSA is computational resource intensive, parallelisation should be considered to make the analysis tractable. Recent advance in the software technology made it possible to run relatively large GSA with as many as 500,000 samples executed even on a common laptop PC

GSA helps to investigate the global behaviour of any system models. In contrast to simulation, in which specific values are assigned to system inputs and the values of key outputs are reported, in GSA, it is possible to specify a range of input values of interest and therefore

inverse-gamma will be available in the GSA tool that was used for this study).

and variance as well as maximum a posterior (MAP) estimate.

stand the fundamental principles of the actual processes.

2. Parameter estimation by global system analysis

postulated for the investigation.

with a multi-core CPU

obtain a range of outputs.

• Scenario or parametric

• Uncertainty • Sensitivity

GSA can perform three types of analysis:

gence criterion.

The Bayesian statistics provides an ideal framework for the estimation of the mathematical parameters of predictive models, which is inherently stochastic in that all models make imperfect predictions because of the process variance, observation errors, model selection uncertainties, and so on.

The uncertainty of the estimated parameters must be evaluated in terms of accuracy or posterior distribution in the Bayesian statistics context.

Although maximum likelihood estimation (MLE) has been the standard tool for solving the parameter estimation problems, some shortcomings have been recognised in terms of its properties and the conventional solution algorithms for the MLE (see [1]):


Global system analysis (GSA) is designed to provide advanced features compared with the conventional analytical methods such as GBM and MLE. GSA is not precisely the same theoretical framework as Bayesian method; however, we can apply GSA as a version of computational Bayesian methodology to estimate model parameters.

First, GSA is a global search procedure as opposed to GBM such as Newton's method where we must start the search with some initial guess so that the solution can be trapped by a local optimal point that is not necessarily the global optimum.

Second, the Bayesian Cramer-Rao (BCR) bounds offer tighter lower bounds than the conventional Cramer-Rao lower bound that is based on Fisher information. GSA can calculate BCR with assumption of a prior that can be uniform (non-informative) or any probability distribution such as Gaussian.

Third, the posterior distribution of the estimated model parameters obtained by GSA correctly reflects the non-linearity of the process. Some process kinetics equations such as Arrhenius equation present structural correlation, for example, between pre-exponent factors and activation energies. Even in such a case, GSA can correctly capture the interdependency among the parameters so that the posterior distribution of the parameters accuracy is evaluated properly.

Since GSA algorithm is deterministic, one can decide the required number of samples prior to numerical simulation depending on the required accuracy or convergence. The computation can also be parallelised so that the actual CPU time can be significantly accelerated. This is a clear advantage over Markov Chain Monte Carlo (MCMC) sampling where we need to monitor the convergence to make sure the sample distribution approached the intended convergence criterion.

GSA is applied to parameter estimation taking the likelihood function as the response to the factors, which are in fact model parameters that are considered stochastic variables in the Bayesian statistics framework. The parameter search space may be either non-informative or informative in GSA: low discrepancy sequence (LDS) such as Sobol sequence is considered non-informative; or normal (Gaussian) distribution may also be assumed to perform sampling in the multidimensional parameter space (more distribution functions such as gamma or inverse-gamma will be available in the GSA tool that was used for this study).

Posterior of the model parameters can be calculated simply by integrating the sampled likelihood (response) over the entire parameter space (factors). We can then calculate posterior mean and variance as well as maximum a posterior (MAP) estimate.

Sensitivity analysis can also be performed by sampling in the hyper-dimensional space just like in the case of uncertainty analysis. The factor sensitivity table gives information about the influence of each parameter on the response, which is the likelihood function of the model postulated for the investigation.

If the process model contains some singularity or correlations among the parameters, the parameter estimation work gets very difficult, if not theoretically impossible. Although a number of different procedures have been proposed for re-parameterisation of the process models, since such manipulations are not essential in the physical significance of the processes of interest, the estimated parameters that are reformulated would not be very useful to understand the fundamental principles of the actual processes.

Since GSA is computational resource intensive, parallelisation should be considered to make the analysis tractable. Recent advance in the software technology made it possible to run relatively large GSA with as many as 500,000 samples executed even on a common laptop PC with a multi-core CPU

### 2. Parameter estimation by global system analysis

GSA helps to investigate the global behaviour of any system models. In contrast to simulation, in which specific values are assigned to system inputs and the values of key outputs are reported, in GSA, it is possible to specify a range of input values of interest and therefore obtain a range of outputs.

GSA can perform three types of analysis:


We assume that we can construct a certain mathematical or a first principle model to represent a chemical or biochemical process by a set of algebraic (either linear or nonlinear) and differential (either ordinary or partial) equations, that is, differential algebraic equations (DAEs).

The Bayesian statistics provides an ideal framework for the estimation of the mathematical parameters of predictive models, which is inherently stochastic in that all models make imperfect predictions because of the process variance, observation errors, model selection uncer-

The uncertainty of the estimated parameters must be evaluated in terms of accuracy or posterior

Although maximum likelihood estimation (MLE) has been the standard tool for solving the parameter estimation problems, some shortcomings have been recognised in terms of its

1. Gradient-based methods (GBM) may miss the global optimum or true MLE because of the

2. The confidence region for the parameters can be very large when the curvature of the likelihood function at an optimal point is extrapolated without knowing the global distri-

3. If the model equations contain singularities or structural redundancy, correlations among

Global system analysis (GSA) is designed to provide advanced features compared with the conventional analytical methods such as GBM and MLE. GSA is not precisely the same theoretical framework as Bayesian method; however, we can apply GSA as a version of computa-

First, GSA is a global search procedure as opposed to GBM such as Newton's method where we must start the search with some initial guess so that the solution can be trapped by a local

Second, the Bayesian Cramer-Rao (BCR) bounds offer tighter lower bounds than the conventional Cramer-Rao lower bound that is based on Fisher information. GSA can calculate BCR with assumption of a prior that can be uniform (non-informative) or any probability distribu-

Third, the posterior distribution of the estimated model parameters obtained by GSA correctly reflects the non-linearity of the process. Some process kinetics equations such as Arrhenius equation present structural correlation, for example, between pre-exponent factors and activation energies. Even in such a case, GSA can correctly capture the interdependency among the parameters so that the posterior distribution of the parameters accuracy is evaluated properly. Since GSA algorithm is deterministic, one can decide the required number of samples prior to numerical simulation depending on the required accuracy or convergence. The computation can also be parallelised so that the actual CPU time can be significantly accelerated. This is a

properties and the conventional solution algorithms for the MLE (see [1]):

nature of such solvers that are based on local search mechanisms.

bution of the likelihood function around the point estimate.

the estimated parameter may be observed.

tional Bayesian methodology to estimate model parameters.

optimal point that is not necessarily the global optimum.

tainties, and so on.

42 New Insights into Bayesian Inference

tion such as Gaussian.

distribution in the Bayesian statistics context.

• Sensitivity

The purpose of each type of analysis differs widely as follows:


When the likelihood function is taken as the objective or the only output from the system whereas the model parameters are given as inputs, GSA can be utilised as a global search and uncertainty evaluation for the model parameter estimation. The uncertainty analysis was used to identify the parameter point estimates and confidence intervals. In addition, the sensitivity analysis was performed to quantify the influence of each parameter on the output or prediction of the process models that are investigated.

#### 2.1. Parameter estimation

Bayesian approach to the model parameter estimation can be presented as Eq. 1.

$$
\pi(\boldsymbol{\Theta}|\mathbf{y}) \propto L(\mathbf{y}|\boldsymbol{\Theta})p(\boldsymbol{\Theta}) \tag{1}
$$

<sup>f</sup> <sup>b</sup>yijkjθ; <sup>σ</sup><sup>2</sup> ijk � � <sup>¼</sup>

> 2πσ<sup>2</sup> ijk � ��1=<sup>2</sup>

LLð Þ�� θ log Lð Þ θ

4

i¼1

NE

Y NVi

NM Y ij

k¼1

2 6 4

j¼1

NM X ij

k¼1

i¼1

X NVi

j¼1

ijkÞ that is a function of parameters θ θ1; ⋯; θ<sup>q</sup>

¼ � log <sup>Y</sup>

þ 1 2 X NE

<sup>j</sup>θ; <sup>σ</sup><sup>2</sup>

where <sup>f</sup> <sup>b</sup>yijkjθ; <sup>σ</sup><sup>2</sup>

and variance σ<sup>2</sup>

ijk; g<sup>j</sup>

experiment i; θ ¼ θ θ1; ⋯; θ<sup>q</sup>

2.2. Global system analysis

estimation of its mean value

sured value of variable j in experiment i; ε<sup>2</sup>

measurement of variable j in experiment i.

exp �

The log-likelihood LLð Þ <sup>θ</sup> is defined as the logarithm of likelihood <sup>L</sup>ð Þ� <sup>θ</sup> <sup>L</sup>ð Þ¼ <sup>y</sup>j<sup>θ</sup> <sup>Q</sup>

� �.

<sup>f</sup> <sup>b</sup>yijkjθ; <sup>σ</sup><sup>2</sup> ijk � � <sup>2</sup>

> log σ<sup>2</sup> ijk � � <sup>þ</sup>

0 B@

<sup>b</sup>yijk � <sup>g</sup><sup>j</sup> <sup>x</sup><sup>i</sup> ð Þ ; <sup>θ</sup> � �<sup>2</sup>

1 CA

Dynamic Process Model Parameter Estimation by Global System Analysis

http://dx.doi.org/10.5772/intechopen.74635

(4)

45

ijk <sup>f</sup> <sup>b</sup>yijk �

(5)

2σ<sup>2</sup> ijk

3 <sup>5</sup> <sup>¼</sup> <sup>n</sup>

ijk � �, likelihood of measurement <sup>b</sup>yijk in normal (Gaussian) error with mean <sup>θ</sup>

� log Lð Þ yjθ , logarithmic likelihood function; n, number of measurements taken during all experiments; NE, number of experiments performed; NMij, number of measurements of the jth variable in experiment i; NVi, number of variables measured in the i-th experiment; q, number of model parameters; <sup>x</sup>i, operating conditions of the i-th experiment; <sup>b</sup>yijk, k-th mea-

� �, model parameters to be estimated; and σ<sup>2</sup>

GSA is based on drawing samples from the system under study; that is, taking different values for each factor and calculating the corresponding response values. One of the features and basic principles underlying GSA is that extracting random samples from a system permits the

Considering the atypical surface shown in Figure 1 below on the left, Figure 2 on the right indicates what its average value will be for a number of random sample points. The dispersion of the values is due to the fact that this process is repeated 100 times for each number of samples, each time generating a different random set of sample points. This is indicative of

The MLE is defined by the parameter that minimises the log-likelihood function:

how accurate the results will be for different numbers of samples

<sup>2</sup> log 2ð Þ <sup>π</sup>

3 7 5

ijk, variance of the k-th measurement of variable j in

<sup>θ</sup>MLE <sup>¼</sup> arg min<sup>θ</sup> ð Þ LLð Þ <sup>θ</sup> (6)

ijk, variance of the k-th

<sup>b</sup>yijk � <sup>g</sup><sup>j</sup> <sup>x</sup><sup>i</sup> ð Þ ; <sup>θ</sup> � �<sup>2</sup>

> σ2 ijk

ð Þ x<sup>i</sup> , simulated model response of variable j in experiment i; LLð Þ� θ

where pð Þ θ designates a prior distribution of the model parameters, Lð Þ yjθ is the likelihood with which one observes response <sup>y</sup> <sup>b</sup>yijk � �, and <sup>π</sup>ð Þ <sup>θ</sup>j<sup>y</sup> is the posterior.

When certain a priori knowledge is available about the process, it can be incorporated in the Bayesian framework as a prior. In this study, however, we assumed a flat or homogeneous prior over the parameter search space.

The MAP estimate is defined by the parameter that maximises the posterior probability given as Eq. 2.

$$\Theta\_{MAP} = \arg\max\_{\Theta} \left( \pi(\Theta|\mathbf{y}) \right) \tag{2}$$

The system responses are parameterised by a set of parameters in the following equation, where θ<sup>∗</sup> denotes true parameters that characterise the system. We assume additive errors that are i.i.d, that is, independent and identically distributed following a normal distribution.

$$\widehat{\boldsymbol{y}}\_{i\bar{\boldsymbol{\eta}}\boldsymbol{k}} = \mathbf{g}\_{\bar{\boldsymbol{\eta}}}(\mathbf{x}\_{i}; \boldsymbol{\Theta}^{\*}) + \boldsymbol{\varepsilon}\_{i\bar{\boldsymbol{\eta}}\boldsymbol{k}} \quad (i = 1, 2, \dots, N\boldsymbol{E}) \tag{3}$$

The probability <sup>f</sup> with which one observes measurement <sup>b</sup>yijk is formulated as Eq. 4.

$$\begin{aligned} \, \, \, f\left(\hat{\mathcal{Y}}\_{\vec{\eta}\vec{k}} | \Theta, \sigma\_{\vec{\eta}\vec{k}}^2\right) &= \\ \, \, \left(2\pi\sigma\_{\vec{\eta}\vec{k}}^2\right)^{-1/2} \exp\left(-\frac{\left(\hat{\mathcal{Y}}\_{\vec{\eta}\vec{k}} - \mathbf{g}\_{\vec{\eta}}(\mathbf{x}\_{\vec{\eta}}; \Theta)\right)^2}{2\sigma\_{\vec{\eta}\vec{k}}^2}\right) \end{aligned} \tag{4}$$

The log-likelihood LLð Þ <sup>θ</sup> is defined as the logarithm of likelihood <sup>L</sup>ð Þ� <sup>θ</sup> <sup>L</sup>ð Þ¼ <sup>y</sup>j<sup>θ</sup> <sup>Q</sup> ijk <sup>f</sup> <sup>b</sup>yijk � <sup>j</sup>θ; <sup>σ</sup><sup>2</sup> ijkÞ that is a function of parameters θ θ1; ⋯; θ<sup>q</sup> � �.

$$\begin{split} LL(\boldsymbol{\Theta}) &= -\log L(\boldsymbol{\Theta}) \\ &= -\log \left[ \prod\_{i=1}^{NEN\_iNN\_i} \prod\_{k=1}^{NM\_i} \prod\_{l=1}^{n} f\left(\hat{y}\_{ijk} | \boldsymbol{\Theta}, \sigma\_{ijk}^2\right) \right] = \frac{n}{2} \log \left(2\pi\right) \\ &+ \frac{1}{2} \sum\_{i=1}^{NE} \sum\_{j=1}^{NV\_iNM\_j} \left[ \log \left(\sigma\_{ijk}^2\right) + \frac{\left(\hat{y}\_{ijk} - \mathbf{g}\_j(\mathbf{x}\_i; \boldsymbol{\Theta})\right)^2}{\sigma\_{ijk}^2} \right] \end{split} \tag{5}$$

where <sup>f</sup> <sup>b</sup>yijkjθ; <sup>σ</sup><sup>2</sup> ijk � �, likelihood of measurement <sup>b</sup>yijk in normal (Gaussian) error with mean <sup>θ</sup> and variance σ<sup>2</sup> ijk; g<sup>j</sup> ð Þ x<sup>i</sup> , simulated model response of variable j in experiment i; LLð Þ� θ � log Lð Þ yjθ , logarithmic likelihood function; n, number of measurements taken during all experiments; NE, number of experiments performed; NMij, number of measurements of the jth variable in experiment i; NVi, number of variables measured in the i-th experiment; q, number of model parameters; <sup>x</sup>i, operating conditions of the i-th experiment; <sup>b</sup>yijk, k-th measured value of variable j in experiment i; ε<sup>2</sup> ijk, variance of the k-th measurement of variable j in experiment i; θ ¼ θ θ1; ⋯; θ<sup>q</sup> � �, model parameters to be estimated; and σ<sup>2</sup> ijk, variance of the k-th measurement of variable j in experiment i.

The MLE is defined by the parameter that minimises the log-likelihood function:

$$\Theta\_{MLE} = \arg\min\_{\Theta} \left( LL(\Theta) \right) \tag{6}$$

#### 2.2. Global system analysis

The purpose of each type of analysis differs widely as follows:

tion of the process models that are investigated.

with which one observes response <sup>y</sup> <sup>b</sup>yijk

prior over the parameter search space.

as Eq. 2.

2.1. Parameter estimation

response.

44 New Insights into Bayesian Inference

• Scenario or parametric sensitivity analysis is a series of simulations where the values of one or more factors are varied over a grid to investigate the resulting changes in one or more responses. It allows an assessment of the impact of different factors on each

• Uncertainty analysis propagates the uncertainty in the factors to the responses. The values

• Sensitivity analysis provides metrics which indicate how factors and their uncertainty influence responses. However, the values obtained do not represent actual sensitivities. When the likelihood function is taken as the objective or the only output from the system whereas the model parameters are given as inputs, GSA can be utilised as a global search and uncertainty evaluation for the model parameter estimation. The uncertainty analysis was used to identify the parameter point estimates and confidence intervals. In addition, the sensitivity analysis was performed to quantify the influence of each parameter on the output or predic-

returned represent the actual uncertainty associated with the responses.

Bayesian approach to the model parameter estimation can be presented as Eq. 1.

� �

where pð Þ θ designates a prior distribution of the model parameters, Lð Þ yjθ is the likelihood

When certain a priori knowledge is available about the process, it can be incorporated in the Bayesian framework as a prior. In this study, however, we assumed a flat or homogeneous

The MAP estimate is defined by the parameter that maximises the posterior probability given

The system responses are parameterised by a set of parameters in the following equation, where θ<sup>∗</sup> denotes true parameters that characterise the system. We assume additive errors that

are i.i.d, that is, independent and identically distributed following a normal distribution.

The probability <sup>f</sup> with which one observes measurement <sup>b</sup>yijk is formulated as Eq. 4.

, and πð Þ θjy is the posterior.

πð Þ θjy ∝Lð Þ yjθ pð Þ θ (1)

<sup>θ</sup>MAP <sup>¼</sup> arg max <sup>θ</sup> ð Þ <sup>π</sup>ð Þ <sup>θ</sup>j<sup>y</sup> (2)

<sup>b</sup>yijk <sup>¼</sup> <sup>g</sup><sup>j</sup> <sup>x</sup>i; <sup>θ</sup><sup>∗</sup> ð Þþ <sup>ε</sup>ijk ð Þ <sup>i</sup> <sup>¼</sup> <sup>1</sup>; <sup>2</sup>;…; NE (3)

GSA is based on drawing samples from the system under study; that is, taking different values for each factor and calculating the corresponding response values. One of the features and basic principles underlying GSA is that extracting random samples from a system permits the estimation of its mean value

Considering the atypical surface shown in Figure 1 below on the left, Figure 2 on the right indicates what its average value will be for a number of random sample points. The dispersion of the values is due to the fact that this process is repeated 100 times for each number of samples, each time generating a different random set of sample points. This is indicative of how accurate the results will be for different numbers of samples

Carlo method. The Monte Carlo method performs multiple model evaluations (simulations) with deterministic and/or probabilistic factors. The results of these evaluations are then used to determine the uncertainty in the responses. This method is appropriate for any type of model,

Dynamic Process Model Parameter Estimation by Global System Analysis

http://dx.doi.org/10.5772/intechopen.74635

47

Pseudo-random sampling generates a more realistic random sample, that is, resembling a sample drawn from a distribution. This method should be used to represent realistic sampling, for example, a clinical trial in which 200 people are sampled from the general population.

Quasi-random (Sobol) sampling (Figure 3) allows for better coverage of space compared to pseudo-random sampling (Figure 4). Sobol sampling is recommended unless a more realistic

regardless of complexity.

random behaviour is required.

Figure 3. Quasi-random (Sobol) sampling.

Figure 4. Pseudo-random sampling.

Figure 1. Response surface for two inputs.

Figure 2. Spread of averages for different number of samples.

It is clear that by increasing the number of points, values gradually become closer to the true mean value of the surface. Increasing the number of sample points will generally lead to more accurate results.

#### 2.3. Uncertainty analysis

An uncertainty analysis can be used to determine what effect the uncertainty in the factors has on the uncertainty in the responses. This type of analysis can be accomplished using the Monte Carlo method. The Monte Carlo method performs multiple model evaluations (simulations) with deterministic and/or probabilistic factors. The results of these evaluations are then used to determine the uncertainty in the responses. This method is appropriate for any type of model, regardless of complexity.

Pseudo-random sampling generates a more realistic random sample, that is, resembling a sample drawn from a distribution. This method should be used to represent realistic sampling, for example, a clinical trial in which 200 people are sampled from the general population.

Quasi-random (Sobol) sampling (Figure 3) allows for better coverage of space compared to pseudo-random sampling (Figure 4). Sobol sampling is recommended unless a more realistic random behaviour is required.

Figure 3. Quasi-random (Sobol) sampling.

Figure 4. Pseudo-random sampling.

It is clear that by increasing the number of points, values gradually become closer to the true mean value of the surface. Increasing the number of sample points will generally lead to more

An uncertainty analysis can be used to determine what effect the uncertainty in the factors has on the uncertainty in the responses. This type of analysis can be accomplished using the Monte

accurate results.

2.3. Uncertainty analysis

Figure 2. Spread of averages for different number of samples.

Figure 1. Response surface for two inputs.

46 New Insights into Bayesian Inference

#### 2.4. Sensitivity analysis

Sensitivity analysis can be used to determine which factor contributes the most to the uncertainty in a response. In GSA, the calculated sensitivity measures are sensitivity indices rather than mathematical sensitivities (derivatives). For example, consider the following equation

$$y = \mathbf{x}\_1^2 + \mathbf{3}\mathbf{x}\_2\tag{7}$$

For elementary effects, the sensitivity indices are calculated based on the following approximation.

<sup>Δ</sup> <sup>∀</sup>i∈f g <sup>1</sup>;…; <sup>n</sup> (12)

Dynamic Process Model Parameter Estimation by Global System Analysis

EEi,j (13)

, response of sample j; and

49

http://dx.doi.org/10.5772/intechopen.74635

ðu1;…; ui þ Δ; unÞ � yj u1;…; ui ð Þ ; un

<sup>μ</sup><sup>i</sup> <sup>¼</sup> <sup>1</sup> n Xn j¼1

The elementary effects method also calculates the standard deviation as follows.

1 n � 1

σ<sup>i</sup> ¼

where σ<sup>i</sup> is the standard deviation of elementary effect for factor i.

of the most influential to the least influential factor.

applied to the following numerical examples.

of variance-based sensitivity indices (see [2, 3])

any function.

Although this appears to be an approximation of the mean partial derivative, it is typically not an accurate estimate of the actual mean sensitivity. Among other differences, the elementary effect also considers the variance in each factor. For example, considering the following equation y = x1 + x2, the value of the mathematical sensitivity for both x1 and x2 is one, but if the

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

EEi,j � μ<sup>i</sup> � �<sup>2</sup>

vuut (14)

Xn j¼1

The advantage of elementary effects analysis is that it is a numerically efficient way to screen many factors at the expense of accuracy. It has been shown in practice that the relative value of the sensitivity is fairly accurate, so values sorted from high to low do typically follow the order

Variance-based methods are more accurate but more computationally demanding and require (many) more samples than the elementary effects method. The variance-based method was

The metrics used are based on variances and are indicative of what part of the variance, in the response, is attributed to the variance in each factor. Note that the relative contribution of the

This method distinguishes between first order and total effect indices. Generally, one can split

Variance-based sensitivity indices measure the influence of individual factors on responses. The variance-based sensitivity analysis method is based on the Saltelli's method [4], and the formulae used to estimate the sensitivity indices are those proposed in [5]. There are two types

factors to the variance in the response can be sorted differently for each response.

, mean elementary effect for factor i.

where EEi,j, elementary effect for factor i for sample j; ui, factor i; yj

EEi,j <sup>¼</sup> yj

The mean values are calculated as follows:

variance in x1 is larger than x2, then μ<sup>1</sup> > μ2.

where n, number of samples; and μ<sup>i</sup>

Δ, difference.

where all variables are algebraic and both x1 and x2 are assumed to have values between 0 and 10. Their mathematical sensitivities are given by:

$$\frac{\partial y}{\partial x\_1} = 2x\_1 \tag{8}$$

$$\frac{\partial y}{\partial x\_2} = 3 \tag{9}$$

where y, xi, algebraic variables; and <sup>∂</sup><sup>y</sup> ∂xi , partial derivative of y with respect to xi.

These mathematical sensitivities are informative at the point where they are computed, but are limited when the factors are uncertain, and the model is of unknown linearity, as they do not provide for an exploration of the rest of the input factor space. In contrast, GSA calculates sensitivity measures that are averages over the entire space of interest. Two different indices are available: elementary effects and variance-based indices, which provide relative metrics.

In both cases, indices for the first order and total effect are calculated. According to Sobol [2, 3], any function expressed either as follows:

$$y = f\left(u\_1, \ldots, u\_{n\_f}\right) \tag{10}$$

where f , function; nf , number of factors; ui, factor i; and y, response.

or as a combination of different functions:

$$y = f\_0 + \sum\_{i=1}^{n\_{\vec{f}}} f\_i(u\_i) + \sum\_{i$$

where f <sup>i</sup> , first-order function; f ij, second-order function; and f <sup>1</sup>,2,…,nf , nf th order function.

The first-order effect of factor ui is fi. In contrast, the total effect is the sum of all functions of all orders where ui appears as an argument

Elementary effects are an efficient method for larger, more complex models with a large number of factors. This method requires a limited number of samples. However, it is not as accurate as the variance-based method and should only be used to identify the most important factors.

For elementary effects, the sensitivity indices are calculated based on the following approximation.

$$EE\_{i,j} = \frac{y\_j(u\_1, \ldots, u\_i + \Delta, u\_n) - y\_j(u\_1, \ldots, u\_i, u\_n)}{\Delta} \quad \forall i \in \{1, \ldots, n\} \tag{12}$$

where EEi,j, elementary effect for factor i for sample j; ui, factor i; yj , response of sample j; and Δ, difference.

The mean values are calculated as follows:

2.4. Sensitivity analysis

48 New Insights into Bayesian Inference

10. Their mathematical sensitivities are given by:

where y, xi, algebraic variables; and <sup>∂</sup><sup>y</sup>

any function expressed either as follows:

or as a combination of different functions:

<sup>y</sup> <sup>¼</sup> <sup>f</sup> <sup>0</sup> <sup>þ</sup>Xnf

orders where ui appears as an argument

where f <sup>i</sup>

i¼1 f i ð Þþ ui

equation

Sensitivity analysis can be used to determine which factor contributes the most to the uncertainty in a response. In GSA, the calculated sensitivity measures are sensitivity indices rather than mathematical sensitivities (derivatives). For example, consider the following

where all variables are algebraic and both x1 and x2 are assumed to have values between 0 and

<sup>2</sup> <sup>þ</sup> <sup>3</sup>x<sup>2</sup> (7)

¼ 2x<sup>1</sup> (8)

¼ 3 (9)

u1; u2;…; unf � �

, nf th order function.

(10)

(11)

, partial derivative of y with respect to xi.

y ¼ x<sup>1</sup>

∂y ∂x<sup>1</sup>

> ∂y ∂x<sup>2</sup>

These mathematical sensitivities are informative at the point where they are computed, but are limited when the factors are uncertain, and the model is of unknown linearity, as they do not provide for an exploration of the rest of the input factor space. In contrast, GSA calculates sensitivity measures that are averages over the entire space of interest. Two different indices are available: elementary effects and variance-based indices, which provide relative metrics. In both cases, indices for the first order and total effect are calculated. According to Sobol [2, 3],

> y ¼ f u1;…; unf � �

∂xi

where f , function; nf , number of factors; ui, factor i; and y, response.

Xnf i<j

, first-order function; f ij, second-order function; and f <sup>1</sup>,2,…,nf

f ij ui; uj

The first-order effect of factor ui is fi. In contrast, the total effect is the sum of all functions of all

Elementary effects are an efficient method for larger, more complex models with a large number of factors. This method requires a limited number of samples. However, it is not as accurate as the variance-based method and should only be used to identify the most important factors.

� � <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> <sup>f</sup> <sup>1</sup>, <sup>2</sup>,…,nf

$$\mu\_i = \frac{1}{n} \sum\_{j=1}^n EE\_{i,j} \tag{13}$$

where n, number of samples; and μ<sup>i</sup> , mean elementary effect for factor i.

Although this appears to be an approximation of the mean partial derivative, it is typically not an accurate estimate of the actual mean sensitivity. Among other differences, the elementary effect also considers the variance in each factor. For example, considering the following equation y = x1 + x2, the value of the mathematical sensitivity for both x1 and x2 is one, but if the variance in x1 is larger than x2, then μ<sup>1</sup> > μ2.

The elementary effects method also calculates the standard deviation as follows.

$$\sigma\_{i} = \sqrt{\frac{1}{n-1} \sum\_{j=1}^{n} \left(EE\_{i,j} - \mu\_{i}\right)^{2}}\tag{14}$$

where σ<sup>i</sup> is the standard deviation of elementary effect for factor i.

The advantage of elementary effects analysis is that it is a numerically efficient way to screen many factors at the expense of accuracy. It has been shown in practice that the relative value of the sensitivity is fairly accurate, so values sorted from high to low do typically follow the order of the most influential to the least influential factor.

Variance-based methods are more accurate but more computationally demanding and require (many) more samples than the elementary effects method. The variance-based method was applied to the following numerical examples.

The metrics used are based on variances and are indicative of what part of the variance, in the response, is attributed to the variance in each factor. Note that the relative contribution of the factors to the variance in the response can be sorted differently for each response.

This method distinguishes between first order and total effect indices. Generally, one can split any function.

Variance-based sensitivity indices measure the influence of individual factors on responses. The variance-based sensitivity analysis method is based on the Saltelli's method [4], and the formulae used to estimate the sensitivity indices are those proposed in [5]. There are two types of variance-based sensitivity indices (see [2, 3])

• The first-order effect, that is, the direct effect of factor i on response y:

$$S\_i = \frac{\sigma\_{i,y}^2}{\sigma\_y^2} \tag{15}$$

3.1.1. Problem statement

Figure 5. Bioreactor.

Table 2. Experimental procedure.

Table 1. Additional information – 'known' parameters.

for biomass B and substrate S (Table 1).

In silico experimental data for an organism in a chemostat as shown in Figure 5 is presented. A

Dynamic Process Model Parameter Estimation by Global System Analysis

http://dx.doi.org/10.5772/intechopen.74635

51

After reaching a steady state, the flow rates qin, qout as well as the concentration of the substrate in the feed cin are changed. Measurements are available for three metabolites, M1, M2, and M3 that function as an enzyme E, representing a small biochemical network of the organism, and

Since number of such experiments that can be performed is rather limited in reality, GSA or a Bayesian approach is anticipated to perform better than MLE or minimum mean square error (MMSE) methods in that the latter tends to provide overfitting of the parameters in terms of

Time [hour] cin [g/litre] qin, qout [litre/h]

'Known' kinetics Parameter Value Unit Synthesis of M1 values r1max 2.4 <sup>10</sup><sup>4</sup> <sup>μ</sup>mol/g/h

Affinity M1 – E value KM1 12.2 μmol/g Degradation of M2 values r3max 3.0 <sup>10</sup><sup>6</sup> <sup>μ</sup>mol/g/h

KS 0.4437 μmol/g

KM2 10 μmol/g

0–20 2 0.25 20–30 2 0.35 30–60 0.5 0.35

Those parameters as listed in Table 2 are assumed known prior to the experiment.

mathematical model is set up based on a fictive network structure.

where Si, first-order effect index for factor i; σ<sup>2</sup> i,y, variance attributed to factor i on response y; and σ<sup>2</sup> <sup>y</sup>, variance observed for response y.

• The total effect, that is, the total effect of factor i on response, y:

$$\mathcal{S}\_{\boldsymbol{i},T} = \frac{\sigma\_{\boldsymbol{i},T,\boldsymbol{y}}^2}{\sigma\_{\boldsymbol{y}}^2} \tag{16}$$

where Si,T, total effect index for factor i; and σ<sup>2</sup> i,T, <sup>y</sup>, variance attributed to factor i on response y through all factors.

In this case, the influence of factor i through other variables is taken into consideration.

In variance-based sensitivity analysis, the sensitivity indices are based on variances. Note that the relationships below can be inferred from the definitions of the factors:

$$\sum\_{i \in \text{factors}} S\_{j,i} \quad \le \quad 1 \tag{17}$$

$$\mathbf{S}\_{\mathfrak{j},i,T} \quad \succeq \quad \mathbf{S}\_{\mathfrak{j},i} \tag{18}$$

$$\sum\_{i \in \text{factors}} S\_{j,i,T} \quad \ge \quad 1 \tag{19}$$

where Sj,i, first-order effect index for factor i in response j; and Sj,i,T, total effect index for factor i in response j.

It is clear that mathematical sensitivities and sensitivity indices have very different meanings. For simplicity, consider the sensitivity index as a normalised metric which indicates what part of the variance in the response can be attributed to which factor.

#### 3. Numerical case studies

gPROMS®, a general-purpose DAE solver for process modelling, was used to compute the likelihood function values based on the measurement data with some artificial errors for the demonstration purposes. It was also used to compute the MLE (by GBM) to set a reference for comparison with the results by the proposed method, GSA-based inference.

#### 3.1. Case study 'A': bioreactor

A benchmark problem in genome research in [6] is reproduced. They applied bootstrapping to minimise the expected variance of the estimated parameters.

#### 3.1.1. Problem statement

(15)

(16)

i,y, variance attributed to factor i on response y;

i,T, <sup>y</sup>, variance attributed to factor i on response y

Sj,i ≤ 1 (17)

Sj,i,T ≥ 1 (19)

Sj,i,T ≥ Sj,i (18)

• The first-order effect, that is, the direct effect of factor i on response y:

• The total effect, that is, the total effect of factor i on response, y:

where Si, first-order effect index for factor i; σ<sup>2</sup>

<sup>y</sup>, variance observed for response y.

where Si,T, total effect index for factor i; and σ<sup>2</sup>

and σ<sup>2</sup>

through all factors.

50 New Insights into Bayesian Inference

i in response j.

3. Numerical case studies

3.1. Case study 'A': bioreactor

Si <sup>¼</sup> <sup>σ</sup><sup>2</sup> i,y σ2 y

Si,T <sup>¼</sup> <sup>σ</sup><sup>2</sup>

In this case, the influence of factor i through other variables is taken into consideration.

X i∈factors

X i∈factors

the relationships below can be inferred from the definitions of the factors:

of the variance in the response can be attributed to which factor.

minimise the expected variance of the estimated parameters.

comparison with the results by the proposed method, GSA-based inference.

In variance-based sensitivity analysis, the sensitivity indices are based on variances. Note that

where Sj,i, first-order effect index for factor i in response j; and Sj,i,T, total effect index for factor

It is clear that mathematical sensitivities and sensitivity indices have very different meanings. For simplicity, consider the sensitivity index as a normalised metric which indicates what part

gPROMS®, a general-purpose DAE solver for process modelling, was used to compute the likelihood function values based on the measurement data with some artificial errors for the demonstration purposes. It was also used to compute the MLE (by GBM) to set a reference for

A benchmark problem in genome research in [6] is reproduced. They applied bootstrapping to

i,T,y σ2 y

In silico experimental data for an organism in a chemostat as shown in Figure 5 is presented. A mathematical model is set up based on a fictive network structure.

After reaching a steady state, the flow rates qin, qout as well as the concentration of the substrate in the feed cin are changed. Measurements are available for three metabolites, M1, M2, and M3 that function as an enzyme E, representing a small biochemical network of the organism, and for biomass B and substrate S (Table 1).

Those parameters as listed in Table 2 are assumed known prior to the experiment.

Since number of such experiments that can be performed is rather limited in reality, GSA or a Bayesian approach is anticipated to perform better than MLE or minimum mean square error (MMSE) methods in that the latter tends to provide overfitting of the parameters in terms of

Figure 5. Bioreactor.


Table 1. Additional information – 'known' parameters.


Table 2. Experimental procedure.

the observation or measured data, whereas the former should calculate more robust estimate of parameters that are integrated over the posterior distribution.

$$
\dot{V} = \eta\_{in} - \eta\_{out} \tag{20}
$$

YX/S k2 ksynmax KIB

Dynamic Process Model Parameter Estimation by Global System Analysis

http://dx.doi.org/10.5772/intechopen.74635

53

MLE 7.00E-05 6.43E + 06 6.80E-03 0.2 Std. Dev. 6.24E-07 1.85E + 05 5.70E-04 0.2

MAP 6.91E-05 6.70E + 06 6.26E-03 2.7 Posterior Mean 6.94E-05 6.65E + 06 6.31E-03 5.3 Posterior Variance 5.70E-07 1.21E + 05 1.30E-04 4.8

First order effect 0.021 0.02 0.054 0.003 Total effect 0.768 0.697 0.906 0.006

Maximum likelihood estimators

Empirical Bayes estimators (GSA)

Factor sensitivity table

Likelihood = 485

Likelihood = 476

Table 3. Case study 'A' estimated parameters.

Figure 6. Operating conditions of the experiments.

Figure 7. Overlay plot for trajectory of substrate S.

$$
\dot{B} = \left(\mu - \frac{q\_{in}}{V}\right)B\tag{21}
$$

$$\dot{S} = \frac{q\_{in}}{V}(c\_{in} - S) - r\_1 M wB \tag{22}$$

$$r\_1 = r\_{1\,\mathrm{max}} \frac{\mathrm{S}}{K\_{\mathrm{S}} + \mathrm{S}} \tag{23}$$

$$r\_2 = k\_2 E \frac{M\_1}{K\_{M1} + M\_1} \tag{24}$$

$$r\_3 = r\_{3\max} \frac{M\_2}{K\_{M2} + M\_2} \tag{25}$$

$$r\_{\text{syn}} = k\_{\text{syn max}} \frac{K\_{\text{IB}}}{K\_{\text{IB}} + M\_2} \tag{26}$$

$$
\dot{M}\_1 = r\_1 - r\_2 - \mu M\_1 \tag{27}
$$

$$
\dot{M}\_2 = r\_2 - r\_3 - \mu M\_2 \tag{28}
$$

$$
\dot{E} = r\_{syn} - \mu E \tag{29}
$$

$$
\mu = Y\_{X/S} r\_1 \tag{30}
$$

where B, biomass concentration, (g/litre); cin, substrate concentration in feed, (g/litre); E, enzyme (=M3) concentration, (μmol/g); k2, turnover number to produce M2, (litre/h); KIB, inhibition of enzyme synthesis by M2, (μmol/g); KM1, Michaelis constant - M1 uptake, (μmol/ g); KM2, Michaelis constant - M2 uptake, (μmol/g); KS, Michaelis constant - substrate uptake, (μmol/g); ksynmax, maximum enzyme synthesis rate, (μmol/g/h); Mi, metabolite i (i = 1, 2, 3) concentration, (μmol/g); Mw, molar mass of substrate (243.3 � <sup>10</sup>�<sup>6</sup> ), (g/μmol); qin, feed flow rate, (litre/h); qout, product flow rate, (litre/h); r1max, maximum rate constant–reaction 1, (μmol/ g/h); r3max, maximum rate constant - reaction 3, (μmol/g/h); ri, uptake rate to produce Mi (i = 1, 2, 3), (μmol/g/h); rsyn, enzyme synthesis rate, (μmol/g/h); S, substrate concentration, (g/litre); V, reactor holdup volume, (litre); YX/S, biomass yield coefficient, (g/μmol); and μ, biomass specific growth rate, (h�<sup>1</sup> ).

The feed flow rate and the substrate concentration in feed are changed as shown in Table 3 and Figure 6.

#### 3.1.2. Numerical study

GSA-based parameter estimation was attempted as well as the gradient-based method or local search as the benchmark. In Figures 7–11, each dot in blue shows an individual measurement


Table 3. Case study 'A' estimated parameters.

the observation or measured data, whereas the former should calculate more robust estimate

<sup>B</sup>\_ <sup>¼</sup> <sup>μ</sup> � qin

r<sup>1</sup> ¼ r1 max

r<sup>3</sup> ¼ r3 max

rsyn ¼ ksyn max

where B, biomass concentration, (g/litre); cin, substrate concentration in feed, (g/litre); E, enzyme (=M3) concentration, (μmol/g); k2, turnover number to produce M2, (litre/h); KIB, inhibition of enzyme synthesis by M2, (μmol/g); KM1, Michaelis constant - M1 uptake, (μmol/ g); KM2, Michaelis constant - M2 uptake, (μmol/g); KS, Michaelis constant - substrate uptake, (μmol/g); ksynmax, maximum enzyme synthesis rate, (μmol/g/h); Mi, metabolite i (i = 1, 2, 3)

rate, (litre/h); qout, product flow rate, (litre/h); r1max, maximum rate constant–reaction 1, (μmol/ g/h); r3max, maximum rate constant - reaction 3, (μmol/g/h); ri, uptake rate to produce Mi (i = 1, 2, 3), (μmol/g/h); rsyn, enzyme synthesis rate, (μmol/g/h); S, substrate concentration, (g/litre); V, reactor holdup volume, (litre); YX/S, biomass yield coefficient, (g/μmol); and μ, biomass specific

The feed flow rate and the substrate concentration in feed are changed as shown in Table 3 and

GSA-based parameter estimation was attempted as well as the gradient-based method or local search as the benchmark. In Figures 7–11, each dot in blue shows an individual measurement

concentration, (μmol/g); Mw, molar mass of substrate (243.3 � <sup>10</sup>�<sup>6</sup>

growth rate, (h�<sup>1</sup>

3.1.2. Numerical study

Figure 6.

).

<sup>r</sup><sup>2</sup> <sup>¼</sup> <sup>k</sup><sup>2</sup> <sup>E</sup> <sup>M</sup><sup>1</sup>

V 

S

KM<sup>1</sup> þ M<sup>1</sup>

M<sup>2</sup> KM<sup>2</sup> þ M<sup>2</sup>

> KIB KIB þ M<sup>2</sup>

<sup>V</sup>\_ <sup>¼</sup> qin � qout (20)

<sup>V</sup> ð Þ� cin � <sup>S</sup> <sup>r</sup><sup>1</sup> Mw B (22)

<sup>M</sup>\_ <sup>1</sup> <sup>¼</sup> <sup>r</sup><sup>1</sup> � <sup>r</sup><sup>2</sup> � <sup>μ</sup>M<sup>1</sup> (27)

<sup>M</sup>\_ <sup>2</sup> <sup>¼</sup> <sup>r</sup><sup>2</sup> � <sup>r</sup><sup>3</sup> � <sup>μ</sup>M<sup>2</sup> (28)

<sup>E</sup>\_ <sup>¼</sup> rsyn � <sup>μ</sup><sup>E</sup> (29)

μ ¼ YX=<sup>S</sup> r<sup>1</sup> (30)

), (g/μmol); qin, feed flow

B (21)

KS <sup>þ</sup> <sup>S</sup> (23)

(24)

(25)

(26)

of parameters that are integrated over the posterior distribution.

52 New Insights into Bayesian Inference

<sup>S</sup>\_ <sup>¼</sup> qin

Figure 6. Operating conditions of the experiments.

Figure 7. Overlay plot for trajectory of substrate S.

in dynamic experiments. The error bar on each dot signifies standard deviation of the measurement data (linear variance model). The solid lines in red are MLE estimation by the GBM

A constant linear variance model is assumed for each point of the measurement data where

<sup>σ</sup><sup>2</sup> <sup>¼</sup> <sup>ω</sup><sup>2</sup>

GSA sampling was performed by Sobol sequence involving four parameters YX/S, k2, ksynmax, KIB. A total of 500,000 sample points were generated in terms of combinations of these four

Both YX/S and k2 are seen to follow rather smooth distributions as shown in Figures 12 and 13 respectively. ksynmax also seems to follow a nice and smooth distribution, but with a few outliers

Note that those variance or confidence intervals for YX/S, k2, ksynmax by GSA are smaller than

KIB on the other hand presents totally different picture (Figure 15), that is, the model is almost insensitive to the variation of this parameter. We can actually confirm this by running sensitiv-

An important finding here is that the local search was trapped by the local minimum, but it is

A laboratory integral reactor with fixed bed catalyst was studied. Model parameter estimation was attempted by the conventional maximum likelihood approach as well as the newly

parameters. Marginal posterior for each parameter is plotted in Figures 12–15.

not necessarily a robust estimate because of the spread as shown in Figure 15.

3.2. Case study 'B': catalytic synthesis of methanol from CO and H2

scattering here and there as shown in Figure 14.

ity analysis separately as summarised in Table 4.

Figure 12. Marginal posterior (YX/S).

what was obtained by local search or the conventional GBM.

Figures 16–21 are 3D presentation of the posterior in terms of the likelihood function.

z<sup>2</sup> (31)

http://dx.doi.org/10.5772/intechopen.74635

55

Dynamic Process Model Parameter Estimation by Global System Analysis

solver (gPROMS®).

ω = 0.02.

Figure 8. Overlay plot for trajectory of biomass B.

Figure 9. Overlay plot for trajectory of enzyme E (M3).

Figure 10. Overlay plot for trajectory of metabolite M1.

Figure 11. Overlay plot for trajectory of metabolite M2.

in dynamic experiments. The error bar on each dot signifies standard deviation of the measurement data (linear variance model). The solid lines in red are MLE estimation by the GBM solver (gPROMS®).

A constant linear variance model is assumed for each point of the measurement data where ω = 0.02.

$$
\sigma^2 = \omega^2 \overline{z}^2 \tag{31}
$$

GSA sampling was performed by Sobol sequence involving four parameters YX/S, k2, ksynmax, KIB. A total of 500,000 sample points were generated in terms of combinations of these four parameters. Marginal posterior for each parameter is plotted in Figures 12–15.

Figures 16–21 are 3D presentation of the posterior in terms of the likelihood function.

Both YX/S and k2 are seen to follow rather smooth distributions as shown in Figures 12 and 13 respectively. ksynmax also seems to follow a nice and smooth distribution, but with a few outliers scattering here and there as shown in Figure 14.

Note that those variance or confidence intervals for YX/S, k2, ksynmax by GSA are smaller than what was obtained by local search or the conventional GBM.

KIB on the other hand presents totally different picture (Figure 15), that is, the model is almost insensitive to the variation of this parameter. We can actually confirm this by running sensitivity analysis separately as summarised in Table 4.

An important finding here is that the local search was trapped by the local minimum, but it is not necessarily a robust estimate because of the spread as shown in Figure 15.

#### 3.2. Case study 'B': catalytic synthesis of methanol from CO and H2

A laboratory integral reactor with fixed bed catalyst was studied. Model parameter estimation was attempted by the conventional maximum likelihood approach as well as the newly

Figure 12. Marginal posterior (YX/S).

Figure 8. Overlay plot for trajectory of biomass B.

54 New Insights into Bayesian Inference

Figure 9. Overlay plot for trajectory of enzyme E (M3).

Figure 10. Overlay plot for trajectory of metabolite M1.

Figure 11. Overlay plot for trajectory of metabolite M2.

Figure 13. Marginal posterior (k2).

Figure 14. Marginal posterior (ksynmax).

proposed GSA methodology. Pseudo experiment data were prepared by introducing artificial measurement errors.

#### 3.2.1. Problem statement

The following main reaction can be accomplished by several side and consecutive reactions (see [7]). A large number of kinetic expressions have been proposed in the literature [8]. We assumed the rate determining step (29) as the dominant reaction mechanism.

$$\text{CH}^\* + 2\text{H}\_2\text{}^\* = \text{CH}\_3\text{OH}^\* + 2\text{S} \tag{32}$$

rMethanol ¼

Figure 16. 2D marginal posterior (YX/S vs. k2).

Figure 15. Marginal posterior (KIB).

k ϕCOϕ<sup>H</sup><sup>2</sup>

<sup>k</sup> <sup>¼</sup> krexp �Er

<sup>A</sup> <sup>¼</sup> kCOexp �ECO

<sup>B</sup> <sup>¼</sup> kH<sup>2</sup> exp �EH<sup>2</sup>

<sup>2</sup> � <sup>ϕ</sup>Methanol=Keq

<sup>1</sup> <sup>þ</sup> <sup>A</sup>ϕCO <sup>þ</sup> <sup>B</sup>ϕ<sup>H</sup><sup>2</sup> <sup>þ</sup> <sup>C</sup>ϕMethanol <sup>2</sup> (33)

Dynamic Process Model Parameter Estimation by Global System Analysis

http://dx.doi.org/10.5772/intechopen.74635

57

RT (34)

RT (35)

RT (36)

where \* denotes component as adsorbed state; S is an adsorbed site.

A Langmuir-Hinshelwood type kinetics as described by Eq. 13 was assumed for the study.

Dynamic Process Model Parameter Estimation by Global System Analysis http://dx.doi.org/10.5772/intechopen.74635 57

Figure 15. Marginal posterior (KIB).

Figure 16. 2D marginal posterior (YX/S vs. k2).

proposed GSA methodology. Pseudo experiment data were prepared by introducing artificial

The following main reaction can be accomplished by several side and consecutive reactions (see [7]). A large number of kinetic expressions have been proposed in the literature [8]. We

A Langmuir-Hinshelwood type kinetics as described by Eq. 13 was assumed for the study.

<sup>∗</sup> <sup>¼</sup> CH3OH<sup>∗</sup> <sup>þ</sup> 2S (32)

assumed the rate determining step (29) as the dominant reaction mechanism.

<sup>þ</sup> 2H2

CO<sup>∗</sup>

where \* denotes component as adsorbed state; S is an adsorbed site.

measurement errors.

Figure 14. Marginal posterior (ksynmax).

Figure 13. Marginal posterior (k2).

56 New Insights into Bayesian Inference

3.2.1. Problem statement

$$r\_{\text{Methanol}} = \frac{k \left(\phi\_{\text{CO}} \phi\_{H\_2}{}^2 - \phi\_{\text{Methanol}} / K\_{eq}\right)}{\left(1 + A\phi\_{\text{CO}} + B\phi\_{H\_2} + C\phi\_{\text{Metanol}}\right)^2} \tag{33}$$

$$k = k\_r \exp\left[\frac{-E\_r}{RT}\right] \tag{34}$$

$$A = k\_{\rm CO} \exp\left[\frac{-E\_{\rm CO}}{RT}\right] \tag{35}$$

$$B = k\_{H\_2} \exp\left[\frac{-E\_{H\_2}}{RT}\right] \tag{36}$$

Figure 17. 3D posterior plot (YX/S vs. k2).

Figure 18. 2D marginal posterior (YX/S vs. ksynmax).

$$\mathcal{C} = k\_{Method} \exp\left[\frac{-E\_{Method}}{RT}\right] \tag{37}$$

(mol/sec/m<sup>3</sup>

variance σ<sup>2</sup>

temperature, (K).

.

Figure 20. 2D marginal posterior (k2 vs. ksynmax).

Figure 19. 3D posterior plot (YX/S vs. ksynmax).

); R, universal gas constant, (J/K/mol); t, time, (sec); and T, reaction (operating)

Dynamic Process Model Parameter Estimation by Global System Analysis

http://dx.doi.org/10.5772/intechopen.74635

59

The molar concentrations of the species are measured with Gaussian errors with mean at 0 and

where Ci, molar concentration of component i, (mol/m<sup>3</sup> ); Ej, activation energy of jth reaction, (J/mol); kj, reaction pre-exponent factor of jth reaction, (1/sec); rj, reaction rate of jth reaction, Dynamic Process Model Parameter Estimation by Global System Analysis http://dx.doi.org/10.5772/intechopen.74635 59

Figure 19. 3D posterior plot (YX/S vs. ksynmax).

Figure 20. 2D marginal posterior (k2 vs. ksynmax).

<sup>C</sup> <sup>¼</sup> kMethanolexp �EMethanol

(J/mol); kj, reaction pre-exponent factor of jth reaction, (1/sec); rj, reaction rate of jth reaction,

where Ci, molar concentration of component i, (mol/m<sup>3</sup>

Figure 18. 2D marginal posterior (YX/S vs. ksynmax).

Figure 17. 3D posterior plot (YX/S vs. k2).

58 New Insights into Bayesian Inference

RT 

(37)

); Ej, activation energy of jth reaction,

(mol/sec/m<sup>3</sup> ); R, universal gas constant, (J/K/mol); t, time, (sec); and T, reaction (operating) temperature, (K).

The molar concentrations of the species are measured with Gaussian errors with mean at 0 and variance σ<sup>2</sup> .

Figure 21. 3D posterior plot (k2 vs. ksynmax).


Table 4. Case study 'B' estimated parameters.

$$
\sigma^2 = \omega^2 z^2 \tag{38}
$$

gas compositions or an empty/inert state. There were eight sets of experiment data for the

Dynamic Process Model Parameter Estimation by Global System Analysis

http://dx.doi.org/10.5772/intechopen.74635

61

We can observe that the GSA or Bayesian approach correctly captures the process kinetics as described by Arrhenius equations that contains some interdependency among the activation

Figures 23 and 24 show measurement data in a dynamic experiment (designated by the blue dots in the plots). Notice that the vertical bar attached to each dot denotes the measurement error that we assumed. The red dots are the simulation results of the final iteration for the MLE

study. Feed rates were controlled as shown in Figure 22.

energies and the pre-exponent factors.

3.2.2. Numerical study

estimation by the GBM.

Figure 22. Feed rates profile.

Figure 23. Trajectory of carbon monoxide yield.

where, z is the measured data and ω (=0.02) is the standard deviation of the linear variance model.

The reactor with a fixed catalyst bed was operated under three different temperatures (475, 500, 525, 550 K). Two different initial conditions were assumed, either being filled with the feed gas compositions or an empty/inert state. There were eight sets of experiment data for the study. Feed rates were controlled as shown in Figure 22.

We can observe that the GSA or Bayesian approach correctly captures the process kinetics as described by Arrhenius equations that contains some interdependency among the activation energies and the pre-exponent factors.

#### 3.2.2. Numerical study

Figures 23 and 24 show measurement data in a dynamic experiment (designated by the blue dots in the plots). Notice that the vertical bar attached to each dot denotes the measurement error that we assumed. The red dots are the simulation results of the final iteration for the MLE estimation by the GBM.

Figure 22. Feed rates profile.

<sup>σ</sup><sup>2</sup> <sup>¼</sup> <sup>ω</sup><sup>2</sup>

First order effect 0.004 �0.011 �0.031 0 �0.043 �0.019 0.02 �0.042 Total effect 0.607 0.256 0.69 0.885 0.179 0.078 0.271 0.209

ECO EH2 EMethanol Er kCO kH2 kMethanol kr

MLE 6.72E + 04 5.13E + 04 7.16E + 04 1.22E + 05 7.05E + 05 3.41E + 03 1.97E + 06 1.01E + 09 Std. Dev. 1.65E + 04 1.69E + 04 1.61E + 04 4.92E + 04 2.95E + 06 1.46E + 04 8.07E + 06 1.26E + 10

MAP 6.56E + 04 4.86E + 04 7.29E + 04 1.21E + 05 8.35E + 05 3.59E + 03 4.47E + 06 4.58E + 09 Posterior Mean 6.55E + 04 4.86E + 04 7.29E + 04 1.21E + 05 8.35E + 05 3.60E + 03 4.47E + 06 4.58E + 09 Posterior Variance 6.48E + 02 4.49E + 02 6.93E + 02 1.67E + 03 1.21E + 04 1.20E + 02 8.04E + 04 2.26E + 07

Figure 21. 3D posterior plot (k2 vs. ksynmax).

Maximum likelihood estimators Likelihood = �3348

60 New Insights into Bayesian Inference

Likelihood = �3397

Factor sensitivity table

Empirical Bayes estimators (GSA-based)

Table 4. Case study 'B' estimated parameters.

where, z is the measured data and ω (=0.02) is the standard deviation of the linear variance model. The reactor with a fixed catalyst bed was operated under three different temperatures (475, 500, 525, 550 K). Two different initial conditions were assumed, either being filled with the feed

z<sup>2</sup> (38)

Figure 23. Trajectory of carbon monoxide yield.

Figure 24. Trajectory of methanol yield.

The model parameters were estimated firstly by gPROMS®, a conventional solution engine or GBM to maximise the likelihood function (minimising the log-likelihood). Confidence regions or intervals for the estimated parameters are evaluated based on Cramer-Rao lower bounds.

Second, posterior likelihood of the model parameters is computed by GSA. Figures 25–30 are 3D plots of likelihood for a few pairs of parameters to be identified. The projection of those 3D points to 2D planes in the former plots are 2D marginal distributions as shown in Figures 31–36, respectively.

The parameters are correlated in one way or another. Nonetheless, the pair of parameters such as ECO and Er are still identifiable based on the posterior distribution of the likelihood function

(Figure 31). The pair of EH2 and Er presents a little different profile, but they can also be

Dynamic Process Model Parameter Estimation by Global System Analysis

http://dx.doi.org/10.5772/intechopen.74635

63

One noticeable finding was that the posterior variances of the GSA in this particular case were much smaller than those were estimated by local search or GBM (Table 5). Since the preexponent factors and the activation energies in the Arrhenius-type kinetics are highly correlated as can be seen in Figures 34–36, parameter estimation is difficult in its nature of the

estimated by taking MAP or averaging over the entire sampling space (Figure 32).

mathematical formulation.

Figure 27. 3D posterior plot (EMethanol vs. Er).

Figure 26. 3D posterior plot (EH2 vs. Er).

Figure 25. 3D posterior plot (ECO vs. Er).

Dynamic Process Model Parameter Estimation by Global System Analysis http://dx.doi.org/10.5772/intechopen.74635 63

Figure 26. 3D posterior plot (EH2 vs. Er).

The model parameters were estimated firstly by gPROMS®, a conventional solution engine or GBM to maximise the likelihood function (minimising the log-likelihood). Confidence regions or intervals for the estimated parameters are evaluated based on Cramer-Rao lower bounds. Second, posterior likelihood of the model parameters is computed by GSA. Figures 25–30 are 3D plots of likelihood for a few pairs of parameters to be identified. The projection of those 3D points to 2D planes in the former plots are 2D marginal distributions as shown in Figures 31–36,

The parameters are correlated in one way or another. Nonetheless, the pair of parameters such as ECO and Er are still identifiable based on the posterior distribution of the likelihood function

respectively.

Figure 24. Trajectory of methanol yield.

62 New Insights into Bayesian Inference

Figure 25. 3D posterior plot (ECO vs. Er).

Figure 27. 3D posterior plot (EMethanol vs. Er).

(Figure 31). The pair of EH2 and Er presents a little different profile, but they can also be estimated by taking MAP or averaging over the entire sampling space (Figure 32).

One noticeable finding was that the posterior variances of the GSA in this particular case were much smaller than those were estimated by local search or GBM (Table 5). Since the preexponent factors and the activation energies in the Arrhenius-type kinetics are highly correlated as can be seen in Figures 34–36, parameter estimation is difficult in its nature of the mathematical formulation.

Figure 31. 2D marginal posterior (ECO vs. Er).

Dynamic Process Model Parameter Estimation by Global System Analysis

http://dx.doi.org/10.5772/intechopen.74635

65

Figure 32. 2D marginal posterior (EH2 vs. Er).

Figure 33. 2D marginal posterior (EMethanol vs. Er).

Figure 28. 3D posterior plot (ECO vs. kCO).

Figure 29. 3D posterior plot (EH2 vs. kH2).

Figure 30. 3D posterior plot (EMethanol vs. kMethanol).

Dynamic Process Model Parameter Estimation by Global System Analysis http://dx.doi.org/10.5772/intechopen.74635 65

Figure 31. 2D marginal posterior (ECO vs. Er).

Figure 28. 3D posterior plot (ECO vs. kCO).

64 New Insights into Bayesian Inference

Figure 29. 3D posterior plot (EH2 vs. kH2).

Figure 30. 3D posterior plot (EMethanol vs. kMethanol).

Figure 32. 2D marginal posterior (EH2 vs. Er).

Figure 33. 2D marginal posterior (EMethanol vs. Er).

Figure 34. 2D marginal posterior (ECO vs. kCO).

4. Conclusions

Table 5. Elementary effects sensitivity analysis.

Table 6. Variance-based sensitivity analysis.

lower bounds.

the conventional ones that are based on local search.

due to lack of global information for the point estimate.

than those were calculated by the local search.

We applied GSA to parameter estimation for dynamic process models. Numerical case studies were performed for a couple of different processes with a set of simulated or artificial data. We demonstrated the advantage of the proposed methodology that is based on global search, over

μ σ μ σ

Dynamic Process Model Parameter Estimation by Global System Analysis

http://dx.doi.org/10.5772/intechopen.74635

67

Si Si,T Si Si,T

0<xi < 10 0<xi < 1 x1 111 60.5 1.11 0.65 x2 30 13.4 3.0 1.34

0<xi < 10 0<xi < 1 x1 0.92 1.02 0.03 0.12 x2 0.01 0.08 0.88 0.97

GSA calculates posterior distribution of the model parameters based on any measurement data available for the study. The estimated confidence region or intervals are in general narrower

In case where some parameter is not affecting the system response significantly, GSA would still correctly evaluate the distribution or the probable spread of the model parameters. On the contrary, local search or GBM might find an optimal point for the same problem that can however be elusive in that the estimated variance around the solution is erroneously small

When a set of parameters are structurally correlated as in the case for the activation energy and pre-exponential coefficient, confidence region or confidence interval by the conventional local search methods can be very large thus smearing out the point estimate. Even under such circumstances, GSA can provide legitimate information about the accuracy of the estimated parameters. The posterior variance by GSA can be significantly smaller than Cramer-Rao

Therefore, we can conclude that GSA can provide more robust location parameters compared to the local search methods. But to be fair, because GBM can still find the optimal solution more precisely, it is advised to use both the methods complementarily: local search or GBM can initially be used to search for MLE. GSA can then provide a global view of the posterior probability of the point estimation to make sure the solution is stable and robust. GSA can also

Figure 35. 2D marginal posterior (EH2 vs. kH2).

Figure 36. 2D marginal posterior (EMethanol vs. kMethanol).


Table 5. Elementary effects sensitivity analysis.


Table 6. Variance-based sensitivity analysis.

#### 4. Conclusions

Figure 34. 2D marginal posterior (ECO vs. kCO).

66 New Insights into Bayesian Inference

Figure 35. 2D marginal posterior (EH2 vs. kH2).

Figure 36. 2D marginal posterior (EMethanol vs. kMethanol).

We applied GSA to parameter estimation for dynamic process models. Numerical case studies were performed for a couple of different processes with a set of simulated or artificial data. We demonstrated the advantage of the proposed methodology that is based on global search, over the conventional ones that are based on local search.

GSA calculates posterior distribution of the model parameters based on any measurement data available for the study. The estimated confidence region or intervals are in general narrower than those were calculated by the local search.

In case where some parameter is not affecting the system response significantly, GSA would still correctly evaluate the distribution or the probable spread of the model parameters. On the contrary, local search or GBM might find an optimal point for the same problem that can however be elusive in that the estimated variance around the solution is erroneously small due to lack of global information for the point estimate.

When a set of parameters are structurally correlated as in the case for the activation energy and pre-exponential coefficient, confidence region or confidence interval by the conventional local search methods can be very large thus smearing out the point estimate. Even under such circumstances, GSA can provide legitimate information about the accuracy of the estimated parameters. The posterior variance by GSA can be significantly smaller than Cramer-Rao lower bounds.

Therefore, we can conclude that GSA can provide more robust location parameters compared to the local search methods. But to be fair, because GBM can still find the optimal solution more precisely, it is advised to use both the methods complementarily: local search or GBM can initially be used to search for MLE. GSA can then provide a global view of the posterior probability of the point estimation to make sure the solution is stable and robust. GSA can also be used for sensitivity study to make sure the estimated parameters or the variances of those are reasonable.

A.1.3. Shortcomings of MCMC

a certain region of the parameter space.

physical systems that we encounter in real life.

A.2.1. Elementary effects sensitivity analysis

A.2.2. Variance-based sensitivity analysis

Table 5.

are correctly identified.

Author details

Shigeru Kashiwaya

References

10.1252/jcej.12we202

A.2. Global sensitivity analysis: example calculations

Applying elementary effects sensitivity analysis on equation y=x1

Applying variance-based sensitivity analysis on equation y=x1

Address all correspondence to: s.kashiwaya@psenterprise.com

Process Systems Enterprise Limited, Yokohama, Japan

Although MCMC has become a standard tool for Bayesian analysis, several problems have also become obvious particularly in the numerical implementation of the method [9]. While MCMC provides an almost automatic way of sampling the posterior distribution, it often converges too slowly or get stuck within one mode of a multi-modal parametric space. Moreover, it is oftentimes difficult to determine if a chain has reached stationarity or still hovering in

Dynamic Process Model Parameter Estimation by Global System Analysis

http://dx.doi.org/10.5772/intechopen.74635

69

The computation of the likelihood function can be complicated as in the physical process systems that are described by a set of mathematical equations, which can also be dynamic or time-variant rather than stationary or time-invariant. Therefore, the sampling must be efficient for the numerical implementation of MCMC to be applied to analysis of chemical, biological or

A couple of example calculations of GSA sensitivity analysis are shown below (refer to Section 2.4).

Table 6. While variance-based values do not have an exact meaning, the most influential factors

[1] Kashiwaya S. A Quasi-Monte Carlo approach to Bayesian parameter estimation for nonlinear dynamic process models. Journal of Chemical Engineering of Japan. 2013;46(7):467-479. DOI:

<sup>2</sup> + 3x2 returns the results in

<sup>2</sup> + 3x2 returns the results in

Since GSA is computation-intensive, parallel computation must be considered to practice such a method for a study of real-life problems. We made use of parallel computation on a single PC with multi-cores (employed four workers). GSA with 500,000 points sampling took CPU time of 9530 s (2.6 h) and 128,752 s (35.8 h) for the abovementioned example cases 'A' and 'B' respectively (performed on PC with Intel i7-5600 U @2.6GHz).

## A. Appendix

### A.1. Numerical implementation of Bayesian analysis: review of the incumbent method

Although we proposed a new way of calculating posterior probability by GSA as described in Section 2.2, it is worthwhile to review the incumbent method.

#### A.1.1. Markov chain Monte Carlo

Markov Chain Monte Carlo (MCMC) is a sampling method searching in a space. It is particularly important in Bayesian analysis for surveying a space with an arbitrary probability measure. When the posterior distribution is not obtained in a closed form, we need to rely on numerical sampling in the parametric space.

MCMC draws random samples from their marginal posterior distributions. With the introduction of prior distribution, we can compute the posterior distribution based on the Bayes' theorem.

#### A.1.2. Sampling from the posterior distribution

Metropolis-Hastings is an accept-reject sampling. Designing a good proposal distribution is the key to the success of such a sampling method. This concept is illustrated in the following pseudo steps.


where full conditionals are conjugate, Gibbs sampling can be used. A sufficient number of repetitions converge on the posterior distributions much more rapidly than by the Metropolis-Hastings sampling. When conditional distributions are not conjugate, we use Metropolis-Hastings.

#### A.1.3. Shortcomings of MCMC

be used for sensitivity study to make sure the estimated parameters or the variances of those

Since GSA is computation-intensive, parallel computation must be considered to practice such a method for a study of real-life problems. We made use of parallel computation on a single PC with multi-cores (employed four workers). GSA with 500,000 points sampling took CPU time of 9530 s (2.6 h) and 128,752 s (35.8 h) for the abovementioned example cases 'A' and 'B' respec-

A.1. Numerical implementation of Bayesian analysis: review of the

Although we proposed a new way of calculating posterior probability by GSA as described in

Markov Chain Monte Carlo (MCMC) is a sampling method searching in a space. It is particularly important in Bayesian analysis for surveying a space with an arbitrary probability measure. When the posterior distribution is not obtained in a closed form, we need to rely on

MCMC draws random samples from their marginal posterior distributions. With the introduction of prior distribution, we can compute the posterior distribution based on the Bayes'

Metropolis-Hastings is an accept-reject sampling. Designing a good proposal distribution is the key to the success of such a sampling method. This concept is illustrated in the following

iii. Accept the proposed value with the probability in the previous step (i.e. θ (k) = θ (

case the proposal is not accepted, then retain the previous value (i.e. θ (k) = θ (k1)).

where full conditionals are conjugate, Gibbs sampling can be used. A sufficient number of repetitions converge on the posterior distributions much more rapidly than by the Metropolis-Hastings sampling. When conditional distributions are not conjugate, we use Metropolis-

\*

) against the previous

\* ) ). In

tively (performed on PC with Intel i7-5600 U @2.6GHz).

Section 2.2, it is worthwhile to review the incumbent method.

are reasonable.

68 New Insights into Bayesian Inference

A. Appendix

theorem.

pseudo steps.

Hastings.

i. Propose a value θ (

value θ (k1).

incumbent method

A.1.1. Markov chain Monte Carlo

numerical sampling in the parametric space.

A.1.2. Sampling from the posterior distribution

\* )

for the parameter. ii. Compute the posterior probability ratio for the proposed value θ ( Although MCMC has become a standard tool for Bayesian analysis, several problems have also become obvious particularly in the numerical implementation of the method [9]. While MCMC provides an almost automatic way of sampling the posterior distribution, it often converges too slowly or get stuck within one mode of a multi-modal parametric space. Moreover, it is oftentimes difficult to determine if a chain has reached stationarity or still hovering in a certain region of the parameter space.

The computation of the likelihood function can be complicated as in the physical process systems that are described by a set of mathematical equations, which can also be dynamic or time-variant rather than stationary or time-invariant. Therefore, the sampling must be efficient for the numerical implementation of MCMC to be applied to analysis of chemical, biological or physical systems that we encounter in real life.

#### A.2. Global sensitivity analysis: example calculations

A couple of example calculations of GSA sensitivity analysis are shown below (refer to Section 2.4).

#### A.2.1. Elementary effects sensitivity analysis

Applying elementary effects sensitivity analysis on equation y=x1 <sup>2</sup> + 3x2 returns the results in Table 5.

#### A.2.2. Variance-based sensitivity analysis

Applying variance-based sensitivity analysis on equation y=x1 <sup>2</sup> + 3x2 returns the results in Table 6. While variance-based values do not have an exact meaning, the most influential factors are correctly identified.

#### Author details

Shigeru Kashiwaya

Address all correspondence to: s.kashiwaya@psenterprise.com

Process Systems Enterprise Limited, Yokohama, Japan

#### References

[1] Kashiwaya S. A Quasi-Monte Carlo approach to Bayesian parameter estimation for nonlinear dynamic process models. Journal of Chemical Engineering of Japan. 2013;46(7):467-479. DOI: 10.1252/jcej.12we202

[2] Sobol' IM. Sensitivity estimates for nonlinear mathematical models. Mathematical Modelling and Computational Experiments. 1993;1(4):407-414

**Chapter 5**

**Provisional chapter**

**Preventing Disparities: Bayesian and Frequentist**

**Preventing Disparities: Bayesian and Frequentist** 

**Decision-Support Models**

**Decision-Support Models**

http://dx.doi.org/10.5772/intechopen.73176

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

Douglas S. McNair

**Abstract**

variables selection

**1. Introduction**

Douglas S. McNair

**Methods for Assessing Fairness in Machine-Learning**

Machine-learning (ML) methods are finding increasing application to guide human decision-making in many fields. Such guidance can have important consequences, including treatments and outcomes in health care. Recently, growing attention has focused on the potential that machine-learning might automatically learn unjust or discriminatory, but unrecognized or undisclosed, patterns that are manifested in available observational data and the human processes that gave rise to them, and thereby inadvertently perpetuating and propagating injustices that are embodied in the historical data. We applied two frequentist methods that have long been utilized in the courts and elsewhere for the purpose of ascertaining fairness (Cochran-Mantel-Haenszel test and beta regression) and one Bayesian method (Bayesian Model Averaging). These methods revealed that our ML model for guiding physicians' prescribing discharge beta-blocker medication for post-coronary artery bypass patients do not manifest significant untoward race-associated disparity. The methods also showed that our ML model for directing repeat performance of MRI imaging in children with medulloblastoma did manifest racial disparities that are likely associated with ethnic differences in informed consent and desire for information in the context of serious malignancies. The relevance of these methods to ascertaining and assuring fairness in other ML-based decision-support model-development and -curation contexts is discussed.

**Methods for Assessing Fairness in Machine-Learning** 

DOI: 10.5772/intechopen.73176

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution,

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

and reproduction in any medium, provided the original work is properly cited.

With regard to cognitive computing and machine-learning (ML)-based decision-support tools, there is an emerging need for ethical reasoning about Big Data beyond privacy [1–3].

**Keywords:** fairness, machine-learning, Bayesian model averaging, bias,


**Provisional chapter**

#### **Preventing Disparities: Bayesian and Frequentist Methods for Assessing Fairness in Machine-Learning Decision-Support Models Methods for Assessing Fairness in Machine-Learning Decision-Support Models**

**Preventing Disparities: Bayesian and Frequentist** 

DOI: 10.5772/intechopen.73176

Douglas S. McNair Douglas S. McNair Additional information is available at the end of the chapter

[2] Sobol' IM. Sensitivity estimates for nonlinear mathematical models. Mathematical Model-

[3] Sobol' IM. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Mathematical Modelling and Computational Experiments. 2001;55(1):271-

[4] Andrea S et al. Global Sensitivity Analysis: The Primer. London: John Wiley & Sons; 2008 [5] Andrea S et al. Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index. Computer Physics Communications. 2010;181(2):259-270 [6] Andreas K et al. A benchmark for methods in reverse engineering and model discrimination: Problem formulation and solutions. Genome Research. 2004;14(9):1773-1785

[7] Kuczynski M et al. Reaction kinetics for the synthesis of methanol from CO and H2 on a

[8] Falbe J, editor. New Synthesis with Carbon Monoxide. Berlin: Springer; 1980. pp. 309-320 [9] Guan Y et al. Markov chain Monte Carlo in small worlds. Statistics and Computing. 2006;

copper catalyst. Chemical Engineering and Processing. 1987;21(4):179-191

ling and Computational Experiments. 1993;1(4):407-414

280

70 New Insights into Bayesian Inference

16(2):193-202

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.73176

#### **Abstract**

Machine-learning (ML) methods are finding increasing application to guide human decision-making in many fields. Such guidance can have important consequences, including treatments and outcomes in health care. Recently, growing attention has focused on the potential that machine-learning might automatically learn unjust or discriminatory, but unrecognized or undisclosed, patterns that are manifested in available observational data and the human processes that gave rise to them, and thereby inadvertently perpetuating and propagating injustices that are embodied in the historical data. We applied two frequentist methods that have long been utilized in the courts and elsewhere for the purpose of ascertaining fairness (Cochran-Mantel-Haenszel test and beta regression) and one Bayesian method (Bayesian Model Averaging). These methods revealed that our ML model for guiding physicians' prescribing discharge beta-blocker medication for post-coronary artery bypass patients do not manifest significant untoward race-associated disparity. The methods also showed that our ML model for directing repeat performance of MRI imaging in children with medulloblastoma did manifest racial disparities that are likely associated with ethnic differences in informed consent and desire for information in the context of serious malignancies. The relevance of these methods to ascertaining and assuring fairness in other ML-based decision-support model-development and -curation contexts is discussed.

**Keywords:** fairness, machine-learning, Bayesian model averaging, bias, variables selection

#### **1. Introduction**

With regard to cognitive computing and machine-learning (ML)-based decision-support tools, there is an emerging need for ethical reasoning about Big Data beyond privacy [1–3].

Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons

Recent definitions of 'algorithmic fairness' [4–7] assert that similar individuals should be treated similarly. Such metrics comport with conventional lay-persons' sense of the meaning of fairness. Algorithmic fairness definitions presuppose the existence of a use-case-specific metric on individuals and propose that fair algorithms should satisfy a Lipschitz condition with respect to this metric. However, such definitions for algorithms and artificial intelligence tools have not yet been aligned with existing statistical methods that have been established in the legal and regulatory communities. Furthermore, no generally accepted standards yet exist for ascertaining the presence or absence of disparities in machine-learning (ML) models that have been learned from historical observational data. There is a serious concern among policy-makers and members of the public that the rapid growth of ML may lead to the systematic promulgation of "bad models" that inculcate past injustices in subsequent decisionmaking going forward [8–31].

new framework for statistically ascertaining ML model fairness. The purpose of this chapter is to introduce the new three-method framework to the machine-learning community and illustrate its use with two practical examples from clinical medicine specialties (namely cardiology and oncology). Our method involves joint application of the following methods to ML model-training and -test data, where the test data may be either (a) data arising from natural decision-making unaided by the ML model or (b) data arising from decision-making where

Preventing Disparities: Bayesian and Frequentist Methods for Assessing Fairness in Machine-Learning…

http://dx.doi.org/10.5772/intechopen.73176

73

If the p-values for all three methods are non-significant, then the ML model is declared to be provisionally fair. However, if any of these methods show that statistically significant (p < 0.05) bias exists depending on one or more stratification variables, then the model is declared to have failed fairness checking, the model is placed into a "hold" status and not released, and further investigation is initiated into the nature of the detected bias and its pos-

To date, a majority of the more than 50 predictive mathematical models that have been developed and deployed by the author's team are ML models. The discovery, development, and validation of the models have primarily been performed using a HIPAA-compliant, deidentified and PHI-free, epsilon differential privacy-protected, secondary-use-assented, EHR-derived, ontology-mapped, longitudinally electronic master person identifier (eMPI) linked repository of the serial care-episode health records associated with 100% of patients cared for at 814 U.S. health institutions who have established HIPAA business-associate agreements and data-rights agreements with our corporation. This data warehouse currently comprises more than 153 million distinct persons' longitudinal records and more than 400 million episodes of care from January 1, 2000 to the present time. New case material accrues into the data warehouse from each of the contributing health networks' and institutions' systems on a daily basis, encrypted end-to-end, and auto-mapped to a standard ontology and pre-cleaned upon arrival. The data warehouse is not a "claims" dataset but instead encompasses a majority of the content of the patients' EHR records, from flowsheet and monitoring data and waveforms, to all medications dispenses and prescriptions, all lab results, all procedures, all problem list entries and diagnoses, and all claims—with each data element or item or transaction date-timestamped with minute-level time precision and with successive episodes for a given person longitudinally linked via a key that is encrypted from the eMPI. A typical ML project for us begins with a cohort extracted from the data warehouse. Cohorts for studies we undertake tend to comprise from 20,000 to several million cases and a comparable number of controls, all meeting inclusion-exclusion criteria for the project and governed by a project specification and written, version-controlled protocol. The datasets comprised of these cohorts of historical, outcomes-labeled, de-identified cases and controls are separated into randomized, independent "training" and "test"

human users are assisted by the ML model:

**1.** The Cochran-Mantel-Haenszel test;

**3.** Bayesian model averaging (BMA).

**2.** Beta regression; and

sible causes.

Such concerns are heightened in the context of life-critical medical and surgical treatment. In one illustrative medical example, beta-blocker medications have been found to be important in the treatment of myocardial infarction and in coronary artery bypass (CAB) surgery in that they have been shown to decrease mortality. Their benefit is derived not only from improving the myocardial oxygen supply-demand balance but also from their ability to inhibit subsequent cardiac ventricular remodeling, mitigation of platelet activation, decrease in peripheral vascular resistance (PVR) and decrease in hemodynamic stress on the arterial wall, increase in diastolic coronary artery flow, membrane stabilization and shortening of the heart ratecorrected QT interval (QT<sup>c</sup> ) [32], prevention of atrial fibrillation and other arrhythmias, and other mechanisms.

In another typical example in clinical medicine, serial repeated MRI scans of the head and spinal cord have been found to be relevant in the ongoing management of medulloblastoma [33]. As with any cancer, early detection and ongoing follow-up monitoring are essential to achieving a positive outcome. With its multi-planar capability and excellent high spatial resolution, MRI is the preferred imaging modality in the follow-up to assess response to treatment. The efficacy of repeated MRI scans is presently uncertain as regards improvement of survival or other outcomes. However, there can be considerable psychological value that attaches to finding that a repeat scan is negative for recurrence, progression, or metastasis of the cancer, and repeat scans are routinely performed at regular intervals on this empirical basis, motivated by the wish to provide knowledge and reassurance. Conversely, the MRI-informed discovery of recurrence, progression, or metastasis is a much-feared possibility for parents of children with medulloblastoma, insofar as this finding portends shortened life-expectancy for the child and diminution of hope. In certain contexts, then, there is a disinclination to perform exams that could lead to bad news, for which there may be no effective mitigations or treatment options.

#### **2. Background and methodology**

Avoiding Type II (false-negative) errors is paramount in machine-learning model quality assurance and fairness determinations. Following this spirit, we have recently developed a new framework for statistically ascertaining ML model fairness. The purpose of this chapter is to introduce the new three-method framework to the machine-learning community and illustrate its use with two practical examples from clinical medicine specialties (namely cardiology and oncology). Our method involves joint application of the following methods to ML model-training and -test data, where the test data may be either (a) data arising from natural decision-making unaided by the ML model or (b) data arising from decision-making where human users are assisted by the ML model:


Recent definitions of 'algorithmic fairness' [4–7] assert that similar individuals should be treated similarly. Such metrics comport with conventional lay-persons' sense of the meaning of fairness. Algorithmic fairness definitions presuppose the existence of a use-case-specific metric on individuals and propose that fair algorithms should satisfy a Lipschitz condition with respect to this metric. However, such definitions for algorithms and artificial intelligence tools have not yet been aligned with existing statistical methods that have been established in the legal and regulatory communities. Furthermore, no generally accepted standards yet exist for ascertaining the presence or absence of disparities in machine-learning (ML) models that have been learned from historical observational data. There is a serious concern among policy-makers and members of the public that the rapid growth of ML may lead to the systematic promulgation of "bad models" that inculcate past injustices in subsequent decision-

Such concerns are heightened in the context of life-critical medical and surgical treatment. In one illustrative medical example, beta-blocker medications have been found to be important in the treatment of myocardial infarction and in coronary artery bypass (CAB) surgery in that they have been shown to decrease mortality. Their benefit is derived not only from improving the myocardial oxygen supply-demand balance but also from their ability to inhibit subsequent cardiac ventricular remodeling, mitigation of platelet activation, decrease in peripheral vascular resistance (PVR) and decrease in hemodynamic stress on the arterial wall, increase in diastolic coronary artery flow, membrane stabilization and shortening of the heart rate-

In another typical example in clinical medicine, serial repeated MRI scans of the head and spinal cord have been found to be relevant in the ongoing management of medulloblastoma [33]. As with any cancer, early detection and ongoing follow-up monitoring are essential to achieving a positive outcome. With its multi-planar capability and excellent high spatial resolution, MRI is the preferred imaging modality in the follow-up to assess response to treatment. The efficacy of repeated MRI scans is presently uncertain as regards improvement of survival or other outcomes. However, there can be considerable psychological value that attaches to finding that a repeat scan is negative for recurrence, progression, or metastasis of the cancer, and repeat scans are routinely performed at regular intervals on this empirical basis, motivated by the wish to provide knowledge and reassurance. Conversely, the MRI-informed discovery of recurrence, progression, or metastasis is a much-feared possibility for parents of children with medulloblastoma, insofar as this finding portends shortened life-expectancy for the child and diminution of hope. In certain contexts, then, there is a disinclination to perform exams that could lead to bad news, for which there may be no effective mitigations or treatment options.

Avoiding Type II (false-negative) errors is paramount in machine-learning model quality assurance and fairness determinations. Following this spirit, we have recently developed a

) [32], prevention of atrial fibrillation and other arrhythmias, and

making going forward [8–31].

72 New Insights into Bayesian Inference

corrected QT interval (QT<sup>c</sup>

**2. Background and methodology**

other mechanisms.

**3.** Bayesian model averaging (BMA).

If the p-values for all three methods are non-significant, then the ML model is declared to be provisionally fair. However, if any of these methods show that statistically significant (p < 0.05) bias exists depending on one or more stratification variables, then the model is declared to have failed fairness checking, the model is placed into a "hold" status and not released, and further investigation is initiated into the nature of the detected bias and its possible causes.

To date, a majority of the more than 50 predictive mathematical models that have been developed and deployed by the author's team are ML models. The discovery, development, and validation of the models have primarily been performed using a HIPAA-compliant, deidentified and PHI-free, epsilon differential privacy-protected, secondary-use-assented, EHR-derived, ontology-mapped, longitudinally electronic master person identifier (eMPI) linked repository of the serial care-episode health records associated with 100% of patients cared for at 814 U.S. health institutions who have established HIPAA business-associate agreements and data-rights agreements with our corporation. This data warehouse currently comprises more than 153 million distinct persons' longitudinal records and more than 400 million episodes of care from January 1, 2000 to the present time. New case material accrues into the data warehouse from each of the contributing health networks' and institutions' systems on a daily basis, encrypted end-to-end, and auto-mapped to a standard ontology and pre-cleaned upon arrival. The data warehouse is not a "claims" dataset but instead encompasses a majority of the content of the patients' EHR records, from flowsheet and monitoring data and waveforms, to all medications dispenses and prescriptions, all lab results, all procedures, all problem list entries and diagnoses, and all claims—with each data element or item or transaction date-timestamped with minute-level time precision and with successive episodes for a given person longitudinally linked via a key that is encrypted from the eMPI. A typical ML project for us begins with a cohort extracted from the data warehouse. Cohorts for studies we undertake tend to comprise from 20,000 to several million cases and a comparable number of controls, all meeting inclusion-exclusion criteria for the project and governed by a project specification and written, version-controlled protocol. The datasets comprised of these cohorts of historical, outcomes-labeled, de-identified cases and controls are separated into randomized, independent "training" and "test" subsets. A typical ML project for us begins with several hundred input data variables or document types selected from the EHR data model, which includes more than 10,000 data type categories.

A total of 30,116 complete post-CAB cases were retained, and no imputation was used. Median age was 64 years and 13.6% were black, with M:F ratio 2.57. From this extract, males were retained for analysis (median age 64 years, 11.1% black). Matching was performed on a per-hospital basis by race in a 1:9 ratio (Black:White), to minimize bias arising from regional differences in the prevalence of Black individuals. Matching was performed on U.S. census division (nine geographic regions) and on age with 5-year binning. Matching was additionally performed on diabetic status. This resulted in of 11,358 actual cases used for subsequent training dataset modeling and analysis. The remainder of the data was used as an indepen-

Preventing Disparities: Bayesian and Frequentist Methods for Assessing Fairness in Machine-Learning…

http://dx.doi.org/10.5772/intechopen.73176

75

A total of 1207 medulloblastoma cases were retained. Median age was 5.8 and 15.2% were

In our two examples, exploratory machine-learning, including logistic regression, was performed using raw data comprised of 326 data elements from the de-identified EHR-derived extracts, supplemented by derived variables that were transformed. The LASSO procedure [34] was used for dimensionality reduction. Predictor variables with a category-wise Wald

In the post-CAB beta-blocker example, transformed continuous-variable features (6) in the model included: ln(inter-beat interval), RMSSD(HR), ln(nbr\_dx), ln(nbr\_meds), ln(LOS\_ days), and ln(AST). Transformed binomial-variable features included the following: AST/ ALT <1.1, max(HR) < 110, range(HR 48 hr prior to discharge) > 30 bpm, range(RR 48 hr prior to discharge) > 18 bpm, range(MAP 48 hr prior to discharge) > 22 mmHg, max(SBP) during hospital stay) < 150 mmHg; diabetes; concomitant calcium channel blocker; perioperative inotrope or mechanical circulatory assist; concomitant CYP2D6 substrate or inhibitor (esp. antidepressants, antipsychotics, COX-2 inhibitors, amiodarone, or cimetidine); history of sub-

In the medulloblastoma repeat MRI example, binomial-variable features included the following: clinical trial enrollment, prior evidence of recurrence or metastasis of tumor, renal impairment such as would be a safety contraindication for MRI contrast, high-risk histology, SHH or WNT genomics, tumor extent at resection, PFS duration, recent 99mTc scan, recent

Personalized patient care decisions require considering numerous clinical information items and weighing and combining them according to patient-specific risks and likely benefits. Additionally, considerations of disease etiology and progression as well as on comorbid conditions and concomitant medications or prior treatments that may affect the underlying biological processes or constrain subsequent therapeutic options are required. Yet further,

stance abuse; and history of syncope, vertigo, postural hypotension, or falling.

**3. Comparing model-guided and natural decision-making**

dent test dataset.

black, with M:F ratio 1.71.

test p-value ≤0.05 were retained in the models.

<sup>123</sup>I-mIGB scan, and public payor (Medicaid).

**2.3. Feature selection**

#### **2.1. Cohort selection**

Two representative examples serve to illustrate the application of Bayesian and frequentist methods for assessing fairness in ML models, one involving a very large cohort (beta-blocker usage in coronary artery disease post-coronary artery bypass (CAB)) and one involving a comparatively small cohort (MRI in pediatric medulloblastoma (brain cancer)).

A post-CAB cohort included those cases who were discharged alive with hospital LOS between 3 and 28 days, black or white race only, between January 1, 2012 and December 31, 2016, aged between 40 and 69 years at the time of CAB surgery, with no known prior use of beta-blocker within 1 year prior to CAB. Excluded were patients receiving percutaneous and MIDCAB (usage rates for which might be, or are, confounded by geography, operative risk and preoperative comorbidities, and other factors); in-hospital percutaneous coronary intervention (PCI), PCI to CAB conversion, urgent-emergent CAB; known prior AMI, prior PCI or prior CAB; patients with heart rate <45 bpm or AV block (ICD-10-CM diagnosis codes I44.x, I45.x; ICD-9-CM diagnosis codes 426.x); patients with implanted pacemaker; patients having eGFR <50 mL/min/1.73m<sup>2</sup> ; persons with previously diagnosed heart failure, asthma, or active malignancy; patients who were transferred to other medical institutions without discharge prescription; and patients at institutions having fewer than 100 open CAB cases annually meeting the criteria above during 2012–2016. Patients treated at a total of 14 out of 814 institutions participating in this data warehouse met the criteria for inclusion in the ML model development and analysis.

A medulloblastoma cohort included cases who were discharged alive, black or white race only, between January 1, 2000 and December 31, 2016, aged between 0 and 21 at the time of resection of the brain tumor. Patients treated at a total of 33 out of 814 institutions participating in this data warehouse met the criteria for inclusion in the ML model development and analysis.

#### **2.2. Data extraction**

Exploratory analyses to characterize available data often require full table scans, which, in conventional RDBMS tables having billions of rows, may entail runtimes of many hours, even with bitmapped indexes and careful query optimization. Laboratory tests and vital signs and flowsheet items in our data warehouse are each multi-billion-row tables. Premature dimensionality or cardinality reduction may interfere with discovering the best ML model. Therefore, a 64-node Hewlett Packard Vertica® system was the means whereby the data warehouse was physically stored for the present work. Extracts were performed using standard SQL queries on this massively parallel vertical database. Although many racial and ethnic categories were represented in the data warehouse, for the present work racial categories were restricted to black and white, for reasons of adequacy of sample size.

A total of 30,116 complete post-CAB cases were retained, and no imputation was used. Median age was 64 years and 13.6% were black, with M:F ratio 2.57. From this extract, males were retained for analysis (median age 64 years, 11.1% black). Matching was performed on a per-hospital basis by race in a 1:9 ratio (Black:White), to minimize bias arising from regional differences in the prevalence of Black individuals. Matching was performed on U.S. census division (nine geographic regions) and on age with 5-year binning. Matching was additionally performed on diabetic status. This resulted in of 11,358 actual cases used for subsequent training dataset modeling and analysis. The remainder of the data was used as an independent test dataset.

A total of 1207 medulloblastoma cases were retained. Median age was 5.8 and 15.2% were black, with M:F ratio 1.71.

#### **2.3. Feature selection**

subsets. A typical ML project for us begins with several hundred input data variables or document types selected from the EHR data model, which includes more than 10,000 data

Two representative examples serve to illustrate the application of Bayesian and frequentist methods for assessing fairness in ML models, one involving a very large cohort (beta-blocker usage in coronary artery disease post-coronary artery bypass (CAB)) and one involving a

A post-CAB cohort included those cases who were discharged alive with hospital LOS between 3 and 28 days, black or white race only, between January 1, 2012 and December 31, 2016, aged between 40 and 69 years at the time of CAB surgery, with no known prior use of beta-blocker within 1 year prior to CAB. Excluded were patients receiving percutaneous and MIDCAB (usage rates for which might be, or are, confounded by geography, operative risk and preoperative comorbidities, and other factors); in-hospital percutaneous coronary intervention (PCI), PCI to CAB conversion, urgent-emergent CAB; known prior AMI, prior PCI or prior CAB; patients with heart rate <45 bpm or AV block (ICD-10-CM diagnosis codes I44.x, I45.x; ICD-9-CM diagnosis codes 426.x); patients with implanted pacemaker; patients

or active malignancy; patients who were transferred to other medical institutions without discharge prescription; and patients at institutions having fewer than 100 open CAB cases annually meeting the criteria above during 2012–2016. Patients treated at a total of 14 out of 814 institutions participating in this data warehouse met the criteria for inclusion in the ML

A medulloblastoma cohort included cases who were discharged alive, black or white race only, between January 1, 2000 and December 31, 2016, aged between 0 and 21 at the time of resection of the brain tumor. Patients treated at a total of 33 out of 814 institutions participating in this data warehouse met the criteria for inclusion in the ML model development and

Exploratory analyses to characterize available data often require full table scans, which, in conventional RDBMS tables having billions of rows, may entail runtimes of many hours, even with bitmapped indexes and careful query optimization. Laboratory tests and vital signs and flowsheet items in our data warehouse are each multi-billion-row tables. Premature dimensionality or cardinality reduction may interfere with discovering the best ML model. Therefore, a 64-node Hewlett Packard Vertica® system was the means whereby the data warehouse was physically stored for the present work. Extracts were performed using standard SQL queries on this massively parallel vertical database. Although many racial and ethnic categories were represented in the data warehouse, for the present work racial categories

were restricted to black and white, for reasons of adequacy of sample size.

; persons with previously diagnosed heart failure, asthma,

comparatively small cohort (MRI in pediatric medulloblastoma (brain cancer)).

type categories.

**2.1. Cohort selection**

74 New Insights into Bayesian Inference

having eGFR <50 mL/min/1.73m<sup>2</sup>

model development and analysis.

analysis.

**2.2. Data extraction**

In our two examples, exploratory machine-learning, including logistic regression, was performed using raw data comprised of 326 data elements from the de-identified EHR-derived extracts, supplemented by derived variables that were transformed. The LASSO procedure [34] was used for dimensionality reduction. Predictor variables with a category-wise Wald test p-value ≤0.05 were retained in the models.

In the post-CAB beta-blocker example, transformed continuous-variable features (6) in the model included: ln(inter-beat interval), RMSSD(HR), ln(nbr\_dx), ln(nbr\_meds), ln(LOS\_ days), and ln(AST). Transformed binomial-variable features included the following: AST/ ALT <1.1, max(HR) < 110, range(HR 48 hr prior to discharge) > 30 bpm, range(RR 48 hr prior to discharge) > 18 bpm, range(MAP 48 hr prior to discharge) > 22 mmHg, max(SBP) during hospital stay) < 150 mmHg; diabetes; concomitant calcium channel blocker; perioperative inotrope or mechanical circulatory assist; concomitant CYP2D6 substrate or inhibitor (esp. antidepressants, antipsychotics, COX-2 inhibitors, amiodarone, or cimetidine); history of substance abuse; and history of syncope, vertigo, postural hypotension, or falling.

In the medulloblastoma repeat MRI example, binomial-variable features included the following: clinical trial enrollment, prior evidence of recurrence or metastasis of tumor, renal impairment such as would be a safety contraindication for MRI contrast, high-risk histology, SHH or WNT genomics, tumor extent at resection, PFS duration, recent 99mTc scan, recent <sup>123</sup>I-mIGB scan, and public payor (Medicaid).

### **3. Comparing model-guided and natural decision-making**

Personalized patient care decisions require considering numerous clinical information items and weighing and combining them according to patient-specific risks and likely benefits. Additionally, considerations of disease etiology and progression as well as on comorbid conditions and concomitant medications or prior treatments that may affect the underlying biological processes or constrain subsequent therapeutic options are required. Yet further, guidelines regarding treatment modalities, risk factors, complications, patient caregiver support, living situation, and costs also influence care decisions. Natural, model-unassisted decision-making yields therapeutic treatment allocations that are the basis of the initial ML models. However, once one or more ML models are deployed and integrated with the users' workflow and decision-making, the guidance and evidence that the models present to the users tends to alter their decision-making and change the rates of allocating specific treatments or diagnostic procedures to individual patients. It is important to assess the fairness of ML models not only prior to their initial commissioning and deployment but also to reassess model fairness in a periodic and ongoing manner post-deployment. Depending on the degree to which an ML model influences users' decision-making it is possible that differences between strata may increase during deployment, and the model-guided data that accrues during the post-deployment period may cause later versions of the ML model to manifest statistically significant biases that were not present in the initial ML model version that was based on purely natural decisional data.

good calibration across the deciles of score values providing the recommendations for discharge beta-blocker prescribing. The distribution of discharge beta-blocker medications in the subset of the cohort who received them was as follows: metoprolol, 68.2%; carvedilol, 14.1%; labetalol, 11.5%; atenolol, 4.7%; propranolol, 0.87%; nebivolol, 0.28%; bisoprolol, 0.17%; nadolol, 0.08%; acebutolol, 0.04%; and pindolol, 0.02%. This distribution is consistent with recently published guidelines [44–47]. Kruskal-Wallis non-parametric ANOVA revealed no statistically significant racial group-associated differences in the proportions of these categories of beta-blockers. Corresponding controlling for age distribution and other factors was performed for the medulloblastoma example. Hosmer-Lemeshow evidence of model calibration was confirmed for the medulloblastoma ML model. With calibration determined to be adequate, we then proceeded to evaluate potential ML model biases using three methods:

Preventing Disparities: Bayesian and Frequentist Methods for Assessing Fairness in Machine-Learning…

http://dx.doi.org/10.5772/intechopen.73176

77

Cochran-Mantel-Haenszel test; beta regression; and Bayesian Model Averaging.

Linear regression with normally distributed errors is probably the most commonly used analysis tool in applied statistics. The pervasiveness of linear regression is based on the fact that random variations in observed data can frequently be well-approximated by a normal distribution with constant variance. If the response variable in a regression model is a rate or percentage, however, the assumption of normally distributed errors is not valid. Because the analysis of rates and proportions is an important issue for many applications, establishing statistically valid analysis tools for dependent variables whose values are on the bounded interval [0,1) has high importance. This is particularly so in applications that assesses the fairness and equitability of proportions of allocated services or resources, including allocations that are mediated by decision-support tools and artificial intelligence (AI) models originating in ML from existing data. Such models aim to represent the relationship between a binary exposure (exposed vs. unexposed) and a binary outcome (success vs. failure). Sometimes the relationship between the two binary variables is influenced by another variable (or variables). One way to adjust for such influence is to stratify on that variable and perform stratified

The Cochran-Mantel-Haenszel test (CMH) is a test of the similarity of the mean rank (across the outcome scale) for groups in stratified 2 × 2 tables with possibly unbalanced stratum sizes and unbalanced group sizes within each stratum. The CMH test has the advantage of only moderate assumptions for calculating the p-value, namely, that the conditional odds ratios of

Cochran-Mantel-Haenszel (CMH) procedure tests the homogeneity of population proportions after taking into account other factors. The CMH test has been utilized for many years in the courts and by regulatory agencies [48–52]. The "training" and "test" data were arranged as a 2 × 2 × N arrays, where race and beta-blocker status comprised the first two dimensions and hospital was the third dimension. In this manner CMH examines one factor (race) and one outcome (discharge beta-blocker), across N subgroups (hospitals). The CMH chi-square tests if there is an interaction or association between the 2 × 2 rows and columns across the N categories. The null hypothesis is that the pooled odds ratio is equal to 1.0, there is no interaction

the strata are in the same direction and similar in magnitude.

**4.1. Cochran-Mantel-Haenszel test**

analysis.

In the post-CAB beta-blocker example, the ML score output would later be consumed by prescribers in computerized physician order-entry (CPOE) apps used to advise the implementing of care in the perioperative CAB patients. Markov Chain Monte Carlo sampling of 11,358 cases in the "training" dataset was performed to determine the rate of historical discharge beta-blocker usage in each decile of ML-model-generated score values. In the serial MRI medulloblastoma follow-up example, the ML score output would later be consumed by prescribers in computerized physician order-entry (CPOE) apps used to advise the implementing of care in pediatric medulloblastoma patients. Markov Chain Monte Carlo sampling of 1207 cases in the "training" dataset was performed to determine the rate of historical serial MRI usage in each decile of ML-model-generated score values.

#### **4. Evaluation approach**

The purpose of fairness auditing in our two examples was to examine the questions (1) whether black patients were less likely to receive beneficial therapy or diagnostic procedures when compared with white patients and (2) whether, in connection with ML model-training on observational data from a large, representative collection of hospitals, an ML decisionsupport model would manifest a statistically significant untoward disparity of therapy or diagnostic procedures prescribing based on race. It was first necessary to determine whether the ML models were adequately calibrated in 'test' cohorts different from the ML modeldiscovery 'training' cohorts. Controlling for age distribution, geographic differences, gender, common contraindications for the treatment-of-interest (discharge beta-blocker post-CAB), and other factors [34–42] is important, to insure adequate statistical power for these assessments and to mitigate confounding [27, 43]. Establishing that the ML model was adequately well-calibrated for each racial group prior to performing procedures to evaluate the presence of discrimination or disparities was performed using the Hosmer-Lemeshow test by model score deciles. For black subjects, the model's HL was χ<sup>2</sup> = 10.9, df = 8, p-value = 0.21, while for white subjects, HL χ<sup>2</sup> = 10.1, and p-value = 0.26, confirming that the ML model scores showed good calibration across the deciles of score values providing the recommendations for discharge beta-blocker prescribing. The distribution of discharge beta-blocker medications in the subset of the cohort who received them was as follows: metoprolol, 68.2%; carvedilol, 14.1%; labetalol, 11.5%; atenolol, 4.7%; propranolol, 0.87%; nebivolol, 0.28%; bisoprolol, 0.17%; nadolol, 0.08%; acebutolol, 0.04%; and pindolol, 0.02%. This distribution is consistent with recently published guidelines [44–47]. Kruskal-Wallis non-parametric ANOVA revealed no statistically significant racial group-associated differences in the proportions of these categories of beta-blockers. Corresponding controlling for age distribution and other factors was performed for the medulloblastoma example. Hosmer-Lemeshow evidence of model calibration was confirmed for the medulloblastoma ML model. With calibration determined to be adequate, we then proceeded to evaluate potential ML model biases using three methods: Cochran-Mantel-Haenszel test; beta regression; and Bayesian Model Averaging.

#### **4.1. Cochran-Mantel-Haenszel test**

guidelines regarding treatment modalities, risk factors, complications, patient caregiver support, living situation, and costs also influence care decisions. Natural, model-unassisted decision-making yields therapeutic treatment allocations that are the basis of the initial ML models. However, once one or more ML models are deployed and integrated with the users' workflow and decision-making, the guidance and evidence that the models present to the users tends to alter their decision-making and change the rates of allocating specific treatments or diagnostic procedures to individual patients. It is important to assess the fairness of ML models not only prior to their initial commissioning and deployment but also to reassess model fairness in a periodic and ongoing manner post-deployment. Depending on the degree to which an ML model influences users' decision-making it is possible that differences between strata may increase during deployment, and the model-guided data that accrues during the post-deployment period may cause later versions of the ML model to manifest statistically significant biases that were not present in the initial ML model version that was

In the post-CAB beta-blocker example, the ML score output would later be consumed by prescribers in computerized physician order-entry (CPOE) apps used to advise the implementing of care in the perioperative CAB patients. Markov Chain Monte Carlo sampling of 11,358 cases in the "training" dataset was performed to determine the rate of historical discharge beta-blocker usage in each decile of ML-model-generated score values. In the serial MRI medulloblastoma follow-up example, the ML score output would later be consumed by prescribers in computerized physician order-entry (CPOE) apps used to advise the implementing of care in pediatric medulloblastoma patients. Markov Chain Monte Carlo sampling of 1207 cases in the "training" dataset was performed to determine the rate of historical serial

The purpose of fairness auditing in our two examples was to examine the questions (1) whether black patients were less likely to receive beneficial therapy or diagnostic procedures when compared with white patients and (2) whether, in connection with ML model-training on observational data from a large, representative collection of hospitals, an ML decisionsupport model would manifest a statistically significant untoward disparity of therapy or diagnostic procedures prescribing based on race. It was first necessary to determine whether the ML models were adequately calibrated in 'test' cohorts different from the ML modeldiscovery 'training' cohorts. Controlling for age distribution, geographic differences, gender, common contraindications for the treatment-of-interest (discharge beta-blocker post-CAB), and other factors [34–42] is important, to insure adequate statistical power for these assessments and to mitigate confounding [27, 43]. Establishing that the ML model was adequately well-calibrated for each racial group prior to performing procedures to evaluate the presence of discrimination or disparities was performed using the Hosmer-Lemeshow test by model score deciles. For black subjects, the model's HL was χ<sup>2</sup> = 10.9, df = 8, p-value = 0.21, while for white subjects, HL χ<sup>2</sup> = 10.1, and p-value = 0.26, confirming that the ML model scores showed

based on purely natural decisional data.

76 New Insights into Bayesian Inference

**4. Evaluation approach**

MRI usage in each decile of ML-model-generated score values.

Linear regression with normally distributed errors is probably the most commonly used analysis tool in applied statistics. The pervasiveness of linear regression is based on the fact that random variations in observed data can frequently be well-approximated by a normal distribution with constant variance. If the response variable in a regression model is a rate or percentage, however, the assumption of normally distributed errors is not valid. Because the analysis of rates and proportions is an important issue for many applications, establishing statistically valid analysis tools for dependent variables whose values are on the bounded interval [0,1) has high importance. This is particularly so in applications that assesses the fairness and equitability of proportions of allocated services or resources, including allocations that are mediated by decision-support tools and artificial intelligence (AI) models originating in ML from existing data. Such models aim to represent the relationship between a binary exposure (exposed vs. unexposed) and a binary outcome (success vs. failure). Sometimes the relationship between the two binary variables is influenced by another variable (or variables). One way to adjust for such influence is to stratify on that variable and perform stratified analysis.

The Cochran-Mantel-Haenszel test (CMH) is a test of the similarity of the mean rank (across the outcome scale) for groups in stratified 2 × 2 tables with possibly unbalanced stratum sizes and unbalanced group sizes within each stratum. The CMH test has the advantage of only moderate assumptions for calculating the p-value, namely, that the conditional odds ratios of the strata are in the same direction and similar in magnitude.

Cochran-Mantel-Haenszel (CMH) procedure tests the homogeneity of population proportions after taking into account other factors. The CMH test has been utilized for many years in the courts and by regulatory agencies [48–52]. The "training" and "test" data were arranged as a 2 × 2 × N arrays, where race and beta-blocker status comprised the first two dimensions and hospital was the third dimension. In this manner CMH examines one factor (race) and one outcome (discharge beta-blocker), across N subgroups (hospitals). The CMH chi-square tests if there is an interaction or association between the 2 × 2 rows and columns across the N categories. The null hypothesis is that the pooled odds ratio is equal to 1.0, there is no interaction between rows and columns. Rejection of H0 indicates that interaction exists. Calculation of the CMH test may be performed via the cmh.test() function in the R package 'lawstat' (https:// cran.r-project.org/package=lawstat) or by other conventional means.

In the post-CAB beta-blocker analysis (**Table 1**), the CMH statistic = 5.84, df = 1, p-value = 0.016, MH Estimate = 1.23, Pooled Odds Ratio = 1.35, such that, rather than representing a disadvantage, black race in this male cohort conferred a slight advantage, with a modest increase in the likelihood of receiving discharge beta-blocker post-CAB compared to men who were white.

In the medulloblastoma repeat MRI analysis (**Table 2**), the CMH statistic = 39.8, df = 1, p-value <0.0001, MH estimate = 0.33, Pooled odds ratio = 0.35, such that children of black race in this cohort have a statistically lower likelihood of receiving serial MRI exams compared to children who were white.

between these), which we denote by *μx*

**Race MRI + MRI - Prevalence (MRI+)**

**Table 2.** Prevalence of serial MRI utilization, post-medulloblastoma resection.

Black 112 71 61.3 63.4 White 837 187 81.7 80.9

eter to depend on covariates. We have:

use the scale link to ensure that *ψ* > 0.

*g*(*μx*) = *x*, or, equivalently, *μx* = *g*<sup>−</sup><sup>1</sup>

where *g*<sup>−</sup>1(∙) is the inverse function of *g*(∙). Here the default logit link implies that

statistical literature. The conditional variance of the beta distribution is:

ensure that *μx*

. Because *y* is on the open interval (0, 1), we must

**Actual (Training) (%) Model-guided (Test) (%)**

http://dx.doi.org/10.5772/intechopen.73176

79

(*x*) (1)

is also in [0, 1). We do this by using a link function for the conditional mean,

Preventing Disparities: Bayesian and Frequentist Methods for Assessing Fairness in Machine-Learning…

denoted *g*(∙). This is necessary because linear combinations of the covariates are not otherwise restricted to [0, 1). Beta regression is widely used because of its flexibility for modeling variables whose values are constrained to lie between 0 and 1 and because its predictions are confined to the same range [53, 54]. Beta regression models were proposed by Ferrari and Cribari-Neto [55, 56] and extended by Smithson and Verkuilen [54] to allow the scale param-

*ln*{*μx* /(1 − *μx*)} = *x*, and that *μx* = *exp*(*x*)/{1 + *exp*(*x*)}. (2)

Using a link function to keep the conditional-mean model inside an interval is common in the

*var*(*y* | *x*) = { *μx* (1 − *μx* ) }/(1 + *ψ*). (3)

The parameter *ψ* is known as the scale factor because it rescales the conditional variance. We

Beta regression models have applications in a variety of disciplines, such as economics, the social sciences, and health science. For example, in political science and in the law, beta regression has been utilized in determining noncompliance with antidiscrimination laws [52]. In psychology, Smithson [57] used beta regression to evaluate jurors' assessments of the probability of a defendant's guilt and their verdicts in trial courts. Beta regression has also been

Where necessary, outcome observations (the proportion of cases receiving discharge betablocker post-CAB) were transformed to the open unit interval (0, 1), adding a very small amount (0.001) to the zero-valued observations and subtracting the same amount from the one-valued observations. Beta regression was performed via the betareg() function in the R package 'betareg' (https://cran.r-project.org/package=betareg) but may also be accomplished by other similar algorithms in other statistics packages. Beta regression (**Table 3**) produces

used to model quality-adjusted life years in health cost-effectiveness studies [58, 59].

#### **4.2. Beta regression**

Note that if the we see very different odds ratios for the strata, that suggest the variable used to separate the data into strata (race, in these examples) is a confounder and, if so, the Mantel-Haenszel odds ratio is not a valid measure of significance. To test whether the odds ratios in the different strata are different, we calculate Tarone's test of homogeneity using the rma.mh() function from the R package metafor. If some odds ratios are <1 and other odds ratios are >1, or if the Tarone test p-value <0.05, then the CMH test is not valid or appropriate. Thus, a disadvantage of CMH is that the circumstance of violation of its assumptions does occur comparatively often (for example, if the stratifying factor can confer protection for one value and excess risk for another value). Therefore, we sought additional methods that do not have this limitation.

One such alternative method that is able to address model rates and proportions is beta regression. Beta regression is based on the assumption that the response is beta-distributed on the unit interval [0,1). The beta density can assume a number of different shapes depending on the combination of parameter values, including left- and right-skewed or the flat shape of the uniform density. Beta regression models can allow for heteroskedasticity and can accommodate both variable dispersion and asymmetrical distributions. An additional advantage is that the regression parameters are interpretable in terms of the mean of the outcome variable.

The measure of association between the predictor variables and the outcome from the beta regression is expressed as a relative proportion ratio [53–56]. Beta regression is a model of the mean of the dependent variable *y* (likelihood of discharge beta-blocker) conditioned on covariates *x* (race, ML model-guided recommendation for beta-blocker, and the interaction


**Table 1.** Prevalence of discharge beta-blocker utilization, post-CAB.


**Table 2.** Prevalence of serial MRI utilization, post-medulloblastoma resection.

between rows and columns. Rejection of H0

dren who were white.

78 New Insights into Bayesian Inference

**4.2. Beta regression**

indicates that interaction exists. Calculation of the

**Actual (Training) (%) Model-guided (Test) (%)**

CMH test may be performed via the cmh.test() function in the R package 'lawstat' (https://

In the post-CAB beta-blocker analysis (**Table 1**), the CMH statistic = 5.84, df = 1, p-value = 0.016, MH Estimate = 1.23, Pooled Odds Ratio = 1.35, such that, rather than representing a disadvantage, black race in this male cohort conferred a slight advantage, with a modest increase in the likelihood of receiving discharge beta-blocker post-CAB compared to men who were white. In the medulloblastoma repeat MRI analysis (**Table 2**), the CMH statistic = 39.8, df = 1, p-value <0.0001, MH estimate = 0.33, Pooled odds ratio = 0.35, such that children of black race in this cohort have a statistically lower likelihood of receiving serial MRI exams compared to chil-

Note that if the we see very different odds ratios for the strata, that suggest the variable used to separate the data into strata (race, in these examples) is a confounder and, if so, the Mantel-Haenszel odds ratio is not a valid measure of significance. To test whether the odds ratios in the different strata are different, we calculate Tarone's test of homogeneity using the rma.mh() function from the R package metafor. If some odds ratios are <1 and other odds ratios are >1, or if the Tarone test p-value <0.05, then the CMH test is not valid or appropriate. Thus, a disadvantage of CMH is that the circumstance of violation of its assumptions does occur comparatively often (for example, if the stratifying factor can confer protection for one value and excess risk for another value). Therefore, we sought additional methods that do not have this limitation. One such alternative method that is able to address model rates and proportions is beta regression. Beta regression is based on the assumption that the response is beta-distributed on the unit interval [0,1). The beta density can assume a number of different shapes depending on the combination of parameter values, including left- and right-skewed or the flat shape of the uniform density. Beta regression models can allow for heteroskedasticity and can accommodate both variable dispersion and asymmetrical distributions. An additional advantage is that the regression parameters are interpretable in terms of the mean of the outcome variable. The measure of association between the predictor variables and the outcome from the beta regression is expressed as a relative proportion ratio [53–56]. Beta regression is a model of the mean of the dependent variable *y* (likelihood of discharge beta-blocker) conditioned on covariates *x* (race, ML model-guided recommendation for beta-blocker, and the interaction

cran.r-project.org/package=lawstat) or by other conventional means.

**Race Beta-blocker + Beta-blocker - Prevalence (BB+)**

**Table 1.** Prevalence of discharge beta-blocker utilization, post-CAB.

Black 953 307 75.6 73.3 White 7039 3059 69.7 67.8 between these), which we denote by *μx* . Because *y* is on the open interval (0, 1), we must ensure that *μx* is also in [0, 1). We do this by using a link function for the conditional mean, denoted *g*(∙). This is necessary because linear combinations of the covariates are not otherwise restricted to [0, 1). Beta regression is widely used because of its flexibility for modeling variables whose values are constrained to lie between 0 and 1 and because its predictions are confined to the same range [53, 54]. Beta regression models were proposed by Ferrari and Cribari-Neto [55, 56] and extended by Smithson and Verkuilen [54] to allow the scale parameter to depend on covariates. We have:

$$\mathcal{g}(\mu\_{\,\,\nu}) = \text{x}\emptyset \text{ or, equivalently, } \mu\_{\,\,\nu} = \mathcal{g}^{-1}(\text{x}\emptyset) \tag{1}$$

where *g*<sup>−</sup>1(∙) is the inverse function of *g*(∙). Here the default logit link implies that

$$\ln\left\{\mu\_{\rm x}/(1-\mu\_{\rm x})\right\} = \ge \theta\_{\rm t} \text{ and that } \mu\_{\rm x} = \exp(\ge \emptyset)/\left\{1 + \exp(\ge \emptyset)\right\}.\tag{2}$$

Using a link function to keep the conditional-mean model inside an interval is common in the statistical literature. The conditional variance of the beta distribution is:

$$\text{var}(\mathcal{Y} \mid \mathbf{x}) = \{\; \left\{ \mu\_x \left( 1 - \mu\_x \right) \right\} / (1 + \psi) . \tag{3}$$

The parameter *ψ* is known as the scale factor because it rescales the conditional variance. We use the scale link to ensure that *ψ* > 0.

Beta regression models have applications in a variety of disciplines, such as economics, the social sciences, and health science. For example, in political science and in the law, beta regression has been utilized in determining noncompliance with antidiscrimination laws [52]. In psychology, Smithson [57] used beta regression to evaluate jurors' assessments of the probability of a defendant's guilt and their verdicts in trial courts. Beta regression has also been used to model quality-adjusted life years in health cost-effectiveness studies [58, 59].

Where necessary, outcome observations (the proportion of cases receiving discharge betablocker post-CAB) were transformed to the open unit interval (0, 1), adding a very small amount (0.001) to the zero-valued observations and subtracting the same amount from the one-valued observations. Beta regression was performed via the betareg() function in the R package 'betareg' (https://cran.r-project.org/package=betareg) but may also be accomplished by other similar algorithms in other statistics packages. Beta regression (**Table 3**) produces


**4.3. Bayesian model averaging**

advantages.

model *p*(*<sup>y</sup>* <sup>|</sup> *Mi*

probability *p*(*Mi*

In our experience, beta regression and CMH are sufficient for ascertaining the fairness of ML-derived models in many situations. However, if the strata are markedly unbalanced or if the data are not satisfactorily fitted by a beta distribution, these methods may give either false-positive or false-negative results. Also, percentage outcomes that are based on the binomial model are often overdispersed, meaning that they show a larger variability than expected by the binomial distribution. Beta regression models usually account for overdispersion by including the precision parameter phi to adjust the conditional variance of the percentage outcome, but this fixed parameterization involves an *ad hoc* choice by the analyst and may be unstable or yield poor goodness-of-fit when the data are heteroskedastic. Yet further, beta regression tends to require relatively large sample sizes to power interpretations of statistical significance. Therefore, we seek additional methods that are robust against these conditions. In that regard, Bayesian model averaging (BMA) offers particular

Preventing Disparities: Bayesian and Frequentist Methods for Assessing Fairness in Machine-Learning…

BMA is a relatively recently developed method that addresses model uncertainty in the canonical regression variables selection problem [60–64]. If we assume a linear model struc-

*y* = *α<sup>i</sup>* + *Xi β<sup>i</sup>* + ϵ, *ε* ~ *N*(0, *σ*<sup>2</sup> *I*) (4)

High dimensionality interferes with stable variables selection. Small cohort size or collinearity of potential explanatory variables in matrix *X* may increase the risk of over-fitting and retention of some variables *Xi* ∈ { *X* } which should not be included in the model. Stepwise variables elimination starting from the null linear model that includes all variables may be statistically

BMA addresses the problem by estimating models for all, or a very large number of, possible combinations of { *X* } and constructing a weighted average over all of them. If there are *K* potential variables, this means estimating 2*<sup>K</sup>* variable combinations and therefore 2*<sup>K</sup>* models. The model weights for model averaging arise from posterior model probabilities which, in

> , *X*) *p*(*Mi* )

Here, *p*(*y* | *X*) is the integrated likelihood, which is constant over all models. Therefore, the posterior model probability (PMP) *p*(*Mi* <sup>|</sup> *<sup>y</sup>*, *<sup>X</sup>*) is proportional to the marginal likelihood of the

, *<sup>X</sup>*) (the likelihood of the observed data, given model *Mi*

*<sup>p</sup>*(*<sup>y</sup>* <sup>|</sup> *<sup>X</sup>*) <sup>=</sup> *<sup>p</sup>*(*<sup>y</sup>* <sup>|</sup> *Mi*

); that is, how probable the machine-learning analyst believes model *Mi*

∑ *j*=1 2*K p*(*y* | *Mj*

\_\_\_\_\_\_\_\_\_\_\_\_\_

then we have:

are constants, *β<sup>i</sup>*

, *X*) *p*(*Mi* )

, *X*) *p*(*Mj*)

\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_

are coefficients, and

http://dx.doi.org/10.5772/intechopen.73176

81

(5)

to

) times a prior model

ture, where *y* is the dependent variable to be predicted, *α<sup>i</sup>*

*ε* is a normal IID error term with variance σ<sup>2</sup>

unsupportable if the cohort size is small.

turn, are denoted by Bayes' theorem:

*<sup>p</sup>*(*Mi* <sup>|</sup> *<sup>y</sup>*, *<sup>X</sup>*) <sup>=</sup> *<sup>p</sup>*(*<sup>y</sup>* <sup>|</sup> *Mi*

This shows that discharge beta-blocker rate increases with Score\_percentile and is slightly higher (for blacks), and there is no significant interaction (annotated as #) between Score\_percentile and black race. This evidence corroborates that from the Cochran-Mantel-Haenszel test, regarding the absence of disadvantage under the ML model for black men compared to white men for post-CAB discharge beta-blocker recommendation. Precision is asymmetric and heteroskedastic. Precision (phi) increases with Score\_percentile.

**Table 3.** Beta regression of discharge beta-blocker utilization, post-CAB.

estimated coefficients of the covariates and an estimated scale parameter. The coefficient of the factor variable for race = Black is significant at the p < 0.05 level and positive. Thus we conclude that, rather than posing a hazard, Black race in these 14 institutions during this 5-year period, actually conferred a slight advantage in terms of the likelihood of a male patient's receiving standard-of-care discharge on a beta-blocker, status-post open coronary artery bypass.

Corresponding beta regression (**Table 4**) was performed for predictive recommendations from our second example ML model derived from repeat MRI pediatric medulloblastoma data from 33 institutions.


This shows that the rate of serial MRI exams increases with Score\_percentile and in the mean equation there is potentially a weak, mildly negative interaction (annotated as #) between Score\_percentile and black race. This is weak evidence consistent with the hypothesis that a disparity may exist under our initial, empirically discovered ML model, between black and white children with medulloblastoma with regard to recommendation of serial MRI scans in treatment followup. Precision (phi) is not significantly asymmetric or heteroskedastic in this example dataset.

**Table 4.** Beta regression of serial MRI utilization, post-medulloblastoma resection.

#### **4.3. Bayesian model averaging**

estimated coefficients of the covariates and an estimated scale parameter. The coefficient of the factor variable for race = Black is significant at the p < 0.05 level and positive. Thus we conclude that, rather than posing a hazard, Black race in these 14 institutions during this 5-year period, actually conferred a slight advantage in terms of the likelihood of a male patient's receiving

This shows that discharge beta-blocker rate increases with Score\_percentile and is slightly higher (for blacks), and there is no significant interaction (annotated as #) between Score\_percentile and black race. This evidence corroborates that from the Cochran-Mantel-Haenszel test, regarding the absence of disadvantage under the ML model for black men compared to white men for post-CAB discharge beta-blocker recommendation. Precision is asymmetric and heteroskedastic.

Corresponding beta regression (**Table 4**) was performed for predictive recommendations from our second example ML model derived from repeat MRI pediatric medulloblastoma

This shows that the rate of serial MRI exams increases with Score\_percentile and in the mean equation there is potentially a weak, mildly negative interaction (annotated as #) between Score\_percentile and black race. This is weak evidence consistent with the hypothesis that a disparity may exist under our initial, empirically discovered ML model, between black and white children with medulloblastoma with regard to recommendation of serial MRI scans in treatment follow-

up. Precision (phi) is not significantly asymmetric or heteroskedastic in this example dataset.

**Table 4.** Beta regression of serial MRI utilization, post-medulloblastoma resection.

standard-of-care discharge on a beta-blocker, status-post open coronary artery bypass.

**Covariate Estimate Std error p-Value** (Intercept) −1.136 0.067 < 0.0001 Black 0.201 0.067 0.0026 Score\_percentile 4.875 0.119 < 0.0001 Black:Score\_percentile −0.033# 0.118 0.7780 (phi)\_(Intercept) 2.434 0.160 < 0.0001 (phi)\_Black −0.093 0.099 0.3466 (phi)\_Score\_percentile 2.394 0.255 < 0.0001

**Covariate Estimate Std error p-Value** (Intercept) −2.399 0.220 < 0.0001 Black 0.036 0.219 0.7813 Score\_percentile 5.030 0.293 < 0.0001 Black:Score\_percentile −0.387# 0.290 0.1821 (phi)\_(Intercept) 2.232 0.289 < 0.0001 (phi)\_Black 0.061 0.122 0.6162 (phi)\_Score\_percentile −0.023 0.410 0.9562

data from 33 institutions.

Precision (phi) increases with Score\_percentile.

80 New Insights into Bayesian Inference

**Table 3.** Beta regression of discharge beta-blocker utilization, post-CAB.

In our experience, beta regression and CMH are sufficient for ascertaining the fairness of ML-derived models in many situations. However, if the strata are markedly unbalanced or if the data are not satisfactorily fitted by a beta distribution, these methods may give either false-positive or false-negative results. Also, percentage outcomes that are based on the binomial model are often overdispersed, meaning that they show a larger variability than expected by the binomial distribution. Beta regression models usually account for overdispersion by including the precision parameter phi to adjust the conditional variance of the percentage outcome, but this fixed parameterization involves an *ad hoc* choice by the analyst and may be unstable or yield poor goodness-of-fit when the data are heteroskedastic. Yet further, beta regression tends to require relatively large sample sizes to power interpretations of statistical significance. Therefore, we seek additional methods that are robust against these conditions. In that regard, Bayesian model averaging (BMA) offers particular advantages.

BMA is a relatively recently developed method that addresses model uncertainty in the canonical regression variables selection problem [60–64]. If we assume a linear model structure, where *y* is the dependent variable to be predicted, *α<sup>i</sup>* are constants, *β<sup>i</sup>* are coefficients, and *ε* is a normal IID error term with variance σ<sup>2</sup> then we have:

$$y = \alpha\_i + X\_i \beta\_i + \varepsilon\_i \quad \varepsilon \quad \text{\textquotedbl{}N(0, \sigma^2 I)}\tag{4}$$

High dimensionality interferes with stable variables selection. Small cohort size or collinearity of potential explanatory variables in matrix *X* may increase the risk of over-fitting and retention of some variables *Xi* ∈ { *X* } which should not be included in the model. Stepwise variables elimination starting from the null linear model that includes all variables may be statistically unsupportable if the cohort size is small.

BMA addresses the problem by estimating models for all, or a very large number of, possible combinations of { *X* } and constructing a weighted average over all of them. If there are *K* potential variables, this means estimating 2*<sup>K</sup>* variable combinations and therefore 2*<sup>K</sup>* models. The model weights for model averaging arise from posterior model probabilities which, in turn, are denoted by Bayes' theorem:

$$p(\mathcal{M}\_{\boldsymbol{\cdot}} \mid \boldsymbol{y}\_{\boldsymbol{\cdot}}, \boldsymbol{X}) = \frac{p(\boldsymbol{y} \mid \boldsymbol{M}\_{\boldsymbol{\cdot}}, \boldsymbol{X}) p(\boldsymbol{M}\_{\boldsymbol{\cdot}})}{p(\boldsymbol{y} \mid \boldsymbol{X})} = \frac{p(\boldsymbol{y} \mid \boldsymbol{M}\_{\boldsymbol{\cdot}}, \boldsymbol{X}) \, p(\boldsymbol{M}\_{\boldsymbol{\cdot}})}{\sum\_{j=1}^{s} p(\boldsymbol{y} \mid \boldsymbol{M}\_{j}, \boldsymbol{X}) \, p(\boldsymbol{M}\_{j})} \tag{5}$$

Here, *p*(*y* | *X*) is the integrated likelihood, which is constant over all models. Therefore, the posterior model probability (PMP) *p*(*Mi* <sup>|</sup> *<sup>y</sup>*, *<sup>X</sup>*) is proportional to the marginal likelihood of the model *p*(*<sup>y</sup>* <sup>|</sup> *Mi* , *<sup>X</sup>*) (the likelihood of the observed data, given model *Mi* ) times a prior model probability *p*(*Mi* ); that is, how probable the machine-learning analyst believes model *Mi* to be before looking at the data. Renormalization then leads to the PMPs and thus the model weighted posterior distribution for any statistic *θ* (for example, the coefficients *β<sup>i</sup>* ):

$$p(\boldsymbol{\theta} \mid \boldsymbol{y}, \boldsymbol{X}) = \sum\_{i=1}^{2^{\kappa}} p(\boldsymbol{\theta} \mid \boldsymbol{M}\_{\boldsymbol{\prime}} \boldsymbol{y}, \boldsymbol{X}) \, p(\boldsymbol{M}\_{\boldsymbol{\prime}} \mid \boldsymbol{X}, \boldsymbol{y}) \tag{6}$$

models it encounters during the iterations executed. Since the time for updating the iteration counts for the 'top' models grows linearly with their number, the sampler becomes considerably slower the more 'top' models that are retained. Still, if they are sufficiently numerous, those best models can accurately represent most of posterior model cumulative probability. In this case, it is defensible to base posterior statistics on analytical likelihoods instead of MCMC

Preventing Disparities: Bayesian and Frequentist Methods for Assessing Fairness in Machine-Learning…

http://dx.doi.org/10.5772/intechopen.73176

83

For the post-CAB beta-blocker at discharge example, **Table 5** shows features of the 10 topranked models generated by BMA MCMC sampling, together with the cumulative inclusion

With regard to prescribing of beta-blocker medication at discharge from hospital post-CAB coronary revascularization, Bayesian model averaging yields evidence that models omitting race have higher Posterior Model Probability (PMP) than models that retain race as a feature, and race exhibits low inclusion probability. These findings are compatible with the results of CMH and beta regression and support the hypothesis that no untoward racial bias is present in this ML model. Were this ML decision-support model put into production use to guide prescribing, it is unlikely that it would manifest racially discriminatory or unjust

For the medulloblastoma follow-up example, **Table 6** likewise shows features of the 10 topranked models generated by BMA MCMC sampling, together with the cumulative inclusion

**Variable M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 Incl.** 

Severe CHF X X X X X X X X X X 1.00 Asthma X X X X X X X X X X 1.00 Bradycardia X X X X X X X X X 1.00 Heart block X X X X X X X 1.00 Pacemaker X X X X X X 0.91

Pressors or inotrope X X X X X X X 0.79 IABP or VAD X X X X X X 0.62 Urgent-emergent X X X X X X 0.54 Age X X X X 0.21 Race X X 0.13

X X X X X 1.00

X X X X X X 0.88

0.021 0.012 0.011 0.010 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01

**Prob.**

probability for each feature summed over the models evaluated.

probability for each feature summed over the models evaluated.

frequencies.

recommendations.

Hypotension (SBP < 105 mm)

Polypharmacy (Nmeds > 10)

Posterior Model Probability

**Table 5.** BMA of discharge beta-blocker utilization, post-CAB.

The model prior *p*(*Mi* ) is elicited by the machine-learning researcher and reflects prior beliefs [65, 66], some of which may come from published research literature [67]. In the absence of other guidance or historical knowledge, a routine option is to assume a uniform prior probability for all models *p*(*Mi* ) ∝ 1 to represent the absence of a well-established prior.

The expressions for posterior distributions *p*( *<sup>θ</sup>* <sup>|</sup> *Mi* , *<sup>y</sup>*, *<sup>X</sup>*) and for marginal likelihoods *p*(*Mi* <sup>|</sup> *<sup>y</sup>*, *<sup>X</sup>*) depend on the model estimation framework. Routine practice is to use a Bayesian linear model with a prior structure called Zellner's *g* prior [68, 69]. For each candidate model *Mi* a normal-distributed error structure is assumed, as in Eq. (1). The need to determine posterior distributions requires that one specify priors on the model parameters. In practice, one sets provisional priors on the constants and on error variance, typically distributed as *p*(*α<sup>i</sup>* ) ∝ 1, meaning complete prior uncertainty about the prior mean, and *p*(*σ*) ∝ *σ*<sup>−</sup><sup>1</sup> .

The most influential prior is the one on the coefficients *β<sup>i</sup>* . Before analyzing the data (*y*, *X*), the analyst proposes priors on the coefficients *β<sup>i</sup>* , typically normally distributed with a specified mean and variance. In the context of ML model fairness evaluations, we assume a prior mean of zero for the coefficients to assert that not much is known about them. In our work, their variance structure is defined by Zellner's *g*:

$$\beta\_i \mid g \sim \mathcal{N}\left(0, \sigma^2 \left(\frac{1}{\mathcal{J}} X\_i' X\_i\right)^{-1}\right) \tag{7}$$

The hyperparameter *g* embodies how certain the analyst is that coefficients are zero: A small *g* means small prior coefficient variance and therefore implies the analyst is quite certain that the coefficients are indeed approximately zero. By contrast, a large *g* means that the analyst is very uncertain about whether the variables' coefficients are statistically significant, as in the case of our work on ML model fairness evaluations with regard to racial bias.

In general, the more complicated the distribution of marginal likelihoods, the more difficulties a Bayesian (Gibbs, Markov Chain Monte Carlo) sampler will encounter before converging to a good approximation of posterior model probabilities (PMPs). The quality of approximation may be inferred from the number of times a model got drawn versus their actual marginal likelihoods. Partly for this reason, BMA retains a pre-specified number of models with the highest PMPs encountered during MCMC sampling, for which PMPs and draw counts are stored. Their respective distributions and their correlation indicate how well the sampler has converged. While BMA should usually compare as many models as possible, some considerations might dictate the restriction to a subspace of the 2K models. By far the most common setting is to keep some regressors fixed in the model setting, and apply Bayesian Model uncertainty only to a subset of regressors. However, due to physical RAM memory limits, the sampling chain can retain fewer than 1000,000 of these models. Instead, BMA computes aggregate statistics on-the-fly, usually using iteration counts as surrogate model weights. For model convergence and some posterior statistics BMA retains only the 'top' (highest PMP) models it encounters during the iterations executed. Since the time for updating the iteration counts for the 'top' models grows linearly with their number, the sampler becomes considerably slower the more 'top' models that are retained. Still, if they are sufficiently numerous, those best models can accurately represent most of posterior model cumulative probability. In this case, it is defensible to base posterior statistics on analytical likelihoods instead of MCMC frequencies.

be before looking at the data. Renormalization then leads to the PMPs and thus the model

*p*(*θ* | *Mi*

[65, 66], some of which may come from published research literature [67]. In the absence of other guidance or historical knowledge, a routine option is to assume a uniform prior prob-

depend on the model estimation framework. Routine practice is to use a Bayesian linear model with a prior structure called Zellner's *g* prior [68, 69]. For each candidate model *Mi*

normal-distributed error structure is assumed, as in Eq. (1). The need to determine posterior distributions requires that one specify priors on the model parameters. In practice, one sets provisional priors on the constants and on error variance, typically distributed as *p*(*α<sup>i</sup>*

mean and variance. In the context of ML model fairness evaluations, we assume a prior mean of zero for the coefficients to assert that not much is known about them. In our work, their

> ( \_\_1 *<sup>g</sup> Xi* ′ *Xi*) −1

The hyperparameter *g* embodies how certain the analyst is that coefficients are zero: A small *g* means small prior coefficient variance and therefore implies the analyst is quite certain that the coefficients are indeed approximately zero. By contrast, a large *g* means that the analyst is very uncertain about whether the variables' coefficients are statistically significant, as in the

In general, the more complicated the distribution of marginal likelihoods, the more difficulties a Bayesian (Gibbs, Markov Chain Monte Carlo) sampler will encounter before converging to a good approximation of posterior model probabilities (PMPs). The quality of approximation may be inferred from the number of times a model got drawn versus their actual marginal likelihoods. Partly for this reason, BMA retains a pre-specified number of models with the highest PMPs encountered during MCMC sampling, for which PMPs and draw counts are stored. Their respective distributions and their correlation indicate how well the sampler has converged. While BMA should usually compare as many models as possible, some considerations might dictate the restriction to a subspace of the 2K models. By far the most common setting is to keep some regressors fixed in the model setting, and apply Bayesian Model uncertainty only to a subset of regressors. However, due to physical RAM memory limits, the sampling chain can retain fewer than 1000,000 of these models. Instead, BMA computes aggregate statistics on-the-fly, usually using iteration counts as surrogate model weights. For model convergence and some posterior statistics BMA retains only the 'top' (highest PMP)

case of our work on ML model fairness evaluations with regard to racial bias.

) is elicited by the machine-learning researcher and reflects prior beliefs

) ∝ 1 to represent the absence of a well-established prior.

):

a

) ∝ 1,

, *y*, *X*) *p*(*Mi* |*X*, *y*) (6)

, *<sup>y</sup>*, *<sup>X</sup>*) and for marginal likelihoods *p*(*Mi* <sup>|</sup> *<sup>y</sup>*, *<sup>X</sup>*)

.

, typically normally distributed with a specified

. Before analyzing the data (*y*, *X*), the

) (7)

weighted posterior distribution for any statistic *θ* (for example, the coefficients *β<sup>i</sup>*

*i*=1 2*K*

meaning complete prior uncertainty about the prior mean, and *p*(*σ*) ∝ *σ*<sup>−</sup><sup>1</sup>

*p*(*θ* |*y*, *X*) = ∑

The expressions for posterior distributions *p*( *<sup>θ</sup>* <sup>|</sup> *Mi*

The most influential prior is the one on the coefficients *β<sup>i</sup>*

analyst proposes priors on the coefficients *β<sup>i</sup>*

variance structure is defined by Zellner's *g*:

*β<sup>i</sup>* ∣ *g* ~ *N*(0, *σ*<sup>2</sup>

The model prior *p*(*Mi*

82 New Insights into Bayesian Inference

ability for all models *p*(*Mi*

For the post-CAB beta-blocker at discharge example, **Table 5** shows features of the 10 topranked models generated by BMA MCMC sampling, together with the cumulative inclusion probability for each feature summed over the models evaluated.

With regard to prescribing of beta-blocker medication at discharge from hospital post-CAB coronary revascularization, Bayesian model averaging yields evidence that models omitting race have higher Posterior Model Probability (PMP) than models that retain race as a feature, and race exhibits low inclusion probability. These findings are compatible with the results of CMH and beta regression and support the hypothesis that no untoward racial bias is present in this ML model. Were this ML decision-support model put into production use to guide prescribing, it is unlikely that it would manifest racially discriminatory or unjust recommendations.

For the medulloblastoma follow-up example, **Table 6** likewise shows features of the 10 topranked models generated by BMA MCMC sampling, together with the cumulative inclusion probability for each feature summed over the models evaluated.


**Table 5.** BMA of discharge beta-blocker utilization, post-CAB.


perpetuate or propagate injustices that are manifested in the historical data that are utilized to train the machine-learning models [2, 7]. Despite the increasing attention to this issue, as yet it is unclear whether the goals of fairness and accuracy in ML are conflicting goals [5, 19, 24]. In that connection, the impact of race/ethnicity on health services access, long-term risk factor control, and cardiovascular outcomes among patients has been the subject of intensive study for decades [75–83]. However, significant disparities in cardiovascular management have received less attention [84–89]. Similarly, the current literature has directed scant attention to disparities in cancer care subsequent to diagnosis. In the present era of artificial intelligence, Big Data, and machine-learning, it is a priority that ML-based decision-support tools not manifest untoward disparities. Sensitive methods having statistical power adequate to detect disparity are essential to achieving this goal. Moreover, it is important that such methods be aligned with generally-accepted governance practices in the courts and regulatory agencies. The present work sets forth a three-pronged approach for ascertaining the presence or absence of disparity in ML models, by race, age, gender, or other attributes. We sought to discover strengths and limitations of methods for detecting unfairness in ML-model-guided decisionsupport and, when unfairness is identified, discovering the sources and magnitudes of the disparate effects. In health services, numerous clinical contexts and models and treatment use-cases merit such analyses. However, for simplicity we selected two contexts in which strong consensus does exist regarding what the preferred treatment should be, and in which the consensus has prevailed and remained constant for a sufficient period of time, such that observational data are available for analysis and such that minimal change in the consensus has occurred during the time period for which data are available for analysis. We selected cardiac care and cancer care contexts in which disparities with respect to race are feasible

Preventing Disparities: Bayesian and Frequentist Methods for Assessing Fairness in Machine-Learning…

http://dx.doi.org/10.5772/intechopen.73176

85

Other factors such as socioeconomic status and health services access patterns remain to be studied. The frequency and tenure of accessing the health system are confounded by race and socioeconomic factors. Patients' frequency and tenure also influence what medications patients have been prescribed previously [76], including some medications that may have been discontinued or substituted due to side-effects or non-efficacy reasons, events that influence subsequent considerations for devising or adjusting the patients' medications regimen when new circumstances arise. Nonetheless, we examined one example intervention that has been regarded as 'standard if care' for a long time and whose marginal cost in the U.S. context is so small as to be negligible (beta-blocker medications at hospital discharge post-CAB) and another intervention whose marginal cost in the U.S. is significant (serial MRI exams of head

Beta-blockers have been found to be important in the treatment of myocardial infarction and in coronary artery bypass surgery in that they have been shown to decrease mortality. Their use post-CAB has been standard care for many years [47], conditioned on the absence of significant clinical contraindications to their use in a particular patient. However, prescribing a beta-blocker at the time of discharge from hospital post-CAB remains less consistent than it should be. Of note is that most beta-blocker medications are extensively metabolized by the liver (esp. CYP2D6) and are affected by liver function. Indeed, the concomitant use of

evaluate.

and spinal cord).

**Table 6.** BMA of serial MRI utilization, post-medulloblastoma resection.

With regard to repeat MRI in follow-up of pediatric medulloblastoma, Bayesian model averaging yields evidence that some models that include race have higher Posterior Model probability (PMP) than models that omit race as a feature, and race exhibits relatively high inclusion probability among the 1000 top-ranked models. These findings are consistent with the results of CMH and beta regression. The evidence suggests that the ML model manifests biases which, if put into production use to guide prescribing, may reproduce or exaggerate disparities that were present in the historical observational data from which the ML model was learned.

#### **5. Discussion**

Machine-learning methods are finding increasing application to guide decision-making, including decisions that arise in health care. ML decision-support guidance can have important consequences on outcomes, including employment [50, 51, 70, 71], banking [10], predictive policing and law enforcement [2, 11, 12, 21, 49, 72–74], and criminal sentencing [48]. Recently, growing attention has focused on the potential that machine-learning might learn unjust, unfair, or discriminatory representations from observational data and inadvertently perpetuate or propagate injustices that are manifested in the historical data that are utilized to train the machine-learning models [2, 7]. Despite the increasing attention to this issue, as yet it is unclear whether the goals of fairness and accuracy in ML are conflicting goals [5, 19, 24]. In that connection, the impact of race/ethnicity on health services access, long-term risk factor control, and cardiovascular outcomes among patients has been the subject of intensive study for decades [75–83]. However, significant disparities in cardiovascular management have received less attention [84–89]. Similarly, the current literature has directed scant attention to disparities in cancer care subsequent to diagnosis. In the present era of artificial intelligence, Big Data, and machine-learning, it is a priority that ML-based decision-support tools not manifest untoward disparities. Sensitive methods having statistical power adequate to detect disparity are essential to achieving this goal. Moreover, it is important that such methods be aligned with generally-accepted governance practices in the courts and regulatory agencies.

The present work sets forth a three-pronged approach for ascertaining the presence or absence of disparity in ML models, by race, age, gender, or other attributes. We sought to discover strengths and limitations of methods for detecting unfairness in ML-model-guided decisionsupport and, when unfairness is identified, discovering the sources and magnitudes of the disparate effects. In health services, numerous clinical contexts and models and treatment use-cases merit such analyses. However, for simplicity we selected two contexts in which strong consensus does exist regarding what the preferred treatment should be, and in which the consensus has prevailed and remained constant for a sufficient period of time, such that observational data are available for analysis and such that minimal change in the consensus has occurred during the time period for which data are available for analysis. We selected cardiac care and cancer care contexts in which disparities with respect to race are feasible evaluate.

Other factors such as socioeconomic status and health services access patterns remain to be studied. The frequency and tenure of accessing the health system are confounded by race and socioeconomic factors. Patients' frequency and tenure also influence what medications patients have been prescribed previously [76], including some medications that may have been discontinued or substituted due to side-effects or non-efficacy reasons, events that influence subsequent considerations for devising or adjusting the patients' medications regimen when new circumstances arise. Nonetheless, we examined one example intervention that has been regarded as 'standard if care' for a long time and whose marginal cost in the U.S. context is so small as to be negligible (beta-blocker medications at hospital discharge post-CAB) and another intervention whose marginal cost in the U.S. is significant (serial MRI exams of head and spinal cord).

With regard to repeat MRI in follow-up of pediatric medulloblastoma, Bayesian model averaging yields evidence that some models that include race have higher Posterior Model probability (PMP) than models that omit race as a feature, and race exhibits relatively high inclusion probability among the 1000 top-ranked models. These findings are consistent with the results of CMH and beta regression. The evidence suggests that the ML model manifests biases which, if put into production use to guide prescribing, may reproduce or exaggerate disparities that were present in the historical observational data from which the ML model

**Table 6.** BMA of serial MRI utilization, post-medulloblastoma resection.

0.032 0.024 0.017 0.011 0.010 <0.01 <0.01 <0.01 <0.01 <0.01

**Variable M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 Incl.** 

Clinical trial X X X X X X X X X X 1.00 Age X X X X X X X X X 0.99

Tumor extent X X X X X X X X 0.85 PFS X X X X X X X 0.46 99mTc scan X X X X X 0.32 mIGB scan X X X X X X X 0.29 Race X X X X X X X X 0.58 Public payor X X X X 0.17

X X X X X X X X 0.97

X X X X X X X 0.96

X X X X X X X 0.94

X X X X X X X X 0.70

**Prob.**

Machine-learning methods are finding increasing application to guide decision-making, including decisions that arise in health care. ML decision-support guidance can have important consequences on outcomes, including employment [50, 51, 70, 71], banking [10], predictive policing and law enforcement [2, 11, 12, 21, 49, 72–74], and criminal sentencing [48]. Recently, growing attention has focused on the potential that machine-learning might learn unjust, unfair, or discriminatory representations from observational data and inadvertently

was learned.

Posterior Model Probability

Prior recurrence

84 New Insights into Bayesian Inference

Renal impairment

High-risk histology

SHH/WNT genomics

**5. Discussion**

Beta-blockers have been found to be important in the treatment of myocardial infarction and in coronary artery bypass surgery in that they have been shown to decrease mortality. Their use post-CAB has been standard care for many years [47], conditioned on the absence of significant clinical contraindications to their use in a particular patient. However, prescribing a beta-blocker at the time of discharge from hospital post-CAB remains less consistent than it should be. Of note is that most beta-blocker medications are extensively metabolized by the liver (esp. CYP2D6) and are affected by liver function. Indeed, the concomitant use of CYP2D6-inhibiting medications or the presence of liver disease may contraindicate or restrict the use of beta-blockers. Our ML modeling process determined the statistical significance and retention of AST and the AST/ALT ratio in the ML predictive model, consistent with this anticipated relevance of liver function to prescribers' decision-making, recapitulated in the model. However, alcohol use, hepatitis, non-alcoholic steatosis, cirrhosis, and other liver conditions are known to exhibit racial imbalance. Slightly elevated prevalence of cirrhosis in has been reported in the U.S. black population (viz., QT<sup>c</sup> prolongation and the risk of ventricular arrhythmias, see [40]). At the outset of the present study, we were concerned that such imbalances might confound the ML modeling process or give rise to an ML model that could exacerbate under-prescribing of beta-blockers to black individuals.

options and access to services to all does not compel equal utilization of services by all [110]. Nonetheless, financial barriers to care may prevent minority and underserved populations from accessing follow-up care at rates commensurate with other groups. Enhancing insurance coverage or addressing out-of-pocket costs may help address financial barriers to follow-

Preventing Disparities: Bayesian and Frequentist Methods for Assessing Fairness in Machine-Learning…

http://dx.doi.org/10.5772/intechopen.73176

87

Compared to CMH and beta regression, BMA is able to achieve adequate power with smaller sample sizes. Moreover, BMA does not have the odds ratio homogeneity, parametric distributional, or other disadvantages of CMH or beta regression. Our BMA analytical approach meets the primary goals for defining a statistical approach to assessing fairness of ML models.

**4.** It avoids assumptions in the calculation of the significance of treatment differences; and

We suggest that that this approach is superior to the stratified dichotomous approach as it captures the entire spectrum of the outcome scale, and therefore will be generally more powerful. While it remains valuable to use a combination of two or more methods (including frequentist methods, such as CMH and beta regression) in a correlative manner to insure consistent determination of fairness of ML models, BMA has become for us a preferred component of fairness testing owing to its modestly greater statistical power when some strata have small size or there is marked unbalancing among strata. Bayesian methods, including BMA, are essential components of auditing processes and policy-setting processes for ML decisionsupport models, and are valuable adjuncts to conventional frequentist methods, which are less well-suited to the combinatorial challenges of high dimensionality in model variables

In summary, we propose that these frequentist and Bayesian methods, including BMA, may be valuable for other outcomes types and other contexts and use-cases, to detect disparities in a fashion similar to how statistical tests have historically been used in the courts and in public policy-making and regulation [52, 111]. Based on our success with the present example use-case, we particularly recommend that BMA may be valuable for other use-cases in health services-related ML modeling, to determine the covariate sources of ML-model-based decision-support disparities that are discovered, to measure the magnitudes of such effects, and to perform model-curation quality assurance so as to insure that such disparities can be eliminated [18] or kept to minimum levels. Such methods may help to promote and quantify algorithmic fairness [8–31], assist in proper governance of ML-based decision-support tools, and insure that ML modeling does not inadvertently learn and replicate unfair practices that are extant in observational datasets that are mined, thereby avoiding perpetuation of injustice by artificial intelligence or cognitive computing. These methods appear to be adequately sensitive

**5.** The interpretation is based on the same foundation as the calculation of the p-value.

up care, including repeat screening to detect recurrence or progression.

**1.** It captures information from the endpoint scale on the interval [0,1);

**3.** It has power at least equivalent to the CMH test or beta regression;

**2.** It provides an interpretation that is readily understood;

Specifically:

selection in the Big Data era.

By using the Cochran-Mantel-Haenszel test, beta regression, and Bayesian Model Averaging, not only was no untoward racial disparity found in the post-CAB cohort, but, with regard to the likelihood of receiving standard-of-care discharge beta-blocker after CAB, there was unexpectedly a slight benefit associated with black race.

By using the Cochran-Mantel-Haenszel test, a statistically significant and unexpected racial disparity was detected in the medulloblastoma ML model in regard to serial repeat MRI exams following initial cancer treatment. This was corroborated by beta regression and confirmed by Bayesian model averaging analysis, wherein many BMA-generated models retained race as a statistically significant predictor of serial MRI utilization. Potential reasons for the disparity are the subject of ongoing study.

Presently, we explicitly exclude race as an input variable from both of the ML models discussed as examples above, as a matter of assuring that the models will not perpetuate clinical differences in utilization rates associated with race or ethnicity manifested in the observational data used to train and validate the models. Naturally, race is only one factor that might be considered as a basis for potential unfairness. Attention should be directed also to other attributes that are candidate predictors in ML models, such as age, gender, chronicity/tenure or survival phase, payor class, or previous exposure to treatments or procedures that themselves might be subject to disparities, inequitable rationing, or unjust differential access or provisioning rates between groups. Confounding may arise from other factors [90–109], such as the vigor or effectiveness with which informed consent is sought by the treating physicians. Such confounding may be affected by racial or cultural differences between the family and the person performing the consenting process. This merits ongoing evaluation by model developers and model users, to insure that good and fair ML models are not erroneously disparaged or rejected for invalid reasons.

As revealed in the example of medulloblastoma treatment follow-up, quantitative Bayesian and frequentist surveillance for potential model unfairness may detect phenomena that are not evidence of injustice per se but instead reflect cultural, educational, religious/spiritual, coping style, family structure, economic, comorbid anxiety/depression rates, rurality, inability to leave work, or other underlying sociodemographic differences. Such differences in decision-making are worthy foci of bioethical, epidemiological, and other evaluations, but are not necessarily differences that merit sanction or suppression. Autonomy of patients' and families' decision-making must be respected. Thus, fair, equitable, nondiscriminatory offering of options and access to services to all does not compel equal utilization of services by all [110]. Nonetheless, financial barriers to care may prevent minority and underserved populations from accessing follow-up care at rates commensurate with other groups. Enhancing insurance coverage or addressing out-of-pocket costs may help address financial barriers to followup care, including repeat screening to detect recurrence or progression.

Compared to CMH and beta regression, BMA is able to achieve adequate power with smaller sample sizes. Moreover, BMA does not have the odds ratio homogeneity, parametric distributional, or other disadvantages of CMH or beta regression. Our BMA analytical approach meets the primary goals for defining a statistical approach to assessing fairness of ML models. Specifically:


CYP2D6-inhibiting medications or the presence of liver disease may contraindicate or restrict the use of beta-blockers. Our ML modeling process determined the statistical significance and retention of AST and the AST/ALT ratio in the ML predictive model, consistent with this anticipated relevance of liver function to prescribers' decision-making, recapitulated in the model. However, alcohol use, hepatitis, non-alcoholic steatosis, cirrhosis, and other liver conditions are known to exhibit racial imbalance. Slightly elevated prevalence of cirrhosis in

tricular arrhythmias, see [40]). At the outset of the present study, we were concerned that such imbalances might confound the ML modeling process or give rise to an ML model that could

By using the Cochran-Mantel-Haenszel test, beta regression, and Bayesian Model Averaging, not only was no untoward racial disparity found in the post-CAB cohort, but, with regard to the likelihood of receiving standard-of-care discharge beta-blocker after CAB, there was

By using the Cochran-Mantel-Haenszel test, a statistically significant and unexpected racial disparity was detected in the medulloblastoma ML model in regard to serial repeat MRI exams following initial cancer treatment. This was corroborated by beta regression and confirmed by Bayesian model averaging analysis, wherein many BMA-generated models retained race as a statistically significant predictor of serial MRI utilization. Potential reasons for the disparity

Presently, we explicitly exclude race as an input variable from both of the ML models discussed as examples above, as a matter of assuring that the models will not perpetuate clinical differences in utilization rates associated with race or ethnicity manifested in the observational data used to train and validate the models. Naturally, race is only one factor that might be considered as a basis for potential unfairness. Attention should be directed also to other attributes that are candidate predictors in ML models, such as age, gender, chronicity/tenure or survival phase, payor class, or previous exposure to treatments or procedures that themselves might be subject to disparities, inequitable rationing, or unjust differential access or provisioning rates between groups. Confounding may arise from other factors [90–109], such as the vigor or effectiveness with which informed consent is sought by the treating physicians. Such confounding may be affected by racial or cultural differences between the family and the person performing the consenting process. This merits ongoing evaluation by model developers and model users, to insure that good and fair ML models are not erroneously disparaged

As revealed in the example of medulloblastoma treatment follow-up, quantitative Bayesian and frequentist surveillance for potential model unfairness may detect phenomena that are not evidence of injustice per se but instead reflect cultural, educational, religious/spiritual, coping style, family structure, economic, comorbid anxiety/depression rates, rurality, inability to leave work, or other underlying sociodemographic differences. Such differences in decision-making are worthy foci of bioethical, epidemiological, and other evaluations, but are not necessarily differences that merit sanction or suppression. Autonomy of patients' and families' decision-making must be respected. Thus, fair, equitable, nondiscriminatory offering of

prolongation and the risk of ven-

has been reported in the U.S. black population (viz., QT<sup>c</sup>

unexpectedly a slight benefit associated with black race.

are the subject of ongoing study.

86 New Insights into Bayesian Inference

or rejected for invalid reasons.

exacerbate under-prescribing of beta-blockers to black individuals.


We suggest that that this approach is superior to the stratified dichotomous approach as it captures the entire spectrum of the outcome scale, and therefore will be generally more powerful. While it remains valuable to use a combination of two or more methods (including frequentist methods, such as CMH and beta regression) in a correlative manner to insure consistent determination of fairness of ML models, BMA has become for us a preferred component of fairness testing owing to its modestly greater statistical power when some strata have small size or there is marked unbalancing among strata. Bayesian methods, including BMA, are essential components of auditing processes and policy-setting processes for ML decisionsupport models, and are valuable adjuncts to conventional frequentist methods, which are less well-suited to the combinatorial challenges of high dimensionality in model variables selection in the Big Data era.

In summary, we propose that these frequentist and Bayesian methods, including BMA, may be valuable for other outcomes types and other contexts and use-cases, to detect disparities in a fashion similar to how statistical tests have historically been used in the courts and in public policy-making and regulation [52, 111]. Based on our success with the present example use-case, we particularly recommend that BMA may be valuable for other use-cases in health services-related ML modeling, to determine the covariate sources of ML-model-based decision-support disparities that are discovered, to measure the magnitudes of such effects, and to perform model-curation quality assurance so as to insure that such disparities can be eliminated [18] or kept to minimum levels. Such methods may help to promote and quantify algorithmic fairness [8–31], assist in proper governance of ML-based decision-support tools, and insure that ML modeling does not inadvertently learn and replicate unfair practices that are extant in observational datasets that are mined, thereby avoiding perpetuation of injustice by artificial intelligence or cognitive computing. These methods appear to be adequately sensitive and effective in terms of statistical power for cohort sizes such as are practically available. The frequentist methods have the advantage of general acceptance in the public sector and a long history of use in the courts and in regulatory settings. However, they are not well-suited to Big Data with high dimensionality and significant missingness rates for individual predictor variables. By contrast, BMA does not yet have a history of use in the courts or in other publicpolicy or regulatory settings. Nonetheless, confirmation by BMA of the statistically negligible role of race in our post-CAB cohort and a likely significant role of race in our medulloblastoma cohort suggests that BMA should be an important addition to the toolbox supporting fairnessassurance of ML models in these and similar contexts and can also help courts and regulators ascertain fairness of decision-support models in actual application. Correspondingly, BMA can help model developers to defend against allegations of unfairness as they arise.

[8] Bechavod Y, Ligett K. Learning fair classifiers: A regularization-inspired approach. KDD

Preventing Disparities: Bayesian and Frequentist Methods for Assessing Fairness in Machine-Learning…

http://dx.doi.org/10.5772/intechopen.73176

89

[9] Bunnik A, Cawley A, Mulqueen M, Zwitter A, editors. Big Data Challenges: Society,

[10] Byrnes N. Artificial intolerance. MIT Technology Review, March 28 2016. Available:

[11] Chouldechova A. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. arXiv preprint, 2016. Available: https://arxiv.org/pdf/1610.07524

[12] Chouldechova A, G'Sell M. Fairer and more accurate, but for whom? KDD '17, Halifax

[13] Corbett-Davies S, Pierson E, Feller A, Goel S, Huq A. Algorithmic decision-making and the cost of fairness. Stanford Working Paper, 18-FEB-2017, Available: https://arxiv.org/

[14] Béranger J. Big Data and Ethics: The Medical Datasphere. New York: ISTE Press/Elsevier;

[15] Davis K. Ethics of Big Data: Balancing Risk and Innovation. Sebastopol, CA: O'Reilly

[16] Berk R, Heidari H, Jabbari S, Joseph M, Kearns M, Morgenstern J, Neel S, Roth A. A convex framework for fair regression. KDD '17, ACM. Available: https://arxiv.org/pdf/

[17] FAT/ML (Fairness, Accountability, and Transparency in Machine Learning) Available:

[18] Feldman M, Friedler S, Moeller J, Scheidegger C, Venkatasubramanian S. Certifying and removing disparate impact. arXiv preprint, 2015. Available: https://arxiv.org/abs/

[19] Fish B, Kun J, Lelkes A. A Confidence-Based Approach for Balancing Fairness and Accuracy. SIAM International Conference on Data Mining, 2016. Available: http://

[21] Guinn C. Big data algorithms can discriminate, and it's not clear what to do about it. The Conversation blog, Aug 13, 2015. Available: http://theconversation.com/big-data-

[22] Hajian S, Bonchi F, Castillo C. Algorithmic Bias: From Discrimination Discovery to Fairness-Aware Data Mining. KDD '16, August 2016. San Francisco: ACM. ISBN 978-1-

[23] Hodson H. No one in control: The algorithms that run our lives. New Scientist, 05-FEB-2015

algorithms-can-discriminate-and-its-not-clear-what-to-do-about-it-45849

'17, Halifax 2017, ACM. Available: https://arxiv.org/pdf/1707.00044.pdf

https://www.technologyreview.com/s/600996/artificial-intolerance/

2017, ACM. Available: https://arxiv.org/pdf/1707.00046.pdf

pdf/1701.08230

Media; 2012. p. 82

1706.02409.pdf

1412.3756

http://www.fatml.org

homepages.math.uic.edu/~bfish3/sdm\_2016.pdf

[20] Francez N. Fairness. Berlin: Springer Verlag; 1986. p. 298

4503-4232-2/16/08. DOI: 10.1145/2939672.2945386

2016. p. 300

Security, Innovation and Ethics. London: Palgrave Macmillan; 2016. p. 140

#### **Author details**

Douglas S. McNair

Address all correspondence to: dmcnair@cerner.com

Cerner Corporation, Kansas City, USA

#### **References**


[8] Bechavod Y, Ligett K. Learning fair classifiers: A regularization-inspired approach. KDD '17, Halifax 2017, ACM. Available: https://arxiv.org/pdf/1707.00044.pdf

and effective in terms of statistical power for cohort sizes such as are practically available. The frequentist methods have the advantage of general acceptance in the public sector and a long history of use in the courts and in regulatory settings. However, they are not well-suited to Big Data with high dimensionality and significant missingness rates for individual predictor variables. By contrast, BMA does not yet have a history of use in the courts or in other publicpolicy or regulatory settings. Nonetheless, confirmation by BMA of the statistically negligible role of race in our post-CAB cohort and a likely significant role of race in our medulloblastoma cohort suggests that BMA should be an important addition to the toolbox supporting fairnessassurance of ML models in these and similar contexts and can also help courts and regulators ascertain fairness of decision-support models in actual application. Correspondingly, BMA

can help model developers to defend against allegations of unfairness as they arise.

[1] Collmann J, Matei S, editors. Ethical Reasoning in Big Data: An Exploratory Analysis.

[2] Coglianese C, Lehr D. Regulating by robot: Administrative decision-making in the machine-learning era. Georgetown Law Journal. 28-Feb-2017. Available: http://scholar-

[3] Fox M. Technology is a marvel – Now let's make it moral. The Guardian, April 10, 2017. Available: https://www.theguardian.com/commentisfree/2017/apr/10/ethical-

[4] Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R. Fairness through awareness. Proceedings of the 3rd ACM Innovations in Theoretical Computer Science Conference.

[5] Joseph M, Kearns M, Morgenstern J, Roth A. Fairness in learning: Classic and contextual

[6] Zemel R, Wu Y, Swersky K, Pitassi T, Dwork C.Learning fair representations. Proceedings of the 30th International Conference on Machine Learning. 2013;**2013**:325-333

[7] Barocas S, Selbst A. Big Data's disparate impact. California Law Review 2016;**104**:671-733.

bandits. arXiv preprint, 2016. Available: https://arxiv.org/pdf/1605.07139

**Author details**

88 New Insights into Bayesian Inference

Douglas S. McNair

**References**

Address all correspondence to: dmcnair@cerner.com

Cerner Corporation, Kansas City, USA

Berlin: Springer Verlag; 2016. p. 192

technology-women-britain-internet

2012;**2012**:214-226

ship.law.upenn.edu/faculty\_scholarship/1734/

Available: http://ssrn.com/abstract=2477899


[24] Jabbari S, Joseph M, Kearns M, Morgenstern J, Roth A. Fair learning in Markovian environments. arXiv preprint, 2016. Available: https://arxiv.org/pdf/1611.03071

[40] Tuttolomondo A, Buttà C, Casuccio A, Di Raimondo D, Serio A, D'Aguanno G, Pecoraro R, Renda C, Giarrusso L, Miceli G, Cirrincione A, Pinto A. QT indexes in cirrhotic patients: Relationship with clinical variables and potential diagnostic predictive value. Archives

Preventing Disparities: Bayesian and Frequentist Methods for Assessing Fairness in Machine-Learning…

http://dx.doi.org/10.5772/intechopen.73176

91

[41] Valles S. Heterogeneity of risk within racial groups: A challenge for public health pro-

[42] Yu Q, Fan Y, Wu X. General multiple mediation analysis with an application to explore racial disparity in breast cancer survival. Journal of Biometrics & Biostatistics. 2014;**5**:

[43] Yu Q, Scribner RA, Leonardi C, Zhang L, Park C, Chen L, Simonsen NR. Exploring racial disparity in obesity: A mediation analysis considering geo-coded environmental factors.

[44] Schonberger R, Gilbertsen T, Dai F. The problem of controlling for imperfectly measured confounders on dissimilar populations: A database simulation study. Journal of

[45] Amsterdam EA, Wenger NK, Brindis RG, Casey DE Jr, Ganiats TG, Holmes DR Jr, Jaffe AS, Jneid H, Kelly RF, Kontos MC, Levine GN, Liebson PR, Mukherjee D, Peterson ED, Sabatine MS, Smalling RW, Zieman SJ. 2014 AHA/ACC guideline for the management of patients with non–ST-elevation acute coronary syndromes. Journal of the American

[46] Boudonas G. β-blockers in coronary artery disease management. Hippokratia. 2010;**14**:

[48] Barry-Jester A. The new science of sentencing. The Marshall Project, 2015. Available: https://www.themarshallproject.org/2015/08/04/the-new-science-of-sentencing

[49] Benforado A. Unfair: The New Science of Criminal Injustice. New York: Crown; 2015.

[50] Gastwirth J. Statistical methods for analyzing claims of employment discrimination.

[51] Gastwirth J et al. Statistical methods for assessing the fairness of the allocation of shares in initial public offerings. Law Probability & Risk. 2005. DOI: 10.1093/lpr/mgi012

[52] Kadane J. Statistics in the Law: A Practitioner's Guide, Cases, and Materials. Oxford:

[53] Paolino. *maximum* Likelihood estimation of models with beta-distributed dependent

[54] Smithson M, Verkuilen J. A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychological Methods. 2006;**11**:54-71

[47] Khan M. Cardiac Drug Therapy. 7th ed. Totawa, NJ: Humana Press; 2007. p. 420

of Medical Research. 2015;**46**:207-213

189-196

231-235

p. 300

grams. Preventive Medicine. 2012;**55**:405-408

Spatial & Spatio-temporal Epidemiology. 2017;**21**:13-23

Cardiothoracic and Vascular Anesthesia. 2014;**28**:247-254

College of Cardiology 2014;**64**:e139-e228

Industrial & Labor Relations Review. 1984;**38**:75-86

Oxford University Press; 2008. p. 472

variables. Political Analysis. 2001;**9**:325-346


[40] Tuttolomondo A, Buttà C, Casuccio A, Di Raimondo D, Serio A, D'Aguanno G, Pecoraro R, Renda C, Giarrusso L, Miceli G, Cirrincione A, Pinto A. QT indexes in cirrhotic patients: Relationship with clinical variables and potential diagnostic predictive value. Archives of Medical Research. 2015;**46**:207-213

[24] Jabbari S, Joseph M, Kearns M, Morgenstern J, Roth A. Fair learning in Markovian envi-

[25] Mittelstadt B, Floridi L, editors. The Ethics of Biomedical Big Data. Berlin: Springer Verlag;

[26] O'Neil C. Weapons of Math Destruction: How Big Data Increases Inequality and

[27] Robertson J, Webb W. Cake-Cutting Algorithms: Be Fair if you can. Boca Raton: CRC

[28] Simoiu C, Corbett-Davies S, Goel S. The problem of infra-marginality in outcome tests for discrimination. arXiv preprint, 2017. Available: https://arxiv.org/pdf/1607.05376 [29] Skirpan M, Gorelick M. The authority of 'fair' in machine learning. KDD '17, Halifax

[30] Veal M. Logics and practices of transparency and opacity in real-world applications of public sector machine learning. KDD '17, Halifax 2017, ACM. Available: https://arxiv.

[31] Zhang Z, Neill D. Identifying significant predictive bias in classifiers. KDD '17, Halifax

[32] Steinberg C, Padfield GJ, Al-Sabeq B, Adler A, Yeung-Lai-Wah JA, Kerr CR, Deyell MW, Andrade JG, Bennett MT, Yee R, Klein GJ, Green M, Laksman ZW, Krahn AD, Chakrabarti S. Experience with bisoprolol in long-QT1 and long-QT2 syndrome. Journal

[33] Servaes S, Epelman M, Pollock A, Shekdar K. Pediatric malignancies: Synopsis of current imaging techniques. In: Blake M, Kalra M, editors. Imaging in Oncology. New York:

[34] Tibshirani R. Regression shrinkage and selection via the LASSO. Journal of the Royal

[35] Delker E, Brown Q, Hasin DS. Alcohol consumption in demographic subpopulations: An epidemiologic overview. Alcohol Research: Current Reviews. 2016;**38**:7-15

[36] Hughson MD, Puelles VG, Hoy WE, Douglas-Denton RN, Mott SA, Bertram JF. Hypertension, glomerular hypertrophy and nephrosclerosis: The effect of race. Nephrol-

[37] Klous S, Wielaard N. We Are Big Data: The Future of the Information Society. New York:

[38] Na L et al. Disparities in receipt of recommended care among younger versus older Medicare beneficiaries: A cohort study. BMC Health Services Research. 2017;**17**:241-253

[39] Sajja K, Mohan DP, Rockey DC. Age and ethnicity in cirrhosis. Journal of Investigational

ronments. arXiv preprint, 2016. Available: https://arxiv.org/pdf/1611.03071

Threatens Democracy. New York: Crown; 2016. p. 300

2017, ACM. Available: https://arxiv.org/pdf/1706.09976.pdf

2017, ACM. Available: https://arxiv.org/pdf/1611.08292.pdf

of Interventional Cardiac Electrophysiology. 2016;**47**(2):163-170

2016. p. 480

90 New Insights into Bayesian Inference

Press; 1998. p. 300

org/pdf/1706.09249.pdf

Springer; 2008. pp. 469-492

Atlantis Press; 2016. p. 300

Medicine. 2014;**62**:920-926

Statistical Society, Series B. 1996;**58**(1):267-288

ogy, Dialysis, Transplantation. 2014;**29**:1399-1409


[55] Ferrari S, Cribari-Neto F. Beta regression for modelling rates and proportions. Journal of Applied Statistics. 2004;**31**:799-815

[71] Miller C. Can an algorithm hire better than a human? The New York Times, 25-Jun-2015. Available: http://www.nytimes.com/2015/06/26/upshot/can-an-algorithm-hire-better-

Preventing Disparities: Bayesian and Frequentist Methods for Assessing Fairness in Machine-Learning…

http://dx.doi.org/10.5772/intechopen.73176

93

[72] Goel S, Rao J, Shroff R. Personalized risk assessments in the criminal justice system. The

[73] Goel S, Rao J, Shroff. Precinct or prejudice? Understanding racial disparities in New York

[75] Amini A, Yeh N, Jones BL, Bedrick E, Vinogradskiy Y, Rusthoven CG, Amini A, Purcell WT, Karam SD, Kavanagh BD, Guntupalli SR, Fisher CM. Perioperative mortality in nonelderly adult patients with cancer: A population-based study evaluating health care disparities in the united states according to insurance status. American Journal of Clinical

[76] Beohar N et al. Race/ethnic disparities in risk factor control and survival in the bypass angioplasty revascularization investigation 2 diabetes (BARI-2D) trial. American Journal

[77] Buja A, Boemo DG, Furlan P, Bertoncello C, Casale P, Baldovin T, Marcolongo A, Baldo V. Tackling inequalities: Are secondary prevention therapies for reducing post-infarction mortality used without disparities? European Journal of Preventive Cardiology.

[78] Butwick A, Blumenfeld YJ, Brookfield KF, Nelson LM, Weiniger CF. Racial and ethnic disparities in mode of anesthesia for cesarean delivery. Anesthesia & Analgesia.

[79] Cheng E, Declercq ER, Belanoff C, Iverson RE, McCloskey L. Racial and ethnic differences in the likelihood of vaginal birth after caesarean delivery. Birth. 2015;**42**:249-253

[80] Efird J, Griffin WF, Sarpong DF, Davies SW, Vann I, Koutlas NT, Anderson EJ, Crane PB, Landrine H, Kindell L, Iqbal ZJ, Ferguson TB, Chitwood WR, Kypson AP. Increased long-term mortality among black CABG patients receiving preoperative inotropic agents. International Journal of Environmental Research and Public Health. 2015;**12**:

[81] Efird J, Gudimella P, O'Neal WT, Griffin WF, Landrine H, Kindell LC, Davies SW, Sarpong DF, O'Neal JB, Crane P, Nelson M, Ferguson TB, Chitwood WR, Kypson AP, Anderson EJ. Comparison of risk of atrial fibrillation in black versus white patients after coronary

artery bypass grafting. The American Journal of Cardiology. 2016;**117**:1095-1100

[82] Efird J, Kiser AC, Crane PB, Landrine H, Kindell LC, Nelson MA, Jindal C, Sarpong DF, Griffin WF, Ferguson TB, Chitwood WR, Davies SW, Kypson AP, Gudimella P, Anderson EJ. Perioperative inotrope therapy and atrial fibrillation following coronary artery bypass graft surgery: Evidence of a racial disparity. Pharmacotherapy. 2017;**37**:

City's stop-and-frisk policy. Annals of Applied Statistics. 2016;**10**:365-394

[74] Lum K, Isaac W. To predict and serve? Significance. 2016;**13**:14-19

Oncology. 2016 Jun 8. DOI: 10.1097/COC.0000000000000306

than-a-human.html

American Economic Review. 2016;**106**:119-123

of Cardiology. 2013;**112**:1298-1305

2014;**21**:222-230

2016;**122**:472-479

7478-7490

297-304


[55] Ferrari S, Cribari-Neto F. Beta regression for modelling rates and proportions. Journal of

[56] Cribari-Neto F, Zeileis A. Beta regression in R. Journal of Statistical Software. 2010;**34**:1-24 [57] Smithson M, Deady S, Gracik L. Guilty, not guilty, or…?: Multiple options in jury verdict

[58] Hubben G, Bishai D, Pechlivanoglou P, Cattelan AM, Grisetti R, Facchin C, Compostella FA, Bos JM, Postma MJ, Tramarin A. The societal burden of HIV/AIDS in northern Italy: An analysis of costs and quality of life. AIDS Care: Psychological and Socio-Medical Aspects

[59] Basu A, Manca A. Regression estimators for generic health-related quality of life and

[60] Ando T. Bayesian Model Selection and Statistical Modeling. Boca Raton: CRC Press;

[61] Bayarri MJ, Berger JO, Forte A, Garcia-Donato G. Criteria for Bayesian model choice

[62] Claeskens G, Hjort N. Model Selection and Model Averaging. Cambridge: Cambridge

[63] Eicher T, Papageorgiou C, Raftery A. Determining growth determinants: Default priors and predictive performance in Bayesian model averaging. Journal of Applied Econo-

[64] Garcia-Donato G, Forte A. R package BayesVarSel, Available: https://cran.r-project.org/

[65] Feldkircher M, Zeugner S. Benchmark Priors Revisited: On Adaptive Shrinkage and the Supermodel Effect in Bayesian Model Averaging. IMF Working Paper, WP/09/202, 2009.

[66] Fernandez C, Ley E, Steel MF. Benchmark priors for Bayesian model averaging. Journal

[67] Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model averaging: A tutorial.

[68] Zellner A. On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Zellner A, editor. Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti. Edward Elgar Publishing: London; 1986. pp. 389-399 [69] Liang F, Paulo R, Molina G, Clyde MA, Berger JO. Mixtures of g priors for Bayesian variable selection. Journal of the American Statistical Association. 2008;**103**:410-423

[70] Bertrand M, Mullainathan S. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. The American Economic

choices. Journal of Behavioral Decision Making. 2007;**20**:481-498

quality-adjusted life years. Medical Decision Making. 2012;**32**:56-69

with application to variable selection. Ann. Statist. 2012;**40**:1550-1577

Applied Statistics. 2004;**31**:799-815

of AIDS/HIV. 2008;**20**:449-455

University Press; 2008. p. 332

DOI:10.5089/9781451873498.001

of Econometrics. 2001;**100**:381-427

Statistical Science. 1999;**14**(4):382-417

Review. 2004;**94**:991-1013

metrics. 2011;**26**(1):30-55

package=BayesVarSel

2010. p. 300

92 New Insights into Bayesian Inference


[83] Shiels M, Chernyavskiy P, Anderson WF, Best AF, Haozous EA, Hartge P, Rosenberg PS, Thomas D, Freedman ND, Berrington de Gonzalez A. Trends in premature mortality in the USA by sex, race, and ethnicity from 1999 to 2014: An analysis of death certificate data. Lancet. 2017;**389**:1043-1054

[94] Glickman SW, Anstrom KJ, Lin L, Chandra A, Laskowitz DT, Woods CW, Freeman DH, Kraft M, Beskow LM, Weinfurt KP, Schulman KA, Cairns CB. Challenges in enrollment of minority, pediatric, and geriatric patients in emergency and acute care clinical research. Annals of Emergency Medicine. 2008;**51**:775-780. DOI: 10.1016/j.annemergmed.

Preventing Disparities: Bayesian and Frequentist Methods for Assessing Fairness in Machine-Learning…

http://dx.doi.org/10.5772/intechopen.73176

95

[95] Gourlay ML, Lewis CL, Preisser JS, Mitchell CM, Sloane PD. Perceptions of informed decision making about cancer screening in a diverse primary care population. Family

[96] Hamilton JB, Best NC, Galbraith KV, Worthy VC, Moore LT. Strategies African-American cancer survivors use to overcome fears and fatalistic attitudes. Journal of

[97] Koch C, Li L, Kaplan GA, Wachterman J, Shishehbor MH, Sabik J, Blackstone EH. Socioeconomic position, not race, is linked to death after cardiac surgery. Circulation.

[98] Miller SJ, Iztkowitz SH, Redd WH, Thompson HS, Valdimarsdottir HB, Jandorf L. Colonoscopy-specific fears in African Americans and Hispanics. Behavioral Medicine.

[99] Nagelhout E, Comarell K, Samadder NJ, Wu YP. Barriers to colorectal cancer screening in a racially diverse population served by a safety-net clinic. Journal of Community

[100] Owens OL, Jackson DD, Thomas TL, Friedman DB, Hébert JR. African American men's and women's perceptions of clinical trials research: Focusing on prostate cancer among a high-risk population in the south. Journal of Health Care for the Poor and

[101] Palmer NR, Weaver KE, Hauser SP, Lawrence JA, Talton J, Case LD, Geiger AM. Disparities in barriers to follow-up care between African American and white breast cancer survivors. Supportive Care in Cancer. 2015;**23**(11):3201-3209. DOI: 10.1007/

[102] Pandya D, Patel S, Ketchum NS, Pollock BH, Padmanabhan S. A comparison of races and leukemia subtypes among patients in different cancer survivorship phases. Clinical Lymphoma, Myeloma & Leukemia. 2011;**11**(Suppl 1):S114-S118. DOI: 10.1016/j.

[103] Pittman LJ.A thirteenth amendment challenge to both racial disparities in medical treatment and improper physicians' informed consent disclosures. Saint Louis University

[104] Shaw MG, Morrell DS, Corbie-Smith GM, Goldsmith LA. Perceptions of pediatric clinical research among African American and Caucasian parents. Journal of the National

Cancer Education. 2015;**30**(4):629-635. DOI: 10.1007/s13187-014-0738-3

Cardiovascular Quality and Outcomes. 2010;**3**:267-276

2015;**41**(2):41-48. DOI: 10.1080/08964289.2014.897930

Health. 2017;**42**(4):791-796. DOI: 10.1007/s10900-017-0319-6

Underserved. 2013;**24**(4):1784-1800. DOI: 10.1353/hpu.2013.0187

2007.11.002

Medicine. 2010;**42**(6):421-427

s00520-015-2706-9

clml.2011.05.036

School of Law. 2003;**48**(1):131-189

Medical Association. 2009;**101**(9):900-907


[94] Glickman SW, Anstrom KJ, Lin L, Chandra A, Laskowitz DT, Woods CW, Freeman DH, Kraft M, Beskow LM, Weinfurt KP, Schulman KA, Cairns CB. Challenges in enrollment of minority, pediatric, and geriatric patients in emergency and acute care clinical research. Annals of Emergency Medicine. 2008;**51**:775-780. DOI: 10.1016/j.annemergmed. 2007.11.002

[83] Shiels M, Chernyavskiy P, Anderson WF, Best AF, Haozous EA, Hartge P, Rosenberg PS, Thomas D, Freedman ND, Berrington de Gonzalez A. Trends in premature mortality in the USA by sex, race, and ethnicity from 1999 to 2014: An analysis of death certificate

[84] Brown C, Ross L, Lopez I, Thornton A, Kiros GE. Disparities in the receipt of cardiac revascularization procedures between blacks and whites: An analysis of secular trends.

[85] Dimick J, Ruhter J, Sarrazin MV, Birkmeyer JD. Black patients more likely than whites to undergo surgery at low-quality hospitals in segregated regions. Health Affairs

[86] Mehta RH, Shahian DM, Sheng S, O'Brien SM, Edwards FH, Jacobs JP, Peterson ED. Association of hospital and physician characteristics and care processes with racial disparities in procedural outcomes among contemporary patients undergoing coronary

[87] Nallamothu B, Lu X, Vaughan-Sarrazin MS, Cram P. Coronary revascularization at specialty cardiac hospitals and peer general hospitals in black Medicare beneficiaries.

[88] O'Neal W, Efird JT, Davies SW, O'Neal JB, Griffin WF, Ferguson TB, Chitwood WR, Kypson AP. Discharge β-blocker use and race after coronary artery bypass grafting.

[89] Rangrass G, Ghaferi AA, Dimick JB. Explaining racial disparities in outcomes after car-

[90] Best AL, Alcaraz KI, McQueen A, Cooper DL, Warren RC, Stein K. Examining the mediating role of cancer-related problems on spirituality and self-rated health among African American cancer survivors: A report from the American Cancer Society's studies of can-

[91] Bromley EG, May FP, Federer L, Spiegel BM, van Oijen MG. Explaining persistent under-use of colonoscopic cancer screening in African Americans: A systematic review.

[92] Christman LK, Abernethy AD, Gorsuch RL, Brown A. Intrinsic religiousness as a mediator between fatalism and cancer-specific fear: Clarifying the role of fear in prostate cancer screening. Journal of Religion and Health. 2014;**53**(3):760-772. DOI: 10.1007/

[93] Davis JL, Bynum SA, Katz RV, Buchanan K, Green BL. Sociodemographic differences in fears and mistrust contributing to unwillingness to participate in cancer screenings. Journal of Health Care for the Poor and Underserved. 2012;**23**(4 Suppl):67-76. DOI:

diac surgery: The role of hospital quality. JAMA Surgery. 2014;**149**:223-227

cer survivors-II. Psycho-Oncology. 2015;**24**:1051-1059. DOI: 10.1002/pon.3720

Preventive Medicine 2015;**71**:40-48. doi: 10.1016/j.ypmed.2014.11.022

data. Lancet. 2017;**389**:1043-1054

94 New Insights into Bayesian Inference

(Millwood). 2013;**32**:1046-1053

Frontiers in Public Health. 2014;**2**:94-99

s10943-012-9670-1

10.1353/hpu.2012.0148

Ethnic Disparities. 2008;**18**(2 Suppl 2):112-117

artery bypass grafting surgery. Circulation. 2016;**133**:124-130

Circulation. Cardiovascular Quality and Outcomes. 2008;**1**:116-122


[105] Shepperd JA, Howell JL, Logan H. A survey of barriers to screening for oral cancer among rural black Americans. Psycho-Oncology. 2014;**23**(3):276-282. DOI: 10.1002/ pon.3415

**Chapter 6**

**Provisional chapter**

**Using Bayesian Inference to Investigate the Influence**

**Using Bayesian Inference to Investigate the Influence** 

Phytoplasma diseases cause major economic damage on crops worldwide. To draw inferences from such a system, joint estimation of dependencies and high flexibility in the model structure are required. Using Bayesian inference, the aim of this chapter was to infer the apple proliferation (AP) disease epidemiology in South Tyrol, Italy. The data consisted of (1) presence/absence of the AP vector *Cacopsylla picta* collected in 44 orchards in 2014; (2) prevalence of the AP pathogen "*Candidatus* Phytoplasma mali" in the vector population; and (3) AP symptomatic trees visually assessed in 2015. Generalized linear mixed models evaluated in a Bayesian framework were used to test species-environment relationships. The model results indicated that the occurrence of the AP vector and symptomatic plants are positively influenced by elevation and temperature and negatively by management. Vector and pathogen predictions in the disease symptoms model correlated negatively or not at all with the prevalence of AP symptoms occurrence. In conclusion, the model results suggest that the presence/absence of the AP vector alone may not be the only cause for disease occurrence. Considering factors such as phytoplasma transmission via root-bridges and specific management strategies, may help to

improve inference and finally to optimize the existing pest management.

pathogenic bacteria vectored by insects belonging to the order Hemiptera [2].

**Keywords:** apple proliferation, Bayesian inference, habitat modeling, imperfect detection, latent infections, occupancy model, phytoplasma disease, pest insect, psyllid

Phytoplasma-induced diseases occur in a range of economically important crops and are therefore major threats in agriculture worldwide [1]. Phytoplasma are cell wall-less plant

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

DOI: 10.5772/intechopen.74637

**of Environmental Factors on a Phytoplasma Disease**

**of Environmental Factors on a Phytoplasma Disease**

Bernd Panassiti

Bernd Panassiti

**Abstract**

vector

**1. Introduction**

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.74637


#### **Using Bayesian Inference to Investigate the Influence of Environmental Factors on a Phytoplasma Disease Using Bayesian Inference to Investigate the Influence of Environmental Factors on a Phytoplasma Disease**

DOI: 10.5772/intechopen.74637

#### Bernd Panassiti Bernd Panassiti

[105] Shepperd JA, Howell JL, Logan H. A survey of barriers to screening for oral cancer among rural black Americans. Psycho-Oncology. 2014;**23**(3):276-282. DOI: 10.1002/

[106] Taylor M, Sun AY, Davis G, Fiuzat M, Liggett SB, Bristow MR. Race, common genetic variation, and therapeutic response disparities in heart failure. JACC Heart Failure.

[107] Taylor TR, Huntley ED, Sween J, Makambi K, Mellman TA, Williams CD, Carter-Nolan P, Frederick W. An exploratory analysis of fear of recurrence among African-American breast cancer survivors. International Journal of Behavioral Medicine. 2012;**19**(3):280-

[108] Torke AM, Corbie-Smith GM, Branch WT. African American patients' perspectives on medical decision making. Archives of Internal Medicine. 2004;**164**(5):525-530

[109] Vrinten C, Wardle J, Marlow LA. Cancer fear and fatalism among ethnic minority women in the United Kingdom. British Journal of Cancer. 2016;**114**(5):597-604. DOI:

[110] Macklin R. Ethical relativism in a multicultural society. Kennedy Institute of Ethics

[111] Vuolo M, Uggen C, Lageson S. Statistical power in experimental audit studies: Cautions and calculations for matched tests with nominal outcomes. Sociological Methods &

pon.3415

96 New Insights into Bayesian Inference

2014;**2**:561-572

10.1038/bjc.2016.15

Journal. 1998;**8**(1):1-22

Research. 2016;**45**:260-303

287. DOI: 10.1007/s12529-011-9183-4

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.74637

#### **Abstract**

Phytoplasma diseases cause major economic damage on crops worldwide. To draw inferences from such a system, joint estimation of dependencies and high flexibility in the model structure are required. Using Bayesian inference, the aim of this chapter was to infer the apple proliferation (AP) disease epidemiology in South Tyrol, Italy. The data consisted of (1) presence/absence of the AP vector *Cacopsylla picta* collected in 44 orchards in 2014; (2) prevalence of the AP pathogen "*Candidatus* Phytoplasma mali" in the vector population; and (3) AP symptomatic trees visually assessed in 2015. Generalized linear mixed models evaluated in a Bayesian framework were used to test species-environment relationships. The model results indicated that the occurrence of the AP vector and symptomatic plants are positively influenced by elevation and temperature and negatively by management. Vector and pathogen predictions in the disease symptoms model correlated negatively or not at all with the prevalence of AP symptoms occurrence. In conclusion, the model results suggest that the presence/absence of the AP vector alone may not be the only cause for disease occurrence. Considering factors such as phytoplasma transmission via root-bridges and specific management strategies, may help to improve inference and finally to optimize the existing pest management.

**Keywords:** apple proliferation, Bayesian inference, habitat modeling, imperfect detection, latent infections, occupancy model, phytoplasma disease, pest insect, psyllid vector

#### **1. Introduction**

Phytoplasma-induced diseases occur in a range of economically important crops and are therefore major threats in agriculture worldwide [1]. Phytoplasma are cell wall-less plant pathogenic bacteria vectored by insects belonging to the order Hemiptera [2].

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons

Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

From an ecological perspective, phytoplasma diseases are complex biological systems. Complexity is linked to many sources of uncertainty which in most cases are difficult to measure. Among others, these sources of uncertainty include vector-pathogen-plant interactions, but also the presence of unknown vectors (e.g., *Reptalus panzeri* was recently confirmed as a new vector in the grapevine phytoplasma disease "bois noir" [3]) or the time (i.e., latent period) between pathogen infection and symptom expression (in case of plants) or the ability to retransmit the pathogen (in case of vectors).

Phytoplasma mali' within the vectors were available from 28 orchards (**Figure 1**). Insect vectors were caught using the "beating tray" method [7, 8]. Depending on orchard size, between 20 and 200 apple trees were randomly selected for vector sampling which was carried out in 2014. Collected vectors were identified according to Ossiannilsson [9]. Insect vectors were then molecularly analyzed for the presence of the pathogen '*Ca*. Phytoplasma mali'. Phytoplasma detection based on a SYBR green real-time PCR was carried out as described in [6]. AP infection status of apple trees was assessed by trained and experienced professionals using visual inspection. Apple cultivars were Golden Delicious and Gala. The monitoring of AP symptoms started in 2013 and each year new AP symptomatic trees were recorded. In most cases, disease symptoms appear 1 year following an infection with '*Ca*. Phytoplasma mali' [10]. To take into account this latent period, monitoring data of AP symptoms from 2015 were used. Given the inspectors' skills to detect AP symptoms and independent molecular

Using Bayesian Inference to Investigate the Influence of Environmental Factors...

http://dx.doi.org/10.5772/intechopen.74637

99

**Figure 1.** Sampling sites of vector, phytoplasma and plant disease symptoms of the apple proliferation epidemiology in

South Tyrol, Northern Italy.

Besides complexity, the statistical treatment of the inherent dependencies of such biological systems represents another challenge in the modeling process. Traditional statistical methods [such as generalized linear models (GLM), generalized linear mixed models (GLMM)] could be used in a step-wise approach. In a first step, the vector-environment relationship is identified (vector model). Second, using the results of the vector model, the pathogen-environment relationship is established (pathogen model). Finally, the results of both previous models are used to fit the plant disease model. However, this approach does not consider the dependencies between the responses at the same time. In contrast, methods that allow for combined dependencies such as structural equation modeling lack the flexibility in model specification [4]. One solution is using Bayesian inference which allows to jointly estimate the model parameters and at the same time offers high flexibility in defining the model structure.

#### **1.1. Case study: apple proliferation**

In this study, the phytoplasma disease apple proliferation (AP) was chosen as a modeling system. AP-specific disease symptoms on apple trees are the proliferation of auxiliary shots (formation of witches' broom) and enlarged stipulae. AP nonspecific disease symptoms include early leaf reddening, small, taste- and colorless fruits, chlorosis and premature bud break. The causal AP agent is '*Candidatus* Phytoplasma mali' [5]. In infected apple trees, the phytoplasma resides in phloem tubes and is transmitted by phloem-sucking insect vectors during feeding activity. In South Tyrol, Northern Italy, the most efficient AP vector is *Cacopsylla picta* (Hemiptera, Psyllidae) [6].

The aim of this chapter was to jointly infer the AP disease epidemiology in South Tyrol, Italy using Bayesian inference. Imperfect detection was accounted for in the vector and symptomatic plant models. The AP insect vector was modeled using an occupancy model. To account for detection bias during the vector sampling, information on sampling effort was used as a predictor in an additional Bernoulli process conditional on the AP vector's true presence or absence. Based on molecular analyses of AP prevalences in apple trees, I estimated the proportion of latent infected trees to account for imperfect detection of truly AP phytoplasma-infected apple trees.

#### **2. Materials and methods**

#### **2.1. Data**

The AP vector *C. picta* and AP symptomatic apple trees were surveyed at 44 and 26 orchards in South Tyrol, Northern Italy, respectively. Prevalences of the AP phytoplasma '*Ca*. Phytoplasma mali' within the vectors were available from 28 orchards (**Figure 1**). Insect vectors were caught using the "beating tray" method [7, 8]. Depending on orchard size, between 20 and 200 apple trees were randomly selected for vector sampling which was carried out in 2014. Collected vectors were identified according to Ossiannilsson [9]. Insect vectors were then molecularly analyzed for the presence of the pathogen '*Ca*. Phytoplasma mali'. Phytoplasma detection based on a SYBR green real-time PCR was carried out as described in [6]. AP infection status of apple trees was assessed by trained and experienced professionals using visual inspection. Apple cultivars were Golden Delicious and Gala. The monitoring of AP symptoms started in 2013 and each year new AP symptomatic trees were recorded. In most cases, disease symptoms appear 1 year following an infection with '*Ca*. Phytoplasma mali' [10]. To take into account this latent period, monitoring data of AP symptoms from 2015 were used. Given the inspectors' skills to detect AP symptoms and independent molecular

From an ecological perspective, phytoplasma diseases are complex biological systems. Complexity is linked to many sources of uncertainty which in most cases are difficult to measure. Among others, these sources of uncertainty include vector-pathogen-plant interactions, but also the presence of unknown vectors (e.g., *Reptalus panzeri* was recently confirmed as a new vector in the grapevine phytoplasma disease "bois noir" [3]) or the time (i.e., latent period) between pathogen infection and symptom expression (in case of plants) or the ability to retrans-

Besides complexity, the statistical treatment of the inherent dependencies of such biological systems represents another challenge in the modeling process. Traditional statistical methods [such as generalized linear models (GLM), generalized linear mixed models (GLMM)] could be used in a step-wise approach. In a first step, the vector-environment relationship is identified (vector model). Second, using the results of the vector model, the pathogen-environment relationship is established (pathogen model). Finally, the results of both previous models are used to fit the plant disease model. However, this approach does not consider the dependencies between the responses at the same time. In contrast, methods that allow for combined dependencies such as structural equation modeling lack the flexibility in model specification [4]. One solution is using Bayesian inference which allows to jointly estimate the model parameters and at the same time offers high flexibility in defining the model structure.

In this study, the phytoplasma disease apple proliferation (AP) was chosen as a modeling system. AP-specific disease symptoms on apple trees are the proliferation of auxiliary shots (formation of witches' broom) and enlarged stipulae. AP nonspecific disease symptoms include early leaf reddening, small, taste- and colorless fruits, chlorosis and premature bud break. The causal AP agent is '*Candidatus* Phytoplasma mali' [5]. In infected apple trees, the phytoplasma resides in phloem tubes and is transmitted by phloem-sucking insect vectors during feeding activity. In South Tyrol,

The aim of this chapter was to jointly infer the AP disease epidemiology in South Tyrol, Italy using Bayesian inference. Imperfect detection was accounted for in the vector and symptomatic plant models. The AP insect vector was modeled using an occupancy model. To account for detection bias during the vector sampling, information on sampling effort was used as a predictor in an additional Bernoulli process conditional on the AP vector's true presence or absence. Based on molecular analyses of AP prevalences in apple trees, I estimated the proportion of latent infected

The AP vector *C. picta* and AP symptomatic apple trees were surveyed at 44 and 26 orchards in South Tyrol, Northern Italy, respectively. Prevalences of the AP phytoplasma '*Ca*.

Northern Italy, the most efficient AP vector is *Cacopsylla picta* (Hemiptera, Psyllidae) [6].

trees to account for imperfect detection of truly AP phytoplasma-infected apple trees.

mit the pathogen (in case of vectors).

98 New Insights into Bayesian Inference

**1.1. Case study: apple proliferation**

**2. Materials and methods**

**2.1. Data**

**Figure 1.** Sampling sites of vector, phytoplasma and plant disease symptoms of the apple proliferation epidemiology in South Tyrol, Northern Italy.

analysis of AP symptomatic trees, the false-positive rate can be assumed to be approximately zero. Symptomatic means that at least one specific AP symptom or a combination of at least two unspecific AP symptoms was present.

using a Bernoulli distribution with the occupancy probability *Ψ* for each surveyed site

~Bernoulli(*Ψ*<sup>i</sup>

In the observation process, real observations (detections/nondetections) for each survey time (indexed by j) follow a Bernoulli distribution conditional on the true occupancy state z:

where p is defined as the detection probability at site i and survey time j given the site was actually occupied. The detection probability was modeled using a logistic regression and sampling effort as explanatory variable. Sampling effort was defined as the number of sampled trees in proportion to the total number of surveyed trees for AP

Field surveys on the prevalence of plant diseases caused by plant-pathogenic bacteria are often based on visual diagnosis of disease symptoms [17–19]. Given trained and experienced plant inspectors, the false-positive rate can be assumed to be close to zero. The false negative rate is also often considered very small because latent infections are mostly ignored. Based on molecular analyses latent infections for the AP disease were found to be 2.32 and 10.48% depending on age of the apple trees [20]. To account for imperfect detection caused by latent infections, an informative beta prior was used for the detection probability p with parameters *a* = 2 and *b* = 80. The specified prior distribution has a mean value of 0.02 and a 95% quantile of 0.06. Hence, in the observation process of the AP disease symptoms model, AP symptoms detections/nondetections were drawn from a Binomial distribution

~Binomial(Ni

, zi (1 − pi ))

MCMC sampling was carried out by the STAN software (RStan version 2.12.1), which uses the No-U-Turn sampler (NUTS) [21, 22]. Model specifications included three chains with 3000 iterations each and considered a chain to be converged when the potential scale reduction statistic, Ȓ < 1.05 [23]. To access model, fit posterior predictive checks were applied on each model separately using the DHARMa package [24]. The DHARMa package calculates scaled residuals (Bayesian p-values) by comparing observations simulated from the fitted model with observed values. All statistical analyses were carried out in the R statistical environment

)

Using Bayesian Inference to Investigate the Influence of Environmental Factors...

http://dx.doi.org/10.5772/intechopen.74637

101

(indexed with i):

symptoms.

as follows:

(version 3.2.2; [25]).

yi

where N is the total number of survey trees for each site.

The RStan code for the joint model is available in Appendix A.

z<sup>i</sup>

yij~Bernoulli(z<sup>i</sup> pij)

A summary of the final data set including the AP vector, the AP phytoplasma and AP symptoms of trees is provided in **Table 1**. Metric environmental predictors included elevation (m a.s.l.) and annual mean temperature (°C). Orchards were classified into integrated/not integrated management to account for different pest management strategies.

#### **2.2. Modeling approach**

Bayesian inference was used to jointly estimate the dependencies of all responses (AP vector, AP phytoplasma prevalences of the vector, AP symptoms of apple trees) and the environment. To fit the model, all environmental predictors (except vector and phytoplasma predictions) were scaled and centered (i.e., mean subtracted and divided by the standard deviation) to allow a faster convergence of the model fitting algorithm. To decide whether to account for unimodal response-curves, in a pre-step, I fitted multivariate GLMs including quadratic terms of elevation and temperature [11]. As the unimodal response curves were not found to be ecologically sensible, only linear relationships were considered in subsequent analysis. Generalized linear models were developed using a binomial error distribution and a logit link function (GLMM; [12, 13]). The GLMMs were then evaluated in a Bayesian framework. As prior distributions for the fixed effects, zero-centered normal distributions were used. Except for the intercept, priors were defined to be mildly informative which results in a shrinkage effect similar to a ridge-regression [14].

The vector data set, as is common for ecological data, contained many zero values due to the rarity and detectability of the species. To account for imperfect detection, I used a site-occupancy model [15, 16]. These models rely on the "closure assumption" stating that the occupancy state remains unchanged between survey times. The occupancy model combines (1) an ecological process and (2) an observation process. The ecological process of the true occupancy state z (which is a latent or unobserved variable) can be described


**Table 1.** Summary of vector, pathogen and disease symptoms of plants used to fit the apple proliferation (AP) joint model.

using a Bernoulli distribution with the occupancy probability *Ψ* for each surveyed site (indexed with i):

analysis of AP symptomatic trees, the false-positive rate can be assumed to be approximately zero. Symptomatic means that at least one specific AP symptom or a combination of at least

A summary of the final data set including the AP vector, the AP phytoplasma and AP symptoms of trees is provided in **Table 1**. Metric environmental predictors included elevation (m a.s.l.) and annual mean temperature (°C). Orchards were classified into integrated/not

Bayesian inference was used to jointly estimate the dependencies of all responses (AP vector, AP phytoplasma prevalences of the vector, AP symptoms of apple trees) and the environment. To fit the model, all environmental predictors (except vector and phytoplasma predictions) were scaled and centered (i.e., mean subtracted and divided by the standard deviation) to allow a faster convergence of the model fitting algorithm. To decide whether to account for unimodal response-curves, in a pre-step, I fitted multivariate GLMs including quadratic terms of elevation and temperature [11]. As the unimodal response curves were not found to be ecologically sensible, only linear relationships were considered in subsequent analysis. Generalized linear models were developed using a binomial error distribution and a logit link function (GLMM; [12, 13]). The GLMMs were then evaluated in a Bayesian framework. As prior distributions for the fixed effects, zero-centered normal distributions were used. Except for the intercept, priors were defined to be mildly informative which results in a shrinkage

The vector data set, as is common for ecological data, contained many zero values due to the rarity and detectability of the species. To account for imperfect detection, I used a site-occupancy model [15, 16]. These models rely on the "closure assumption" stating that the occupancy state remains unchanged between survey times. The occupancy model combines (1) an ecological process and (2) an observation process. The ecological process of the true occupancy state z (which is a latent or unobserved variable) can be described

**Table 1.** Summary of vector, pathogen and disease symptoms of plants used to fit the apple proliferation (AP) joint

Observed vector 0 0 1 1 1 0.659 0.479 44 0 Vectors analyzed 1 1 2 3.25 35 4.29 7.91 28 0 Vectors Inf 0 0 0 1 8 0.929 1.98 28 0 Tree\_total 390 950 1171 1764 3065 1365 613 26 0 Tree\_inf 0 0 2.5 5 111 9.46 24.5 26 0

**Min Q1 Median Q3 Max Mean sd n Missing**

integrated management to account for different pest management strategies.

two unspecific AP symptoms was present.

effect similar to a ridge-regression [14].

model.

**2.2. Modeling approach**

100 New Insights into Bayesian Inference

#### z<sup>i</sup> ~Bernoulli(*Ψ*<sup>i</sup> )

In the observation process, real observations (detections/nondetections) for each survey time (indexed by j) follow a Bernoulli distribution conditional on the true occupancy state z:

 $\mathbf{y}\_{\boldsymbol{\upeta}}$  $\text{əberroulli}(\mathbf{z}\_{\boldsymbol{\upeta}}\mathbf{p}\_{\boldsymbol{\upeta}})$ 

where p is defined as the detection probability at site i and survey time j given the site was actually occupied. The detection probability was modeled using a logistic regression and sampling effort as explanatory variable. Sampling effort was defined as the number of sampled trees in proportion to the total number of surveyed trees for AP symptoms.

Field surveys on the prevalence of plant diseases caused by plant-pathogenic bacteria are often based on visual diagnosis of disease symptoms [17–19]. Given trained and experienced plant inspectors, the false-positive rate can be assumed to be close to zero. The false negative rate is also often considered very small because latent infections are mostly ignored. Based on molecular analyses latent infections for the AP disease were found to be 2.32 and 10.48% depending on age of the apple trees [20]. To account for imperfect detection caused by latent infections, an informative beta prior was used for the detection probability p with parameters *a* = 2 and *b* = 80. The specified prior distribution has a mean value of 0.02 and a 95% quantile of 0.06. Hence, in the observation process of the AP disease symptoms model, AP symptoms detections/nondetections were drawn from a Binomial distribution as follows:

$$\mathbf{y}\_{\text{i}}\text{-Binomial}(\mathbf{N}\_{\text{i}'}\mathbf{z}\_{\text{i}}(\mathbf{1}-\mathbf{p}\_{\text{i}}))$$

where N is the total number of survey trees for each site.

MCMC sampling was carried out by the STAN software (RStan version 2.12.1), which uses the No-U-Turn sampler (NUTS) [21, 22]. Model specifications included three chains with 3000 iterations each and considered a chain to be converged when the potential scale reduction statistic, Ȓ < 1.05 [23]. To access model, fit posterior predictive checks were applied on each model separately using the DHARMa package [24]. The DHARMa package calculates scaled residuals (Bayesian p-values) by comparing observations simulated from the fitted model with observed values. All statistical analyses were carried out in the R statistical environment (version 3.2.2; [25]).

The RStan code for the joint model is available in Appendix A.

#### **3. Results**

The marginal posterior distributions of the parameter of interest of the AP joint model are shown in **Figure 2**. For the AP vector *C. picta*, I found that the occurrence of the AP vector was positively correlated with elevation and temperature. The opposite was found for integrated pest management measures which negatively affected the vector occurrence probabilities. The sampling effort represented by the amount of sampled trees seemingly did not affect the detection of the AP vector. Unlike elevation, temperature and pest management, which did not affect the prevalences of the pathogen '*Ca*. Phytoplasma mali' within the AP vector, an assumably positive relationship between AP vector and its phytoplasma infection rates is indicated in **Figure 2**. The 80% credible interval (CrI), however, also indicates that a high uncertainty is associated with the true value of this parameter.

As the vector model, AP symptom occurrences on apple trees were likewise positively correlated with elevation and temperature and negatively with integrated pest management measures. Moreover, the model estimated a negative correlation between the AP vector and AP symptoms. No relationship between phytoplasma infection rates within the AP vector and AP symptoms was found.

Regarding the model performance, the potential scale reduction statistic, Ȓ, for each parameter was close to 1 (not shown). Hence, I found no indication of non-convergence of the three chains.

> **Figure 3** shows the results of the residual diagnosis. The plots show no serious violations of distributional assumptions. To confirm the overall uniformity of the scaled residuals, I applied one-

> **Figure 3.** Residual diagnosis for the vector (a), pathogen (b) and disease symptoms of plants (c) models. Residual diagnosis is based on scaled residuals. For each model, qq-plots are shown on the left, and scaled residuals *versus*

Using Bayesian Inference to Investigate the Influence of Environmental Factors...

http://dx.doi.org/10.5772/intechopen.74637

103

The modeling case study presented in this chapter illustrated the use of Bayesian inference to jointly investigate the influence of environment on the occurrence of the AP vector *C. picta*, the prevalences of the AP pathogen ('*Ca*. Phytoplasma mali') within the vector and the occurrence

Using the 80% credible interval, I found that AP vector and AP symptoms on apple trees were positively associated with elevation and temperature and negatively with integrated pest management. While having similar ecological requirements, the joint model indicated a negative relationship between vector and symptoms. Elevation, temperature and integrated pest management did not affect AP phytoplasma prevalences within the vector. No correla-

sample Kolmogorov–Smirnov tests, which were not significant for all three models.

**4.1. Influence of environment on apple proliferation epidemiology**

tion was found between prevalences of AP phytoplasma and symptoms.

**4. Discussion**

of AP disease symptoms on apple trees.

predicted values are plotted on the right.

**Figure 2.** Marginal posterior distributions of all environmental predictors for the vector, pathogen and disease symptoms of plants models. Black intervals denote the 95% credible intervals, and red bands indicate the 80% credible intervals. For each predictor, black bullets denote the medians of the three MCMC chains.

**Figure 3.** Residual diagnosis for the vector (a), pathogen (b) and disease symptoms of plants (c) models. Residual diagnosis is based on scaled residuals. For each model, qq-plots are shown on the left, and scaled residuals *versus* predicted values are plotted on the right.

**Figure 3** shows the results of the residual diagnosis. The plots show no serious violations of distributional assumptions. To confirm the overall uniformity of the scaled residuals, I applied onesample Kolmogorov–Smirnov tests, which were not significant for all three models.

#### **4. Discussion**

**3. Results**

102 New Insights into Bayesian Inference

symptoms was found.

The marginal posterior distributions of the parameter of interest of the AP joint model are shown in **Figure 2**. For the AP vector *C. picta*, I found that the occurrence of the AP vector was positively correlated with elevation and temperature. The opposite was found for integrated pest management measures which negatively affected the vector occurrence probabilities. The sampling effort represented by the amount of sampled trees seemingly did not affect the detection of the AP vector. Unlike elevation, temperature and pest management, which did not affect the prevalences of the pathogen '*Ca*. Phytoplasma mali' within the AP vector, an assumably positive relationship between AP vector and its phytoplasma infection rates is indicated in **Figure 2**. The 80% credible interval (CrI), however, also indicates that a high

As the vector model, AP symptom occurrences on apple trees were likewise positively correlated with elevation and temperature and negatively with integrated pest management measures. Moreover, the model estimated a negative correlation between the AP vector and AP symptoms. No relationship between phytoplasma infection rates within the AP vector and AP

Regarding the model performance, the potential scale reduction statistic, Ȓ, for each parameter was close to 1 (not shown). Hence, I found no indication of non-convergence of the three chains.

**Figure 2.** Marginal posterior distributions of all environmental predictors for the vector, pathogen and disease symptoms of plants models. Black intervals denote the 95% credible intervals, and red bands indicate the 80% credible intervals. For

each predictor, black bullets denote the medians of the three MCMC chains.

uncertainty is associated with the true value of this parameter.

The modeling case study presented in this chapter illustrated the use of Bayesian inference to jointly investigate the influence of environment on the occurrence of the AP vector *C. picta*, the prevalences of the AP pathogen ('*Ca*. Phytoplasma mali') within the vector and the occurrence of AP disease symptoms on apple trees.

#### **4.1. Influence of environment on apple proliferation epidemiology**

Using the 80% credible interval, I found that AP vector and AP symptoms on apple trees were positively associated with elevation and temperature and negatively with integrated pest management. While having similar ecological requirements, the joint model indicated a negative relationship between vector and symptoms. Elevation, temperature and integrated pest management did not affect AP phytoplasma prevalences within the vector. No correlation was found between prevalences of AP phytoplasma and symptoms.

'*Ca*. Phytoplasma mali'-infection rates of *C. picta* in South Tyrol are usually higher than those of *C. melanoneura,* another AP vector (0.6% compared to 11.6%, [26]). Moreover, *C. picta* is assumed to be the more effective AP vector because it was shown to be able to vertically transmit the pathogen to its offspring [6]. Therefore, the finding that the vector is not correlated with AP symptoms is unexpected but agrees with the vector-symptomatic plant relationship of "bois noir," a phytoplasma disease on grapevines [17]. The authors argued that this discrepancy may be explained by acknowledging the fact that vector's presence alone is not responsible for disease occurrence rather it is important to define the pathogen prevalence in the vector population. In this study, however, no correlation was also found between pathogen predictions and AP symptoms. Intuitively, one would expect more AP symptoms given a high infection rate of the vectors but the marginal posterior distribution of the phytoplasma prevalence in the vector population is associated with a large credible interval and does not allow an interpretation of the true parameter value. The lack of positive correlation between infected vectors and AP symptoms hint at other infection sources not considered in this study. For example, recently, a phytoplasma transmission via root-bridges between apple trees was hypothesized [27].

Finally, the results of a Bayesian inference (posterior distributions) can be summarized using, for example, credible intervals which allow an intuitive interpretation of the parameter estimates associated with well-defined uncertainties. Given chain convergence and successful posterior predictive checks, Bayesian credible intervals are also appropriate for small data sets [32]. This is especially true in observational studies on animal and plant populations

Using Bayesian Inference to Investigate the Influence of Environmental Factors...

http://dx.doi.org/10.5772/intechopen.74637

105

In summary, the results of the AP joint model suggested that the presence of the AP vector is not necessarily positively correlated with disease occurrence. Instead, other factors such as phytoplasma transmission via root-bridges or specific management strategies should be additionally considered in future studies. In case of the AP disease system, Bayesian inference allowed to jointly fit combined dependencies which are common to phytoplasma epidemiological diseases. Unlike maximum likelihood methods, posterior distributions for all quantities of interest are obtained which could be further summarized using credible intervals and allowed intuitive interpretation of the results. The provided example of a joint Bayesian modeling framework can be used as a basis to infer species-environment relationships of

The work was performed as part of the project APPLClust and was funded by the Autonomous Province of Bozen/Bolzano (Italy) and the South Tyrolean Apple Consortium. The author would like to thank Stefanie Fischnaller, Martin Parth, Manuel Messner, Robert Stocker, Christine Kerschbamer and Katrin Janik for providing data on insect vectors, phytoplasma

Supplementary data associated with this chapter is available online: https://doi.org/10.6084/

prevalences and occurrences of disease symptoms of apple trees.

Address all correspondence to: bernd.panassiti@gmail.com

**Appendix A. Supplementary data**

Laimburg Research Center, Auer, Italy

where data collection is often time- and cost-consuming.

**5. Conclusion**

phytoplasma disease systems.

**Acknowledgements**

m9.figshare.5896789

**Author details**

Bernd Panassiti

Even though the joint model did not identify a clear correlation between the predictor variable integrated pest management and pathogen occurrences, overall, it seems that integrated pest management is an important environmental driver, negatively influencing vector, and disease symptom occurrences. But it is also possible that the AP responses are influenced by different management measures. For example, the presence/absence of the vectors may be influenced by application time, quantity and type of insecticides, while new disease incidences in plants may also relate to different levels in the effort of uprooting AP symptomatic trees, thereby eliminating sources of new vector infections or root transmissions to adjacent trees [28]. Hence, in a follow-up study, it would be worth to further investigate which specific management measure leads to a decrease in the responses to optimize insect pest management strategies.

#### **4.2. Advantages of Bayesian inference**

Besides jointly estimating the disease system, Bayesian inference allows high flexibility in the model specifications. Models can be easily extended to include detection probabilities, overdispersion or zero-inflation [29–31]. The present joint model could be further extended by including AP symptoms detection probabilities depending on the cultivar and observed symptoms. The high flexibility is also important when data is collected for purposes different than statistical inference and prediction. For example, if vector data was collected to determine the first appearance of the vector in the orchard (to timely optimize the application of insecticides), vector prediction probabilities need to be constraint by probabilities of the true flight period of the pest insect.

Some parameter estimates in this study were associated with large credible intervals, meaning high uncertainty. One solution would be to use a higher number of observations, which is not always feasible in ecological studies. Another possibility is to include informative priors derived from the literature or previous analysis as illustrated for the informative beta prior to account for imperfect AP symptom detection due to latent infections. Priors play an essential role in every Bayesian analysis. For the environmental parameter estimates included in this chapter, no prior information from previous analysis was available. However, the identified relationships could be used to define prior distributions in future studies.

Finally, the results of a Bayesian inference (posterior distributions) can be summarized using, for example, credible intervals which allow an intuitive interpretation of the parameter estimates associated with well-defined uncertainties. Given chain convergence and successful posterior predictive checks, Bayesian credible intervals are also appropriate for small data sets [32]. This is especially true in observational studies on animal and plant populations where data collection is often time- and cost-consuming.

#### **5. Conclusion**

'*Ca*. Phytoplasma mali'-infection rates of *C. picta* in South Tyrol are usually higher than those of *C. melanoneura,* another AP vector (0.6% compared to 11.6%, [26]). Moreover, *C. picta* is assumed to be the more effective AP vector because it was shown to be able to vertically transmit the pathogen to its offspring [6]. Therefore, the finding that the vector is not correlated with AP symptoms is unexpected but agrees with the vector-symptomatic plant relationship of "bois noir," a phytoplasma disease on grapevines [17]. The authors argued that this discrepancy may be explained by acknowledging the fact that vector's presence alone is not responsible for disease occurrence rather it is important to define the pathogen prevalence in the vector population. In this study, however, no correlation was also found between pathogen predictions and AP symptoms. Intuitively, one would expect more AP symptoms given a high infection rate of the vectors but the marginal posterior distribution of the phytoplasma prevalence in the vector population is associated with a large credible interval and does not allow an interpretation of the true parameter value. The lack of positive correlation between infected vectors and AP symptoms hint at other infection sources not considered in this study. For example, recently, a

phytoplasma transmission via root-bridges between apple trees was hypothesized [27].

leads to a decrease in the responses to optimize insect pest management strategies.

relationships could be used to define prior distributions in future studies.

**4.2. Advantages of Bayesian inference**

104 New Insights into Bayesian Inference

Even though the joint model did not identify a clear correlation between the predictor variable integrated pest management and pathogen occurrences, overall, it seems that integrated pest management is an important environmental driver, negatively influencing vector, and disease symptom occurrences. But it is also possible that the AP responses are influenced by different management measures. For example, the presence/absence of the vectors may be influenced by application time, quantity and type of insecticides, while new disease incidences in plants may also relate to different levels in the effort of uprooting AP symptomatic trees, thereby eliminating sources of new vector infections or root transmissions to adjacent trees [28]. Hence, in a follow-up study, it would be worth to further investigate which specific management measure

Besides jointly estimating the disease system, Bayesian inference allows high flexibility in the model specifications. Models can be easily extended to include detection probabilities, overdispersion or zero-inflation [29–31]. The present joint model could be further extended by including AP symptoms detection probabilities depending on the cultivar and observed symptoms. The high flexibility is also important when data is collected for purposes different than statistical inference and prediction. For example, if vector data was collected to determine the first appearance of the vector in the orchard (to timely optimize the application of insecticides), vector prediction probabilities need to be constraint by probabilities of the true flight period of the pest insect. Some parameter estimates in this study were associated with large credible intervals, meaning high uncertainty. One solution would be to use a higher number of observations, which is not always feasible in ecological studies. Another possibility is to include informative priors derived from the literature or previous analysis as illustrated for the informative beta prior to account for imperfect AP symptom detection due to latent infections. Priors play an essential role in every Bayesian analysis. For the environmental parameter estimates included in this chapter, no prior information from previous analysis was available. However, the identified In summary, the results of the AP joint model suggested that the presence of the AP vector is not necessarily positively correlated with disease occurrence. Instead, other factors such as phytoplasma transmission via root-bridges or specific management strategies should be additionally considered in future studies. In case of the AP disease system, Bayesian inference allowed to jointly fit combined dependencies which are common to phytoplasma epidemiological diseases. Unlike maximum likelihood methods, posterior distributions for all quantities of interest are obtained which could be further summarized using credible intervals and allowed intuitive interpretation of the results. The provided example of a joint Bayesian modeling framework can be used as a basis to infer species-environment relationships of phytoplasma disease systems.

#### **Acknowledgements**

The work was performed as part of the project APPLClust and was funded by the Autonomous Province of Bozen/Bolzano (Italy) and the South Tyrolean Apple Consortium. The author would like to thank Stefanie Fischnaller, Martin Parth, Manuel Messner, Robert Stocker, Christine Kerschbamer and Katrin Janik for providing data on insect vectors, phytoplasma prevalences and occurrences of disease symptoms of apple trees.

### **Appendix A. Supplementary data**

Supplementary data associated with this chapter is available online: https://doi.org/10.6084/ m9.figshare.5896789

### **Author details**

Bernd Panassiti

Address all correspondence to: bernd.panassiti@gmail.com

Laimburg Research Center, Auer, Italy

#### **References**

[1] Bertaccini A, Duduk B, Paltrinieri S, Contaldo N. Phytoplasmas and phytoplasma diseases: A severe threat to agriculture. American Journal of Plant Sciences. 2014;**5**:1763-1788. DOI: 10.4236/ajps.2014.512191

[14] Park T, Casella G. The Bayesian lasso. Journal of the American Statistical Association.

Using Bayesian Inference to Investigate the Influence of Environmental Factors...

http://dx.doi.org/10.5772/intechopen.74637

107

[15] MacKenzie DI, Nichols JD, Lachman GB, Droege S, Royle JA, Langtimm CA. Estimating site occupancy rates when detection probabilities are less than 1. Ecology.

[16] MacKenzie DI, Nichols JD, Royle JA, Pollock KH, Bailey LL, Hines JE. Occupancy Estimation and Modelling. Inferring Patterns and Dynamics of Species Occurrence.

[17] Panassiti B, Hartig F, Breuer M, Biedermann R. Bayesian inference of environmental and biotic factors determining the occurrence of the grapevine disease 'bois noir'. Ecosphere.

[18] Parry M, Gibson GJ, Parnell S, Gottwald TR, Irey MS, Gast TC, Gilligan CA. Bayesian inference for an emerging arboreal epidemic in the presence of control. Proceedings of the National Academy of Sciences. 2014;**111**:6258-6262. DOI: 10.1073/pnas.1310997111

[19] Thébaud Gl SN, Chadœuf Jl DA, Gr L. Identifying risk factors for european stone fruit

[20] Baric S, Kerschbamer C, Dalla VJ. Detection of latent apple proliferation infection in two differently aged apple orchards in South Tyrol (northern Italy). Bulletin of Insectology.

[21] Hoffman MD, Gelman A. The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning research. 2014;**15**:1593-1623

[22] Stan Development Team. Stan modeling language users guide and reference manual; 2017

[23] Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. London: Chapman

[24] Hartig F. DHARMa: Residual Diagnostics for Hierarchical (Multi-Level / Mixed)

[25] Development Core Team R. R: A Language and Environment for Statistical Computing.

[26] Baric S, Öttl S, Dalla Via J. Infection rates of natural psyllid populations with '*Candidatus* Phytoplasma mali' in South Tyrol (Northern Italy). In: 21st International Conference on Virus and other Graft Transmissible Diseases of Fruit Crops. Neustadt, Germany: Julius

[27] Baric S. Molecular tools applied to the advancement of fruit growing in South Tyrol: A

[28] Baric S, Kerschbamer C, Vigl J, Via JD. Translocation of apple proliferation phytoplasma via natural root grafts – A case study. European Journal of Plant Pathology. 2007;**12**:207-211

review. Erwerbs-Obstbau. 2012;**54**:125-135. DOI: 10.1007/s10341-012-0170-y

Vienna, Austria: R Foundation for Statistical Computing; 2011

2008;**103**:681-686

2002;**83**:2248-2255

Boston: Elsevier; 2006

2007;**60**:265-266

& Hall; 2014

Regression Models; 2016

Kühn-Institut; 2009. pp. 189-192

2015;**6**:art143, 1-13. DOI: 10.1890/ES14-00439.1

yellows from a survey. Phytopathology. 2006;**96**:890-899


[14] Park T, Casella G. The Bayesian lasso. Journal of the American Statistical Association. 2008;**103**:681-686

**References**

106 New Insights into Bayesian Inference

2015;**5**:53-73

DOI: 10.4236/ajps.2014.512191

DOI: 10.1111/ppa.12080

2004;**54**:1231-1240

10.1111/ppa.12653

New York: E.J. Brill; 1992

Weinbau. 2011;**3**:77-78

[1] Bertaccini A, Duduk B, Paltrinieri S, Contaldo N. Phytoplasmas and phytoplasma diseases: A severe threat to agriculture. American Journal of Plant Sciences. 2014;**5**:1763-1788.

[2] Alma A, Tedeschi R, Lessio F, Picciau L, Gonella E, Ferracini C. Insect vectors of plant pathogenic Mollicutes in the euro-Mediterranean region. Phytopathogenic Mollicutes.

[3] Cvrković T, Jović J, Mitrović M, Krstić O, Toševski I. Experimental and molecular evidence of *Reptalus panzeri* as a natural vector of bois noir. Plant Pathology. 2013;**63**:42-53.

[4] Austin M. Species distribution models and ecological theory: A critical assessment and

[5] Seemüller E, Schneider B. Taxonomic description of '*Candidatus* Phytoplasma mali' sp. nov., '*Candidatus* Phytoplasma pyri' sp. nov. and '*Candidatus* Phytoplasma prunorum' sp. nov., the causal agents of apple proliferation, pear decline and European stone fruit yellows, respectively. International Journal of Systematic and Evolutionary Microbiology.

[6] Mittelberger C, Obkircher L, Oettl S, Oppedisano T, Pedrazzoli F, Panassiti B, Kerschbamer C, Anfora G, Janik K. The insect vector *Cacopsylla picta* vertically transmits the bacterium '*Candidatus* Phytoplasma mali' to its progeny. Plant Pathology. 2016;**66**:1015-1021. DOI:

[7] Horton DR. Monitoring of pear psylla for pest management decisions and research.

[8] Muther J, Vogt H. Sampling methods in orchard trials: A comparison between beating

[9] Ossiannilsson F. The Psylloidea (Homoptera) of Fennoscandia and Demark. Leiden.

[10] Unterthurner M, Baric S. Sechs Jahre Erfahrungen in einer Modellanlage. Obst- und

[11] Austin MP. Spatial prediction of species distribution: An interface between ecological

[12] Gelman A, Hill J. Data analysis using regression and multilevel/hierarchical models.

[13] Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MH, White JS. Generalized linear mixed models: A practical guide for ecology and evolution. Trends in

theory and statistical modelling. Ecological Modelling. 2002;**157**:101-118

some possible new approaches. Ecological Modelling. 2007;**200**:1-19

Integrated Pest Management Reviews. 1999;**4**:1-20

and inventory sampling. IOBC WPRS Bulletin. 2003;**26**:67-72

Cambridge; New York: Cambridge University Press; 2007

Ecology & Evolution. 2009;**24**:127-135


[29] Wikle CK. Hierarchical bayesian models for predicting the spread of ecological processes. Ecology. 2003;**84**:1382-1394. DOI: 10.1890/0012-9658(2003)084[1382,HBMFPT]2.0. CO;2

**Chapter 7**

Provisional chapter

**A Bayesian Hau-Kashyap Approach for Hepatitis**

DOI: 10.5772/intechopen.74638

World Health Organization reported that viral hepatitis affects 400 million people globally. Every year, 610 million people are newly infected. In this research, we integrate a Bayesian theory and Hau-Kashyap approach for detecting hepatitis and displaying the result of calculation process. The basic idea of the Bayesian theory is using the known prior probability and conditional probability density parameter based on the Bayes theorem to calculate the corresponding posterior probability and then obtain the posterior probability to infer and make decisions. Bayesian methods combine present knowledge, prior probabilities, with additional knowledge derived from new data, the likelihood function. Hau-Kashyap presented an alternative Dempster-Shafer combination rule, and the alternative combination rule is that with the use of this alternative rule, the intersection conflict is put into the union. In this chapter, we get basic possibility assignment value from Bayesian probability. The result reveals that a Bayesian Hau-Kashyap approach has

Keywords: hepatitis, disease diagnosis, Bayesian, Dempster-Shafer theory, Hau-Kashyap

Hepatitis is a medical condition defined by the inflammation of the liver and characterized by the presence of inflammatory cells in the tissue of the organ. The word "hepatitis" comes from the ancient Greek word "hepar," root word "hepat," meaning liver [1]. Hepatitis may occur with limited or no symptoms. Hepatitis is acute when it lasts less than 6 months and

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

A Bayesian Hau-Kashyap Approach for Hepatitis

**Disease Detection**

Disease Detection

Moamin A. Mahmoud

Moamin A. Mahmoud

Abstract

approach

1. Introduction

Andino Maseleno, Rohmah Zahroh Hidayati,

Andino Maseleno, Rohmah Zahroh Hidayati,

Additional information is available at the end of the chapter

successfully identified the existence of hepatitis.

Additional information is available at the end of the chapter

Marini Othman, Alicia Y.C. Tang and

Marini Othman, Alicia Y.C. Tang and

http://dx.doi.org/10.5772/intechopen.74638


#### **A Bayesian Hau-Kashyap Approach for Hepatitis Disease Detection** A Bayesian Hau-Kashyap Approach for Hepatitis Disease Detection

DOI: 10.5772/intechopen.74638

Andino Maseleno, Rohmah Zahroh Hidayati, Marini Othman, Alicia Y.C. Tang and Moamin A. Mahmoud Andino Maseleno, Rohmah Zahroh Hidayati, Marini Othman, Alicia Y.C. Tang and Moamin A. Mahmoud

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.74638

#### Abstract

[29] Wikle CK. Hierarchical bayesian models for predicting the spread of ecological processes. Ecology. 2003;**84**:1382-1394. DOI: 10.1890/0012-9658(2003)084[1382,HBMFPT]2.0.

[30] Clark JS. Why environmental scientists are becoming Bayesians. Ecology Letters.

[31] Zuur AF, Saveliev AA, Ieno EN. Zero Inflated Models and Generalized Linear Mixed

[32] Dunson DB. Commentary: Practical advantages of Bayesian analysis of epidemiologic

2005;**8**:2-14. DOI: 10.1111/j.1461-0248.2004.00702.x

Models with R. Newburgh: Highland Statistics Ltd.; 2012

data. American Journal of Epidemiology. 2001;**153**:1222-1226

CO;2

108 New Insights into Bayesian Inference

World Health Organization reported that viral hepatitis affects 400 million people globally. Every year, 610 million people are newly infected. In this research, we integrate a Bayesian theory and Hau-Kashyap approach for detecting hepatitis and displaying the result of calculation process. The basic idea of the Bayesian theory is using the known prior probability and conditional probability density parameter based on the Bayes theorem to calculate the corresponding posterior probability and then obtain the posterior probability to infer and make decisions. Bayesian methods combine present knowledge, prior probabilities, with additional knowledge derived from new data, the likelihood function. Hau-Kashyap presented an alternative Dempster-Shafer combination rule, and the alternative combination rule is that with the use of this alternative rule, the intersection conflict is put into the union. In this chapter, we get basic possibility assignment value from Bayesian probability. The result reveals that a Bayesian Hau-Kashyap approach has successfully identified the existence of hepatitis.

Keywords: hepatitis, disease diagnosis, Bayesian, Dempster-Shafer theory, Hau-Kashyap approach

#### 1. Introduction

Hepatitis is a medical condition defined by the inflammation of the liver and characterized by the presence of inflammatory cells in the tissue of the organ. The word "hepatitis" comes from the ancient Greek word "hepar," root word "hepat," meaning liver [1]. Hepatitis may occur with limited or no symptoms. Hepatitis is acute when it lasts less than 6 months and

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

chronic when it persists longer. In medical, hepatitis means injury to the liver with inflammation of the liver cells. The liver is the largest glandular organ of the body [2]. It weighs about 1.36 kg. It is reddish brown in color and is divided into four lobes of unequal size and shape. There are six main hepatitis viruses, referred to as types A, B, C, D, E and G. Hepatitis A and E are typically caused if patients eat the contaminated food or water. Hepatitis B, C and D are typically caused by parental contact by infected body fluid, and Hepatitis B also can be infected through sexual contact. Hepatitis B is primarily found in the liver. Researches have been done through methods for diagnosis of hepatitis [3, 4, 5]. Bayesian approaches are successfully applied to a variety of problems [6, 7, 8]; recently, several studies have been conducted and have focused on medical diagnosis. These studies have applied different approaches and have achieved various classification accuracies. Neshat et al. [9] studied an adaptive neural fuzzy system for diagnosing the hepatitis B intensity rate. Neshat et al. [10] describes the combination of two methods of particle swarm optimization, and case-based reasoning has been used to diagnose hepatitis. Mahesh et al. [5] proposed a generalized regression neural network-based expert system for the diagnosis of the hepatitis B virus disease. The system classifies each patient into infected and noninfected. If infected, then how severe it is in terms of intensity rate. Panchal et al. [11] described an artificial intelligence-based expert system for Hepatitis B diagnosis. The main reason for using a Bayesian approach to hepatitis detection is that it facilitates the uncertainties related to models and parameter values. It gives a characteristic and principled method of combining prior information with data, within a solid decision theoretical framework. We can fuse past data about a parameter and form a prior distribution for future analysis. When new observations become available, the previous posterior distribution can be used as a prior. All inferences logically follow from Bayesian Hau-Kashyap approach. The structure of the paper is as follows. Section 2 presents a Bayesian Hau-Kashyap approach. Section 3 presents implementation of Bayesian approach. Bayesian approach results are presented in Section 4. Section 5 presents a Bayesian Hau-Kashyap approach for hepatitis disease detection. Results and discussion are presented in Section 6. Finally, Section 7 presents some concluding remarks.

P Bð Þ¼ P ∪

P Bð Þ <sup>¼</sup> P Bð Þ <sup>∩</sup> Ai Pn i¼1

P Bð Þ ∩ Ai

Belief functions offer a non-Bayesian method for quantifying subjective evaluations by using probability. In the 1970s, it was further developed by Shafer, whose book Mathematical Theory of Evidence [13] remains a classic in belief functions or the so-called Theory of Evidence. This theory has been also called the Dempster-Shafer Mathematical Theory of Evidence. In the 1980s, the scientific community working with Artificial Intelligence got involved in using the theory of evidence in applications. The Dempster-Shafer theory or the theory of belief functions is a mathematical theory of evidence, which can be interpreted as a generalization of probability theory [13, 14] in which the elements of the sample space to which nonzero probability mass is attributed are not single points but sets. The sets that get nonzero mass are called focal elements [13]. The sum of these probability masses is 1; however, the basic difference between Dempster-Shafer mathematical theory of evidence and traditional probability theory is that the focal elements of a Dempster-Shafer structure may overlap one another. The Dempster-Shafer mathematical theory of evidence also provides methods to represent and

The Dempster-Shafer theory assumes that there is a fixed set of mutually exclusive and exhaustive elements called hypotheses or propositions and symbolized by the Greek letter Θ, represented as Θ ¼ f g h1; h2;…; hn , where hi is called a hypothesis or proposition. A hypothesis can be any subset of the frame, in example, to singletons in the frame or to combinations of elements in the frame. Θ is also called frame of discernment. A basic probability assignment

Hau and Kashyap [15] presented an alternative Dempser-Shafer rule of combination, denoted

(bpa) is represented by a mass function <sup>m</sup> : <sup>2</sup><sup>Θ</sup> ! ½ � <sup>0</sup>; <sup>1</sup> . Where 2<sup>Θ</sup> is the power set of <sup>Θ</sup>.

by ⊙. Method to integrate Bayesian theory and Hau-Kashyap approach as follows:

1. Step 1: Assume m<sup>1</sup> and m<sup>2</sup> are two mass functions on the frame of discernment mð Þ Θ .

2.3. Integrating Bayesian and Hau-Kashyap approach

tional probabilities, it follows that, as shown in Eqs. (3)–(5).

P Ai ð Þ¼ <sup>j</sup><sup>B</sup> P Bð Þ <sup>∩</sup> Ai

2.2. Dempster-Shafer theory

combine weights of evidence.

Then [12]

n i¼1

ð Þ B ∩ Ai � �

From the multiplication rule since P Að Þ ∩ B appears in the numerator of each of these condi-

<sup>¼</sup> <sup>X</sup><sup>n</sup> i¼1

P Að Þ¼ ∩ B P Að Þ jB :P Bð Þ¼ P Bð Þ jA :P Að Þ (3)

P Bð Þ¼ ∩ Ai P Að Þ<sup>i</sup> P Bð Þ jAi , i ¼ 1, …, n (4)

<sup>¼</sup> P Að Þ<sup>i</sup> P Bð Þ <sup>j</sup>Ai

P Að Þ<sup>i</sup> P Bð Þ jAi

Pn i¼1 P Bð Þ ∩ Ai (2)

http://dx.doi.org/10.5772/intechopen.74638

111

A Bayesian Hau-Kashyap Approach for Hepatitis Disease Detection

, i ¼ 1, …, n: (5)

#### 2. A Bayesian Hau-Kashyap approach

#### 2.1. A Bayesian approach

Let the events A1, A2, …, An form a partition of the sample space S with P Að Þ<sup>i</sup> < 0, i ¼ 1, …, n: For any event B⊂S with P Bð Þ > 0, as shown in Eq. (1):

$$P(A\_i|B) = \frac{P(A\_i)P(B|A\_i)}{\sum\_{i=1}^n P(A\_i)P(B|A\_i)}, \quad i = 1, \ldots, n. \tag{1}$$

We may rationalize this result as follows. Given <sup>B</sup><sup>⊂</sup> <sup>S</sup> <sup>¼</sup> <sup>∪</sup><sup>n</sup> <sup>i</sup>¼<sup>1</sup> Ai, it follows that <sup>B</sup> <sup>¼</sup> <sup>∪</sup><sup>n</sup> <sup>i</sup>¼<sup>1</sup>ð Þ <sup>B</sup> <sup>∩</sup> Ai . If the Ai 's are mutually exclusive, then so are the events B ∩ Ai, i ¼ 1, …, n, and thus, as shown in Eq. (2),

$$P(B) = P\left(\bigcup\_{i=1}^{n} (B \cap A\_i)\right) = \sum\_{i=1}^{n} P(B \cap A\_i) \tag{2}$$

From the multiplication rule since P Að Þ ∩ B appears in the numerator of each of these conditional probabilities, it follows that, as shown in Eqs. (3)–(5).

$$P(A \cap B) = P(A|B).P(B) = P(B|A).P(A) \tag{3}$$

$$P(B \cap A\_i) = P(A\_i)P(B|A\_i), \quad i = 1, \ldots, n \tag{4}$$

Then [12]

chronic when it persists longer. In medical, hepatitis means injury to the liver with inflammation of the liver cells. The liver is the largest glandular organ of the body [2]. It weighs about 1.36 kg. It is reddish brown in color and is divided into four lobes of unequal size and shape. There are six main hepatitis viruses, referred to as types A, B, C, D, E and G. Hepatitis A and E are typically caused if patients eat the contaminated food or water. Hepatitis B, C and D are typically caused by parental contact by infected body fluid, and Hepatitis B also can be infected through sexual contact. Hepatitis B is primarily found in the liver. Researches have been done through methods for diagnosis of hepatitis [3, 4, 5]. Bayesian approaches are successfully applied to a variety of problems [6, 7, 8]; recently, several studies have been conducted and have focused on medical diagnosis. These studies have applied different approaches and have achieved various classification accuracies. Neshat et al. [9] studied an adaptive neural fuzzy system for diagnosing the hepatitis B intensity rate. Neshat et al. [10] describes the combination of two methods of particle swarm optimization, and case-based reasoning has been used to diagnose hepatitis. Mahesh et al. [5] proposed a generalized regression neural network-based expert system for the diagnosis of the hepatitis B virus disease. The system classifies each patient into infected and noninfected. If infected, then how severe it is in terms of intensity rate. Panchal et al. [11] described an artificial intelligence-based expert system for Hepatitis B diagnosis. The main reason for using a Bayesian approach to hepatitis detection is that it facilitates the uncertainties related to models and parameter values. It gives a characteristic and principled method of combining prior information with data, within a solid decision theoretical framework. We can fuse past data about a parameter and form a prior distribution for future analysis. When new observations become available, the previous posterior distribution can be used as a prior. All inferences logically follow from Bayesian Hau-Kashyap approach. The structure of the paper is as follows. Section 2 presents a Bayesian Hau-Kashyap approach. Section 3 presents implementation of Bayesian approach. Bayesian approach results are presented in Section 4. Section 5 presents a Bayesian Hau-Kashyap approach for hepatitis disease detection. Results and discussion are presented in

Section 6. Finally, Section 7 presents some concluding remarks.

Let the events A1, A2, …, An form a partition of the sample space S with P Að Þ<sup>i</sup> < 0, i ¼ 1, …, n:

P Að Þ<sup>i</sup> P Bð Þ jAi

If the Ai 's are mutually exclusive, then so are the events B ∩ Ai, i ¼ 1, …, n, and thus, as shown

, i ¼ 1, …, n: (1)

<sup>i</sup>¼<sup>1</sup>ð Þ <sup>B</sup> <sup>∩</sup> Ai .

<sup>i</sup>¼<sup>1</sup> Ai, it follows that <sup>B</sup> <sup>¼</sup> <sup>∪</sup><sup>n</sup>

P Ai ð Þ¼ <sup>j</sup><sup>B</sup> P Að Þ<sup>i</sup> P Bð Þ <sup>j</sup>Ai Pn i¼1

2. A Bayesian Hau-Kashyap approach

For any event B⊂S with P Bð Þ > 0, as shown in Eq. (1):

We may rationalize this result as follows. Given <sup>B</sup><sup>⊂</sup> <sup>S</sup> <sup>¼</sup> <sup>∪</sup><sup>n</sup>

2.1. A Bayesian approach

110 New Insights into Bayesian Inference

in Eq. (2),

$$P(A\_i|B) = \frac{P(B \cap A\_i)}{P(B)} = \frac{P(B \cap A\_i)}{\sum\_{i=1}^n P(B \cap A\_i)} = \frac{P(A\_i)P(B|A\_i)}{\sum\_{i=1}^n P(A\_i)P(B|A\_i)}, \quad i = 1, \ldots, n. \tag{5}$$

#### 2.2. Dempster-Shafer theory

Belief functions offer a non-Bayesian method for quantifying subjective evaluations by using probability. In the 1970s, it was further developed by Shafer, whose book Mathematical Theory of Evidence [13] remains a classic in belief functions or the so-called Theory of Evidence. This theory has been also called the Dempster-Shafer Mathematical Theory of Evidence. In the 1980s, the scientific community working with Artificial Intelligence got involved in using the theory of evidence in applications. The Dempster-Shafer theory or the theory of belief functions is a mathematical theory of evidence, which can be interpreted as a generalization of probability theory [13, 14] in which the elements of the sample space to which nonzero probability mass is attributed are not single points but sets. The sets that get nonzero mass are called focal elements [13]. The sum of these probability masses is 1; however, the basic difference between Dempster-Shafer mathematical theory of evidence and traditional probability theory is that the focal elements of a Dempster-Shafer structure may overlap one another. The Dempster-Shafer mathematical theory of evidence also provides methods to represent and combine weights of evidence.

The Dempster-Shafer theory assumes that there is a fixed set of mutually exclusive and exhaustive elements called hypotheses or propositions and symbolized by the Greek letter Θ, represented as Θ ¼ f g h1; h2;…; hn , where hi is called a hypothesis or proposition. A hypothesis can be any subset of the frame, in example, to singletons in the frame or to combinations of elements in the frame. Θ is also called frame of discernment. A basic probability assignment (bpa) is represented by a mass function <sup>m</sup> : <sup>2</sup><sup>Θ</sup> ! ½ � <sup>0</sup>; <sup>1</sup> . Where 2<sup>Θ</sup> is the power set of <sup>Θ</sup>.

#### 2.3. Integrating Bayesian and Hau-Kashyap approach

Hau and Kashyap [15] presented an alternative Dempser-Shafer rule of combination, denoted by ⊙. Method to integrate Bayesian theory and Hau-Kashyap approach as follows:

1. Step 1: Assume m<sup>1</sup> and m<sup>2</sup> are two mass functions on the frame of discernment mð Þ Θ .

$$\text{From Eq. (5), } P(A\_i|B) = \frac{P(B \cap A\_i)}{P(B)} = \frac{P(B \cap A\_i)}{\sum\_{i=1}^n P(B \cap A\_i)} = \frac{P(A\_i)P(B|A\_i)}{\sum\_{i=1}^n P(A\_i)P(B|A\_i)}, \quad i = 1, \dots, n.$$

We can get m from the result of Eq. (5). m Pð Þ is called basic possibility assignment value, which presents the level of trust to proposition P. Let Ri, Zj be their sets of focal elements. ð Þ m1⊙m<sup>2</sup> ð Þ¼ ∅ 0.

#### 2. Step 2:

$$\text{If } R\_i \cap Z\_j \neq \mathfrak{Q} \text{ then let } \mathbf{X} = R\_i \cap Z\_j \text{ and } (m\_1 \odot m\_2)(\mathbf{X}) = \sum\_{\mathbb{R}\_i \cap Z\_j} = \mathbf{X}m\_1(R\_i)m\_2(Z\_j) \tag{6}$$

3. Step 3:

$$\text{If } R\_i \cap Z\_j = \bigotimes \text{ then let } X = R\_i \cup Z\_j \text{ and } (m\_1 \odot m\_2)(X) = \sum\_{R\_i \cup Z\_j} = X m\_1(R\_i) m\_2(Z\_j) \tag{7}$$

presence of hepatitis. Bayes rule allows us to compute the probability we really want Pr (Hepatitis ∣ Malaise) with the help of the more readily available number Pr(Malaise ∣ Hepatitis). Bayes's theorem is a formula with conditioned probabilities. Calculating the probability of

Malaise ∣ Hepatitis 0.85 0.70 0.80 0.75 0.60 Hepatitis 0.45 0.30 0.35 0.40 0.50 Malaise ∣ Malaria 0.65 0.55 0.75 0.45 0.85 Malaria 0.55 0.40 0.50 0.35 0.45 Malaise ∣ Influenza 0.20 0.25 0.30 0.35 0.40 Influenza 0.50 0.30 0.45 0.40 0.35 Malaise ∣ Gastroenteritis 0.60 0.50 0.65 0.70 0.75 Gastroenteritis 0.30 0.35 0.40 0.50 0.60

There is about a 37.5% chance that the probability of hepatitis given the symptom of malaise

Calculating the probability of malaria given the symptom of malaise, which is calculated as follows:

There is about a 35% chance that the probability of malaria given the symptom of malaise

Calculating the probability of influenza given the symptom of malaise, which is calculated as

There is about a 9.8% chance that the probability of influenza given the symptom of malaise

Calculating the probability of gastroenteritis given the symptom of malaise, which is calcu-

ð Þþ <sup>0</sup>:<sup>85</sup> � <sup>0</sup>:<sup>45</sup> ð Þþ <sup>0</sup>:<sup>65</sup> � <sup>0</sup>:<sup>55</sup> ð Þþ <sup>0</sup>:<sup>20</sup> � <sup>0</sup>:<sup>50</sup> ð Þ <sup>0</sup>:<sup>60</sup> � <sup>0</sup>:<sup>30</sup> <sup>¼</sup> <sup>0</sup>:<sup>375</sup>

Condition 1 Condition 2 Condition 3 Condition 4 Condition 5

A Bayesian Hau-Kashyap Approach for Hepatitis Disease Detection

http://dx.doi.org/10.5772/intechopen.74638

113

ð Þþ <sup>0</sup>:<sup>85</sup> � <sup>0</sup>:<sup>45</sup> ð Þþ <sup>0</sup>:<sup>65</sup> � <sup>0</sup>:<sup>55</sup> ð Þþ <sup>0</sup>:<sup>20</sup> � <sup>0</sup>:<sup>50</sup> ð Þ <sup>0</sup>:<sup>60</sup> � <sup>0</sup>:<sup>30</sup> <sup>¼</sup> <sup>0</sup>:<sup>350</sup>

ð Þþ <sup>0</sup>:<sup>85</sup> � <sup>0</sup>:<sup>45</sup> ð Þþ <sup>0</sup>:<sup>65</sup> � <sup>0</sup>:<sup>55</sup> ð Þþ <sup>0</sup>:<sup>20</sup> � <sup>0</sup>:<sup>50</sup> ð Þ <sup>0</sup>:<sup>60</sup> � <sup>0</sup>:<sup>30</sup> <sup>¼</sup> <sup>0</sup>:<sup>098</sup>

ð Þþ <sup>0</sup>:<sup>85</sup> � <sup>0</sup>:<sup>45</sup> ð Þþ <sup>0</sup>:<sup>65</sup> � <sup>0</sup>:<sup>55</sup> ð Þþ <sup>0</sup>:<sup>20</sup> � <sup>0</sup>:<sup>50</sup> ð Þ <sup>0</sup>:<sup>60</sup> � <sup>0</sup>:<sup>30</sup> <sup>¼</sup> <sup>0</sup>:<sup>177</sup>

hepatitis given the symptom of malaise, which is calculated as follows:

P Hepatitis <sup>ð</sup> <sup>j</sup>MalaiseÞ ¼ <sup>0</sup>:<sup>85</sup> � <sup>0</sup>:<sup>45</sup>

Action Condition

Table 1. Hepatitis ∣ malaise.

actually has the attribute given that it tested positively for it.

P Malaria <sup>ð</sup> <sup>j</sup>MalaiseÞ ¼ <sup>0</sup>:<sup>65</sup> � <sup>0</sup>:<sup>55</sup>

actually has the attribute given that it tested positively for it.

P Influenza <sup>ð</sup> <sup>j</sup>MalaiseÞ ¼ <sup>0</sup>:<sup>20</sup> � <sup>0</sup>:<sup>50</sup>

actually has the attribute given that it tested positively for it.

P Gastroenteritis <sup>ð</sup> <sup>j</sup>MalaiseÞ ¼ <sup>0</sup>:<sup>60</sup> � <sup>0</sup>:<sup>30</sup>

follows:

lated as follows:

The fundamental distinction between the Dempster-Shafer combination rule and the Hau-Kashyap combination rule is that with the use of Hau-Kashyap rule, the conflict m1ð Þ Ri m<sup>2</sup> Zj � � for Ri ∩ Zj ¼ ∅ is put into the union Ri∪Zj.

#### 3. A Bayesian approach for hepatitis disease detection

Everyday medical practice contains many examples of probability. Medical doctor often uses words such as probably, unlikely, certainly, or almost certainly in all conversations with patients. Medical doctor only rarely attach numbers to these terms, but computerized systems must use some numerical representation of likelihood in order to combine statements into conclusions. Probability is represented numerically by a number between 0 and 1. This study conducts experiments on hepatitis dataset. The main goal of the dataset is to forecast the presence or absence of hepatitis virus. The dataset contains probability of the initial symptoms of hepatitis, which are often similar to other diseases.

The initial symptoms of hepatitis include malaise, fever and headache. The probability of malaise given the presence for hepatitis, malaria, influenza and gastroenteritis. The probability of fever given the presence for hepatitis, malaria, influenza and gastroenteritis. The probability of headache given the presence for hepatitis, malaria, influenza and gastroenteritis. The probability was obtained by studying a series of patients with proven hepatitis by looking up diagnosis codes in the medical records department, and computing the percentage of these patients who present with malaise, fever and headache.

#### 3.1. Probability of hepatitis given the symptom of malaise

Malaise is a feeling of general discomfort, uneasiness or pain, often the first indication of an infection. Table 1 shows the probability of malaise (Ma) given the presence for hepatitis (H), malaria (M), influenza (I), and gastroenteritis (G).

P(Hepatitis ∣ Malaise), which is read as the probability of hepatitis given the symptom of malaise. Pr(Malaise (Ma) ∣ Hepatitis (H)), which is the probability of malaise given the


Table 1. Hepatitis ∣ malaise.

From Eq. (5), P Ai ð Þ¼ <sup>j</sup><sup>B</sup> P Bð Þ <sup>∩</sup> Ai

112 New Insights into Bayesian Inference

for Ri ∩ Zj ¼ ∅ is put into the union Ri∪Zj.

ð Þ m1⊙m<sup>2</sup> ð Þ¼ ∅ 0.

2. Step 2:

3. Step 3:

P Bð Þ ¼ P

P Bð Þ <sup>∩</sup> Ai <sup>n</sup> <sup>i</sup>¼<sup>1</sup> P Bð Þ <sup>∩</sup> Ai

If Ri <sup>∩</sup> Zj 6¼ <sup>∅</sup> then let <sup>X</sup> <sup>¼</sup> Ri <sup>∩</sup>Zj and ð Þ <sup>m</sup>1⊙m<sup>2</sup> ð Þ¼ <sup>X</sup> <sup>X</sup>

If Ri <sup>∩</sup> Zj <sup>¼</sup> <sup>∅</sup> then let <sup>X</sup> <sup>¼</sup> Ri∪Zj and ð Þ <sup>m</sup>1⊙m<sup>2</sup> ð Þ¼ <sup>X</sup> <sup>X</sup>

3. A Bayesian approach for hepatitis disease detection

of hepatitis, which are often similar to other diseases.

patients who present with malaise, fever and headache.

malaria (M), influenza (I), and gastroenteritis (G).

3.1. Probability of hepatitis given the symptom of malaise

¼ P

We can get m from the result of Eq. (5). m Pð Þ is called basic possibility assignment value, which presents the level of trust to proposition P. Let Ri, Zj be their sets of focal elements.

The fundamental distinction between the Dempster-Shafer combination rule and the Hau-Kashyap combination rule is that with the use of Hau-Kashyap rule, the conflict m1ð Þ Ri m<sup>2</sup> Zj

Everyday medical practice contains many examples of probability. Medical doctor often uses words such as probably, unlikely, certainly, or almost certainly in all conversations with patients. Medical doctor only rarely attach numbers to these terms, but computerized systems must use some numerical representation of likelihood in order to combine statements into conclusions. Probability is represented numerically by a number between 0 and 1. This study conducts experiments on hepatitis dataset. The main goal of the dataset is to forecast the presence or absence of hepatitis virus. The dataset contains probability of the initial symptoms

The initial symptoms of hepatitis include malaise, fever and headache. The probability of malaise given the presence for hepatitis, malaria, influenza and gastroenteritis. The probability of fever given the presence for hepatitis, malaria, influenza and gastroenteritis. The probability of headache given the presence for hepatitis, malaria, influenza and gastroenteritis. The probability was obtained by studying a series of patients with proven hepatitis by looking up diagnosis codes in the medical records department, and computing the percentage of these

Malaise is a feeling of general discomfort, uneasiness or pain, often the first indication of an infection. Table 1 shows the probability of malaise (Ma) given the presence for hepatitis (H),

P(Hepatitis ∣ Malaise), which is read as the probability of hepatitis given the symptom of malaise. Pr(Malaise (Ma) ∣ Hepatitis (H)), which is the probability of malaise given the

P Að Þ<sup>i</sup> P Bð Þ <sup>j</sup>Ai <sup>n</sup> <sup>i</sup>¼<sup>1</sup> P Að Þ<sup>i</sup> P Bð Þ <sup>j</sup>Ai

, i ¼ 1, …, n:

Ri ∩ Zj

Ri∪Zj

¼ Xm1ð Þ Ri m<sup>2</sup> Zj

¼ Xm1ð Þ Ri m<sup>2</sup> Zj

� � (6)

� � (7)

� �

presence of hepatitis. Bayes rule allows us to compute the probability we really want Pr (Hepatitis ∣ Malaise) with the help of the more readily available number Pr(Malaise ∣ Hepatitis). Bayes's theorem is a formula with conditioned probabilities. Calculating the probability of hepatitis given the symptom of malaise, which is calculated as follows:

$$P(\text{Heptitis}|\text{Molaris}) = \frac{0.85 \times 0.45}{(0.85 \times 0.45) + (0.65 \times 0.55) + (0.20 \times 0.50) + (0.60 \times 0.30)} = 0.375$$

There is about a 37.5% chance that the probability of hepatitis given the symptom of malaise actually has the attribute given that it tested positively for it.

Calculating the probability of malaria given the symptom of malaise, which is calculated as follows:

$$P(\text{Molaria}|\text{Molaris}) = \frac{0.65 \times 0.55}{(0.85 \times 0.45) + (0.65 \times 0.55) + (0.20 \times 0.50) + (0.60 \times 0.30)} = 0.350$$

There is about a 35% chance that the probability of malaria given the symptom of malaise actually has the attribute given that it tested positively for it.

Calculating the probability of influenza given the symptom of malaise, which is calculated as follows:

$$P(\text{Influuzza}|\text{Malanis}) = \frac{0.20 \times 0.50}{(0.85 \times 0.45) + (0.65 \times 0.55) + (0.20 \times 0.50) + (0.60 \times 0.30)} = 0.098$$

There is about a 9.8% chance that the probability of influenza given the symptom of malaise actually has the attribute given that it tested positively for it.

Calculating the probability of gastroenteritis given the symptom of malaise, which is calculated as follows:

$$P(\text{Gastroenteritis}|\text{Malaise}) = \frac{0.60 \times 0.30}{(0.85 \times 0.45) + (0.65 \times 0.55) + (0.20 \times 0.50) + (0.60 \times 0.30)} = 0.177$$

There is about a 17.7% chance that the probability of gastroenteritis given the symptom of malaise actually has the attribute given that it tested positively for it.

Calculating the probability of gastroenteritis given the symptom of fever, which is calculated

There is about a 14.4% chance that the probability of gastroenteritis given the symptom of

Headache is pain in any region of the head. Headaches may occur on one or both sides of the head, be isolated to a certain location, radiate across the head from one point or have a viselike quality. Table 3 shows the probability of headache (He) given the presence for hepatitis (H),

Calculating the probability of hepatitis given the symptom of headache, which is calculated as

There is about a 31.8% chance that the probability of hepatitis given the symptom of headache

Calculating the probability of malaria given the symptom of headache, which is calculated as

There is about a 19.9% chance that the probability of malaria given the symptom of headache

Headache ∣ Hepatitis 0.80 0.75 0.70 0.65 0.60 Hepatitis 0.45 0.35 0.40 0.50 0.55 Headache ∣ Malaria 0.75 0.70 0.60 0.80 0.65 Malaria 0.30 0.40 0.45 0.35 0.50 Headache ∣ Influenza 0.55 0.50 0.40 0.45 0.60 Influenza 0.50 0.55 0.45 0.60 0.65 Headache ∣ Gastroenteritis 0.60 0.65 0.55 0.40 0.45 Gastroenteritis 0.45 0.50 0.40 0.55 0.60

ð Þþ <sup>0</sup>:<sup>75</sup> � <sup>0</sup>:<sup>40</sup> ð Þþ <sup>0</sup>:<sup>60</sup> � <sup>0</sup>:<sup>50</sup> ð Þþ <sup>0</sup>:<sup>65</sup> � <sup>0</sup>:<sup>45</sup> ð Þ <sup>0</sup>:<sup>50</sup> � <sup>0</sup>:<sup>30</sup> <sup>¼</sup> <sup>0</sup>:<sup>144</sup>

A Bayesian Hau-Kashyap Approach for Hepatitis Disease Detection

http://dx.doi.org/10.5772/intechopen.74638

115

ð Þþ <sup>0</sup>:<sup>80</sup> � <sup>0</sup>:<sup>45</sup> ð Þþ <sup>0</sup>:<sup>75</sup> � <sup>0</sup>:<sup>30</sup> ð Þþ <sup>0</sup>:<sup>55</sup> � <sup>0</sup>:<sup>50</sup> ð Þ <sup>0</sup>:<sup>60</sup> � <sup>0</sup>:<sup>45</sup> <sup>¼</sup> <sup>0</sup>:<sup>318</sup>

ð Þþ <sup>0</sup>:<sup>80</sup> � <sup>0</sup>:<sup>45</sup> ð Þþ <sup>0</sup>:<sup>75</sup> � <sup>0</sup>:<sup>30</sup> ð Þþ <sup>0</sup>:<sup>55</sup> � <sup>0</sup>:<sup>50</sup> ð Þ <sup>0</sup>:<sup>60</sup> � <sup>0</sup>:<sup>45</sup> <sup>¼</sup> <sup>0</sup>:<sup>199</sup>

Condition 1 Condition 2 Condition 3 Condition 4 Condition 5

P Gastroenteritis <sup>ð</sup> <sup>j</sup>FeverÞ ¼ <sup>0</sup>:<sup>50</sup> � <sup>0</sup>:<sup>30</sup>

fever actually has the attribute given that it tested positively for it.

3.3. Probability of hepatitis given the symptom of headache

P Hepatitis <sup>ð</sup> <sup>j</sup>HeadacheÞ ¼ <sup>0</sup>:<sup>80</sup> � <sup>0</sup>:<sup>45</sup>

actually has the attribute given that it tested positively for it.

P Malaria <sup>ð</sup> <sup>j</sup>HeadacheÞ ¼ <sup>0</sup>:<sup>75</sup> � <sup>0</sup>:<sup>30</sup>

actually has the attribute given that it tested positively for it.

Action Condition

Table 3. Hepatitis ∣ headache.

malaria (M), influenza (I), and gastroenteritis (G).

as follows:

follows:

follows:

#### 3.2. Probability of hepatitis given the symptom of fever

Fever is defined as having a temperature above the normal range due to an increase in the body's temperature set point. Table 2 shows the probability of fever (Fe) given the presence for hepatitis (H), malaria (M), influenza (I) and gastroenteritis (G).

Calculating the probability of hepatitis given the symptom of fever, which is calculated as follows:

$$P(\text{Heptitis}|\text{Fever}) = \frac{0.75 \times 0.40}{(0.75 \times 0.40) + (0.60 \times 0.50) + (0.65 \times 0.45) + (0.50 \times 0.30)} = 0.288$$

There is about a 28.8% chance that the probability of hepatitis given the symptom of fever actually has the attribute given that it tested positively for it.

Calculating the probability of malaria given the symptom of fever, which is calculated as follows:

$$P(\text{Malaria}|\text{Fever}) = \frac{0.60 \times 0.50}{(0.75 \times 0.40) + (0.60 \times 0.50) + (0.65 \times 0.45) + (0.50 \times 0.30)} = 0.288$$

There is about a 28.8% chance that the probability of malaria given the symptom of fever actually has the attribute given that it tested positively for it.

Calculating the probability of influenza given the symptom of fever, which is calculated as follows:

$$P(\text{Iųfųuenza}|\text{Fever}) = \frac{0.65 \times 0.45}{(0.75 \times 0.40) + (0.60 \times 0.50) + (0.65 \times 0.45) + (0.50 \times 0.30)} = 0.280$$

There is about a 28% chance that the probability of influenza given the symptom of fever actually has the attribute given that it tested positively for it.


Table 2. Hepatitis ∣ fever.

Calculating the probability of gastroenteritis given the symptom of fever, which is calculated as follows:

$$P(\text{Gastroenteritis}|\text{Fever}) = \frac{0.50 \times 0.30}{(0.75 \times 0.40) + (0.60 \times 0.50) + (0.65 \times 0.45) + (0.50 \times 0.30)} = 0.144$$

There is about a 14.4% chance that the probability of gastroenteritis given the symptom of fever actually has the attribute given that it tested positively for it.

#### 3.3. Probability of hepatitis given the symptom of headache

There is about a 17.7% chance that the probability of gastroenteritis given the symptom of

Fever is defined as having a temperature above the normal range due to an increase in the body's temperature set point. Table 2 shows the probability of fever (Fe) given the presence for

Calculating the probability of hepatitis given the symptom of fever, which is calculated as follows:

There is about a 28.8% chance that the probability of hepatitis given the symptom of fever

Calculating the probability of malaria given the symptom of fever, which is calculated as follows:

There is about a 28.8% chance that the probability of malaria given the symptom of fever

Calculating the probability of influenza given the symptom of fever, which is calculated as

There is about a 28% chance that the probability of influenza given the symptom of fever

Fever ∣ Hepatitis 0.75 0.70 0.80 0.60 0.65 Hepatitis 0.40 0.45 0.50 0.55 0.60 Fever ∣ Malaria 0.60 0.80 0.70 0.75 0.65 Malaria 0.50 0.40 0.45 0.55 0.35 Fever ∣ Influenza 0.65 0.70 0.75 0.60 0.80 Influenza 0.45 0.50 0.35 0.55 0.40 Fever ∣ Gastroenteritis 0.50 0.40 0.45 0.55 0.35 Gastroenteritis 0.30 0.45 0.30 0.35 0.30

ð Þþ <sup>0</sup>:<sup>75</sup> � <sup>0</sup>:<sup>40</sup> ð Þþ <sup>0</sup>:<sup>60</sup> � <sup>0</sup>:<sup>50</sup> ð Þþ <sup>0</sup>:<sup>65</sup> � <sup>0</sup>:<sup>45</sup> ð Þ <sup>0</sup>:<sup>50</sup> � <sup>0</sup>:<sup>30</sup> <sup>¼</sup> <sup>0</sup>:<sup>288</sup>

ð Þþ <sup>0</sup>:<sup>75</sup> � <sup>0</sup>:<sup>40</sup> ð Þþ <sup>0</sup>:<sup>60</sup> � <sup>0</sup>:<sup>50</sup> ð Þþ <sup>0</sup>:<sup>65</sup> � <sup>0</sup>:<sup>45</sup> ð Þ <sup>0</sup>:<sup>50</sup> � <sup>0</sup>:<sup>30</sup> <sup>¼</sup> <sup>0</sup>:<sup>288</sup>

ð Þþ <sup>0</sup>:<sup>75</sup> � <sup>0</sup>:<sup>40</sup> ð Þþ <sup>0</sup>:<sup>60</sup> � <sup>0</sup>:<sup>50</sup> ð Þþ <sup>0</sup>:<sup>65</sup> � <sup>0</sup>:<sup>45</sup> ð Þ <sup>0</sup>:<sup>50</sup> � <sup>0</sup>:<sup>30</sup> <sup>¼</sup> <sup>0</sup>:<sup>280</sup>

Condition 1 Condition 2 Condition 3 Condition 4 Condition 5

malaise actually has the attribute given that it tested positively for it.

3.2. Probability of hepatitis given the symptom of fever

114 New Insights into Bayesian Inference

hepatitis (H), malaria (M), influenza (I) and gastroenteritis (G).

P Hepatitis <sup>ð</sup> <sup>j</sup>FeverÞ ¼ <sup>0</sup>:<sup>75</sup> � <sup>0</sup>:<sup>40</sup>

actually has the attribute given that it tested positively for it.

P Malaria <sup>ð</sup> <sup>j</sup>FeverÞ ¼ <sup>0</sup>:<sup>60</sup> � <sup>0</sup>:<sup>50</sup>

actually has the attribute given that it tested positively for it.

P Influenza <sup>ð</sup> <sup>j</sup>FeverÞ ¼ <sup>0</sup>:<sup>65</sup> � <sup>0</sup>:<sup>45</sup>

actually has the attribute given that it tested positively for it.

Action Condition

Table 2. Hepatitis ∣ fever.

follows:

Headache is pain in any region of the head. Headaches may occur on one or both sides of the head, be isolated to a certain location, radiate across the head from one point or have a viselike quality. Table 3 shows the probability of headache (He) given the presence for hepatitis (H), malaria (M), influenza (I), and gastroenteritis (G).

Calculating the probability of hepatitis given the symptom of headache, which is calculated as follows:

$$P(\text{Heptitis}|\text{Headache}) = \frac{0.80 \times 0.45}{(0.80 \times 0.45) + (0.75 \times 0.30) + (0.55 \times 0.50) + (0.60 \times 0.45)} = 0.318$$

There is about a 31.8% chance that the probability of hepatitis given the symptom of headache actually has the attribute given that it tested positively for it.

Calculating the probability of malaria given the symptom of headache, which is calculated as follows:

$$P(\text{Malaria}|\text{Headache}) = \frac{0.75 \times 0.30}{(0.80 \times 0.45) + (0.75 \times 0.30) + (0.55 \times 0.50) + (0.60 \times 0.45)} = 0.199$$

There is about a 19.9% chance that the probability of malaria given the symptom of headache actually has the attribute given that it tested positively for it.


Table 3. Hepatitis ∣ headache.

Calculating the probability of influenza given the symptom of headache, which is calculated as follows:

$$P(\text{Influuzza}|\text{Headache}) = \frac{0.55 \times 0.50}{(0.80 \times 0.45) + (0.75 \times 0.30) + (0.55 \times 0.50) + (0.60 \times 0.45)} = 0.243$$

There is about a 24.3% chance that the probability of influenza given the symptom of headache actually has the attribute given that it tested positively for it.

Calculating the probability of gastroenteritis given the symptom of headache, which is calculated as follows:

$$P(\text{Gastroentritis}|\text{Headache}) = \frac{0.60 \times 0.45}{(0.80 \times 0.45) + (0.75 \times 0.30) + (0.55 \times 0.50) + (0.60 \times 0.45)} = 0.240$$

There is about a 24% chance that the probability of gastroenteritis given the symptom of headache actually has the attribute given that it tested positively for it.

Table 5 shows probability of diseases given the symptom of fever. These probabilities are probability of hepatitis given the symptom of fever, probability of malaria given the symptom of fever, probability of influenza given the symptom of fever and probability of gastroenteritis

A Bayesian Hau-Kashyap Approach for Hepatitis Disease Detection

http://dx.doi.org/10.5772/intechopen.74638

117

Figure 2 shows graphic of probability of disease given the symptom of fever. Probability of hepatitis given the symptom of fever obtained value 0.288 for condition 1, 0.270 for condition 2, 0.360 for condition 3, 0.261 for condition 4 and 0.351 for condition 5. Probability of malaria given the symptom of fever obtained value 0.288 for condition 1, 0.275 for condition 2, 0.283 for condition 3, 0.326 for condition 4 and 0.204 for condition 5. Probability of influenza given the symptom of fever obtained value 0.280 for condition 1, 0.300 for condition 2, 0.236 for condition 3, 0.261 for condition 4 and 0.288 for condition 5. Probability of gastroenteritis given the symptom of fever obtained value 0.144 for condition 1, 0.155 for condition 2, 0.121 for condi-

Table 6 shows probability of diseases given the symptom of headache. These probabilities are probability of hepatitis given the symptom of headache, probability of malaria given the symptom of headache, probability of influenza given the symptom of headache and

Hepatitis ∣ Fever 0.288 0.270 0.360 0.261 0.351 Malaria ∣ Fever 0.288 0.275 0.283 0.326 0.204 Influenza ∣ Fever 0.280 0.300 0.236 0.261 0.288 Gastroenteritis ∣ Fever 0.144 0.155 0.121 0.152 0.157

Condition 1 Condition 2 Condition 3 Condition 4 Condition 5

given the symptom of fever.

Action Condition

Table 5. Hepatitis ∣ fever.

tion 3, 0.152 for condition 4 and 0.157 for condition 5.

Figure 1. Graphic of probability of disease given the symptom of malaise.

probability of gastroenteritis given the symptom of headache.

#### 4. A Bayesian approach for hepatitis disease detection results

Table 4 shows probability of diseases given the symptom of malaise. These probabilities are probability of hepatitis given the symptom of malaise, probability of malaria given the symptom of malaise, probability of influenza given the symptom of malaise and probability of gastroenteritis given the symptom of malaise.

Figure 1 shows graphic of probability of disease given the symptom of malaise. Probability of hepatitis given the symptom of malaise obtained value 0.375 for condition 1, 0.310 for condition 2, 0.267 for condition 3, 0.317 for condition 4 and 0.236 for condition 5. Probability of malaria given the symptom of malaise obtained value 0.350 for condition 1, 0.323 for condition 2, 0.357 for condition 3, 0.166 for condition 4 and 0.300 for condition 5. Probability of influenza given the symptom of malaise obtained value 0.098 for condition 1, 0.110 for condition 2, 0.128 for condition 3, 0.148 for condition 4 and 0.110 for condition 5. Probability of gastroenteritis given the symptom of malaise obtained value 0.177 for condition 1, 0.257 for condition 2, 0.248 for condition 3, 0.369 for condition 4 and 0.354 for condition 5.


Table 4. Hepatitis ∣ malaise.

Figure 1. Graphic of probability of disease given the symptom of malaise.

Calculating the probability of influenza given the symptom of headache, which is calculated as

There is about a 24.3% chance that the probability of influenza given the symptom of headache

Calculating the probability of gastroenteritis given the symptom of headache, which is calcu-

There is about a 24% chance that the probability of gastroenteritis given the symptom of

Table 4 shows probability of diseases given the symptom of malaise. These probabilities are probability of hepatitis given the symptom of malaise, probability of malaria given the symptom of malaise, probability of influenza given the symptom of malaise and probability of

Figure 1 shows graphic of probability of disease given the symptom of malaise. Probability of hepatitis given the symptom of malaise obtained value 0.375 for condition 1, 0.310 for condition 2, 0.267 for condition 3, 0.317 for condition 4 and 0.236 for condition 5. Probability of malaria given the symptom of malaise obtained value 0.350 for condition 1, 0.323 for condition 2, 0.357 for condition 3, 0.166 for condition 4 and 0.300 for condition 5. Probability of influenza given the symptom of malaise obtained value 0.098 for condition 1, 0.110 for condition 2, 0.128 for condition 3, 0.148 for condition 4 and 0.110 for condition 5. Probability of gastroenteritis given the symptom of malaise obtained value 0.177 for condition 1, 0.257 for condition 2, 0.248

Hepatitis ∣ Malaise 0.375 0.310 0.267 0.317 0.236 Malaria ∣ Malaise 0.350 0.323 0.357 0.166 0.300 Influenza ∣ Malaise 0.098 0.110 0.128 0.148 0.110 Gastroenteritis ∣ Malaise 0.177 0.257 0.248 0.369 0.354

ð Þþ <sup>0</sup>:<sup>80</sup> � <sup>0</sup>:<sup>45</sup> ð Þþ <sup>0</sup>:<sup>75</sup> � <sup>0</sup>:<sup>30</sup> ð Þþ <sup>0</sup>:<sup>55</sup> � <sup>0</sup>:<sup>50</sup> ð Þ <sup>0</sup>:<sup>60</sup> � <sup>0</sup>:<sup>45</sup> <sup>¼</sup> <sup>0</sup>:<sup>243</sup>

ð Þþ <sup>0</sup>:80�0:<sup>45</sup> ð Þþ <sup>0</sup>:75�0:<sup>30</sup> ð Þþ <sup>0</sup>:55�0:<sup>50</sup> ð Þ <sup>0</sup>:60�0:<sup>45</sup> <sup>¼</sup>0:<sup>240</sup>

Condition 1 Condition 2 Condition 3 Condition 4 Condition 5

P Influenza <sup>ð</sup> <sup>j</sup>HeadacheÞ ¼ <sup>0</sup>:<sup>55</sup> � <sup>0</sup>:<sup>50</sup>

actually has the attribute given that it tested positively for it.

P Gastroenteritis <sup>ð</sup> <sup>j</sup>HeadacheÞ ¼ <sup>0</sup>:60�0:<sup>45</sup>

gastroenteritis given the symptom of malaise.

Action Condition

Table 4. Hepatitis ∣ malaise.

for condition 3, 0.369 for condition 4 and 0.354 for condition 5.

headache actually has the attribute given that it tested positively for it.

4. A Bayesian approach for hepatitis disease detection results

follows:

116 New Insights into Bayesian Inference

lated as follows:

Table 5 shows probability of diseases given the symptom of fever. These probabilities are probability of hepatitis given the symptom of fever, probability of malaria given the symptom of fever, probability of influenza given the symptom of fever and probability of gastroenteritis given the symptom of fever.

Figure 2 shows graphic of probability of disease given the symptom of fever. Probability of hepatitis given the symptom of fever obtained value 0.288 for condition 1, 0.270 for condition 2, 0.360 for condition 3, 0.261 for condition 4 and 0.351 for condition 5. Probability of malaria given the symptom of fever obtained value 0.288 for condition 1, 0.275 for condition 2, 0.283 for condition 3, 0.326 for condition 4 and 0.204 for condition 5. Probability of influenza given the symptom of fever obtained value 0.280 for condition 1, 0.300 for condition 2, 0.236 for condition 3, 0.261 for condition 4 and 0.288 for condition 5. Probability of gastroenteritis given the symptom of fever obtained value 0.144 for condition 1, 0.155 for condition 2, 0.121 for condition 3, 0.152 for condition 4 and 0.157 for condition 5.

Table 6 shows probability of diseases given the symptom of headache. These probabilities are probability of hepatitis given the symptom of headache, probability of malaria given the symptom of headache, probability of influenza given the symptom of headache and probability of gastroenteritis given the symptom of headache.


Table 5. Hepatitis ∣ fever.

Figure 2. Graphic of probability of disease given the symptom of fever.


Table 6. Hepatitis ∣ headache.

Figure 3 shows graphic of probability of disease given the symptom of headache. Probability of hepatitis given the symptom of headache obtained value 0.318 for condition 1, 0.230 for condition 2, 0.295 for condition 3, 0.296 for condition 4 and 0.251 for condition 5. Probability of malaria given the symptom of headache obtained value 0.199 for condition 1, 0.245 for condition 2, 0.284 for condition 3, 0.256 for condition 4 and 0.247 for condition 5. Probability of influenza given the symptom of headache obtained value 0.243 for condition 1, 0.241 for condition 2, 0.189 for condition 3, 0.247 for condition 4 and 0.297 for condition 5. Probability of gastroenteritis given the symptom of headache obtained value 0.240 for condition 1, 0.284 for condition 2, 0.232 for condition 3, 0.201 for condition 4 and 0.205 for condition 5.

disease diagnosis obtained value 16.6% for probability of malaria given the symptom of malaise, 32.6% for probability of malaria given the symptom of fever and 25.6% for probability of malaria given the symptom of headache. Condition 5 of malaria disease diagnosis obtained value 30% for probability of malaria given the symptom of malaise, 20.4% for probability of malaria given the symptom of fever and 24.7% for probability of malaria given the symptom of

A Bayesian Hau-Kashyap Approach for Hepatitis Disease Detection

http://dx.doi.org/10.5772/intechopen.74638

119

Figure 3. Graphic of probability of disease given the symptom of headache.

Figure 5 shows overall influenza disease diagnosis. Condition 1 of influenza disease diagnosis obtained value 9.8% for probability of influenza given the symptom of malaise, 28% for probability of influenza given the symptom of fever and 24.3% for probability of influenza given the symptom of headache. Condition 2 of influenza disease diagnosis obtained value 11% for probability of influenza given the symptom of malaise, 30% for probability of influenza given the symptom of fever and 24.1% for probability of influenza given the symptom of headache. Condition 3 of influenza disease diagnosis obtained value 12.8% for probability of influenza given the symptom of malaise, 23.6% for probability of influenza given the symptom

headache.

Figure 4. Malaria disease diagnosis.

Figure 4 shows overall malaria disease diagnosis. Condition 1 of malaria disease diagnosis obtained value 35% for probability of malaria given the symptom of malaise, 28.8% for probability of malaria given the symptom of fever and 19.9% for probability of malaria given the symptom of headache. Condition 2 of malaria disease diagnosis obtained value 32.3% for probability of malaria given the symptom of malaise, 27.5% for probability of malaria given the symptom of fever and 24.5% for probability of malaria given the symptom of headache. Condition 3 of malaria disease diagnosis obtained value 35.7% for probability of malaria given the symptom of malaise, 28.3% for probability of malaria given the symptom of fever and 28.4% for probability of malaria given the symptom of headache. Condition 4 of malaria

Figure 3. Graphic of probability of disease given the symptom of headache.

Figure 4. Malaria disease diagnosis.

Figure 3 shows graphic of probability of disease given the symptom of headache. Probability of hepatitis given the symptom of headache obtained value 0.318 for condition 1, 0.230 for condition 2, 0.295 for condition 3, 0.296 for condition 4 and 0.251 for condition 5. Probability of malaria given the symptom of headache obtained value 0.199 for condition 1, 0.245 for condition 2, 0.284 for condition 3, 0.256 for condition 4 and 0.247 for condition 5. Probability of influenza given the symptom of headache obtained value 0.243 for condition 1, 0.241 for condition 2, 0.189 for condition 3, 0.247 for condition 4 and 0.297 for condition 5. Probability of gastroenteritis given the symptom of headache obtained value 0.240 for condition 1, 0.284 for

Hepatitis ∣ Headache 0.318 0.230 0.295 0.296 0.251 Malaria ∣ Headache 0.199 0.245 0.284 0.256 0.247 Influenza ∣ Headache 0.243 0.241 0.189 0.247 0.297 Gastroenteritis ∣ Headache 0.240 0.284 0.232 0.201 0.205

Condition 1 Condition 2 Condition 3 Condition 4 Condition 5

Figure 4 shows overall malaria disease diagnosis. Condition 1 of malaria disease diagnosis obtained value 35% for probability of malaria given the symptom of malaise, 28.8% for probability of malaria given the symptom of fever and 19.9% for probability of malaria given the symptom of headache. Condition 2 of malaria disease diagnosis obtained value 32.3% for probability of malaria given the symptom of malaise, 27.5% for probability of malaria given the symptom of fever and 24.5% for probability of malaria given the symptom of headache. Condition 3 of malaria disease diagnosis obtained value 35.7% for probability of malaria given the symptom of malaise, 28.3% for probability of malaria given the symptom of fever and 28.4% for probability of malaria given the symptom of headache. Condition 4 of malaria

condition 2, 0.232 for condition 3, 0.201 for condition 4 and 0.205 for condition 5.

Action Condition

118 New Insights into Bayesian Inference

Figure 2. Graphic of probability of disease given the symptom of fever.

Table 6. Hepatitis ∣ headache.

disease diagnosis obtained value 16.6% for probability of malaria given the symptom of malaise, 32.6% for probability of malaria given the symptom of fever and 25.6% for probability of malaria given the symptom of headache. Condition 5 of malaria disease diagnosis obtained value 30% for probability of malaria given the symptom of malaise, 20.4% for probability of malaria given the symptom of fever and 24.7% for probability of malaria given the symptom of headache.

Figure 5 shows overall influenza disease diagnosis. Condition 1 of influenza disease diagnosis obtained value 9.8% for probability of influenza given the symptom of malaise, 28% for probability of influenza given the symptom of fever and 24.3% for probability of influenza given the symptom of headache. Condition 2 of influenza disease diagnosis obtained value 11% for probability of influenza given the symptom of malaise, 30% for probability of influenza given the symptom of fever and 24.1% for probability of influenza given the symptom of headache. Condition 3 of influenza disease diagnosis obtained value 12.8% for probability of influenza given the symptom of malaise, 23.6% for probability of influenza given the symptom

and 28.4% for probability of gastroenteritis given the symptom of headache. Condition 3 of gastroenteritis disease diagnosis obtained value 24.8% for probability of gastroenteritis given the symptom of malaise, 12.1% for probability of gastroenteritis given the symptom of fever and 23.2% for probability of gastroenteritis given the symptom of headache. Condition 4 of gastroenteritis disease diagnosis obtained value 36.9% for probability of gastroenteritis given the symptom of malaise, 15.2% for probability of gastroenteritis given the symptom of fever and 20.1% for probability of gastroenteritis given the symptom of headache. Condition 5 of gastroenteritis disease diagnosis obtained value 35.4% for probability of gastroenteritis given the symptom of malaise, 15.7% for probability of gastroenteritis given the symptom of fever and 20.5% for probability of gastroenteritis given

A Bayesian Hau-Kashyap Approach for Hepatitis Disease Detection

http://dx.doi.org/10.5772/intechopen.74638

121

Figure 7 shows overall hepatitis diagnosis. Condition 1 of hepatitis diagnosis obtained value 37.5% for probability of hepatitis given the symptom of malaise, 28.8% for probability of hepatitis given the symptom of fever and 31.8% for probability of hepatitis given the symptom of headache. Condition 2 of hepatitis diagnosis obtained value 31% for probability of hepatitis given the symptom of malaise, 27% for probability of hepatitis given the symptom of fever and 23% for probability of hepatitis given the symptom of headache. Condition 3 of hepatitis diagnosis obtained value 26.7% for probability of hepatitis given the symptom of malaise, 36% for probability of hepatitis given the symptom of fever and 29.5% for probability of hepatitis given the symptom of headache. Condition 4 of hepatitis diagnosis obtained value 31.7% for probability of hepatitis given the symptom of malaise, 26.1% for probability of hepatitis given the symptom of fever and 29.6% for probability of hepatitis given the symptom of headache. Condition 5 of hepatitis diagnosis obtained value 23.6% for probability of hepatitis given the symptom of malaise, 35.1% for probability of hepatitis given the symptom of fever and 25.1% for probability of hepatitis given the

the symptom of headache.

symptom of headache.

Figure 7. Overall hepatitis disease detection.

Figure 5. Influenza disease diagnosis.

of fever and 18.9% for probability of influenza given the symptom of headache. Condition 4 of influenza disease diagnosis obtained value 14.8% for probability of influenza given the symptom of malaise, 26.1% for probability of influenza given the symptom of fever and 24.7% for probability of influenza given the symptom of headache. Condition 5 of influenza disease diagnosis obtained value 11% for probability of influenza given the symptom of malaise, 28.8% for probability of influenza given the symptom of fever and 29.7% for probability of influenza given the symptom of headache.

Figure 6 shows overall gastroenteritis disease diagnosis. Condition 1 of gastroenteritis disease diagnosis obtained value 17.7% for probability of gastroenteritis given the symptom of malaise, 14.4% for probability of gastroenteritis given the symptom of fever and 24% for probability of gastroenteritis given the symptom of headache. Condition 2 of gastroenteritis disease diagnosis obtained value 25.7% for probability of gastroenteritis given the symptom of malaise, 15.5% for probability of gastroenteritis given the symptom of fever

Figure 6. Gastroenteritis disease diagnosis.

and 28.4% for probability of gastroenteritis given the symptom of headache. Condition 3 of gastroenteritis disease diagnosis obtained value 24.8% for probability of gastroenteritis given the symptom of malaise, 12.1% for probability of gastroenteritis given the symptom of fever and 23.2% for probability of gastroenteritis given the symptom of headache. Condition 4 of gastroenteritis disease diagnosis obtained value 36.9% for probability of gastroenteritis given the symptom of malaise, 15.2% for probability of gastroenteritis given the symptom of fever and 20.1% for probability of gastroenteritis given the symptom of headache. Condition 5 of gastroenteritis disease diagnosis obtained value 35.4% for probability of gastroenteritis given the symptom of malaise, 15.7% for probability of gastroenteritis given the symptom of fever and 20.5% for probability of gastroenteritis given the symptom of headache.

Figure 7 shows overall hepatitis diagnosis. Condition 1 of hepatitis diagnosis obtained value 37.5% for probability of hepatitis given the symptom of malaise, 28.8% for probability of hepatitis given the symptom of fever and 31.8% for probability of hepatitis given the symptom of headache. Condition 2 of hepatitis diagnosis obtained value 31% for probability of hepatitis given the symptom of malaise, 27% for probability of hepatitis given the symptom of fever and 23% for probability of hepatitis given the symptom of headache. Condition 3 of hepatitis diagnosis obtained value 26.7% for probability of hepatitis given the symptom of malaise, 36% for probability of hepatitis given the symptom of fever and 29.5% for probability of hepatitis given the symptom of headache. Condition 4 of hepatitis diagnosis obtained value 31.7% for probability of hepatitis given the symptom of malaise, 26.1% for probability of hepatitis given the symptom of fever and 29.6% for probability of hepatitis given the symptom of headache. Condition 5 of hepatitis diagnosis obtained value 23.6% for probability of hepatitis given the symptom of malaise, 35.1% for probability of hepatitis given the symptom of fever and 25.1% for probability of hepatitis given the symptom of headache.

Figure 7. Overall hepatitis disease detection.

of fever and 18.9% for probability of influenza given the symptom of headache. Condition 4 of influenza disease diagnosis obtained value 14.8% for probability of influenza given the symptom of malaise, 26.1% for probability of influenza given the symptom of fever and 24.7% for probability of influenza given the symptom of headache. Condition 5 of influenza disease diagnosis obtained value 11% for probability of influenza given the symptom of malaise, 28.8% for probability of influenza given the symptom of fever and 29.7% for probability of

Figure 6 shows overall gastroenteritis disease diagnosis. Condition 1 of gastroenteritis disease diagnosis obtained value 17.7% for probability of gastroenteritis given the symptom of malaise, 14.4% for probability of gastroenteritis given the symptom of fever and 24% for probability of gastroenteritis given the symptom of headache. Condition 2 of gastroenteritis disease diagnosis obtained value 25.7% for probability of gastroenteritis given the symptom of malaise, 15.5% for probability of gastroenteritis given the symptom of fever

influenza given the symptom of headache.

Figure 6. Gastroenteritis disease diagnosis.

Figure 5. Influenza disease diagnosis.

120 New Insights into Bayesian Inference

#### 5. A Bayesian Hau-Kashyap approach for hepatitis disease detection

#### 5.1. Probability of hepatitis given the symptom of malaise

1. There is about 37.5% chance that the probability of hepatitis given the symptom of malaise

From Table 8, we get:

their individual belief. From Table 9, we get:

m5f g¼ M; H; I 0:013, m5f g¼ M; H 0:118, m5f g¼ H; I 0:024, m5f g¼ H 0:220,

A Bayesian Hau-Kashyap Approach for Hepatitis Disease Detection

http://dx.doi.org/10.5772/intechopen.74638

123

4. There is about 17.7% chance that the probability of gastroenteritis given the symptom of malaise

m6f g G ¼ 0:177, m6f gθ ¼ 1 � 0:177 ¼ 0:823

The calculation of the combined m<sup>5</sup> and m<sup>6</sup> is shown in Table 9. Each cell of the table contains the intersection of the corresponding propositions from m<sup>5</sup> and m<sup>6</sup> along with the product of

m7f g M; H; I; G ¼ 0:02, m7f g M; H; I ¼ 0:01, m7f g M; H; G ¼ 0:021, m7f g M; H ¼ 0:097,

1. There is about 28.8% chance that the probability of hepatitis given the symptom of fever

2. There is about 28.8% chance that the probability of malaria given the symptom of fever

{M,H,I} 0.013 {M,H,I,G} 0.02 {M,H,I} 0.01 {M,H} 0.118 {M,H,G} 0.021 {M,H} 0.097 {H,I} 0.024 {H,I,G} 0.004 {H,I} 0.02 {H} 0.220 {H,G} 0.039 {H} 0.181 {M,I} 0.021 {M,I,G} 0.004 {M,I} 0.017 {M} 0.197 {M,G} 0.035 {M} 0.102 {I} 0.04 {I,G} 0.007 {I} 0.033 θ 0.366 {G} 0.06 θ 0.301

Table 9. The third combination of probability of hepatitis given the symptom of malaise.

m1f g H ¼ 0:288, m1f gθ ¼ 1 � 0:288 ¼ 0:712

m2f g M ¼ 0:288, m2f gθ ¼ 1 � 0:288 ¼ 0:712

{G} 0.177 θ 0.823

m7f g H; I; G ¼ 0:004, m7f g H; I ¼ 0:02, m7f g H; G ¼ 0:039, m7f g H ¼ 0:181, m7f g¼ M; I; G 0:004, m7f g¼ M; I 0:017, m7f g¼ M; G 0:035, m7f g¼ M 0:102,

m7f g I; G ¼ 0:007, m7f gI ¼ 0:033, m7f g G ¼ 0:06, m<sup>7</sup> f gθ ¼ 0:301:

5.2. Probability of hepatitis given the symptom of fever

m5f g M; I ¼ 0:021, m5f g M ¼ 0:197, m5f gI ¼ 0:04, m<sup>5</sup> f gθ ¼ 0:366:

$$m\_1 \{ H \} = 0.375, m\_1 \{ \theta \} = 1 - 0.375 = 0.625$$

2. There is about 35% chance that the probability of malaria given the symptom of malaise

$$m\_2\{M\} = 0.35, m\_2\{\theta\} = 1 - 0.35 = 0.65$$

The calculation of the combined m<sup>1</sup> and m<sup>2</sup> is shown in Table 7. Each cell of the table contains the intersection of the corresponding propositions from m<sup>1</sup> and m<sup>2</sup> along with the product of their individual belief.

From Table 7, we get:

$$\mathfrak{m}\_3\{M,H\} = 0.131, \mathfrak{m}\_3\{H\} = 0.244, \mathfrak{m}\_3\{M\} = 0.219, \mathfrak{m}\_3\{\theta\} = 0.406.$$

3. There is about 9.8% chance that the probability of influenza given the symptom of malaise

$$m\_4\{I\} = 0.098, m\_4\{\theta\} = 1 - 0.098 = 0.902$$

The calculation of the combined m<sup>3</sup> and m<sup>4</sup> is shown in Table 8. Each cell of the table contains the intersection of the corresponding propositions from m<sup>3</sup> and m<sup>4</sup> along with the product of their individual belief.


Table 7. The first combination of probability of hepatitis given the symptom of malaise.


Table 8. The second combination of probability of hepatitis given the symptom of malaise.

From Table 8, we get:

5. A Bayesian Hau-Kashyap approach for hepatitis disease detection

1. There is about 37.5% chance that the probability of hepatitis given the symptom of malaise

m1f g¼ H 0:375, m1f g¼ θ 1 � 0:375 ¼ 0:625

m2f g M ¼ 0:35, m2f gθ ¼ 1 � 0:35 ¼ 0:65

The calculation of the combined m<sup>1</sup> and m<sup>2</sup> is shown in Table 7. Each cell of the table contains the intersection of the corresponding propositions from m<sup>1</sup> and m<sup>2</sup> along with the product of

m3f g M; H ¼ 0:131, m3f g H ¼ 0:244, m3f g M ¼ 0:219, m<sup>3</sup> f gθ ¼ 0:406:

3. There is about 9.8% chance that the probability of influenza given the symptom of malaise

m4f gI ¼ 0:098, m4f gθ ¼ 1 � 0:098 ¼ 0:902

The calculation of the combined m<sup>3</sup> and m<sup>4</sup> is shown in Table 8. Each cell of the table contains the intersection of the corresponding propositions from m<sup>3</sup> and m<sup>4</sup> along with the product of

{H} 0.375 {M,H} 0.131 {H} 0.244 θ 0.625 {M} 0.219 θ 0.406

{M,H} 0.131 {M,H,I} 0.013 {M,H} 0.118 {H} 0.244 {H,I} 0.024 {H} 0.220 {M} 0.219 {M,I} 0.021 {M} 0.197 θ 0.406 {I} 0.04 θ 0.366

Table 7. The first combination of probability of hepatitis given the symptom of malaise.

Table 8. The second combination of probability of hepatitis given the symptom of malaise.

{M} 0.35 θ 0.65

{I} 0.098 θ 0.902

2. There is about 35% chance that the probability of malaria given the symptom of malaise

5.1. Probability of hepatitis given the symptom of malaise

their individual belief. From Table 7, we get:

122 New Insights into Bayesian Inference

their individual belief.

$$\begin{aligned} m\_5\{M, H, I\} &= 0.013, m\_5\{M, H\} = 0.118, m\_5\{H, I\} = 0.024, m\_5\{H\} = 0.220, \\ m\_5\{M, I\} &= 0.021, m\_5\{M\} = 0.197, m\_5\{I\} = 0.04, m\_5\left\{\theta\right\} = 0.366. \end{aligned}$$

4. There is about 17.7% chance that the probability of gastroenteritis given the symptom of malaise

$$m\_6\{G\} = 0.177, m\_6\{\theta\} = 1 - 0.177 = 0.823$$

The calculation of the combined m<sup>5</sup> and m<sup>6</sup> is shown in Table 9. Each cell of the table contains the intersection of the corresponding propositions from m<sup>5</sup> and m<sup>6</sup> along with the product of their individual belief.

From Table 9, we get:

$$\begin{aligned} m\_7\{M, H, I, G\} &= 0.02, m\_7\{M, H, I\} = 0.01, m\_7\{M, H, G\} = 0.021, m\_7\{M, H\} = 0.097, \\ m\_7\{H, I, G\} &= 0.004, m\_7\{H, I\} = 0.02, m\_7\{H, G\} = 0.039, m\_7\{H\} = 0.181, \\ m\_7\{M, I, G\} &= 0.004, m\_7\{M, I\} = 0.017, m\_7\{M, G\} = 0.035, m\_7\{M\} = 0.102, \\ m\_7\{I, G\} &= 0.007, m\_7\{I\} = 0.033, m\_7\{G\} = 0.06, m\_7\{\theta\} = 0.301. \end{aligned}$$

#### 5.2. Probability of hepatitis given the symptom of fever

1. There is about 28.8% chance that the probability of hepatitis given the symptom of fever

$$m\_1 \{ H \} = 0.288, m\_1 \{ \theta \} = 1 - 0.288 = 0.712$$

2. There is about 28.8% chance that the probability of malaria given the symptom of fever

 $m\_2\{M\} = 0.288$ ,  $m\_2\{\theta\} = 1 - 0.288 = 0.712$ .


Table 9. The third combination of probability of hepatitis given the symptom of malaise.

The calculation of the combined m<sup>1</sup> and m<sup>2</sup> is shown in Table 10. Each cell of the table contains the intersection of the corresponding propositions from m<sup>1</sup> and m<sup>2</sup> along with the product of their individual belief.

From Table 10 we get:

$$m\_3\{M, H\} = 0.083, m\_3\{H\} = 0.205, m\_3\{M\} = 0.205, m\_3\left\{\theta\right\} = 0.507.$$

3. There is about 28% chance that the probability of influenza given the symptom of fever

 $m\_4\{I\} = 0.28$ ,  $m\_4\{\theta\} = 1 - 0.28 = 0.72$ .

The calculation of the combined m<sup>3</sup> and m<sup>4</sup> is shown in Table 11. Each cell of the table contains the intersection of the corresponding propositions from m<sup>3</sup> and m<sup>4</sup> along with the product of their individual belief.

From Table 11, we get:

$$\begin{aligned} m\_5\{M, H, I\} &= 0.023, m\_5\{M, H\} = 0.06, m\_5\{H, I\} = 0.057, \\ m\_5\{H\} &= 0.148, m\_5\{M, I\} = 0.057, m\_5\{M\} = 0.148, m\_5\{I\} = 0.142, m\_5\{\theta\} = 0.365. \end{aligned}$$

From Table 12, we get:

ache

ache

their individual belief.

m7f g M; H; I; G ¼ 0:003, m7f g M; H; I ¼ 0:02, m7f g M; H; G ¼ 0:009, m7f g M; H ¼ 0:05,

{M,H,I} 0.023 {M,H,I,G} 0.003 {M,H,I} 0.02 {M,H} 0.06 {M,H,G} 0.009 {M,H} 0.05 {H,I} 0.057 {H,I,G} 0.008 {H,I} 0.049 {H} 0.148 {H,G} 0.02 {H} 0.127 {M,I} 0.057 {M,I,G} 0.008 {M,I} 0.049 {M} 0.148 {M,G} 0.02 {M} 0.127 {I} 0.142 {I,G} 0.02 {I} 0.121 θ 0.365 {G} 0.052 θ 0.312

{G} 0.144 θ 0.856

A Bayesian Hau-Kashyap Approach for Hepatitis Disease Detection

http://dx.doi.org/10.5772/intechopen.74638

125

1. There is about 31.8% chance that the probability of hepatitis given the symptom of head-

m1f g¼ H 0:318, m1f g¼ θ 1 � 0:318 ¼ 0:682

2. There is about 19.9% chance that the probability of malaria given the symptom of head-

m2f g M ¼ 0:199, m2f gθ ¼ 1 � 0:199 ¼ 0:801

The calculation of the combined m<sup>1</sup> and m<sup>2</sup> is shown in Table 13. Each cell of the table contains the intersection of the corresponding propositions from m<sup>1</sup> and m<sup>2</sup> along with the product of

{H} 0.318 {M,H} 0.063 {H} 0.255 θ 0.682 {M} 0.136 θ 0.546

Table 13. The first combination of probability of hepatitis given the symptom of headache.

{M} 0.199 θ 0.801

m7f g H; I; G ¼ 0:008, m7f g H; I ¼ 0:049, m7f g H; G ¼ 0:02, m7f g H ¼ 0:127, m7f g M; I; G ¼ 0:008, m7f g M; I ¼ 0:049, m7f g M; G ¼ 0:02, m7f g M ¼ 0:127,

m7f g I; G ¼ 0:02, m7f gI ¼ 0:121, m7f g G ¼ 0:052, m<sup>7</sup> f gθ ¼ 0:312:

Table 12. The third combination of probability of hepatitis given the symptom of fever.

5.3. Probability of hepatitis given the symptom of headache

4. There is about 14.4% chance that the probability of gastroenteritis given the symptom of fever

$$m\_6\{G\} = 0.144, m\_6\{\theta\} = 1 - 0.144 = 0.856$$

The calculation of the combined m<sup>5</sup> and m<sup>6</sup> is shown in Table 12. Each cell of the table contains the intersection of the corresponding propositions from m<sup>5</sup> and m<sup>6</sup> along with the product of their individual belief.


Table 10. The first combination of probability of hepatitis given the symptom of fever.


Table 11. The second combination of probability of hepatitis given the symptom of fever.


Table 12. The third combination of probability of hepatitis given the symptom of fever.

#### From Table 12, we get:

The calculation of the combined m<sup>1</sup> and m<sup>2</sup> is shown in Table 10. Each cell of the table contains the intersection of the corresponding propositions from m<sup>1</sup> and m<sup>2</sup> along with the product of

m3f g¼ M; H 0:083, m3f g¼ H 0:205, m3f g¼ M 0:205, m<sup>3</sup> f g¼ θ 0:507:

m4f gI ¼ 0:28, m4f gθ ¼ 1 � 0:28 ¼ 0:72

The calculation of the combined m<sup>3</sup> and m<sup>4</sup> is shown in Table 11. Each cell of the table contains the intersection of the corresponding propositions from m<sup>3</sup> and m<sup>4</sup> along with the product of

m5f g M; H; I ¼ 0:023, m5f g M; H ¼ 0:06, m5f g H; I ¼ 0:057, m5f g H ¼ 0:148, m5f g M; I ¼ 0:057, m5f g M ¼ 0:148, m5f gI ¼ 0:142, m<sup>5</sup> f gθ ¼ 0:365:

4. There is about 14.4% chance that the probability of gastroenteritis given the symptom of

m6f g G ¼ 0:144, m6f gθ ¼ 1 � 0:144 ¼ 0:856

The calculation of the combined m<sup>5</sup> and m<sup>6</sup> is shown in Table 12. Each cell of the table contains the intersection of the corresponding propositions from m<sup>5</sup> and m<sup>6</sup> along with the product of

{H} 0.288 {M,H} 0.083 {H} 0.205 θ 0.712 {M} 0.205 θ 0.507

{M,H} 0.083 {M,H,I} 0.023 {M,H} 0.06 {H} 0.205 {H,I} 0.057 {H} 0.148 {M} 0.205 {M,I} 0.057 {M} 0.148 θ 0.507 {I} 0.142 θ 0.365

Table 10. The first combination of probability of hepatitis given the symptom of fever.

Table 11. The second combination of probability of hepatitis given the symptom of fever.

{M} 0.288 θ 0.712

{I} 0.28 θ 0.72

3. There is about 28% chance that the probability of influenza given the symptom of fever

their individual belief. From Table 10 we get:

124 New Insights into Bayesian Inference

their individual belief. From Table 11, we get:

their individual belief.

fever

$$\begin{aligned} m\_7\{M, H, I, G\} &= 0.003, m\_7\{M, H, I\} = 0.02, m\_7\{M, H, G\} = 0.009, m\_7\{M, H\} = 0.05, \\ m\_7\{H, I, G\} &= 0.008, m\_7\{H, I\} = 0.049, m\_7\{H, G\} = 0.02, m\_7\{H\} = 0.127, \\ m\_7\{M, I, G\} &= 0.008, m\_7\{M, I\} = 0.049, m\_7\{M, G\} = 0.02, m\_7\{M\} = 0.127, \\ m\_7\{I, G\} &= 0.02, m\_7\{I\} = 0.121, m\_7\{G\} = 0.052, m\_7\{\theta\} = 0.312. \end{aligned}$$

#### 5.3. Probability of hepatitis given the symptom of headache

1. There is about 31.8% chance that the probability of hepatitis given the symptom of headache

$$m\_1 \{ H \} = 0.318, m\_1 \{ \theta \} = 1 - 0.318 = 0.682$$

2. There is about 19.9% chance that the probability of malaria given the symptom of headache

 $\mathcal{m}\_2\{M\} = 0.199$   $\mathcal{m}\_2\{\theta\} = 1 - 0.199 = 0.801$ 

The calculation of the combined m<sup>1</sup> and m<sup>2</sup> is shown in Table 13. Each cell of the table contains the intersection of the corresponding propositions from m<sup>1</sup> and m<sup>2</sup> along with the product of their individual belief.


Table 13. The first combination of probability of hepatitis given the symptom of headache.

From Table 13, we get:

$$\{m\_3\{M,H\}=0.063, m\_3\{H\}=0.255, m\_3\{M\}=0.136, m\_3\{\theta\}=0.546.\}$$

3. There is about 24.3% chance that the probability of influenza given the symptom of headache

$$m\_4\{I\} = 0.243, m\_4\{\theta\} = 1 - 0.243 = 0.757$$

The calculation of the combined m<sup>3</sup> and m<sup>4</sup> is shown in Table 14. Each cell of the table contains the intersection of the corresponding propositions from m<sup>3</sup> and m<sup>4</sup> along with the product of their individual belief.

6. Results and discussions

condition 5.

condition 5.

Figure 8 shows probability of hepatitis given the symptom of malaise using the Bayesian Hau-Kashyap approach. Probability of hepatitis given the symptom of malaise obtained value 0.181 for condition 1, 0.139 for condition 2, 0.113 for condition 3, 0.142 for condition 4, 0.095 for

{M,I} 0.033 {M,I,G} 0.008 {M,I} 0.025 {M} 0.103 {M,G} 0.025 {M} 0.078 {I} 0.133 {I,G} 0.032 {I} 0.101 θ 0.413 {G} 0.099 θ 0.314

Table 15. The third combination of probability of hepatitis given the symptom of headache.

{G} 0.24 θ 0.76

A Bayesian Hau-Kashyap Approach for Hepatitis Disease Detection

http://dx.doi.org/10.5772/intechopen.74638

127

Figure 9 shows probability of hepatitis given the symptom of fever using the Bayesian Hau-Kashyap approach. Probability of hepatitis given the symptom of fever obtained value 0.127 for condition 1, 0.116 for condition 2, 0.173 for condition 3, 0.110 for condition 4, 0.168 for

Figure 8. Probability of hepatitis given the symptom of malaise using the Bayesian Hau-Kashyap approach.

Figure 9. Probability of hepatitis given the symptom of fever using the Bayesian Hau-Kashyap approach.

From Table 14, we get:

$$m\_5\{M, H, I\} = 0.015, m\_5\{M, H\} = 0.047, m\_5\{H, I\} = 0.062,$$

$$m\_5\{H\} = 0.193, m\_5\{M, I\} = 0.033, m\_5\{M\} = 0.103, m\_5\{I\} = 0.133, m\_5\{\theta\} = 0.413.$$

4. There is about 24% chance that the probability of gastroenteritis given the symptom of headache

$$m\_6\{G\} = 0.24, m\_6\{\theta\} = 1 - 0.24 = 0.764$$

The calculation of the combined m<sup>5</sup> and m<sup>6</sup> is shown in Table 15. Each cell of the table contains the intersection of the corresponding propositions from m<sup>5</sup> and m<sup>6</sup> along with the product of their individual belief.

From Table 15, we get:

m7f g M; H; I; G ¼ 0:004, m7f g M; H; I ¼ 0:011, m7f g M; H; G ¼ 0:011, m7f g M; H ¼ 0:036, m7f g H; I; G ¼ 0:015, m7f g H; I ¼ 0:047, m7f g H; G ¼ 0:046, m7f g H ¼ 0:147, m7f g M; I; G ¼ 0:008, m7f g M; I ¼ 0:025, m7f g M; G ¼ 0:025, m7f g M ¼ 0:078, m7f g I; G ¼ 0:032, m7f gI ¼ 0:101, m7f g G ¼ 0:099, m<sup>7</sup> f gθ ¼ 0:314:


Table 14. The second combination of probability of hepatitis given the symptom of headache.



Table 15. The third combination of probability of hepatitis given the symptom of headache.

#### 6. Results and discussions

From Table 13, we get:

126 New Insights into Bayesian Inference

their individual belief. From Table 14, we get:

their individual belief. From Table 15, we get:

m3f g¼ M; H 0:063, m3f g¼ H 0:255, m3f g¼ M 0:136, m<sup>3</sup> f g¼ θ 0:546:

3. There is about 24.3% chance that the probability of influenza given the symptom of headache

m4f gI ¼ 0:243, m4f gθ ¼ 1 � 0:243 ¼ 0:757

The calculation of the combined m<sup>3</sup> and m<sup>4</sup> is shown in Table 14. Each cell of the table contains the intersection of the corresponding propositions from m<sup>3</sup> and m<sup>4</sup> along with the product of

m5f g M; H; I ¼ 0:015, m5f g M; H ¼ 0:047, m5f g H; I ¼ 0:062, m5f g H ¼ 0:193, m5f g M; I ¼ 0:033, m5f g M ¼ 0:103, m5f gI ¼ 0:133, m<sup>5</sup> f gθ ¼ 0:413:

4. There is about 24% chance that the probability of gastroenteritis given the symptom of headache

m6f g G ¼ 0:24, m6f gθ ¼ 1 � 0:24 ¼ 0:76

The calculation of the combined m<sup>5</sup> and m<sup>6</sup> is shown in Table 15. Each cell of the table contains the intersection of the corresponding propositions from m<sup>5</sup> and m<sup>6</sup> along with the product of

m7f g M; H; I; G ¼ 0:004, m7f g M; H; I ¼ 0:011, m7f g M; H; G ¼ 0:011, m7f g M; H ¼ 0:036,

{I} 0.243 θ 0.757

{G} 0.24 θ 0.76

m7f g H; I; G ¼ 0:015, m7f g H; I ¼ 0:047, m7f g H; G ¼ 0:046, m7f g H ¼ 0:147, m7f g M; I; G ¼ 0:008, m7f g M; I ¼ 0:025, m7f g M; G ¼ 0:025, m7f g M ¼ 0:078,

{M,H} 0.063 {M,H,I} 0.015 {M,H} 0.047 {H} 0.255 {H,I} 0.062 {H} 0.193 {M} 0.136 {M,I} 0.033 {M} 0.103 θ 0.546 {I} 0.133 θ 0.413

{M,H,I} 0.015 {M,H,I,G} 0.004 {M,H,I} 0.011 {M,H} 0.047 {M,H,G} 0.011 {M,H} 0.036 {H,I} 0.062 {H,I,G} 0.015 {H,I} 0.047 {H} 0.193 {H,G} 0.046 {H} 0.147

m7f g I; G ¼ 0:032, m7f gI ¼ 0:101, m7f g G ¼ 0:099, m<sup>7</sup> f gθ ¼ 0:314:

Table 14. The second combination of probability of hepatitis given the symptom of headache.

Figure 8 shows probability of hepatitis given the symptom of malaise using the Bayesian Hau-Kashyap approach. Probability of hepatitis given the symptom of malaise obtained value 0.181 for condition 1, 0.139 for condition 2, 0.113 for condition 3, 0.142 for condition 4, 0.095 for condition 5.

Figure 9 shows probability of hepatitis given the symptom of fever using the Bayesian Hau-Kashyap approach. Probability of hepatitis given the symptom of fever obtained value 0.127 for condition 1, 0.116 for condition 2, 0.173 for condition 3, 0.110 for condition 4, 0.168 for condition 5.

Figure 8. Probability of hepatitis given the symptom of malaise using the Bayesian Hau-Kashyap approach.

Figure 9. Probability of hepatitis given the symptom of fever using the Bayesian Hau-Kashyap approach.

Figure 10 shows probability of hepatitis given the symptom of headache using the Bayesian Hau-Kashyap approach. Probability of hepatitis given the symptom of headache obtained value 0.147 for condition 1, 0.094 for condition 2, 0.131 for condition 3, 0.133 for condition 4, 0.106 for condition 5.

The most highest probability of hepatitis given the presence of disease in this work which include condition 1 of hepatitis diagnosis obtained value 37.5% for probability of hepatitis given the presence of malaise, condition 2 of hepatitis diagnosis obtained value 31% for probability of hepatitis given the presence of malaise, condition 3 of hepatitis diagnosis obtained value 36% for probability of hepatitis given the presence of fever, condition 4 of hepatitis diagnosis obtained value 31.7% for probability of hepatitis given the presence of malaise, condition 5 of hepatitis diagnosis obtained value 35.1% for probability of hepatitis given the presence of fever. Using the Bayesian Hau Kashyap approach, the most highest probability of hepatitis given the presence of malaise obtained value 14.2% in condition 4, probability of hepatitis given the presence of fever obtained value 17.3% in condition 3 and probability of hepatitis given the presence of headache obtained value 14.7% in condition 1. A numerical example was illustrated that the Bayesian Hau-

A Bayesian Hau-Kashyap Approach for Hepatitis Disease Detection

http://dx.doi.org/10.5772/intechopen.74638

129

This work is supported by Institute of Informatics and Computing Energy, Universiti Tenaga Nasional, Malaysia. Reference: Geran Penyelidikan Dalaman J510050730. We gratefully appre-

1 Institute of Informatics and Computing Energy, Universiti Tenaga Nasional, Malaysia

[1] Karthikeyan T. Analysis of classification algorithms applied to hepatitis patients. International Journal of Computer Applications. 2013;62(15):2530. DOI: 10.5120/10157-5032 [2] Rajeswari P. Analysis of liver disorder using data mining algorithm. Global Journal of

[3] Bascil MS, Temurtas F. A study on hepatitis disease diagnosis using multilayer neural network with Levenberg Marquardt training algorithm. Journal of Medical Systems. 2011;

, Marini Othman<sup>1</sup>

, Alicia Y.C. Tang<sup>1</sup> and

\*, Rohmah Zahroh Hidayati2

\*Address all correspondence to: andimaseleno@gmail.com

2 Moyudan Public Health Centre, Yogyakarta, Indonesia

Computer Science and Technology. 2010;10(14):4852

35(3):433436. DOI: 10.1007/s10916-009-9378-2

Kashyap approach was efficient and feasible.

Acknowledgements

ciate this support.

Author details

Andino Maseleno<sup>1</sup>

References

Moamin A. Mahmoud<sup>1</sup>

We compare the Bayesian approach and Bayesian Hau-Kashyap approach, where the comparison results are shown in Table 16. As shown in Table 16, it is obvious that the Bayesian Hau-Kashyap approach has minimum probability, so it can minimize the hepatitis disease level.

Figure 10. Probability of hepatitis given the symptom of headache using the Bayesian Hau-Kashyap approach.


Table 16. Probability of hepatitis comparison between the Bayesian approach and Bayesian Hau-Kashyap approach.

#### 7. Conclusion

The initial symptoms of hepatitis are often similar to other diseases. A Bayesian approach has been proposed and implemented in order to diagnosis hepatitis. The hepatitis is a serious disease, its treatment is expensive and severe side effects can appear very often. Therefore, it is important to set a correct diagnosis and to identify those patients who most probably have hepatitis. That is for what the use of such a system can support the medical doctor decisions. The most highest probability of hepatitis given the presence of disease in this work which include condition 1 of hepatitis diagnosis obtained value 37.5% for probability of hepatitis given the presence of malaise, condition 2 of hepatitis diagnosis obtained value 31% for probability of hepatitis given the presence of malaise, condition 3 of hepatitis diagnosis obtained value 36% for probability of hepatitis given the presence of fever, condition 4 of hepatitis diagnosis obtained value 31.7% for probability of hepatitis given the presence of malaise, condition 5 of hepatitis diagnosis obtained value 35.1% for probability of hepatitis given the presence of fever. Using the Bayesian Hau Kashyap approach, the most highest probability of hepatitis given the presence of malaise obtained value 14.2% in condition 4, probability of hepatitis given the presence of fever obtained value 17.3% in condition 3 and probability of hepatitis given the presence of headache obtained value 14.7% in condition 1. A numerical example was illustrated that the Bayesian Hau-Kashyap approach was efficient and feasible.

#### Acknowledgements

Figure 10 shows probability of hepatitis given the symptom of headache using the Bayesian Hau-Kashyap approach. Probability of hepatitis given the symptom of headache obtained value 0.147 for condition 1, 0.094 for condition 2, 0.131 for condition 3, 0.133 for condition

We compare the Bayesian approach and Bayesian Hau-Kashyap approach, where the comparison results are shown in Table 16. As shown in Table 16, it is obvious that the Bayesian Hau-Kashyap approach has minimum probability, so it can minimize the hepatitis disease

The initial symptoms of hepatitis are often similar to other diseases. A Bayesian approach has been proposed and implemented in order to diagnosis hepatitis. The hepatitis is a serious disease, its treatment is expensive and severe side effects can appear very often. Therefore, it is important to set a correct diagnosis and to identify those patients who most probably have hepatitis. That is for what the use of such a system can support the medical doctor decisions.

Table 16. Probability of hepatitis comparison between the Bayesian approach and Bayesian Hau-Kashyap approach.

Bayesian Malaise 0.375 0.310 0.267 0.317 0.236

Figure 10. Probability of hepatitis given the symptom of headache using the Bayesian Hau-Kashyap approach.

Bayesian Hau-Kashyap Malaise 0.181 0.139 0.113 0.142 0.095

Condition 1 Condition 2 Condition 3 Condition 4 Condition 5

Fever 0.288 0.270 0.36 0.261 0.351 Headache 0.318 0.230 0.295 0.296 0.251

Fever 0.127 0.116 0.173 0.110 0.168 Headache 0.147 0.094 0.131 0.133 0.106

4, 0.106 for condition 5.

128 New Insights into Bayesian Inference

7. Conclusion

Approach Symptom Condition

level.

This work is supported by Institute of Informatics and Computing Energy, Universiti Tenaga Nasional, Malaysia. Reference: Geran Penyelidikan Dalaman J510050730. We gratefully appreciate this support.

#### Author details

Andino Maseleno<sup>1</sup> \*, Rohmah Zahroh Hidayati2 , Marini Othman<sup>1</sup> , Alicia Y.C. Tang<sup>1</sup> and Moamin A. Mahmoud<sup>1</sup>

\*Address all correspondence to: andimaseleno@gmail.com


#### References


[4] Sarwar A, Sharma V. Intelligent Naive Bayes Approach to Diagnose Diabetes Type-2. 2012. Special Issue of International Journal of Computer Applications (0975 8887) on Issues and Challenges in Networking, Intelligence and Computing Technologies ICNICT

[5] Mahesh C, Kannan E, Saravanan MS. Generalized regression neural network based expert system for hepatitis B diagnosis. Journal of Computer Science. 2014;10(4):563-569. DOI:

[6] Saat NZM, Ibrahim K, Jemmain AA. Bayesian methods for ranking the severity of apnea

[7] Sharma A, Paliwal KK. A gene selection algorithm using Bayesian classification approach.

[8] Elsayad A, Fakhr M. Diagnosis of cardiovascular diseases with Bayesian classifiers. Jour-

[9] Neshat M, Yaghobi M. FESHDD: Fuzzy expert system for hepatitis B diseases diagnosis. In: Proceedings of the 5th International Conference on Soft Computing, Computing with

[10] Neshat M, Sargolzaei M, Toosi AN, Masoumi A. Hepatitis disease diagnosis using hybrid case-based reasoning and particle swarm optimization. ISRN Artificial Intelligence. 2012,

[11] Panchal D, Shah S. Artificial intelligence based expert system for hepatitis B diagnosis. International Journal of Modeling and Optimization. 2011;1(4):362-366. DOI: 10.7763/

[12] Bayes T. Price R. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M. A. and F. R. S.

[13] Shafer G. A Mathematical Theory of Evidence. New Jersey: Princeton University Press;

[14] Dempster AP. A generalization of Bayesian inference. Journal of the Royal Statistical

[15] Hau HY, Kashyap RL. Belief combination and propagation in a lattice-structured interference network. IEEE Transactions on Systems, Man, and Cybernetics. 1990;20(1):45-57

Philosophical Transactions of the Royal Society of London. 1763;53(0):370-418

among patients. American Journal of Applied Sciences. 2010;7(2):167-170

American Journal of Applied Sciences. 2012;9(1):127-131

nal of Computer Sciences. 2015;11(2):274-282

Words and Perceptions in System Analysis. 2009

2012;2012:1-6. DOI: 10.5402/2012/609718

IJMO

1976

Society. 1968;B(30):205-247

2012, November 2012, pp. 14-16

10.3844/jcssp.2014.563.569

130 New Insights into Bayesian Inference

## *Edited by Mohammad Saber Fallah Nezhad*

This book is an introduction to the mathematical analysis of Bayesian decision-making when the state of the problem is unknown but further data about it can be obtained. The objective of such analysis is to determine the optimal decision or solution that is logically consistent with the preferences of the decision-maker, that can be analyzed using numerical utilities or criteria with the probabilities assigned to the possible state of the problem, such that these probabilities are updated by gathering new information.

Published in London, UK © 2018 IntechOpen © Patrick Hendry / unsplash

New Insights into Bayesian Inference

New Insights into

Bayesian Inference