**Applications**

**Chapter 6**

**Provisional chapter**

**Use of Artificial Intelligence in Healthcare Delivery**

**Use of Artificial Intelligence in Healthcare Delivery**

DOI: 10.5772/intechopen.74714

In recent years, there has been an amplified focus on the use of artificial intelligence (AI) in various domains to resolve complex issues. Likewise, the adoption of artificial intelligence (AI) in healthcare is growing while radically changing the face of healthcare delivery. AI is being employed in a myriad of settings including hospitals, clinical laboratories, and research facilities. AI approaches employing machines to sense and comprehend data like humans has opened up previously unavailable or unrecognised opportunities for clinical practitioners and health service organisations. Some examples include utilising AI approaches to analyse unstructured data such as photos, videos, physician notes to enable clinical decision making; use of intelligence interfaces to enhance patient engagement and compliance with treatment; and predictive modelling to manage patient flow and hospital capacity/resource allocation. Yet, there is an incomplete understanding of AI and even confusion as to what it is? Also, it is not completely clear what the implications are in using AI generally and in particular for clinicians? This chapter aims to cover these topics and also introduce the reader to the concept of AI, the theories behind AI programming and the various applications of AI in the medical domain.

**Keywords:** artificial intelligence, healthcare delivery, medicine, machine learning, deep

There has been an immense amount of discussion in recent years about the advent of artificial intelligence (AI) and the implication of its application in various domains. However, the concept of AI is not new and can be traced back to Ramon Llull's theory of a reasoning machine in 1300 CE and even Aristotle's syllogisms in 300 BC [1, 2]. However, it is only since the 1950s, clearer definitions and practical applications have been formulated [3, 4]. While

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution,

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

and reproduction in any medium, provided the original work is properly cited.

Sandeep Reddy

Sandeep Reddy

**Abstract**

**1. Introduction**

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

learning, intelligent agent and neural networks

http://dx.doi.org/10.5772/intechopen.74714

#### **Use of Artificial Intelligence in Healthcare Delivery Use of Artificial Intelligence in Healthcare Delivery**

DOI: 10.5772/intechopen.74714

#### Sandeep Reddy Sandeep Reddy

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.74714

#### **Abstract**

In recent years, there has been an amplified focus on the use of artificial intelligence (AI) in various domains to resolve complex issues. Likewise, the adoption of artificial intelligence (AI) in healthcare is growing while radically changing the face of healthcare delivery. AI is being employed in a myriad of settings including hospitals, clinical laboratories, and research facilities. AI approaches employing machines to sense and comprehend data like humans has opened up previously unavailable or unrecognised opportunities for clinical practitioners and health service organisations. Some examples include utilising AI approaches to analyse unstructured data such as photos, videos, physician notes to enable clinical decision making; use of intelligence interfaces to enhance patient engagement and compliance with treatment; and predictive modelling to manage patient flow and hospital capacity/resource allocation. Yet, there is an incomplete understanding of AI and even confusion as to what it is? Also, it is not completely clear what the implications are in using AI generally and in particular for clinicians? This chapter aims to cover these topics and also introduce the reader to the concept of AI, the theories behind AI programming and the various applications of AI in the medical domain.

**Keywords:** artificial intelligence, healthcare delivery, medicine, machine learning, deep learning, intelligent agent and neural networks

#### **1. Introduction**

There has been an immense amount of discussion in recent years about the advent of artificial intelligence (AI) and the implication of its application in various domains. However, the concept of AI is not new and can be traced back to Ramon Llull's theory of a reasoning machine in 1300 CE and even Aristotle's syllogisms in 300 BC [1, 2]. However, it is only since the 1950s, clearer definitions and practical applications have been formulated [3, 4]. While

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

there was a lull in the development of AI in the 70s and 80s because of loss of interest and funding, there has been in the most recent period a dramatic revival in the research and development of AI programs. Countries like China have prioritised AI development by investing billions of dollars into AI industrial hubs [5]. Other nations and global corporations have also invested into AI programming and creation of innovative AI applications [6–8]. Building on this trend, institutions are now increasingly paying attention to application of AI in healthcare. AI is being used to improve the efficiency in delivery of healthcare and address previously intractable health problems [1, 9, 10]. The hundreds of AI-based healthcare applications being introduced into the market in recent years is a testimonial to this focus. Commentators have discussed how application of AI in healthcare is at the early stages and there is yet more to come [1, 4, 6]. However, is AI just hype and are entities investing into a bubble? To get an answer, we first need to understand what AI is and its approaches and tools. This chapter covers these issues and how they specifically apply to healthcare and what is next for the use of AI in healthcare?

## **2. Development and application of AI**

#### **2.1. Definition**

So, what is AI? Because of the complexity involved in developing synthetic intelligence that is comparable to human intelligence, there are varying interpretations of what AI is and what goes into developing AI. Some authors even frown upon the term 'AI' and prefer the term 'Computational Intelligence' [11]. However, if we consider what is the objective of AI and what resources go into achieving the objective, an acceptable definition encompassing these components can be fashioned. The end objective of AI is to create systems that think and act rationally like humans [2, 4, 12]. These systems can also be termed as 'intelligent agents' [2, 4]. If the goal of the system is to demonstrate intelligence and developing these systems requires computer programming, a formal definition of AI would read as '*a field of science concerned with the computational understanding of what is commonly called intelligent behaviour, and with the creation of intelligent agents that exhibit such behaviour*' [13]. Simpler definitions describe AI as 'machines assuming human like capabilities', 'extension of human intelligence through computers' and 'making computers do things which currently humans do' but a more accurate description would be 'the science of making intelligent machines' [1, 2, 4, 14].

that it can learn from and the objectives it needs to achieve. The agent perceives the environment through sensors and acts on the environment through effectors. When an intelligent agent is comprised of a computational core with physical actuators and sensors, it is termed a 'robot' [11]. When an agent is a program acting in a pure computational environment, it is an 'infobot' and when an advice providing program is coupled with a human expert, it is a

Use of Artificial Intelligence in Healthcare Delivery http://dx.doi.org/10.5772/intechopen.74714 83

In the past, researchers aimed for AI to replicate human intelligence [2]. This approach is called 'Classical AI'. However, this was a limiting approach as it assumed human intelligence is the only form of intelligence. This approach also assumes human intelligence is the most intelligence can be. Intelligence mainly comprises of learning and reasoning [3, 13]. Constructing intelligence does not have to be defined by the limitations human intelligence poses. An apt analogy to discuss here is flight. While bird flight may be a source of inspiration for constructing aeroplanes, the aeroplane structure is not replicating the anatomic structure of a bird. So in constructing AI, it is more important to incorporate the vital characteristics of

'decision support system'.

**Figure 1.** Concept of an intelligent agent.

**2.3. What makes up AI?**

intelligence than merely replicate human intelligence.

#### **2.2. Intelligent agent**

AI theory can be best understood through the *intelligent agent* concept [11]. An intelligent agent incorporates the skills required to pass the Turing Test, which assesses whether a machine can think like a human? [2, 3]. So an intelligent agent should be skilled in perception, practical reasoning and have an ability to take action to achieve its goals. The agent utilises the environment, it operates within, to both receive input and take action (**Figure 1**). Some key inputs that feed into an agent and potentially, which it can draw itself are current observations about the environment, prior knowledge about the environment, past experiences

**Figure 1.** Concept of an intelligent agent.

there was a lull in the development of AI in the 70s and 80s because of loss of interest and funding, there has been in the most recent period a dramatic revival in the research and development of AI programs. Countries like China have prioritised AI development by investing billions of dollars into AI industrial hubs [5]. Other nations and global corporations have also invested into AI programming and creation of innovative AI applications [6–8]. Building on this trend, institutions are now increasingly paying attention to application of AI in healthcare. AI is being used to improve the efficiency in delivery of healthcare and address previously intractable health problems [1, 9, 10]. The hundreds of AI-based healthcare applications being introduced into the market in recent years is a testimonial to this focus. Commentators have discussed how application of AI in healthcare is at the early stages and there is yet more to come [1, 4, 6]. However, is AI just hype and are entities investing into a bubble? To get an answer, we first need to understand what AI is and its approaches and tools. This chapter covers these issues and how they specifically apply to healthcare and what is next for the use

So, what is AI? Because of the complexity involved in developing synthetic intelligence that is comparable to human intelligence, there are varying interpretations of what AI is and what goes into developing AI. Some authors even frown upon the term 'AI' and prefer the term 'Computational Intelligence' [11]. However, if we consider what is the objective of AI and what resources go into achieving the objective, an acceptable definition encompassing these components can be fashioned. The end objective of AI is to create systems that think and act rationally like humans [2, 4, 12]. These systems can also be termed as 'intelligent agents' [2, 4]. If the goal of the system is to demonstrate intelligence and developing these systems requires computer programming, a formal definition of AI would read as '*a field of science concerned with the computational understanding of what is commonly called intelligent behaviour, and with the creation of intelligent agents that exhibit such behaviour*' [13]. Simpler definitions describe AI as 'machines assuming human like capabilities', 'extension of human intelligence through computers' and 'making computers do things which currently humans do' but a more accurate

AI theory can be best understood through the *intelligent agent* concept [11]. An intelligent agent incorporates the skills required to pass the Turing Test, which assesses whether a machine can think like a human? [2, 3]. So an intelligent agent should be skilled in perception, practical reasoning and have an ability to take action to achieve its goals. The agent utilises the environment, it operates within, to both receive input and take action (**Figure 1**). Some key inputs that feed into an agent and potentially, which it can draw itself are current observations about the environment, prior knowledge about the environment, past experiences

description would be 'the science of making intelligent machines' [1, 2, 4, 14].

of AI in healthcare?

82 eHealth - Making Health Care Smarter

**2.1. Definition**

**2.2. Intelligent agent**

**2. Development and application of AI**

that it can learn from and the objectives it needs to achieve. The agent perceives the environment through sensors and acts on the environment through effectors. When an intelligent agent is comprised of a computational core with physical actuators and sensors, it is termed a 'robot' [11]. When an agent is a program acting in a pure computational environment, it is an 'infobot' and when an advice providing program is coupled with a human expert, it is a 'decision support system'.

#### **2.3. What makes up AI?**

In the past, researchers aimed for AI to replicate human intelligence [2]. This approach is called 'Classical AI'. However, this was a limiting approach as it assumed human intelligence is the only form of intelligence. This approach also assumes human intelligence is the most intelligence can be. Intelligence mainly comprises of learning and reasoning [3, 13]. Constructing intelligence does not have to be defined by the limitations human intelligence poses. An apt analogy to discuss here is flight. While bird flight may be a source of inspiration for constructing aeroplanes, the aeroplane structure is not replicating the anatomic structure of a bird. So in constructing AI, it is more important to incorporate the vital characteristics of intelligence than merely replicate human intelligence.

Learning is an essential characteristic of intelligence [2, 4, 11]. Learning involves acquiring new knowledge, developing new skills through instruction or practise and knowledge representation and experimentation. If AI comprises learning, it has to demonstrate all the aforementioned features. A very common process through which AI systems achieve learning objectives is by *Machine Learning*. Machine learning is the modelling of different aspects of the learning process by computers [15]. Key goals of machine learning are for algorithms<sup>1</sup> to selflearn and improve through experience. The algorithms for machine learning typically fall into two categories: supervised and unsupervised categories [16]. Supervised learning involves an algorithm working with labelled training data. Categorisation of data and programming of the relationship between input and output data occurs in supervised learning. On the other hand, unsupervised learning allows the algorithm to identify a hidden pattern in a stack of data. Here the algorithm is run to check what patterns can be identified in the data and what outcomes may occur?

create AI applications to resolves issues across various disciplines and industries. Some com-

Use of Artificial Intelligence in Healthcare Delivery http://dx.doi.org/10.5772/intechopen.74714 85

Search in AI system mirrors real-life problem solving but draws upon computing power to resolve the problems [17]. Search problems are classified based on the amount of information that is available to the search process. This information may relate to the whole of the problem area or a specific component of the problem. AI through an independent search planning process analyses multiple options and identifies an optimal solution. AI adopts a faster and better process to search and optimisation than conventional techniques [17, 18]. The search process that separates AI from conventional techniques is its process remembers past results, learns and refines its performance in relation to past searches, plans its path forward and answers search queries akin to human intelligence. One such example of AI search and optimisation tool is *Evolutionary Computation*. Evolutionary Computation is the umbrella term for algorithms based on natural evolutionary processes that incorporate mechanisms of natural selection and survival of the fittest principle [1, 10]. Foremost of the evolutionary computation algorithms are the *Genetic Algorithms*. Genetic Algorithms are a category of stochastic search and optimisation algorithms based on Darwin's natural biological evolution. These algorithms use a population-based search process to create random solutions for the problem at hand. These solutions are termed chromosomes. The chromosomes are comprised of random values derived from various control values. The variations in the values are utilised for the search process. The population of chromosomes is then assessed for an objective function. This population of solutions then evolves from one generation to another to arrive at an acceptable solution. The ideal solutions are retained and the mediocre ones disposed of. Through a process of repetition, improvements and generation of new solutions

In their quest to replicate biological intelligence, AI researchers inspired by the biological nervous system have developed *Artificial Neural Networks (ANNs)* [1, 19]. Artificial Neural Networks attempt to simulate nerve cell (neurons) networks of the brain. This approach of copying biological neuronal networks to function independently differs from conventional computing process that primarily seeks to support human brain computation. A very simple base algorithm structure (see **Figure 2**) lies behind the artificial neural networks but it can be adapted to a range of problems. The artificial neurons, which are computer processors, are interconnected with each other and are capable of performing parallel computations for data processing and knowledge representation [19]. These neural networks are capable of learning from historical examples, examining non-linear data, and managing imprecise information. ANNs are categorised into two main categories: *Feedforward Neural Networks* and *Recurrent Neural Networks*. In feedforward network the signal passes in only one direction and in recurrent neural networks, feedback and short-term memories of previous inputs are enabled. In both categories, application of deep learning, which is a class of machine learning that uses a cascade of multiple layers of non-linear processing units, enhances the problem-solving capabilities of the neural networks. So we have deep feedforward and deep recurrent neural networks increasingly being used to resolve realworld problems through language modelling, analysis of unstructured data and strategy

monly utilised tools are discussed in this section.

would occur.

formulation.

Reasoning and knowledge representation are the other aspects of AI [11]. In AI, reasoning involves manipulation of data to produce actions. Unlike traditional programming, the emphasis in AI is on what is to be computed rather than how it is to be computed? Structuring of this computation happens through design-time reasoning, offline computation and online computation. Earlier forms of AI involved algorithms based on the step-by-step reasoning model used to address predicated problems [2]. However, these models were not useful for uncertain situations or when there was incomplete information. AI reasoning models have now evolved to respond to these situations by drawing upon concepts from probability and economic theories. To resolve problems-certain or uncertain, AI systems require widespread knowledge about the relevant environment and then be able to represent this knowledge in a computable form [11]. For this to occur, AI uses a *Representation and Reasoning System* (RRS). An RRS is comprised of a programming language to communicate with a computer, a method to allocate meaning to the language and after input a process to figure out the answers. Knowledge is represented in different forms, but the most widely used method is Frames [2]. Frames are files in the computer where information is stored in slots. To enable AI knowledge representation and reasoning, programming languages and computational resources are two important properties. Different programming languages are used in AI, but the most popular are low-level programming languages such as Lisp, Python, C++ and Fortran. In the past, stand-alone computers and their limited processing power had restricted the advancement of AI. In recent years, AI reasoning and knowledge representation has immensely benefited from the rapid technological advances in computing power and wireless technology. These advances have helped in the deployment of sophisticated algorithms designed to resolve problems that could not have been addressed by AI applications in the past.

#### **2.4. AI tools**

AI systems employ several tools to automate problem-solving tasks. These tools are based on AI principles, some of which were discussed in the previous sections. The tools are used to

<sup>1</sup> In computer science, an algorithm is an explicit description of how to solve a class of problems? [2–4].

create AI applications to resolves issues across various disciplines and industries. Some commonly utilised tools are discussed in this section.

Learning is an essential characteristic of intelligence [2, 4, 11]. Learning involves acquiring new knowledge, developing new skills through instruction or practise and knowledge representation and experimentation. If AI comprises learning, it has to demonstrate all the aforementioned features. A very common process through which AI systems achieve learning objectives is by *Machine Learning*. Machine learning is the modelling of different aspects of the

learn and improve through experience. The algorithms for machine learning typically fall into two categories: supervised and unsupervised categories [16]. Supervised learning involves an algorithm working with labelled training data. Categorisation of data and programming of the relationship between input and output data occurs in supervised learning. On the other hand, unsupervised learning allows the algorithm to identify a hidden pattern in a stack of data. Here the algorithm is run to check what patterns can be identified in the data and what

Reasoning and knowledge representation are the other aspects of AI [11]. In AI, reasoning involves manipulation of data to produce actions. Unlike traditional programming, the emphasis in AI is on what is to be computed rather than how it is to be computed? Structuring of this computation happens through design-time reasoning, offline computation and online computation. Earlier forms of AI involved algorithms based on the step-by-step reasoning model used to address predicated problems [2]. However, these models were not useful for uncertain situations or when there was incomplete information. AI reasoning models have now evolved to respond to these situations by drawing upon concepts from probability and economic theories. To resolve problems-certain or uncertain, AI systems require widespread knowledge about the relevant environment and then be able to represent this knowledge in a computable form [11]. For this to occur, AI uses a *Representation and Reasoning System* (RRS). An RRS is comprised of a programming language to communicate with a computer, a method to allocate meaning to the language and after input a process to figure out the answers. Knowledge is represented in different forms, but the most widely used method is Frames [2]. Frames are files in the computer where information is stored in slots. To enable AI knowledge representation and reasoning, programming languages and computational resources are two important properties. Different programming languages are used in AI, but the most popular are low-level programming languages such as Lisp, Python, C++ and Fortran. In the past, stand-alone computers and their limited processing power had restricted the advancement of AI. In recent years, AI reasoning and knowledge representation has immensely benefited from the rapid technological advances in computing power and wireless technology. These advances have helped in the deployment of sophisticated algorithms designed to resolve

problems that could not have been addressed by AI applications in the past.

In computer science, an algorithm is an explicit description of how to solve a class of problems? [2–4].

AI systems employ several tools to automate problem-solving tasks. These tools are based on AI principles, some of which were discussed in the previous sections. The tools are used to

to self-

learning process by computers [15]. Key goals of machine learning are for algorithms<sup>1</sup>

outcomes may occur?

84 eHealth - Making Health Care Smarter

**2.4. AI tools**

1

Search in AI system mirrors real-life problem solving but draws upon computing power to resolve the problems [17]. Search problems are classified based on the amount of information that is available to the search process. This information may relate to the whole of the problem area or a specific component of the problem. AI through an independent search planning process analyses multiple options and identifies an optimal solution. AI adopts a faster and better process to search and optimisation than conventional techniques [17, 18]. The search process that separates AI from conventional techniques is its process remembers past results, learns and refines its performance in relation to past searches, plans its path forward and answers search queries akin to human intelligence. One such example of AI search and optimisation tool is *Evolutionary Computation*. Evolutionary Computation is the umbrella term for algorithms based on natural evolutionary processes that incorporate mechanisms of natural selection and survival of the fittest principle [1, 10]. Foremost of the evolutionary computation algorithms are the *Genetic Algorithms*. Genetic Algorithms are a category of stochastic search and optimisation algorithms based on Darwin's natural biological evolution. These algorithms use a population-based search process to create random solutions for the problem at hand. These solutions are termed chromosomes. The chromosomes are comprised of random values derived from various control values. The variations in the values are utilised for the search process. The population of chromosomes is then assessed for an objective function. This population of solutions then evolves from one generation to another to arrive at an acceptable solution. The ideal solutions are retained and the mediocre ones disposed of. Through a process of repetition, improvements and generation of new solutions would occur.

In their quest to replicate biological intelligence, AI researchers inspired by the biological nervous system have developed *Artificial Neural Networks (ANNs)* [1, 19]. Artificial Neural Networks attempt to simulate nerve cell (neurons) networks of the brain. This approach of copying biological neuronal networks to function independently differs from conventional computing process that primarily seeks to support human brain computation. A very simple base algorithm structure (see **Figure 2**) lies behind the artificial neural networks but it can be adapted to a range of problems. The artificial neurons, which are computer processors, are interconnected with each other and are capable of performing parallel computations for data processing and knowledge representation [19]. These neural networks are capable of learning from historical examples, examining non-linear data, and managing imprecise information. ANNs are categorised into two main categories: *Feedforward Neural Networks* and *Recurrent Neural Networks*. In feedforward network the signal passes in only one direction and in recurrent neural networks, feedback and short-term memories of previous inputs are enabled. In both categories, application of deep learning, which is a class of machine learning that uses a cascade of multiple layers of non-linear processing units, enhances the problem-solving capabilities of the neural networks. So we have deep feedforward and deep recurrent neural networks increasingly being used to resolve realworld problems through language modelling, analysis of unstructured data and strategy formulation.

for experiments. Part-of-speech identification and word sense disambiguation have become standard processes in NLP. Other current applications of NLP include information retrieval,

Use of Artificial Intelligence in Healthcare Delivery http://dx.doi.org/10.5772/intechopen.74714 87

Use of *Hybrid Artificial Intelligent Systems* (HAIS), which are a combination of AI techniques, is becoming popular because of its capabilities to address real world complex problems that individual AI techniques cannot address [21]. By combining different AI learning and adaptation techniques, HAIS overcomes the limitations associated with a particular technique. HAIS may involve a combination of agents and multi-agent systems, fuzzy systems, artificial neural networks, optimisation models and so forth. By combining symbolic and sub-symbolic techniques, complex issues involving indistinctness, ambiguity and vagueness can be resolved by HAIS. The synergy in HAIS also allows it to adjust to common sense, mine knowledge from

AI lends itself to healthcare delivery very well. In fact, in the recent years there has been an exponential increase in the use of AI in clinical environments [1, 6, 21–24]. With modern Medicine facing a significant challenge of acquiring, analysing and applying structured and unstructured data to treat or manage diseases, AI systems with their data-mining and pattern recognition capabilities come in handy. Medical AI is mainly concerned with the development of AI programs that help with the prediction, diagnosis and treatment or management of diseases. In contrast to non-AI medical software application, which relies on pure statistical analysis and probabilistic approaches, medical AI applications utilise symbolic models of diseases and analyse their relationship to patient signs and symptoms [1, 25–27]. For example, diagnostic AI applications gather and synthesise clinical data and compare information with predefined categories such as diseases to help with diagnosis and treatment. Medical AI applications have not just been used to support diagnosis but also treatment protocol devel-

Discussion of the use of AI in medicine coincides with the advent AI in the modern era. This is not surprising as AI systems initially intend to replicate the functioning of the human brain [2]. In 1970, William B Schwartz, a physician interested in the use of computing science in medicine, published an influential paper in the *New England Journal of Medicine* titled '*Medicine and the computer: the promise and problems of change*' [28]. In the paper he argued '*Computing science will probably exert its major effects by augmenting and, in some cases, largely replacing the intellectual functions of the physician'*. By the 1970s there was a realisation that conventional computing techniques were unsuitable for solving complex medical phenomenon [2, 4]. A more sophisticated computational model that simulated human cognitive processes, that is AI models, was required for clinical problem solving. Early efforts to apply AI in medicine consisted of setting up rules-based systems to help with medical reasoning. However, serious

raw data, use human like reasoning, and learn to adapt to a changing environment.

opment, drug development and patient monitoring too [1].

**3.1. History of use of AI in healthcare**

machine translation and text mining.

**3. AI in healthcare**

**Figure 2.** Schematic representation of an artificial neural network.

Logic is important to reasoning, which in turn is a key component of intelligence. Classical logic is based on the assumption that only two truth-values (false and true) exist [2]. This assumption is called *bivalence*. On the other hand, *Fuzzy Logic* reflects real world phenomenon, where everything is a matter of degree [1, 2, 10]. A fuzzy logic can be viewed as a fuzzy extension of a multi-valued logic i.e. instead of recognising everything is black and white, it recognises there are shades of grey. Fuzzy logic uses continuous set membership from 0 to 1 in opposition to Boolean logic, which relies on sharp distinctions such as 0 for false and 1 for true. Fuzzy applications utilise a structure of series of 'if-then' rules for modelling. This approach by fuzzy logic permits ambiguity and can be used in AI systems for indeterminate reasoning.

Another important AI technique is *Natural Language Processing*. Natural language processing (NLP) is concerned with the use of software programming to understand and manipulate natural language text or speech for practical purposes [20]. With NLP, the process of language analysis is decomposed into various stages mirroring theoretical linguistic distinctions as outlined by syntax, semantics and pragmatics [4, 20]. NLP enables machines to read and understand human language. NLP can also be utilised to gather and analyse unstructured data such as free text. In recent years, progress in NLP specifically in the field of syntax has led to development of effective grammar characterisation and chart parsing. Development of numerous conceptual tools has led to formation of systems and interface subsystems to use for experiments. Part-of-speech identification and word sense disambiguation have become standard processes in NLP. Other current applications of NLP include information retrieval, machine translation and text mining.

Use of *Hybrid Artificial Intelligent Systems* (HAIS), which are a combination of AI techniques, is becoming popular because of its capabilities to address real world complex problems that individual AI techniques cannot address [21]. By combining different AI learning and adaptation techniques, HAIS overcomes the limitations associated with a particular technique. HAIS may involve a combination of agents and multi-agent systems, fuzzy systems, artificial neural networks, optimisation models and so forth. By combining symbolic and sub-symbolic techniques, complex issues involving indistinctness, ambiguity and vagueness can be resolved by HAIS. The synergy in HAIS also allows it to adjust to common sense, mine knowledge from raw data, use human like reasoning, and learn to adapt to a changing environment.

## **3. AI in healthcare**

Logic is important to reasoning, which in turn is a key component of intelligence. Classical logic is based on the assumption that only two truth-values (false and true) exist [2]. This assumption is called *bivalence*. On the other hand, *Fuzzy Logic* reflects real world phenomenon, where everything is a matter of degree [1, 2, 10]. A fuzzy logic can be viewed as a fuzzy extension of a multi-valued logic i.e. instead of recognising everything is black and white, it recognises there are shades of grey. Fuzzy logic uses continuous set membership from 0 to 1 in opposition to Boolean logic, which relies on sharp distinctions such as 0 for false and 1 for true. Fuzzy applications utilise a structure of series of 'if-then' rules for modelling. This approach by fuzzy logic permits ambiguity and can be used in AI systems for indeterminate

**Figure 2.** Schematic representation of an artificial neural network.

86 eHealth - Making Health Care Smarter

Another important AI technique is *Natural Language Processing*. Natural language processing (NLP) is concerned with the use of software programming to understand and manipulate natural language text or speech for practical purposes [20]. With NLP, the process of language analysis is decomposed into various stages mirroring theoretical linguistic distinctions as outlined by syntax, semantics and pragmatics [4, 20]. NLP enables machines to read and understand human language. NLP can also be utilised to gather and analyse unstructured data such as free text. In recent years, progress in NLP specifically in the field of syntax has led to development of effective grammar characterisation and chart parsing. Development of numerous conceptual tools has led to formation of systems and interface subsystems to use

reasoning.

AI lends itself to healthcare delivery very well. In fact, in the recent years there has been an exponential increase in the use of AI in clinical environments [1, 6, 21–24]. With modern Medicine facing a significant challenge of acquiring, analysing and applying structured and unstructured data to treat or manage diseases, AI systems with their data-mining and pattern recognition capabilities come in handy. Medical AI is mainly concerned with the development of AI programs that help with the prediction, diagnosis and treatment or management of diseases. In contrast to non-AI medical software application, which relies on pure statistical analysis and probabilistic approaches, medical AI applications utilise symbolic models of diseases and analyse their relationship to patient signs and symptoms [1, 25–27]. For example, diagnostic AI applications gather and synthesise clinical data and compare information with predefined categories such as diseases to help with diagnosis and treatment. Medical AI applications have not just been used to support diagnosis but also treatment protocol development, drug development and patient monitoring too [1].

#### **3.1. History of use of AI in healthcare**

Discussion of the use of AI in medicine coincides with the advent AI in the modern era. This is not surprising as AI systems initially intend to replicate the functioning of the human brain [2]. In 1970, William B Schwartz, a physician interested in the use of computing science in medicine, published an influential paper in the *New England Journal of Medicine* titled '*Medicine and the computer: the promise and problems of change*' [28]. In the paper he argued '*Computing science will probably exert its major effects by augmenting and, in some cases, largely replacing the intellectual functions of the physician'*. By the 1970s there was a realisation that conventional computing techniques were unsuitable for solving complex medical phenomenon [2, 4]. A more sophisticated computational model that simulated human cognitive processes, that is AI models, was required for clinical problem solving. Early efforts to apply AI in medicine consisted of setting up rules-based systems to help with medical reasoning. However, serious clinical problems are too complex to lend them to simple rules-based problem solving techniques. Problem solving in medicine then progressed to construction of computer programs based on models of diseases. It was not just with the field of general medicine, that AI was being explored to assist with problem solving. In 1976, the Scottish surgeon Gunn used computational analysis to diagnose acute abdominal pain [1]. This was achieved through clinical audits of structured case notes through computers, whereby diagnosis through this route proved to be about 10% more accurate than the conventional route. By the 1980s, AI research communities were well established across the world but especially in learning centres in the US [1, 2, 4, 13]. This development helped in expansion of the use of novel and innovative AI approaches to medical diagnoses. Much of this push was because medicine was an ideal testing ground for these AI applications. A significant number of AI applications in medicine at this stage were based on the *expert system* methodology [1, 25, 29–31]. By the end of the 1990s, research in medical AI had started to use new techniques like machine learning and artificial neural networks to aid clinical decision-making. The next section explores current application of AI in various aspects of healthcare.

#### **3.2. Application of AI techniques in healthcare**

The wide acceptance of AI in healthcare relates to the complexities of modern medicine, which involves acquisition and analysis of the copious amount of information and the limitation of clinicians to address these needs with just human intelligence. Medical AI applications with their advanced computing ability are overcoming this limitation and are using several techniques to assist clinicians in medical care.

> of infectious disease patients. Infectious disease knowledge was represented in the form of production rules, which are conditional statements as to how observations can be inferred appropriately. However, MYCIN had less emphasis on diagnosis and more on the management of patients with infectious diseases. In a later evaluation of the MYCIN system, it was found it compared favourably with the advice provided by infectious disease experts. MYCIN paved the way for the development of knowledge-based systems and the commercialisation of rule-base approaches in medicine and other fields. Another CDSS that was initially developed around the same time period as MYCIN but continues to be used is the QMR system [35]. The QMR system utilises a customised algorithm modelled on the clinical reasoning of one single University of Pittsburgh internist. Hence the system was initially called INTERNIST-I. By considering historical and physical findings, QMR system generates the differential diagnosis. Utilising a large database that categorises disease findings into 'evoking strengths', 'importance' and 'frequencies' domains, the system generates the differential diagnosis. Heuristic rules drove the system to produce a list of ranked diagnoses founded on disease knowledge domains in-built into the system. Where the system was unable to make a determined diagnosis it probed the user with further questions or provided advice about further tests until a determination of the condition was made. While MYCIN and QMR systems offered diagnostic support, other forms of CDSS can provide alerts and reminders and advice about patient treatment and management. These systems operate by

Use of Artificial Intelligence in Healthcare Delivery http://dx.doi.org/10.5772/intechopen.74714 89

**Figure 3.** Medical diagnostic-therapeutic cycle.

AI is being used for all the three classical medical tasks: diagnosis, prognosis and therapy but mostly in the area of medical diagnosis [9, 32]. Generally, the medical diagnosis cycle (**Figure 3**) involves observation and examination of the patient, collection of patient data, interpretation of the data using the clinician's knowledge and experience and then formulation of a diagnosis and a therapeutic plan by the physician. If we can compare the medical diagnostic cycle (**Figure 3**) to the concept of an intelligent agent system, the physician is the intelligent agent, the patient data is the input and the diagnosis is the output. There are several methods, through which AI systems can replicate this diagnostic cycle and assist clinicians with medical diagnosis. One such approach is the use of *Expert Systems*. Expert systems are based on rules clearly outlining the steps involved in progressing from inputs to outputs [2]. The progression occurs through the construction of a number of IF-THEN type rules. These rules are constructed with the help of subject experts like clinicians who have interest and experience in the particular domain. The success of the expert system relies on the explicit representation of the knowledge area in the form of rules. The core of the expert system is the inference engine, which transforms the inputs into actionable outputs.

Commonly, the application of the expert system approach in medical software programming is seen in *Clinical Decision Support Systems* (CDSS). Simply put, CDSS are software programs that enable clinicians to make clinician decisions [33, 34]. CDSS provides customised assessment or advice based on analysis of patient data sets. An early version of CDSS was the MYCIN program developed in the 1970s. MYCIN was a CDSS focusing on the management

**Figure 3.** Medical diagnostic-therapeutic cycle.

clinical problems are too complex to lend them to simple rules-based problem solving techniques. Problem solving in medicine then progressed to construction of computer programs based on models of diseases. It was not just with the field of general medicine, that AI was being explored to assist with problem solving. In 1976, the Scottish surgeon Gunn used computational analysis to diagnose acute abdominal pain [1]. This was achieved through clinical audits of structured case notes through computers, whereby diagnosis through this route proved to be about 10% more accurate than the conventional route. By the 1980s, AI research communities were well established across the world but especially in learning centres in the US [1, 2, 4, 13]. This development helped in expansion of the use of novel and innovative AI approaches to medical diagnoses. Much of this push was because medicine was an ideal testing ground for these AI applications. A significant number of AI applications in medicine at this stage were based on the *expert system* methodology [1, 25, 29–31]. By the end of the 1990s, research in medical AI had started to use new techniques like machine learning and artificial neural networks to aid clinical decision-making. The next section explores current application

The wide acceptance of AI in healthcare relates to the complexities of modern medicine, which involves acquisition and analysis of the copious amount of information and the limitation of clinicians to address these needs with just human intelligence. Medical AI applications with their advanced computing ability are overcoming this limitation and are using several

AI is being used for all the three classical medical tasks: diagnosis, prognosis and therapy but mostly in the area of medical diagnosis [9, 32]. Generally, the medical diagnosis cycle (**Figure 3**) involves observation and examination of the patient, collection of patient data, interpretation of the data using the clinician's knowledge and experience and then formulation of a diagnosis and a therapeutic plan by the physician. If we can compare the medical diagnostic cycle (**Figure 3**) to the concept of an intelligent agent system, the physician is the intelligent agent, the patient data is the input and the diagnosis is the output. There are several methods, through which AI systems can replicate this diagnostic cycle and assist clinicians with medical diagnosis. One such approach is the use of *Expert Systems*. Expert systems are based on rules clearly outlining the steps involved in progressing from inputs to outputs [2]. The progression occurs through the construction of a number of IF-THEN type rules. These rules are constructed with the help of subject experts like clinicians who have interest and experience in the particular domain. The success of the expert system relies on the explicit representation of the knowledge area in the form of rules. The core of the expert system is the inference

Commonly, the application of the expert system approach in medical software programming is seen in *Clinical Decision Support Systems* (CDSS). Simply put, CDSS are software programs that enable clinicians to make clinician decisions [33, 34]. CDSS provides customised assessment or advice based on analysis of patient data sets. An early version of CDSS was the MYCIN program developed in the 1970s. MYCIN was a CDSS focusing on the management

of AI in various aspects of healthcare.

88 eHealth - Making Health Care Smarter

**3.2. Application of AI techniques in healthcare**

techniques to assist clinicians in medical care.

engine, which transforms the inputs into actionable outputs.

of infectious disease patients. Infectious disease knowledge was represented in the form of production rules, which are conditional statements as to how observations can be inferred appropriately. However, MYCIN had less emphasis on diagnosis and more on the management of patients with infectious diseases. In a later evaluation of the MYCIN system, it was found it compared favourably with the advice provided by infectious disease experts. MYCIN paved the way for the development of knowledge-based systems and the commercialisation of rule-base approaches in medicine and other fields. Another CDSS that was initially developed around the same time period as MYCIN but continues to be used is the QMR system [35]. The QMR system utilises a customised algorithm modelled on the clinical reasoning of one single University of Pittsburgh internist. Hence the system was initially called INTERNIST-I. By considering historical and physical findings, QMR system generates the differential diagnosis. Utilising a large database that categorises disease findings into 'evoking strengths', 'importance' and 'frequencies' domains, the system generates the differential diagnosis. Heuristic rules drove the system to produce a list of ranked diagnoses founded on disease knowledge domains in-built into the system. Where the system was unable to make a determined diagnosis it probed the user with further questions or provided advice about further tests until a determination of the condition was made. While MYCIN and QMR systems offered diagnostic support, other forms of CDSS can provide alerts and reminders and advice about patient treatment and management. These systems operate by creating predictive models and multi-dimensional patient view through aggregation of data from multiple sources including knowledge and patient information databases. As treatment and management of diseases have evolved, CDSS architecture is now utilising multi-agent systems [26]. Each of the multiple agents performs distinct tasks and operations in various capacities or different locations but transmit data to a central repository so aggregated data can be used for knowledge discovery.

Unlike experts systems where a serial or sequential data processing approach is utilised, ANN processing utilises a parallel form of data processing analogous to the brain [19]. In ANNs, the processing elements, otherwise called as neurons, process data simultaneously while communicating with each other. The processing elements are arranged in layers and the layers, in turn, are connected to each other. The links between the processing elements are associated with a numerical weight. The memory and adaptation of ANNs are adjusted by changing the weights, which leads to the amplification of the effects of afferent connection to each processing element. As a result of this architecture, ANNs can be trained to learn from experience, analyse non-linear data and manage inexact information. These abilities have led to ANN techniques being one of the most popularly utilised AI techniques in medicine [1]. ANNs in addition to medical diagnosis have been used for radiology and histopathology analysis. In radiology, gamma camera, CT, ultrasound and MRI all create digital images, which can be manipulated by ANNs and used as inputs. The digitised inputs are then transmitted through the hidden and output layers to produce desired outputs (see **Figure 2**). Using the Backpropagation approach, a learning algorithm, ANNs have successfully identified orthopaedic trauma from radiographs [36]. When ANNs and radiologists interpret the same radiological images separately, research has identified good diagnostic agreement [1, 36]. ANNs have also been used for analysis of cytological and histological specimens too [1, 25]. For example, ANNs has been used to screen abnormal cells from slide images for haematology and cervical cytology. Further, ANNs have also been used to interpret ECGs and EEGs through waveform analysis. For this to occur, a multi-layered neural network is trained with waveform data from both people with the disease and without [1]. Evaluation of the waveform interpretations by ANNs has identified excellent pattern approximation and classification abilities and comparable in interpretation to clinicians.

time for chronic disease and cancer patients [8, 38–40]. Data mining medical data faces two main issues: heterogeneity of data sometimes with incomplete recording or filing of data and complexity of the requested outputs [27]. Fuzzy logic, which we discussed in an earlier section, with its proficiency to represent assorted data, strength in adapting to change in the user environment and its distinctive expressiveness can support data mining in addressing these issues. Thus data mining utilising fuzzy logic has been used for a range of situations in healthcare including prediction of the prognosis of cancer and assessing the satisfaction of clinicians

Use of Artificial Intelligence in Healthcare Delivery http://dx.doi.org/10.5772/intechopen.74714 91

There are an estimated 5 billion mobile phone subscriptions in the world [41]. Many mobile phones now have memories and processing power equivalent to the capacity of mini-computers [42]. So it is natural to see mobile communication devices being harnessed to deliver healthcare. The use of wireless communication devices to support delivery of healthcare is called *Mobile Health* or in a popular terminology: *mHealth* [41]. Mobile health applications are being used in many areas of healthcare delivery including education and awareness, point-of-care support and diagnostics, patient monitoring, disease surveillance, emergency medical response and patient information management [41, 43–46]. The rapid development in mHealth has coincided with the increase in AI research and development of AI techniques. Consequently, there has been an increased application of AI techniques in mHealth. The move has worked well as characteristics of an intelligent agent system lend themselves to the objectives of mHealth. The intelligent agent perceives the environment and autonomously acts upon it. In case of multi-agent systems, the agents can communicate between themselves, dynamically manage data and resource and handle the complexity of solutions through decomposition, modelling and reorganisation of relationships. These abilities mean agent-based mobile applications can be used for remote monitoring of patients especially elderly and chronic disease patients, support clinical decision-making and provide remote training for health workers. The application of AI has not been restricted to mobile communication devices but has been extended to other smart devices. When these smart devices are

for patient information management systems.

**Figure 4.** Data mining process. Adapted from Huang et al. [39].

*Data Mining* acts as the foundation for machine learning. Data mining is the process for identifying previously unknown patterns and trends in large databases and then utilising the same to create predictive models [37, 38]. Data mining involves multiple iterative steps (**Figure 4**) that includes retrieval of data sets from data warehouses or operational databases, cleaning of data to remove discrepancies, analysis of data sets to identify patterns that represent relationships amongst the data, validation of the patterns with new data sets and culminating in knowledge extraction [39]. Use of data mining has become hugely popular in healthcare largely because of the generation of data too voluminous and complex to be processed by conventional computational techniques. The potential application of data mining in healthcare can be huge but practically data mining has been used in evaluating the effectiveness of medical treatments, analyse epidemiological data to identify disease outbreaks and act as an early warning system, analyse hospital records to identify acute medical conditions and help with interventions, quality assessment of medical interventions and predicting survival

**Figure 4.** Data mining process. Adapted from Huang et al. [39].

creating predictive models and multi-dimensional patient view through aggregation of data from multiple sources including knowledge and patient information databases. As treatment and management of diseases have evolved, CDSS architecture is now utilising multi-agent systems [26]. Each of the multiple agents performs distinct tasks and operations in various capacities or different locations but transmit data to a central repository so aggregated data

Unlike experts systems where a serial or sequential data processing approach is utilised, ANN processing utilises a parallel form of data processing analogous to the brain [19]. In ANNs, the processing elements, otherwise called as neurons, process data simultaneously while communicating with each other. The processing elements are arranged in layers and the layers, in turn, are connected to each other. The links between the processing elements are associated with a numerical weight. The memory and adaptation of ANNs are adjusted by changing the weights, which leads to the amplification of the effects of afferent connection to each processing element. As a result of this architecture, ANNs can be trained to learn from experience, analyse non-linear data and manage inexact information. These abilities have led to ANN techniques being one of the most popularly utilised AI techniques in medicine [1]. ANNs in addition to medical diagnosis have been used for radiology and histopathology analysis. In radiology, gamma camera, CT, ultrasound and MRI all create digital images, which can be manipulated by ANNs and used as inputs. The digitised inputs are then transmitted through the hidden and output layers to produce desired outputs (see **Figure 2**). Using the Backpropagation approach, a learning algorithm, ANNs have successfully identified orthopaedic trauma from radiographs [36]. When ANNs and radiologists interpret the same radiological images separately, research has identified good diagnostic agreement [1, 36]. ANNs have also been used for analysis of cytological and histological specimens too [1, 25]. For example, ANNs has been used to screen abnormal cells from slide images for haematology and cervical cytology. Further, ANNs have also been used to interpret ECGs and EEGs through waveform analysis. For this to occur, a multi-layered neural network is trained with waveform data from both people with the disease and without [1]. Evaluation of the waveform interpretations by ANNs has identified excellent pattern approximation and

classification abilities and comparable in interpretation to clinicians.

*Data Mining* acts as the foundation for machine learning. Data mining is the process for identifying previously unknown patterns and trends in large databases and then utilising the same to create predictive models [37, 38]. Data mining involves multiple iterative steps (**Figure 4**) that includes retrieval of data sets from data warehouses or operational databases, cleaning of data to remove discrepancies, analysis of data sets to identify patterns that represent relationships amongst the data, validation of the patterns with new data sets and culminating in knowledge extraction [39]. Use of data mining has become hugely popular in healthcare largely because of the generation of data too voluminous and complex to be processed by conventional computational techniques. The potential application of data mining in healthcare can be huge but practically data mining has been used in evaluating the effectiveness of medical treatments, analyse epidemiological data to identify disease outbreaks and act as an early warning system, analyse hospital records to identify acute medical conditions and help with interventions, quality assessment of medical interventions and predicting survival

can be used for knowledge discovery.

90 eHealth - Making Health Care Smarter

time for chronic disease and cancer patients [8, 38–40]. Data mining medical data faces two main issues: heterogeneity of data sometimes with incomplete recording or filing of data and complexity of the requested outputs [27]. Fuzzy logic, which we discussed in an earlier section, with its proficiency to represent assorted data, strength in adapting to change in the user environment and its distinctive expressiveness can support data mining in addressing these issues. Thus data mining utilising fuzzy logic has been used for a range of situations in healthcare including prediction of the prognosis of cancer and assessing the satisfaction of clinicians for patient information management systems.

There are an estimated 5 billion mobile phone subscriptions in the world [41]. Many mobile phones now have memories and processing power equivalent to the capacity of mini-computers [42]. So it is natural to see mobile communication devices being harnessed to deliver healthcare. The use of wireless communication devices to support delivery of healthcare is called *Mobile Health* or in a popular terminology: *mHealth* [41]. Mobile health applications are being used in many areas of healthcare delivery including education and awareness, point-of-care support and diagnostics, patient monitoring, disease surveillance, emergency medical response and patient information management [41, 43–46]. The rapid development in mHealth has coincided with the increase in AI research and development of AI techniques. Consequently, there has been an increased application of AI techniques in mHealth. The move has worked well as characteristics of an intelligent agent system lend themselves to the objectives of mHealth. The intelligent agent perceives the environment and autonomously acts upon it. In case of multi-agent systems, the agents can communicate between themselves, dynamically manage data and resource and handle the complexity of solutions through decomposition, modelling and reorganisation of relationships. These abilities mean agent-based mobile applications can be used for remote monitoring of patients especially elderly and chronic disease patients, support clinical decision-making and provide remote training for health workers. The application of AI has not been restricted to mobile communication devices but has been extended to other smart devices. When these smart devices are connected to each other to create a cyber-physical smart pervasive network, it is termed as the *Internet of Things* (IoT) [47, 48]. IoT is being used across for many purposes including prediction of natural disasters, water scarcity monitoring and intelligent transport systems but in health care, the concept is being used to design smart homes to assist senior citizens to accomplish their daily living activities while preserving their privacy and to remotely monitor their health conditions and medicine intake [48]. An IoT powered by AI and set up to address the healthcare need of senior and incapacitated patients is called as *Ambient Assisted Living* (AAL) [49]. As the main aim of AAL is to extend the independent living of elderly individuals in their homes, automation, security, control and communication are key aspects of AAL modular architecture. The system also includes sensors, actuators and cameras to collect different types of data about the individual and home. The constituent system sets up a smart home environment where activities and the health condition of the resident are not only tracked but also predicted [50].

form, while AI algorithms/programming provide intelligence to the robots [2]. Robotic assistants have already been employed to conduct surgeries, deliver medication and monitor hospital patients but the most promising area for their use is in elderly care [31]. Mobile robotic assistants are already being used to assist the elderly people in their day-to-day activities either in their home or in aged care settings [51]. The robotic assistants mainly undertake tasks that remind them of their routine activities including medication intake or guidance in their environments. With advances in AI and robotics, the employment of robotic assistants

Use of Artificial Intelligence in Healthcare Delivery http://dx.doi.org/10.5772/intechopen.74714 93

While the conventional thinking is that robots act as a vessel for a silicon-based artificial brain, there is emergence of a school of thought that imagines the use of biological brains in robots [2]. With advances in science now allowing the culture of biological neurons, the potential use of a biological brain in a robotic frame through which it can sense the world and move around is not inconceivable. This *Cyborg* model presents a true blurring of the boundaries between human and artificial intelligence and the imaginable development of a hybrid human-artifi-

While the application of AI in delivery of healthcare has very promising potential, challengesboth technical and ethical exist. AI research is largely led and driven by computer scientists without medical training and it has been commented that this has led to a very technologically focused and problem oriented approach in the application of AI in healthcare delivery [24]. Contemporary healthcare delivery models are very dependent on human reasoning, patient-clinician communication and establishing professional relationships with patients to ensure compliance. These aspects are something AI cannot replace easily. Use of robotic assistants in healthcare has raised issues about the mechanisation of care in vulnerable situations where human interaction and intervention is probably more appealing [6]. There is also the reluctance of clinicians in adopting AI technologies that they envisage will eventually replace them. Yet there is no qualm in them using technologies that automate and speed up laboratory diagnostic process [1]. This has led to some suggesting a model of co-habitation [6]. This is a model that accommodates both the AI and human elements in healthcare delivery and anticipates the inevitable automatisation of significant components of medical processes while preserving the human aspects of clinical care like communication, procedures

Healthcare delivery has over years become complex and challenging. A large part of the complexity in delivering healthcare is because of the voluminous data that is generated in the process of healthcare, which has to be interpreted in an intelligent fashion. AI systems with their problem solving approach can address this need. Their intelligent architecture, which incorporates learning and reasoning and ability to act autonomously without requiring constant human

cial intelligence health worker that can revolutionise healthcare delivery.

in elderly care is only bound to grow.

**3.4. Challenges**

and decision-making.

**4. Conclusion**

In addition to the examples discussed above, AI techniques have been successfully used in other areas of medicine. Genetic algorithm techniques have been used to predict outcomes in acutely ill and cancer patients, to analyse mammograms and MRI images and fuzzy logic techniques have been used in diagnosing various cancers, characterise ultrasound and CT scan images and predict survival in cancer patients and administer medication and anaesthetics [1, 6].

Of all the AI applications that have been developed over the past many decades, IBM's Watson is one of the well-recognised applications. IBM Watson is a cognitive computing technology that groups together the competencies of reading, reasoning and learning to reply to questions or investigate original connections [40]. IBM Watson aggregates huge volumes of structured and unstructured data from multiple sources into a single repository called Watson corpus. IBM incorporates machine learning and NLP techniques to process and analyse data to undertake problem solving. The technology of IBM Watson has been extended to the medical domain to assist medical scientists and clinicians in improving patient care [31, 51–53]. Some of the published examples of the use of IBM Watson in health care include automated problem list generation from electronic medical records, drug target identification and drug repurposing, interpretation of genetic testing results, oncological decision making support, and to support the roll-out of government healthcare programs.

#### **3.3. Future trends and application of AI in healthcare**

As more AI research is undertaken and AI systems become more trained and consequently intelligent, it is foreseeable that these agents replace some of, if not all, the human elements of clinical care [6]. While leaving the communication of serious matters and final decision making to human clinicians, AI systems can take responsibility for routine and less risky diagnostic and treatment processes. The intention here is not to replace human clinicians but enable a streamlined high-quality healthcare delivery process.

Of all the promising medical AI novelties that are being explored, robotics driven by AI will have an important role in the medical automation process. Robots embody AI and give it a form, while AI algorithms/programming provide intelligence to the robots [2]. Robotic assistants have already been employed to conduct surgeries, deliver medication and monitor hospital patients but the most promising area for their use is in elderly care [31]. Mobile robotic assistants are already being used to assist the elderly people in their day-to-day activities either in their home or in aged care settings [51]. The robotic assistants mainly undertake tasks that remind them of their routine activities including medication intake or guidance in their environments. With advances in AI and robotics, the employment of robotic assistants in elderly care is only bound to grow.

While the conventional thinking is that robots act as a vessel for a silicon-based artificial brain, there is emergence of a school of thought that imagines the use of biological brains in robots [2]. With advances in science now allowing the culture of biological neurons, the potential use of a biological brain in a robotic frame through which it can sense the world and move around is not inconceivable. This *Cyborg* model presents a true blurring of the boundaries between human and artificial intelligence and the imaginable development of a hybrid human-artificial intelligence health worker that can revolutionise healthcare delivery.

#### **3.4. Challenges**

connected to each other to create a cyber-physical smart pervasive network, it is termed as the *Internet of Things* (IoT) [47, 48]. IoT is being used across for many purposes including prediction of natural disasters, water scarcity monitoring and intelligent transport systems but in health care, the concept is being used to design smart homes to assist senior citizens to accomplish their daily living activities while preserving their privacy and to remotely monitor their health conditions and medicine intake [48]. An IoT powered by AI and set up to address the healthcare need of senior and incapacitated patients is called as *Ambient Assisted Living* (AAL) [49]. As the main aim of AAL is to extend the independent living of elderly individuals in their homes, automation, security, control and communication are key aspects of AAL modular architecture. The system also includes sensors, actuators and cameras to collect different types of data about the individual and home. The constituent system sets up a smart home environment where activities and the health condition of the resident are not only tracked but

In addition to the examples discussed above, AI techniques have been successfully used in other areas of medicine. Genetic algorithm techniques have been used to predict outcomes in acutely ill and cancer patients, to analyse mammograms and MRI images and fuzzy logic techniques have been used in diagnosing various cancers, characterise ultrasound and CT scan images and predict survival in cancer patients and administer medication and anaesthet-

Of all the AI applications that have been developed over the past many decades, IBM's Watson is one of the well-recognised applications. IBM Watson is a cognitive computing technology that groups together the competencies of reading, reasoning and learning to reply to questions or investigate original connections [40]. IBM Watson aggregates huge volumes of structured and unstructured data from multiple sources into a single repository called Watson corpus. IBM incorporates machine learning and NLP techniques to process and analyse data to undertake problem solving. The technology of IBM Watson has been extended to the medical domain to assist medical scientists and clinicians in improving patient care [31, 51–53]. Some of the published examples of the use of IBM Watson in health care include automated problem list generation from electronic medical records, drug target identification and drug repurposing, interpretation of genetic testing results, oncological decision making support,

As more AI research is undertaken and AI systems become more trained and consequently intelligent, it is foreseeable that these agents replace some of, if not all, the human elements of clinical care [6]. While leaving the communication of serious matters and final decision making to human clinicians, AI systems can take responsibility for routine and less risky diagnostic and treatment processes. The intention here is not to replace human clinicians but enable a

Of all the promising medical AI novelties that are being explored, robotics driven by AI will have an important role in the medical automation process. Robots embody AI and give it a

and to support the roll-out of government healthcare programs.

**3.3. Future trends and application of AI in healthcare**

streamlined high-quality healthcare delivery process.

also predicted [50].

92 eHealth - Making Health Care Smarter

ics [1, 6].

While the application of AI in delivery of healthcare has very promising potential, challengesboth technical and ethical exist. AI research is largely led and driven by computer scientists without medical training and it has been commented that this has led to a very technologically focused and problem oriented approach in the application of AI in healthcare delivery [24]. Contemporary healthcare delivery models are very dependent on human reasoning, patient-clinician communication and establishing professional relationships with patients to ensure compliance. These aspects are something AI cannot replace easily. Use of robotic assistants in healthcare has raised issues about the mechanisation of care in vulnerable situations where human interaction and intervention is probably more appealing [6]. There is also the reluctance of clinicians in adopting AI technologies that they envisage will eventually replace them. Yet there is no qualm in them using technologies that automate and speed up laboratory diagnostic process [1]. This has led to some suggesting a model of co-habitation [6]. This is a model that accommodates both the AI and human elements in healthcare delivery and anticipates the inevitable automatisation of significant components of medical processes while preserving the human aspects of clinical care like communication, procedures and decision-making.

#### **4. Conclusion**

Healthcare delivery has over years become complex and challenging. A large part of the complexity in delivering healthcare is because of the voluminous data that is generated in the process of healthcare, which has to be interpreted in an intelligent fashion. AI systems with their problem solving approach can address this need. Their intelligent architecture, which incorporates learning and reasoning and ability to act autonomously without requiring constant human attention, is alluring. Thus the medical domain has provided a fertile ground for AI researchers to test their techniques and in many instances; AI applications have successfully solved problems with outcomes comparable to that of human clinicians. As healthcare delivery becomes more expensive, stakeholders will increasingly look to solutions that can replace the expensive elements in patient care and AI solutions will be sought after in these situations. However, cold technology cannot totally replace the human elements in patient care and a model that incorporates both technological innovations and human care has to be investigated.

[10] Gambhir S, Malik SK, Kumar Y. Role of soft computing approaches in healthcare domain:

Use of Artificial Intelligence in Healthcare Delivery http://dx.doi.org/10.5772/intechopen.74714 95

[11] Poole DL, Mackworth A, Goebel RG. Computational intelligence and knowledge. In: Computational Intelligence: A Logical Approach. New York: Oxford University Press;

[12] Poole DL, Mackworth AK. Artificial Intelligence: Foundations of Computational Agents.

[13] Shapiro SC. Encyclopedia of Artificial Intelligence. 2nd ed. New York: Wiley-Interscience;

[14] Szolovits P, Stephen PG. Medicine Flowchart—How Can We Emulate Diagnosis? Categorical and Probabilistic Reasoning in Medical Diagnosis; Artificial Intelligence.

[15] Bell J. Machine learning. In: Machine Learning: Hands-on for Developers and Technical

[16] Michalski RS, Carbonell JG, Mitchell TM.An overview of machine learning. In: Michalski RS, Carbonell JG, Mitchell TM, editors. Machine Learning: An Artificial Intelligence

[17] Chandel A, Sood M. Searching and optimization techniques in artificial intelligence: A comparative study and complexity analysis. International Journal of Advanced Research

[18] Badar A, Umre BS, Junghare AS. Study of artificial intelligence optimization techniques applied to active power loss minimization. IOSR Journal of Electrical and Electronics

[19] Priddy KL, Keller PE. Artificial Neural Networks: An Introduction. Bellingham: SPIE

[20] Spyns P. Natural language processing in medicine: An overview. Methods of Information

[21] Abraham A. Hybrid artificial intelligent systems. In: Corchado E, Corchado J, Abraham A, editors. Innovations in Hybrid Intelligent Systems—Advances in Soft Computing.

[22] Kim E-Y. Patient will see you now: The future of medicine is in your hands. Healthcare

[23] Mishra S, Takke A, Auti S, Suryavanshi S, Oza M. Role of artificial intelligence in health

[24] Coiera EW. Artificial intelligence in medicine: The challenges ahead. Journal of the

[25] Scott R. Artificial intelligence: its use in medical diagnosis. Journal of Nuclear Medicine.

A mini review. Journal of Medical Systems. 2016;**40**(12):2-20

2nd ed. Cambridge, UK: Cambridge University Press; 2017

Professionals. Indianapolis, USA: John Wiley & Sons; 2015

in Computer Engineering & Technology. 2014;**3**(3):866-871

Approach. Palo Alto: Springer; 1983. p. 23

Engineering. 2014;**2014**(1989):39-45

in Medicine. 1996;**35**(4-5):285-301

Berlin: Springer Berlin Heidelberg; 2007

Informatics Research. 2015;**21**(4):321-323

care. BioChemistry: An Indian Journal. 2017;**11**(5):1-14

American Medical Informatics Association. 1996;**3**(6):363-366

no. Ci. 1998. pp. 1-22

1992

1978;**11**:115-144

Press; 2005

1993;**34**(3):510-514

#### **Notice**

The chapter was submitted to a double blind review and it is in line with COPE Ethical Guidelines.

#### **Author details**

#### Sandeep Reddy

Address all correspondence to: sandeep.reddy@deakin.edu.au

School of Medicine, Faculty of Health, Deakin University, Australia

#### **References**


[10] Gambhir S, Malik SK, Kumar Y. Role of soft computing approaches in healthcare domain: A mini review. Journal of Medical Systems. 2016;**40**(12):2-20

attention, is alluring. Thus the medical domain has provided a fertile ground for AI researchers to test their techniques and in many instances; AI applications have successfully solved problems with outcomes comparable to that of human clinicians. As healthcare delivery becomes more expensive, stakeholders will increasingly look to solutions that can replace the expensive elements in patient care and AI solutions will be sought after in these situations. However, cold technology cannot totally replace the human elements in patient care and a model that incorpo-

The chapter was submitted to a double blind review and it is in line with COPE Ethical

[1] Ramesh AN, Kambhampati C, Monson JRT, Drew PJ. Artificial intelligence in medicine.

[3] Simmons AB, Chappell SG. Artificial intelligence—Definition and practice. IEEE Journal

[4] Kok JN, Boers EJW, Kosters WA, Van Der Putten P, Poel M. Artificial Intelligence: Definition, Trends, Techniques, and Cases. Oxford, UK: Encyclopedia of Life Support Systems; 2013 [5] Cyranoski D. 'China enters the battle for AI talent'. Nature [Online]. 2018. Available from: https://www.nature.com/articles/d41586-018-00604-6 [Accessed: Jan 12, 2018]

[6] Diprose W, Buist N. Artificial intelligence in medicine: Humans need not apply? The

[7] Baker S. Final Jeopardy: The Story of Watson, the Computer That Will Transform Our World. New York: Houghton Mifflin Harcourt Publishing Company, Mariner Books; 2012

[8] Chen HCH, Zeng D. AI for global disease surveillance. IEEE Intelligent Systems. 2009;

[9] Szolovitz P. Artificial intelligence in medical diagnosis. Annals of Internal Medicine. 1988;

Annals of the Royal College of Surgeons of England. 2004;**86**(5):334-338 [2] Warwick K. Artificial Intelligence: The Basics. Abingdon: Routledge; 2012

rates both technological innovations and human care has to be investigated.

Address all correspondence to: sandeep.reddy@deakin.edu.au

of Oceanic Engineering. 1988;**13**(2):14-42

New Zealand Medical Journal. 2016;**129**(1434):73-76

School of Medicine, Faculty of Health, Deakin University, Australia

**Notice**

Guidelines.

**Author details**

94 eHealth - Making Health Care Smarter

Sandeep Reddy

**References**

**24**(6):66-82

**108**(1):80


[26] Wimmer H, Yoon VY, Sugumaran V. A multi-agent system to support evidence based medicine and clinical decision making via data sharing and data privacy. Decision Support Systems. 2016;**88**:51-66

[41] Barton AJ. The regulation of mobile health applications. BMC Medicine. 2012;**10**:2-5

coming! IEEE Pervasive Computing. 2004;**3**(4):11-15

Application & Services (HealthCom); 2015. pp. 311-316

Engineering and Sciences (IECBES); 2010. pp. 144-149

LYX. Biomedical Engineering Online. 2011;**10**:1-14

Industrial Informatics. 2014;**10**(4):2233-2243

living. Applied Soft Computing. 2008;**51**:86-94

vol. 5; 2015. pp. 3942-3947

care: A comprehensive survey. IEEE Access. 2015;**3**:678-708

challenges. Medieval Archaeology. 2014;**68**(1):57

[42] Dagon D, Martin T, Starner T. Mobile phones as computing devices: The viruses are

Use of Artificial Intelligence in Healthcare Delivery http://dx.doi.org/10.5772/intechopen.74714 97

[43] Mohammadzadeh N, Safdari R. Patient monitoring in mobile health: Opportunities and

[44] Alnosayan N, Lee E, Alluhaidan A, Chatterjee S, Houston-Feenstra L, Kagoda M, Dysinger W. MyHeart: An intelligent mhealth home monitoring system supporting heart failure self-care. In: 2014 IEEE 16th International Conference on e-Health Networking,

[45] Minutolo A, Sannino G, Esposito M, De Pietro G. A rule-based mHealth system for cardiac monitoring. In: Proceedings of 2010 IEEE EMBS Conference on Biomedical

[46] Boulos MNK, Wheeler S, Tavares C, Jones R. How smartphones are changing the face of mobile and participatory healthcare: An overview, with example from eCAA-

[47] Islam SMR, Kwak D, Kabir H, Hossain M, Kwak K-S. The Internet of things for health

[48] Da Xu L, He W, Li S. Internet of things in industries: A survey. IEEE Transactions on

[49] Costa R, Carneiro D, Novais P, Lima L, Machado J, Marques A, Neves J.Ambient assisted

[50] Amiribesheli M, Benmansour A, Bouchachia A. A review of smart homes in healthcare. Journal of Ambient Intelligence and Humanized Computing. 2015;**6**(4):495-517

[51] Pollack ME, Engberg S, Matthews JT, Dunbar-jacob J, Mccarthy CE, Thrun S. Pearl: A

[52] Chen Y, Argentinis E, Weber G. IBM Watson: How cognitive computing can be applied to big data challenges in life sciences research. Clinical Therapeutics. 2016;**38**(4):688-701

[53] Devarakonda M, Tsou C-H. Automated problem list generation from electronic medical records in IBM Watson. In: 29th AAAI Conference on Artificial Intelligence (AAAI) 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015;

mobile robotic assistant for the elderly. Architecture. 2002;**2002**:85-91


[41] Barton AJ. The regulation of mobile health applications. BMC Medicine. 2012;**10**:2-5

[26] Wimmer H, Yoon VY, Sugumaran V. A multi-agent system to support evidence based medicine and clinical decision making via data sharing and data privacy. Decision Support

[27] Koh HC, Tan G. Data mining applications in healthcare. Journal of Healthcare Infor-

[28] Schwartz WB. Medicine and the computer: The promise and problems of change. The

[29] Sim I, Gorman P, Greenes RA, Haynes RB, Kaplan B, Lehmann H, Tang PC. Clinical decision support systems for the practice of evidence-based medicine. Journal of the

[30] Khanna S, Sattar A, Hansen D. Advances in artificial intelligence research in health. The

[31] Topol E. The Patient Will See You Now: The Future of Medicine Is in Your Hands.

[32] Farrugia A, Al-Jumeily D, Al-Jumaily M, Hussain A, Lamb D. Medical diagnosis: Are artificial intelligence systems able to diagnose the underlying causes of specific headaches? In: Proceedings of 2013 6th International Conference on Developments in eSys-

[33] Littenberg B, MacLean C, Gagnon M. Clinical decision support system. US Pat. App.

[34] Pusic M, Ansermino JM. Clinical decision support systems. The British Columbia

[35] Miller RA, McNeil MA, Challinor SM, Masarie FE Jr, Myers JD. INTERNIST-1: An experimental computer-based diagnostic consultant for general internal medicine. The New

[36] Olczak J, Fahlberg N, Maki A, Razavian AS, Jilert A, Stark A, Sköldenberg O, Gordon M. Artificial intelligence for analyzing orthopedic trauma radiographs: Deep learning algorithms—Are they on par with humans for diagnosing fractures? Acta Orthopaedica.

[37] Milovic B, Milovic M. Prediction and decision making in health care using data mining.

[38] Zhang Y, Guo SL, Han LN, Li TL. Application and exploration of big data mining in

[39] Huang MJ, Chen MY, Lee SC.Integrating data mining with case-based reasoning for chronic diseases prognosis and diagnosis. Expert Systems with Applications. 2007;**32**(3):856-867

[40] Kelly JE III, Hamm S. Smart Machines: IBM's Watson and the Era of Cognitive Computing.

International Journal of Public Health Science. 2012;**1**(2):69-76

clinical medicine. Chinese Medical Journal. 2016;**129**(6):731-738

Systems. 2016;**88**:51-66

96 eHealth - Making Health Care Smarter

mation Management. 2005;**19**(2):64-72

New England Journal of Medicine. 1970;**283**(23):1257-1264

American Medical Informatics Association. 2001;**8**(6):527-534

Australasian Medical Journal. 2012;**5**(9):475-477

tems Engineering (DeSE); 2013. pp. 376-382

England Journal of Medicine. 1982;**307**:468-476

New York: Columbia University Press; 2013

New York: Basic Books; 2015

11/640,103; 2006. pp. 1-10

2017;**88**(6):581-586

Medical Journal. 2004;**46**:236-239


**Chapter 7**

Provisional chapter

**Phoebe Framework and Experimental Results for**

DOI: 10.5772/intechopen.74883

Fetal age and weight estimation plays an important role in pregnant treatments. There are many estimation formulas created by the combination of statistics and obstetrics. However, such formulas give optimal estimation if and only if they are applied into specified community. This research proposes a so-called Phoebe framework that supports physicians and scientists to find out most accurate formulas with regard to the community where scientists do their research. The built-in algorithm of Phoebe framework uses statistical regression technique for fetal age and weight estimation based on fetal ultrasound measures such as bi-parietal diameter, head circumference, abdominal circumference, fetal length, arm volume, and thigh volume. This algorithm is based on heuristic assumptions, which aim to produce good estimation formulas as fast as possible. From experimental results, the framework produces optimal formulas with high adequacy and accuracy. Moreover, the framework gives facilities to physicians and scientists for exploiting useful statistical information under pregnant data. Phoebe framework is a

Keywords: fetal age estimation, fetal weight estimation, ultrasound measures, regression

Fetal age and weight estimation is to predict the birth weight or birth age before delivery. It is very important for doctors to diagnose abnormal or diseased cases so that she/he can decide treatments on such cases. Because this research mentions both age estimation and weight

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

Phoebe Framework and Experimental Results for

**Estimating Fetal Age and Weight**

Estimating Fetal Age and Weight

Additional information is available at the end of the chapter

computer software available at http://phoebe.locnguyen.net.

Additional information is available at the end of the chapter

Loc Nguyen, Truong-Duyet Phan and

Loc Nguyen, Truong-Duyet Phan and

http://dx.doi.org/10.5772/intechopen.74883

model, estimation formula

1. Introduction

Thu-Hang T. Ho

Thu-Hang T. Ho

Abstract

#### **Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight** Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight

DOI: 10.5772/intechopen.74883

Loc Nguyen, Truong-Duyet Phan and Thu-Hang T. Ho Loc Nguyen, Truong-Duyet Phan and Thu-Hang T. Ho

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.74883

#### Abstract

Fetal age and weight estimation plays an important role in pregnant treatments. There are many estimation formulas created by the combination of statistics and obstetrics. However, such formulas give optimal estimation if and only if they are applied into specified community. This research proposes a so-called Phoebe framework that supports physicians and scientists to find out most accurate formulas with regard to the community where scientists do their research. The built-in algorithm of Phoebe framework uses statistical regression technique for fetal age and weight estimation based on fetal ultrasound measures such as bi-parietal diameter, head circumference, abdominal circumference, fetal length, arm volume, and thigh volume. This algorithm is based on heuristic assumptions, which aim to produce good estimation formulas as fast as possible. From experimental results, the framework produces optimal formulas with high adequacy and accuracy. Moreover, the framework gives facilities to physicians and scientists for exploiting useful statistical information under pregnant data. Phoebe framework is a computer software available at http://phoebe.locnguyen.net.

Keywords: fetal age estimation, fetal weight estimation, ultrasound measures, regression model, estimation formula

#### 1. Introduction

Fetal age and weight estimation is to predict the birth weight or birth age before delivery. It is very important for doctors to diagnose abnormal or diseased cases so that she/he can decide treatments on such cases. Because this research mentions both age estimation and weight

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

estimation, for convenience, the term "birth estimation" implicates both of them. There are two methods for birth estimation:

package "Java Scientific Library" of Michael Thomas Flanagan [12] and parsing package "A Java expression parser" of Jos de Jong [13]. The package "Java Scientific Library" is the most impor-

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight

http://dx.doi.org/10.5772/intechopen.74883

101

Based on clinical data input which includes fetal ultrasound measures such as bpd, hc, ac, and fl, the framework produces optimal formulas for estimating fetal weight and fetal age with the highest precision. Moreover, statistical information about fetus and gestation is also described in detail with two forms: numerical format and graph format. Therefore, the framework

• Dataset component is responsible for managing information about fetal ultrasound measures such as bpd, hc, ac, fl and extra gestational information in reasonable and intelligent manner. This component allows other components to retrieve such information. Gestational information is organized into some abstract structure, for example, a matrix, where each row represents a sample of bpd, hc, ac, fl measures. Table 1 is an example of this

• Regression component represents estimation formula or regression function. This component reads ultrasound information from Dataset component and builds up optimal estimation formula from such information. The built-in algorithm, which is used to discover and construct estimation formula, is discussed in Section 3. This component is the most

• Statistical Manifest component describes statistical information of both ultrasound measures and regression function, for example, mean and standard deviation of bpd samples, sum of residuals, correlation coefficient of regression function, and percentile graph of

 262 51 255 28 900 260 51 232 28 900 260 50 229 28 900 275 52 240 28 900 274 52 240 28 950 253 50 235 28 950 257 52 239 28 950 255 53 236 28 950 264 52 246 28 950

(week)

Fetal weight (gram)

important one because it implements such discovery algorithm.

bpd hc fl ac Fetal age

tant one in the framework. The framework is implemented by Java language [14].

2. General architecture of Phoebe framework

consists of four components as follows:

Table 1. An example of gestational sample matrix.

abstract structure.


Because the second method reflects features of population from statistical data, the regression model is chosen for birth estimation in this research. Note, some terminologies such as function, regression function, estimation function, regression model, estimation model, formula, regression formula, and estimation formula have the same meaning.

There are many estimation formulas resulted from gestational researches such as [1–9]. Some of them gain high accuracy, but they are only appropriate to population, community or ethnic group, where such researches are done. If we apply these formulas into other community such as Vietnam, they are no longer accurate. Moreover, it is difficult to find out a new and effective estimation formula or the cost of time and (computer) resources of formula discovery is expensive. Therefore, the first goal of this research is to propose an effective built-in algorithm, which produces highly accurate formulas that are easy to tune with specified population. The process of producing formulas by such algorithm is as fast as possible. In addition, physicians and researchers always want to discover useful statistical information from measure sample and regression model. Thus, the second goal of this research is to give facilities to physicians and researchers by introducing them a framework that is called Phoebe framework or Phoebe system. Phoebe framework implements such built-in algorithm in the first goal and provides a tool allowing physicians and researchers to exploit and take advantage of useful information under gestational sample. This tool is programmed as computer software. Moreover, Phoebe framework allows software developers to modify its modules. For example, developers can improve the built-in algorithm by adding heuristic constraints.

This chapter is the improved collection of our two articles "A framework of fetal age and weight estimation" [10] and "Experimental Results of Phoebe Framework: Optimal Formulas for Estimating Fetus Weight and Age" [11]. Section 2 gives an overview of the architecture of Phoebe framework. Section 3 is a description of the built-in algorithm to produce optimal formulas which are appropriated to a concrete population like Vietnam. Such algorithm is the core of Phoebe framework. Section 4 discusses main use cases of the framework with respect to gestational sample. As experimental results, some interesting estimation formulas produced by the framework are described in Section 5. A proposal of early weight estimation is proposed in Section 6. Conclusion is given in Section 7. Note that Phoebe framework used statistic software package "Java Scientific Library" of Michael Thomas Flanagan [12] and parsing package "A Java expression parser" of Jos de Jong [13]. The package "Java Scientific Library" is the most important one in the framework. The framework is implemented by Java language [14].

#### 2. General architecture of Phoebe framework

estimation, for convenience, the term "birth estimation" implicates both of them. There are two

• Determining volume of fetal inside mother womb and then calculating fetal weight based on such volume and mass density of flesh and bone. By the other way, fetal age and

• Applying statistical regression model: Fetal ultrasound measures such as bi-parietal diameter (bpd), head circumference (hc), abdominal circumference (ac), fetal length (fl), arm volume (arm\_vol), and thigh volume (thigh\_vol) are recorded and considered as input sample for regression analysis which results in a regression function. This function is formula for estimating fetal age and weight according to ultrasound measures such as bpd, hc, ac, fl, arm\_vol, and thigh\_vol. Data that are composed of these ultrasound measures are called gestational sample or pregnant sample. Terms: "sample" and "data" have the same meaning in this

Because the second method reflects features of population from statistical data, the regression model is chosen for birth estimation in this research. Note, some terminologies such as function, regression function, estimation function, regression model, estimation model, formula, regression for-

There are many estimation formulas resulted from gestational researches such as [1–9]. Some of them gain high accuracy, but they are only appropriate to population, community or ethnic group, where such researches are done. If we apply these formulas into other community such as Vietnam, they are no longer accurate. Moreover, it is difficult to find out a new and effective estimation formula or the cost of time and (computer) resources of formula discovery is expensive. Therefore, the first goal of this research is to propose an effective built-in algorithm, which produces highly accurate formulas that are easy to tune with specified population. The process of producing formulas by such algorithm is as fast as possible. In addition, physicians and researchers always want to discover useful statistical information from measure sample and regression model. Thus, the second goal of this research is to give facilities to physicians and researchers by introducing them a framework that is called Phoebe framework or Phoebe system. Phoebe framework implements such built-in algorithm in the first goal and provides a tool allowing physicians and researchers to exploit and take advantage of useful information under gestational sample. This tool is programmed as computer software. Moreover, Phoebe framework allows software developers to modify its modules. For example, developers can

This chapter is the improved collection of our two articles "A framework of fetal age and weight estimation" [10] and "Experimental Results of Phoebe Framework: Optimal Formulas for Estimating Fetus Weight and Age" [11]. Section 2 gives an overview of the architecture of Phoebe framework. Section 3 is a description of the built-in algorithm to produce optimal formulas which are appropriated to a concrete population like Vietnam. Such algorithm is the core of Phoebe framework. Section 4 discusses main use cases of the framework with respect to gestational sample. As experimental results, some interesting estimation formulas produced by the framework are described in Section 5. A proposal of early weight estimation is proposed in Section 6. Conclusion is given in Section 7. Note that Phoebe framework used statistic software

research. Sample is representation of population where research takes place.

weight can be estimated according to size of mother womb.

mula, and estimation formula have the same meaning.

improve the built-in algorithm by adding heuristic constraints.

methods for birth estimation:

100 eHealth - Making Health Care Smarter

Based on clinical data input which includes fetal ultrasound measures such as bpd, hc, ac, and fl, the framework produces optimal formulas for estimating fetal weight and fetal age with the highest precision. Moreover, statistical information about fetus and gestation is also described in detail with two forms: numerical format and graph format. Therefore, the framework consists of four components as follows:



Table 1. An example of gestational sample matrix.

fetal weight. Statistical manifest is organized into two forms such as numerical format and graph format.

• The sum of residuals is small. Note that residual is defined as the square of deviation

residual ¼ ð Þ Yestimated � Yreal

These two conditions are called the pair of optimal conditions. A regression function is optimal or best if it satisfies the pair of optimal conditions at most, where correlation between Y-estimated and Y-real is largest, and the sum of residuals is smallest. Given a set of regression variables Xi (where i = 1, 2,…, n), we recognize that a regression function is a combination of k variables Xi (s) where k ≤ n so that such combination achieves the pair of optimal conditions. Given a set of possible regression variables VAR = {X1, X2,…, Xn} being ultrasound measures, brute-force algorithm can be used to find out optimal function, which includes three following steps:

1. Let indicator number k be initialized 1, which responds to k-combination having k regres-

2. All combinations of n variables taken k are created. For each k-combination, the function built up by k variables in this k-combination is evaluated on the pair of optimal conditions;

> n! k!ð Þ n � k !

where n is the number of regression variables and notation, and "k!" denotes factorial of k. If n is large enough, there are a huge number of combinations, which causes that the brute-force algorithm never terminates and it is impossible to find out the best function. Moreover, there are many kinds of regression function such as linear, quadric, cube, logarithm, exponent, and product. Therefore, we propose an algorithm which overcomes this drawback and always finds out the optimal function. In other words, the termination of the proposed algorithm is determined, and the time cost is decreased significantly because the searching space is reduced as small as possible. The proposed algorithm is called seed germination (SG) algorithm. SG is built-in algorithm of Phoebe framework, which is the core of Phoebe framework. It is heuristic

• First assumption: regression variables Xi (s) trends to be mutually independent. It means that any pair of Xi and Xj with i6¼j in an optimal function are mutually independent. The independence is reduced into the looser condition "the correlation coefficient of any pair of Xi

• Second assumption: each variable Xi contributes to quality of optimal function. The contribution rate of a variable Xi is defined as the correlation coefficient between such variable and Y-real. The higher the contribution rate is, the more important the respective variable is. Variables with high contribution rate are called contributive variables. Therefore, optimal

if such function satisfies these conditions at most then, it is optimal function.

Xn k¼1

The number of combinations which brute-force algorithm browses is:

algorithm, which is based on the pair of heuristic assumptions as follows:

and Xj is less than a threshold δ." This is minimum assumption.

3. Indicator k is increased by 1. If k=n then algorithm stops, otherwise go back step 2.

2:

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight

http://dx.doi.org/10.5772/intechopen.74883

103

between Y-estimated and Y-real. We have:

sion variables.

• User Interface (UI) component is responsible for providing interaction between system and users such as physicians and researchers. A popular use case is that users enter ultrasound measures and require system to print out both optimal estimation formula and statistical information about such ultrasound measures; moreover, users can retrieve other information in Dataset component. UI component links to all of other components so as to give users as many facilities as possible.

Three components: Dataset, Regression and Statistical Manifest are basic components. The fourth component User Interface is the bridge among them. Figure 1 shows a general architecture of Phoebe framework.

#### 3. Built-in algorithm of Phoebe framework

Phoebe framework uses a regression model for estimating fetal weight and age. Suppose a linear regression function Y = α<sup>0</sup> + α1X<sup>1</sup> + α2X<sup>2</sup> + … + αnXn where Y is fetal weight or age, whereas Xi (s) are gestational ultrasound measures such as bpd, hc, ac, and fl. Variable Y is called response variable or dependent variable. Each Xi is called regression variable, regressor, regression variable, or independent variable. Each α<sup>i</sup> is called regression coefficient. Given a set of measure values of Xi (s), the value of Y called Y-estimated calculated from this regression function is estimated fetal weight (or age) which is compared with real value of Y measured from ultrasonic machine. The real value of Y called Y-real is fetal weight (or age) available in sample. In this research, the notation Y refers implicitly to Y-estimated if there is no explanation. The deviation between Y-estimated and Y-real is a criterion used to assess the quality or the precision of regression function. This deviation is also called estimation error. The less the deviation is, the better the regression function is. The goal of this research is to find out the optimal regression function or estimation formula whose precision is highest.

A regression function will be good if it meets two conditions as follows:

• The correlation between Y-estimated and Y-real is large.

Figure 1. General architecture of Phoebe framework.

• The sum of residuals is small. Note that residual is defined as the square of deviation between Y-estimated and Y-real. We have:

$$residual = \left(Y\_{estimated} - Y\_{real}\right)^{2.5}$$

These two conditions are called the pair of optimal conditions. A regression function is optimal or best if it satisfies the pair of optimal conditions at most, where correlation between Y-estimated and Y-real is largest, and the sum of residuals is smallest. Given a set of regression variables Xi (where i = 1, 2,…, n), we recognize that a regression function is a combination of k variables Xi (s) where k ≤ n so that such combination achieves the pair of optimal conditions. Given a set of possible regression variables VAR = {X1, X2,…, Xn} being ultrasound measures, brute-force algorithm can be used to find out optimal function, which includes three following steps:


The number of combinations which brute-force algorithm browses is:

fetal weight. Statistical manifest is organized into two forms such as numerical format and

• User Interface (UI) component is responsible for providing interaction between system and users such as physicians and researchers. A popular use case is that users enter ultrasound measures and require system to print out both optimal estimation formula and statistical information about such ultrasound measures; moreover, users can retrieve other information in Dataset component. UI component links to all of other components so as to give

Three components: Dataset, Regression and Statistical Manifest are basic components. The fourth component User Interface is the bridge among them. Figure 1 shows a general architecture of

Phoebe framework uses a regression model for estimating fetal weight and age. Suppose a linear regression function Y = α<sup>0</sup> + α1X<sup>1</sup> + α2X<sup>2</sup> + … + αnXn where Y is fetal weight or age, whereas Xi (s) are gestational ultrasound measures such as bpd, hc, ac, and fl. Variable Y is called response variable or dependent variable. Each Xi is called regression variable, regressor, regression variable, or independent variable. Each α<sup>i</sup> is called regression coefficient. Given a set of measure values of Xi (s), the value of Y called Y-estimated calculated from this regression function is estimated fetal weight (or age) which is compared with real value of Y measured from ultrasonic machine. The real value of Y called Y-real is fetal weight (or age) available in sample. In this research, the notation Y refers implicitly to Y-estimated if there is no explanation. The deviation between Y-estimated and Y-real is a criterion used to assess the quality or the precision of regression function. This deviation is also called estimation error. The less the deviation is, the better the regression function is. The goal of this research is to find out the

optimal regression function or estimation formula whose precision is highest.

A regression function will be good if it meets two conditions as follows:

• The correlation between Y-estimated and Y-real is large.

Figure 1. General architecture of Phoebe framework.

graph format.

102 eHealth - Making Health Care Smarter

Phoebe framework.

users as many facilities as possible.

3. Built-in algorithm of Phoebe framework

$$\sum\_{k=1}^{n} \frac{n!}{k!(n-k)!}$$

where n is the number of regression variables and notation, and "k!" denotes factorial of k. If n is large enough, there are a huge number of combinations, which causes that the brute-force algorithm never terminates and it is impossible to find out the best function. Moreover, there are many kinds of regression function such as linear, quadric, cube, logarithm, exponent, and product. Therefore, we propose an algorithm which overcomes this drawback and always finds out the optimal function. In other words, the termination of the proposed algorithm is determined, and the time cost is decreased significantly because the searching space is reduced as small as possible. The proposed algorithm is called seed germination (SG) algorithm. SG is built-in algorithm of Phoebe framework, which is the core of Phoebe framework. It is heuristic algorithm, which is based on the pair of heuristic assumptions as follows:


function includes only contributive regression variables. The second assumption is stated that "the correlation coefficient of any regression variable Xi and real response value Y-real is greater than a threshold ε." This is the maximum assumption.

SG algorithm tries to find out a combination of regression variables Xi (s) so that such combination satisfies such pair of heuristic assumptions. In other words, it is expected that this combination can constitute an optimal regression function that satisfies the pair of heuristic conditions, as follows ([10] p. 22):


Given a set of possible regression variables VAR = {X1, X2,…, Xn} being ultrasound measures, let f = α<sup>0</sup> + α1X<sup>1</sup> + α2X<sup>2</sup> + … + αkXk (k ≤ n) be the estimation function and let Re(f)={X1, X2,…, Xk} be its regression variables. Note that the value of f is fetal age or fetal weight. Re(f) is considered as the representation of f. Let OPTIMAL be the output of SG algorithm, which is a set of optimal functions returned. OPTIMAL is initialized as empty set. Let Re(OPTIMAL) be a set of regression variables contained in all optimal functions f ∈ OPTIMAL. SG algorithm has four following steps ([10] p. 22):


SG algorithm was described in article "A framework of fetal age and weight estimation" ([10] pp. 21–23). It is easy to recognize that the essence of SG algorithm is to reduce search space by choosing regression variables satisfying heuristic assumption as "seeds." Optimal functions are composed of these seeds. The algorithm always delivers best functions but can lose other good functions. The length of function is defined as the number of its regression variables. Terminated condition is that no more optimal functions can be found out or possible variables are browsed exhaustedly. Therefore, the result function is the longest and best one, but some other shorter

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight

http://dx.doi.org/10.5772/intechopen.74883

105

The current implementation of SG algorithm establishes that the minimum threshold δ is

arbitrary. It also supports nonlinear regression models shown in Table 2 as follows:

functions may be significantly good.

Figure 2. Flow chart of SG algorithm.

4. Let BEST be a set of best functions taken from CANDIDATE. In other words, these functions belong to CANDIDATE and satisfy the pair of heuristic conditions at most, where correlation is the largest and the sum of residuals is the smallest. If BEST equals OPTIMAL, then the algorithm stops; otherwise assigning BEST to OPTIMAL and going back step 1. Note that two sets are equal if their elements are the same.

Figure 2 shows the flow chart of SG algorithm.

Figure 2. Flow chart of SG algorithm.

function includes only contributive regression variables. The second assumption is stated that "the correlation coefficient of any regression variable Xi and real response value Y-real is greater

SG algorithm tries to find out a combination of regression variables Xi (s) so that such combination satisfies such pair of heuristic assumptions. In other words, it is expected that this combination can constitute an optimal regression function that satisfies the pair of heuristic

• The correlation coefficient of any pair of Xi and Xj is less than the minimum threshold δ > 0. This condition is corresponding to the minimum assumption, which is called

• The correlation coefficient of any Xi and Y-real is greater than the maximum threshold ε > 0. This condition is corresponding to the maximum assumption, which is called

Given a set of possible regression variables VAR = {X1, X2,…, Xn} being ultrasound measures, let f = α<sup>0</sup> + α1X<sup>1</sup> + α2X<sup>2</sup> + … + αkXk (k ≤ n) be the estimation function and let Re(f)={X1, X2,…, Xk} be its regression variables. Note that the value of f is fetal age or fetal weight. Re(f) is considered as the representation of f. Let OPTIMAL be the output of SG algorithm, which is a set of optimal functions returned. OPTIMAL is initialized as empty set. Let Re(OPTIMAL) be a set of regression variables contained in all optimal functions f ∈ OPTIMAL. SG algorithm has four

1. Let C be the complement set of VAR with regard to OPTIMAL, we have C = VAR\Re (OPTIMAL) where the backslash "\" denotes complement operator in set theory. It means

2. Let G ⊂ C be a list of regression variables satisfying the pair of heuristic conditions. Note, G is subset of C. If G is empty, the algorithm terminates; otherwise going to step 3.

3. We iterate over G in order to find out the candidate list of good functions. For each regression variable X ∈ G, let L be the union set of optimal regression variables and X. We have L = Re(f)∪{X} where f ∈ OPTIMAL. Suppose CANDIDATE is a candidate list of good functions, which is initialized as empty set. Let g be the new function created from L; in other words, regression variables of g belong to L, Re(g) = L. If function g meets the pair of heuristic conditions, it is added into CANDIDATE, CANDIDATE = CANDIDATE∪{g}.

4. Let BEST be a set of best functions taken from CANDIDATE. In other words, these functions belong to CANDIDATE and satisfy the pair of heuristic conditions at most, where correlation is the largest and the sum of residuals is the smallest. If BEST equals OPTIMAL, then the algorithm stops; otherwise assigning BEST to OPTIMAL and going

back step 1. Note that two sets are equal if their elements are the same.

than a threshold ε." This is the maximum assumption.

minimum condition or independence condition.

maximum condition or contribution condition.

that C is in VAR but not in Re(OPTIMAL).

Figure 2 shows the flow chart of SG algorithm.

conditions, as follows ([10] p. 22):

104 eHealth - Making Health Care Smarter

following steps ([10] p. 22):

SG algorithm was described in article "A framework of fetal age and weight estimation" ([10] pp. 21–23). It is easy to recognize that the essence of SG algorithm is to reduce search space by choosing regression variables satisfying heuristic assumption as "seeds." Optimal functions are composed of these seeds. The algorithm always delivers best functions but can lose other good functions. The length of function is defined as the number of its regression variables. Terminated condition is that no more optimal functions can be found out or possible variables are browsed exhaustedly. Therefore, the result function is the longest and best one, but some other shorter functions may be significantly good.

The current implementation of SG algorithm establishes that the minimum threshold δ is arbitrary. It also supports nonlinear regression models shown in Table 2 as follows:


U ¼ β<sup>0</sup> þ β1Z<sup>1</sup> þ β2Z<sup>2</sup> þ … þ βnZn

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight

http://dx.doi.org/10.5772/intechopen.74883

107

With the built-in SG algorithm, Phoebe framework can be totally used for any regression

Phoebe framework has three basic use cases realized by three components: dataset, regression

1. Discovering optimal formulas with high accuracy. Optimal formulas are results of SG

2. Providing statistical information under gestational sample. Statistical information is in

model and statistical manifest as discussed in Section 2. Three basic use cases include:

Table 3 shows how to transform nonlinear models into linear models.

application beyond birth estimation.

4. Use cases of Phoebe framework

algorithm described in Section 3.

numeric format and graph format. 3. Comparison among different formulas.

Figure 3. Gestational sample.

Table 2. Nonlinear regression models.

The notations "exp" and "log" denote exponent function and natural logarithm function, respectively. Most of nonlinear regression models can be transformed into linear regression models. For example, given the product model, the following is an example of linear transformation.

$$\log(Y) = \log(\alpha\_0) + \alpha\_1 \log(X\_1) + \alpha\_2 \log(X\_2) + \dots + \alpha\_n \log(X\_n)$$

Let,

$$\mathcal{U} = \log(\mathcal{Y}),\\ Z\_i = \log(X\_i), \beta\_0 = \log(\alpha\_0), \beta\_{i \ge 1} = \alpha\_i$$

The product model becomes the linear model with regard to variables U, Zi and coefficients β<sup>i</sup> as follows:


Table 3. Transformation of nonlinear models into linear models.

$$\mathcal{U} = \beta\_0 + \beta\_1 Z\_1 + \beta\_2 Z\_2 + \dots + \beta\_n Z\_n$$

Table 3 shows how to transform nonlinear models into linear models.

With the built-in SG algorithm, Phoebe framework can be totally used for any regression application beyond birth estimation.

#### 4. Use cases of Phoebe framework

The notations "exp" and "log" denote exponent function and natural logarithm function, respectively. Most of nonlinear regression models can be transformed into linear regression models. For

Logarithm Y ¼ α<sup>0</sup> þ α1logð Þþ X<sup>1</sup> α2logð Þþ X<sup>2</sup> … þ αnlogð Þ Xn

Exponent Y ¼ expð Þ α<sup>0</sup> þ α1X<sup>1</sup> þ α2X<sup>2</sup> þ … þ αnXn

Y ¼ α<sup>0</sup> þ α1logð Þ X<sup>1</sup> þ X<sup>2</sup> þ … þ Xn

Y ¼ expð Þ α<sup>0</sup> þ α1ð Þ X<sup>1</sup> þ X<sup>2</sup> þ … þ Xn

<sup>α</sup><sup>1</sup> X<sup>2</sup> <sup>α</sup>2…Xn αn

logð Þ¼ Y logð Þþ α<sup>0</sup> α1logð Þþ X<sup>1</sup> α2logð Þþ X<sup>2</sup> … þ αnlogð Þ Xn

U ¼ logð Þ Y , Zi ¼ logð Þ Xi , β<sup>0</sup> ¼ logð Þ α<sup>0</sup> , β<sup>i</sup> <sup>≥</sup> <sup>1</sup> ¼ α<sup>i</sup>

The product model becomes the linear model with regard to variables U, Zi and coefficients β<sup>i</sup>

Logarithm transformation Y ¼ α<sup>0</sup> þ α1logð Þþ X<sup>1</sup> α2logð Þþ X<sup>2</sup> … þ αnlogð Þ Xn

Y ¼ α<sup>0</sup> þ α1Z<sup>1</sup>

where Zi = log(Xi)

Y ¼ α<sup>0</sup> þ α1Z<sup>1</sup>

where U = log(Y)

U ¼ α<sup>0</sup> þ α1Z<sup>1</sup>

<sup>α</sup>1X<sup>2</sup> <sup>α</sup>2…Xn αn

where <sup>Z</sup><sup>1</sup> <sup>¼</sup> ð Þ <sup>X</sup><sup>1</sup> <sup>þ</sup> <sup>X</sup><sup>2</sup> <sup>þ</sup> … <sup>þ</sup> Xn <sup>k</sup>

Y ¼ α<sup>0</sup> þ α1Z<sup>1</sup> þ α2Z<sup>2</sup> þ … þ αnZn

where Z<sup>1</sup> = log(X<sup>1</sup> + X<sup>2</sup> + … + Xn)

U ¼ α<sup>0</sup> þ α1X<sup>1</sup> þ α2X<sup>2</sup> þ … þ αnXn

U ¼ β<sup>0</sup> þ β1Z<sup>1</sup> þ β2Z<sup>2</sup> þ … þ βnZn

where U = log(Y) and Z<sup>1</sup> = X<sup>1</sup> + X<sup>2</sup> + … + Xn

where U ¼ logð Þ Y , Zi ¼ logð Þ Xi , β<sup>0</sup> ¼ logð Þ α<sup>0</sup> , β<sup>i</sup> <sup>≥</sup> <sup>1</sup> ¼ α<sup>i</sup>

Polynomial transformation <sup>Y</sup> <sup>¼</sup> <sup>α</sup><sup>0</sup> <sup>þ</sup> <sup>α</sup>1ð Þ <sup>X</sup><sup>1</sup> <sup>þ</sup> <sup>X</sup><sup>2</sup> <sup>þ</sup> … <sup>þ</sup> Xn <sup>k</sup>

Logarithm transformation Y ¼ α<sup>0</sup> þ α1logð Þ X<sup>1</sup> þ X<sup>2</sup> þ … þ Xn

Exponent transformation Y ¼ expð Þ α<sup>0</sup> þ α1X<sup>1</sup> þ α2X<sup>2</sup> þ … þ αnXn

Exponent transformation Y ¼ expð Þ α<sup>0</sup> þ α1ð Þ X<sup>1</sup> þ X<sup>2</sup> þ … þ Xn

Product transformation Y ¼ α0X<sup>1</sup>

Table 3. Transformation of nonlinear models into linear models.

example, given the product model, the following is an example of linear transformation.

Polynomial <sup>Y</sup> <sup>¼</sup> <sup>α</sup><sup>0</sup> <sup>þ</sup> <sup>α</sup>1ð Þ <sup>X</sup><sup>1</sup> <sup>þ</sup> <sup>X</sup><sup>2</sup> <sup>þ</sup> … <sup>þ</sup> Xn <sup>k</sup>

Product Y ¼ α0X<sup>1</sup>

Table 2. Nonlinear regression models.

106 eHealth - Making Health Care Smarter

Let,

as follows:

Phoebe framework has three basic use cases realized by three components: dataset, regression model and statistical manifest as discussed in Section 2. Three basic use cases include:



Figure 3. Gestational sample.

#### Use case 1: Discovering optimal formulas

Given gestational data [15] are composed of two-dimensional ultrasound measures of pregnant women. These measures are taken at Vinh Long General Hospital – Vietnam, which include bi-parietal diameter (bpd), head circumference (hc), abdominal circumference (ac) and fetal length (fl). Fetal age is from 28 to 42 weeks. Fetal weight is measured by gram. Gestational sample is shown in Figure 3.

An estimation formula with one or two regressors (ultrasound measures) can be represented as a graph. In the illustrative Figure 5, the horizontal axis indicates the measure bpd in millimeter, and the right vertical axis indicates the measure ac in millimeter. The left vertical

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight

http://dx.doi.org/10.5772/intechopen.74883

109

The graph in Figure 5 has 11 estimation lines represented as internal (red) lines. Each estimation line corresponds to a small interval of ac. Fetal weight on each estimation line ranges from 900 to 4800 g. This is a way to show a three-dimensional function as a two-dimensional graph. For example, given bpd = 90 and ac = 300, we need to estimate fetal weight. Because ac is 300 mm, we look at the sixth estimation line from bottom to up. The intersection point between bpd = 90 and the sixth estimation line is projected on the left vertical axis, which results out a fetal weight that approximates to (4800–900)/2 + 900 ≈ 2850 g because such intersection point is

Statistical information is classified into two groups: gestational information and estimation

• Gestational information contains statistical attributes about fetal ultrasound measures, for

• Estimation information contains attributes about estimation model, for example, correla-

tion coefficient, sum of residuals and estimation error of estimation formula.

near to midpoint of the weight range on the sixth estimation line.

example, mean, median and standard deviation of bpd.

Use case 2: Providing statistical information

Figure 5. Estimation graph for estimating fetal weight.

information.

axis shows the estimated weight.

After specifying the maximum threshold ε (fitness value) and which measures are regression variables and response variable, user presses button "Estimate" to retrieve optimal formulas as results of SG algorithm. Such optimal formulas are shown in Figure 4. Note, in Figure 4, regression variables are bpd, hc, ac, and fl, whereas response variable is fetal weight. The threshold ε is 0.6.


Figure 4. Optimal weight estimation formulas.

An estimation formula with one or two regressors (ultrasound measures) can be represented as a graph. In the illustrative Figure 5, the horizontal axis indicates the measure bpd in millimeter, and the right vertical axis indicates the measure ac in millimeter. The left vertical axis shows the estimated weight.

The graph in Figure 5 has 11 estimation lines represented as internal (red) lines. Each estimation line corresponds to a small interval of ac. Fetal weight on each estimation line ranges from 900 to 4800 g. This is a way to show a three-dimensional function as a two-dimensional graph. For example, given bpd = 90 and ac = 300, we need to estimate fetal weight. Because ac is 300 mm, we look at the sixth estimation line from bottom to up. The intersection point between bpd = 90 and the sixth estimation line is projected on the left vertical axis, which results out a fetal weight that approximates to (4800–900)/2 + 900 ≈ 2850 g because such intersection point is near to midpoint of the weight range on the sixth estimation line.

Use case 2: Providing statistical information

Use case 1: Discovering optimal formulas

sample is shown in Figure 3.

108 eHealth - Making Health Care Smarter

Figure 4. Optimal weight estimation formulas.

threshold ε is 0.6.

Given gestational data [15] are composed of two-dimensional ultrasound measures of pregnant women. These measures are taken at Vinh Long General Hospital – Vietnam, which include bi-parietal diameter (bpd), head circumference (hc), abdominal circumference (ac) and fetal length (fl). Fetal age is from 28 to 42 weeks. Fetal weight is measured by gram. Gestational

After specifying the maximum threshold ε (fitness value) and which measures are regression variables and response variable, user presses button "Estimate" to retrieve optimal formulas as results of SG algorithm. Such optimal formulas are shown in Figure 4. Note, in Figure 4, regression variables are bpd, hc, ac, and fl, whereas response variable is fetal weight. The

> Statistical information is classified into two groups: gestational information and estimation information.


Figure 5. Estimation graph for estimating fetal weight.

In representation, statistical information is described in two forms: numeric format and graph format. Figure 6 shows statistical attributes (mean, median, standard deviation, histogram, etc.) of fetal age and ultrasound measures bpd, hc, ac, fl.

Figure 7 shows a full description of a weight estimation formula: weight = 0.000043 \* (bpd^1.948640) \* (hc^0.263745) \* (fl^0.601972) \* (ac^0.905524). For instance, sum of residuals (SS) is 46412446.0047 and estimation error is 7.4655 212.5571. Note, the sign "^" denotes exponent function, for example, 2^3 = 8.

Use case 3: Comparison among different formulas

There are many criteria to evaluate efficiency and accuracy of estimation formulas. These criteria are called evaluation criteria, for example, correlation coefficient, sum of residuals, estimation error. Each formula has individual strong points and drawbacks. A formula is better than another one in terms of some criteria but may be worse than this other one in terms of different criteria. An optimal formula is the one that has more strong points than drawbacks

in most criteria. Hence, Phoebe framework supports the comparison among different formulas via evaluation matrix represented in Figure 8. Each row in evaluation matrix represents a formula, whereas each column indicates a criterion. For example, first row, second row and third row represent three formulas in form of logarithm function, exponent function and linear function, respectively. Four criteria such as multivariate correlation, estimation correlation,

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight

http://dx.doi.org/10.5772/intechopen.74883

111

Tables 4–8 in the section "experimental results" are numeric interpretations of evaluation

We make experiments based on Phoebe framework in order to find out optimal formulas for estimating fetus weight and age with note that such formulas are most appropriate to our gestational samples. We use two samples in which the first sample includes two-dimensional (2D) ultrasound measures of 1027 cases and the second sample includes three-dimensional (3D) ultrasound measures of 506 cases. Ho and Phan [15, 16] collected these samples of pregnant women at Vinh Long General Hospital, Vietnam, with obeying strictly all medical ethical criteria. These women and their husbands are Vietnamese. Their periods are regular,

error range and ratio error range are arranged in three respective columns.

matrix in Figure 8.

5. Experimental results

Figure 7. Statistical estimation information.

Figure 6. Gestational statistical information.

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight http://dx.doi.org/10.5772/intechopen.74883 111

Figure 7. Statistical estimation information.

In representation, statistical information is described in two forms: numeric format and graph format. Figure 6 shows statistical attributes (mean, median, standard deviation, histogram,

Figure 7 shows a full description of a weight estimation formula: weight = 0.000043 \* (bpd^1.948640) \* (hc^0.263745) \* (fl^0.601972) \* (ac^0.905524). For instance, sum of residuals (SS) is 46412446.0047 and estimation error is 7.4655 212.5571. Note, the sign "^" denotes

There are many criteria to evaluate efficiency and accuracy of estimation formulas. These criteria are called evaluation criteria, for example, correlation coefficient, sum of residuals, estimation error. Each formula has individual strong points and drawbacks. A formula is better than another one in terms of some criteria but may be worse than this other one in terms of different criteria. An optimal formula is the one that has more strong points than drawbacks

etc.) of fetal age and ultrasound measures bpd, hc, ac, fl.

exponent function, for example, 2^3 = 8.

110 eHealth - Making Health Care Smarter

Figure 6. Gestational statistical information.

Use case 3: Comparison among different formulas

in most criteria. Hence, Phoebe framework supports the comparison among different formulas via evaluation matrix represented in Figure 8. Each row in evaluation matrix represents a formula, whereas each column indicates a criterion. For example, first row, second row and third row represent three formulas in form of logarithm function, exponent function and linear function, respectively. Four criteria such as multivariate correlation, estimation correlation, error range and ratio error range are arranged in three respective columns.

Tables 4–8 in the section "experimental results" are numeric interpretations of evaluation matrix in Figure 8.

#### 5. Experimental results

We make experiments based on Phoebe framework in order to find out optimal formulas for estimating fetus weight and age with note that such formulas are most appropriate to our gestational samples. We use two samples in which the first sample includes two-dimensional (2D) ultrasound measures of 1027 cases and the second sample includes three-dimensional (3D) ultrasound measures of 506 cases. Ho and Phan [15, 16] collected these samples of pregnant women at Vinh Long General Hospital, Vietnam, with obeying strictly all medical ethical criteria. These women and their husbands are Vietnamese. Their periods are regular,

R ¼

di ¼ zi � yi

s

error range reflects both adequacy and accuracy of a given formula.

seen in Table 5, our formula is the best with R = 0.9636 and error range � 7.4656 � 212.5573 g.

Table 4. Comparison of age estimation with 2D sample.

1 n � 1

<sup>μ</sup> <sup>¼</sup> <sup>1</sup> n Xn i¼1 di

σ ¼

calculate μ, σ, and error range.

Pn i¼1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn i¼1

> <sup>y</sup>̄<sup>¼</sup> <sup>1</sup> n Xn i¼1 yi

<sup>z</sup>̄<sup>¼</sup> <sup>1</sup> n Xn i¼1 zi

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

di � <sup>μ</sup> � �<sup>2</sup>

error\_range <sup>¼</sup> <sup>μ</sup> � <sup>σ</sup>; <sup>μ</sup> <sup>þ</sup> <sup>σ</sup> � � <sup>¼</sup> <sup>μ</sup> � <sup>σ</sup>

For example, if μ = �0.0292 and σ = 1.45 then, the error range is �0.0292 � 1.45, which means that the total average error ranges from �1.4792 = �0.0292-1.45 to 1.4208 = �0.0292 + 1.45. The

Formula Expression R Error range NH 1 log(age) = 2.419638 + 0.002012 \* bpd + 0.000934 \* hc + 0.00547 \* fl + 0.001042 \* ac 0.9303 �0.0292 � 1.4500 NH 2 age = �3.364759 + 0.056285 \* bpd + 0.034697 \* hc + 0.188156 \* fl + 0.035304 \* ac 0.9285 0 � 1.4682 Ho 1 age = 331.022308–1.611774 \* (hc + ac) + 0.00278 \* ((hc + ac)^2) - 0.000002 \* ((hc + ac)^3) 0.9212 0 � 1.5384 Varol 6 age = 11.769 + 1.275 \* fl/10 + 0.449 \* ((fl/10)^2) - 0.02 \* ((fl/10)^3) 0.8949 �1.6807 � 1.8525 Varol 1 age = 5.596 + 0.941 \* ac/10 0.8941 �0.5683 � 1.7711 Varol 5 age = 1.863 + 6.280 \* fl/10–0.211 \* ((fl/10)^2) 0.8934 �1.5182 � 2.1150 The sign "^" denotes exponent operator. The template of formulas aims to flexibility, which can be input of any computational tool. Table 5 shows a comparison between our best weight formula and the others with 2D sample. As

Xn i¼1

An estimation error denoted di is deviation between zi and yi. The estimation error mean denoted μ is mean of errors. The error mean μ reflects accuracy of a given formula. The smaller the absolute value of μ is, the more accurate the formula is. If μ is positive, the respective formula leans to overestimation. If μ is negative, the respective formula leans to low estimation. The standard deviation σ of estimation errors reflects the stability of a given formula. The smaller the standard deviation σ is, the more stable the formula is. The combination of error mean μ and standard deviation σ results out a so-called error range. Eq. (2) explains how to

yi � <sup>y</sup>̄ � �<sup>2</sup>

yi � <sup>y</sup>̄ � � zi � <sup>z</sup>̄ ð Þ

ð Þ zi � <sup>z</sup>̄ <sup>2</sup>

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight

<sup>s</sup> (1)

http://dx.doi.org/10.5772/intechopen.74883

(2)

113

s ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn i¼1

Figure 8. Comparison among different formulas.

and their last periods are determined. Each of them has only one alive fetus. Fetal age is from 28 to 42 weeks. Delivery time is not over 48 h since ultrasound scan. Measures in 2D sample are bpd, hc, ac, and fl. Measures in 3D sample are bpd, hc, ac, fl, thigh\_vol, arm\_vol. The unit of bpd, hc, ac, fl is millimeter. The unit of thigh\_vol and arm\_vol is cm3 . The units of fetal age and fetal weight are week and gram, respectively. Experimental results mentioned in this section were also published in our article "Experimental Results of Phoebe Framework: Optimal Formulas for Estimating Fetus Weight and Age" [11].

The proposed framework can produce amazing formulas. We compare our optimal formulas with the others according to metrics such as estimation correlation and estimation error range, given such two gestational samples. Let Y = {y1, y2,…, yn} and Z = {z1, z2,…, zn} be fetal sample age/weight and fetal estimated age/weight, respectively. The estimation correlation denoted R is correlation coefficient of sample response value and estimated response value, according to Eq. (1). The correlation R reflects adequacy of a given formula. The larger the R is, the better the formula is:

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight http://dx.doi.org/10.5772/intechopen.74883 113

$$R = \frac{\sum\_{i=1}^{n} (y\_i - \dot{y})(z\_i - \dot{z})}{\sqrt{\sum\_{i=1}^{n} (y\_i - \dot{y})^2} \sqrt{\sum\_{i=1}^{n} (z\_i - \dot{z})^2}} \tag{1}$$

$$\ddot{y} = \frac{1}{n} \sum\_{i=1}^{n} y\_i$$

$$\bar{z} = \frac{1}{n} \sum\_{i=1}^{n} z\_i$$

An estimation error denoted di is deviation between zi and yi. The estimation error mean denoted μ is mean of errors. The error mean μ reflects accuracy of a given formula. The smaller the absolute value of μ is, the more accurate the formula is. If μ is positive, the respective formula leans to overestimation. If μ is negative, the respective formula leans to low estimation. The standard deviation σ of estimation errors reflects the stability of a given formula. The smaller the standard deviation σ is, the more stable the formula is. The combination of error mean μ and standard deviation σ results out a so-called error range. Eq. (2) explains how to calculate μ, σ, and error range.

$$\begin{aligned} d\_i &= z\_i - y\_i \\ \mu &= \frac{1}{n} \sum\_{i=1}^n d\_i \\ \sigma &= \sqrt{\frac{1}{n-1} \sum\_{i=1}^n \left( d\_i - \mu \right)^2} \\ \text{error\\_range} &= \left[ \mu - \sigma, \mu + \sigma \right] = \mu \pm \sigma \end{aligned} \tag{2}$$

For example, if μ = �0.0292 and σ = 1.45 then, the error range is �0.0292 � 1.45, which means that the total average error ranges from �1.4792 = �0.0292-1.45 to 1.4208 = �0.0292 + 1.45. The error range reflects both adequacy and accuracy of a given formula.


The sign "^" denotes exponent operator. The template of formulas aims to flexibility, which can be input of any computational tool. Table 5 shows a comparison between our best weight formula and the others with 2D sample. As seen in Table 5, our formula is the best with R = 0.9636 and error range � 7.4656 � 212.5573 g.

Table 4. Comparison of age estimation with 2D sample.

and their last periods are determined. Each of them has only one alive fetus. Fetal age is from 28 to 42 weeks. Delivery time is not over 48 h since ultrasound scan. Measures in 2D sample are bpd, hc, ac, and fl. Measures in 3D sample are bpd, hc, ac, fl, thigh\_vol, arm\_vol. The unit of

fetal weight are week and gram, respectively. Experimental results mentioned in this section were also published in our article "Experimental Results of Phoebe Framework: Optimal

The proposed framework can produce amazing formulas. We compare our optimal formulas with the others according to metrics such as estimation correlation and estimation error range, given such two gestational samples. Let Y = {y1, y2,…, yn} and Z = {z1, z2,…, zn} be fetal sample age/weight and fetal estimated age/weight, respectively. The estimation correlation denoted R is correlation coefficient of sample response value and estimated response value, according to Eq. (1). The correlation R reflects adequacy of a given formula. The

. The units of fetal age and

bpd, hc, ac, fl is millimeter. The unit of thigh\_vol and arm\_vol is cm3

Formulas for Estimating Fetus Weight and Age" [11].

larger the R is, the better the formula is:

Figure 8. Comparison among different formulas.

112 eHealth - Making Health Care Smarter

Table 4 shows a comparison between our best age formula and the others with 2D sample. As a convention, the name of each formula is the name of respective author listed in references section. For example, formula "Ho 1" is the first formula of the author Ho [4]. As seen in Table 4, our formula is the best with R = 0.9303 and error range 0.0292 1.4500 week (s). As a convention, our formulas have names with prefix "NH"

In Table 8, all dual formulas NHD \* are better than normal formulas NH \* with regard to R and error range. Moreover, NHD \* do not need too much regressors. Given 2D sample, NHD 1 and NHD 2 use 4 and 3 regressors including age regressor, respectively whereas both NH 3 and NH 4 uses 4 regressors. Given 3D sample, NHD 3 and NHD 4 use 6 and 5 regressors including age

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight

regressor, respectively, whereas NH 7 and NH 8 use 5 and 3 regressors, respectively.

Formula Expression R Error range

NH 8 weight = 3626.314419 + 43.426744 \* bpd + 23.645338 \* fl + 11.414273 \* thigh\_vol 0.9698 0 184.0439 Ho 5 weight = 3306 + 55.477 \* bpd + 13.483 \* thigh\_vol 0.9663 0.0072 194.0956

Lee 5 weight = exp.(2.1264 + 1.1461 \* log(ac/10) + 0.4314 \* log(thigh\_vol)) 0.9514 289.2660 234.0763

Lee 4 weight = exp.(4.7806 + 0.7596 \* log(thigh\_vol)) 0.9298 737.4932 344.1904 Lee 1 weight = exp.(4.9588 + 1.0721 \* log(arm\_vol) - 0.0526 \* (log(arm\_vol)^2)) 0.9281 867.0836 309.5779 Chang weight = 1080.8735 + 22.44701 \* thigh\_vol 0.9229 456.5168 298.2517

Formula Expression R Error range

0.9674 5.6422 202.0395

0.9708 0.0001 180.9803

115

http://dx.doi.org/10.5772/intechopen.74883

0.9620 247.8761 206.1607

0.9472 316.4974 242.7964

0.9385 7.5001 260.4596

0.9667 5.6111 204.1477

0.9636 7.4656 212.5573

0.9635 6.0901 214.1153

0.9708 0.0001 180.9803

0.9715 0 178.8091

0.9714 0 178.9114

log10(weight) = 3.715073 + 1.873457 \* log10(bpd) + 0.363783 \* log10

log10(weight) = 3.761798 + 2.001731 \* log10(bpd) + 0.811078 \* log10

weight = 4988.000528 + 66.374156 \* age + 0.370084 \* hc + 1.943247 \* ac + 39.464816 \* bpd + 13.215505 \* fl + 3.658463 \* thigh\_vol

weight = 4982.099978 + 68.089354 \* age + 2.001675 \* ac + 39.85375 \*

weight = 3617.936175 + 0.513171 \* hc + 1.960176 \* ac + 39.804645 \* bpd + 17.016936 \* fl + 8.366404 \* thigh\_vol + 5.828808 \* arm\_vol

log(weight) = 10.047381 + 1.94864 \* log(bpd) + 0.263745 \* log(hc) + 0.601972 \*

log(weight) = 3.957543 + 0.02373 \* bpd + 0.000802 \* hc + 0.009403 \* fl + 0.003157

weight = 3626.314419 + 43.426744 \* bpd + 23.645338 \* fl + 11.414273 \* thigh\_vol 0.9698 0 184.0439

(fl) + 0.691683 \* log10(ac) + 0.722245 \* log10(age)

NH 7 weight = 3617.936175 + 0.513171 \* hc + 1.960176 \* ac + 39.804645 \* bpd + 17.016936 \* fl + 8.366404 \* thigh\_vol + 5.828808 \* arm\_vol

Lee 3 weight = exp.(0.5046 + 1.9665 \* log(bpd/10) - 0.3040 \* (log(bpd/10)^2) + 0.9675 \* log

Lee 2 weight = exp.(3.6138 + 4.6761 \* log(ac/10) - 0.4959 \* (log(ac/10)^2) + 0.3795 \* log

Ho 6 weight = 882.7049 + 73.9955 \* thigh\_vol - 0.497 \* (thigh\_vol^2) + 0.0014 \*

bpd + 13.229377 \* fl + 3.619405 \* thigh\_vol

(ac) + 0.826279 \* log10(age)

Table 7. Comparison of weight estimation with 3D sample.

(ac/10) + 0.3557 \* log(arm\_vol))

(arm\_vol))

(thigh\_vol^3)

log(fl) + 0.905524 \* log(ac)

\* ac

Table 8. Weight estimation dual formulas.

NHD 1 (2D sample)

NHD 2 (2D sample)

NHD 3 (3D sample)

NHD 4 (3D sample)

NH 3 (2D sample)

NH 4 (2D sample)

NH 7 (3D sample)

NH 8 (3D sample)

Table 6 shows comparison between our best age formula and the others with 3D sample. As seen in Table 6, our formula is the best with R = 0.9970 and error range 0.2696 week

Table 7 shows a comparison between our best weight formula and the others with 3D sample. As seen in Table 7, our formula is the best with R = 0.9708 and error range 0.0001 180.9803 g

Within the context of this research, from section of 3D ultrasound in PhD dissertation of Ho [4], I recognize that fetus weight and fetus age are mutually dependent. For instance, when fetus age increases, fetus weight increases too. As a result, weight estimation is improved significantly if fetus age was known before. If fetus age is added into the regression model of fetus weight as a regression variable (regressor), the resulted weight estimation formula, called dual formula, is even better than the most optimal ones shown in Tables 5 and 6. Such dual formula is not only precise but also practical because many pregnant women knew their gestational age before taking an ultrasound examination. Given 2D sample and 3D sample, Table 8 shows dual formulas in comparison with the most optimal ones shown in Tables 5 and 7 with regard to R and error range. As a convention, our dual formulas have names with prefix "NHD". Notation "log10" denotes logarithm function with base 10.


Table 5. Comparison of weight estimation with 2D sample.


Table 6. Comparison of age estimation with 3D sample.

In Table 8, all dual formulas NHD \* are better than normal formulas NH \* with regard to R and error range. Moreover, NHD \* do not need too much regressors. Given 2D sample, NHD 1 and NHD 2 use 4 and 3 regressors including age regressor, respectively whereas both NH 3 and NH 4 uses 4 regressors. Given 3D sample, NHD 3 and NHD 4 use 6 and 5 regressors including age regressor, respectively, whereas NH 7 and NH 8 use 5 and 3 regressors, respectively.


Table 7. Comparison of weight estimation with 3D sample.

Table 4 shows a comparison between our best age formula and the others with 2D sample. As a convention, the name of each formula is the name of respective author listed in references section. For example, formula "Ho 1" is the first formula of the author Ho [4]. As seen in Table 4, our formula is the best with R = 0.9303 and error range 0.0292 1.4500 week (s). As

Table 6 shows comparison between our best age formula and the others with 3D sample. As

Table 7 shows a comparison between our best weight formula and the others with 3D sample. As seen in Table 7, our formula is the best with R = 0.9708 and error range 0.0001 180.9803 g Within the context of this research, from section of 3D ultrasound in PhD dissertation of Ho [4], I recognize that fetus weight and fetus age are mutually dependent. For instance, when fetus age increases, fetus weight increases too. As a result, weight estimation is improved significantly if fetus age was known before. If fetus age is added into the regression model of fetus weight as a regression variable (regressor), the resulted weight estimation formula, called dual formula, is even better than the most optimal ones shown in Tables 5 and 6. Such dual formula is not only precise but also practical because many pregnant women knew their gestational age before taking an ultrasound examination. Given 2D sample and 3D sample, Table 8 shows dual formulas in comparison with the most optimal ones shown in Tables 5 and 7 with regard to R and error range. As a convention, our dual formulas have names with prefix "NHD".

seen in Table 6, our formula is the best with R = 0.9970 and error range 0.2696 week

Formula Expression R Error range

Sherpard weight = 10^(1.2508 + 0.166 \* bpd/10 + 0.046 \* ac/10–0.002646 \* ac \* bpd/100) 0.9619 65.8121 219.0392 Ho 2 weight = 10^(1.746 + 0.0124 \* bpd + 0.001906 \* ac) 0.9602 11.5576 223.5124 Hadlock weight = 10^(1.304 + 0.05281 \* ac/10 + 0.1938 \* fl/10–0.004 \* ac \* fl/100) 0.9395 76.4960 272.9474

Formula Expression R Error range

Ho 3 age = 21.1148 + 0.2381 \* thigh\_vol - 0.001 \* (thigh\_vol^2) + 0.000002 \* (thigh\_vol^3) 0.9960 0.0150 0.3173 Ho 4 age = 167.079079–1.553705 \* ac + 0.005559 \* (ac^2) - 0.000006 \* (ac^3) 0.8482 0.3723 1.8985

NH 5 age = 20.759763 + 0.170859 \* (thigh\_vol + arm\_vol) - 0.000545 \* ((thigh\_vol + arm\_vol)

NH 6 age = 21.816252 + 0.137531 \* (thigh\_vol + arm\_vol) - 0.000228 \* ((thigh\_vol + arm\_vol)

weight = 1000 \* exp.(4.564 + 0.282 \* ac/10–0.00331 \* ac \* ac/100) 0.9215 68.1261 308.5728

0.9636 7.4656 212.5573

0.9635 6.0901 214.1153

0.9970 0 0.2696

0.9969 0 0.2752

a convention, our formulas have names with prefix "NH"

114 eHealth - Making Health Care Smarter

Notation "log10" denotes logarithm function with base 10.

NH 3 log(weight) = 10.047381 + 1.94864 \* log(bpd) + 0.263745 \* log (hc) + 0.601972 \* log(fl) + 0.905524 \* log(ac)

fl + 0.003157 \* ac

Table 5. Comparison of weight estimation with 2D sample.

^2) + 0.000001 \* ((thigh\_vol + arm\_vol)^3)

Table 6. Comparison of age estimation with 3D sample.

Campbell and Wilkin

^2)

NH 4 log(weight) = 3.957543 + 0.02373 \* bpd + 0.000802 \* hc + 0.009403 \*


Table 8. Weight estimation dual formulas.

Although our formulas are better than all remaining ones with high adequacy (large R) and high accuracy (small error range), other researches are always significant because their formulas are very simple and practical. Moreover, our formulas are not global. If they are applied into other samples collected in other communities, their accuracy may be decreased and they may not be still better than traditional formulas such as Sherpard and Hadlock. However, it is easy to draw from our experimental results that if Phoebe framework is used for the same samples with other researches, it will always produce preeminent formulas. In order to achieve global optimality with Phoebe framework, the following are two essential suggestions:

of Y and Z are σ<sup>2</sup>

respectively ([17] p. 8).

Eq. (4).

by Eq. (5).

<sup>1</sup> and σ<sup>2</sup>

X ¼

x<sup>i</sup> ¼

P Y, ZjX, <sup>α</sup>, <sup>β</sup> � � <sup>¼</sup> P Yh i <sup>j</sup>X, <sup>α</sup> P ZjX, <sup>β</sup> � � <sup>¼</sup> <sup>1</sup>

specified by joint probability of Y and Z.

where Y is from 1 to K.

xT 1 xT 2 ⋮ xT N 1

0

BBBBB@

y1 y2 ⋮ yN

Given X, joint probability of Y and Z is product of the probability of Y given X and the probability of Z given X because Y and Z are conditionally independent given X, according to

2π

Conditional expectation of sufficient statistic Z given X with regard to P(Z | X, β) is specified

When Z is hidden variable, there is a latent dependent relationship between Y and Z, which is

P Yð Þ¼ ;Z P Yð ÞP Zh i jY

Variables Y and Z have different measures. For instance, the unit of Y is week, whereas the unit of Z is gram. Suppose Y is considered as discrete variable whose values from 1 to K where K can be up to 42, for example. The P(Y) becomes parameter θY, which is the probability of Y

P Yð Þ¼ ;Z θYP Zh i jY

ffiffiffiffiffiffiffiffiffi σ2 1σ2 2

1

CCCCCA , z ¼

0

BBBBB@

CCCCCA ¼

1

CCCCCCCCA , y ¼

0

BBBBB@

0

BBBBBBBB@

1 x<sup>11</sup> x<sup>12</sup> ⋮ x1<sup>n</sup>

<sup>2</sup>, respectively. Note that the superscript "<sup>T</sup>

 x<sup>11</sup> x<sup>12</sup> ⋯ x1<sup>n</sup> x<sup>21</sup> x<sup>22</sup> ⋯ x2<sup>n</sup> ⋮⋮ ⋮ ⋱⋮ xN<sup>1</sup> xN<sup>2</sup> ⋯ xNn

> z1 z2 ⋮ zN

<sup>q</sup> exp � <sup>Y</sup> � <sup>α</sup>TX � �<sup>2</sup>

2σ<sup>2</sup> 1

E Zh i <sup>j</sup><sup>X</sup> <sup>¼</sup> <sup>β</sup>TX (5)

� <sup>Z</sup> � <sup>β</sup>TX � �<sup>2</sup> 2σ<sup>2</sup> 2

(4)

!

1

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight

CCCCCA

0

BBBBB@

1

CCCCCA

operator in vector and matrix. Let D = (X, y, z) be collected sample in which X is a set of sample measures, y is a set of sample fetal ages, and z is a set of fetal weights with note that z is missed (empty) or incomplete. If z is empty, there is no zi in z. If z is incomplete, z has some values but there are also some missing values in z. However, the constraint is that y must be complete, which means that all pregnant women within the research knew their gestational age. Now we focus on estimate α and β based on D. As a convention, let α\* and β\* be estimates of α and β,

" denotes transposition

117

http://dx.doi.org/10.5772/intechopen.74883


These suggestions go beyond this research. In my opinion, we cannot reach absolutely the global optimality because Phoebe framework focuses on local optimality with specific communities. Essentially, the suggestions only alleviate the weak point of the built-in SG algorithm in global optimality.

#### 6. A proposal of early weight estimation

The used ultrasound samples are collected in fetal age from 28 to 42 weeks because delivery time is not over 48 h since last ultrasound scan. Hence, accuracy of weight estimation is only ensured when ultrasound examinations are performed after 28-week old fetal age. This section proposes an early weight estimation, in which ultrasound measures can be taken before 28 week old fetal age. We do not ensure improvement of estimation accuracy yet because we do not make experiments on the proposal yet, but the gestational sample can be totally collected at any appropriate time points in gestational period. In other words, the sample can lack fetal weights. This is a convenience for practitioners because they do not need to concern fetal weights when taking ultrasound examinations. Consequently, early weight estimation is achieved. As a convention, vectors are column vectors if there is no additional information.

Without loss of generality, regression models are linear such as Y = α<sup>0</sup> + α1X<sup>1</sup> + α2X<sup>2</sup> + … + αnXn and Z = β<sup>0</sup> + β1X<sup>1</sup> + β2X<sup>2</sup> + … + βnXn where Y is fetal age and Z is fetal weight, whereas Xi (s) are gestational ultrasound measures such as bpd, hc, ac, and fl. Suppose both Y and Z conform normal distribution, according to Eq. (3) ([17] pp. 8–9).

$$\begin{split}P\langle Y|X,\alpha\rangle &= \frac{1}{\sqrt{2\pi\sigma\_1^2}} \exp\left(\frac{-\left\langle Y - \alpha^T X\right\rangle^2}{2\sigma\_1^2}\right) \\ P\langle Z|X,\beta\rangle &= \frac{1}{\sqrt{2\pi\sigma\_2^2}} \exp\left(\frac{-\left\langle Z - \beta^T X\right\rangle^2}{2\sigma\_2^2}\right) \end{split} \tag{3}$$

where α = (α0, α1,…, αn) <sup>T</sup> and β = (β0, β1,…, βn) <sup>T</sup> are parameter vectors where X = (1, X1, X2,…, Xn) <sup>T</sup> is data vector. The means of Y and Z are α<sup>T</sup> X and β<sup>T</sup> X, respectively, whereas the variances of Y and Z are σ<sup>2</sup> <sup>1</sup> and σ<sup>2</sup> <sup>2</sup>, respectively. Note that the superscript "<sup>T</sup> " denotes transposition operator in vector and matrix. Let D = (X, y, z) be collected sample in which X is a set of sample measures, y is a set of sample fetal ages, and z is a set of fetal weights with note that z is missed (empty) or incomplete. If z is empty, there is no zi in z. If z is incomplete, z has some values but there are also some missing values in z. However, the constraint is that y must be complete, which means that all pregnant women within the research knew their gestational age. Now we focus on estimate α and β based on D. As a convention, let α\* and β\* be estimates of α and β, respectively ([17] p. 8).

Although our formulas are better than all remaining ones with high adequacy (large R) and high accuracy (small error range), other researches are always significant because their formulas are very simple and practical. Moreover, our formulas are not global. If they are applied into other samples collected in other communities, their accuracy may be decreased and they may not be still better than traditional formulas such as Sherpard and Hadlock. However, it is easy to draw from our experimental results that if Phoebe framework is used for the same samples with other researches, it will always produce preeminent formulas. In order to achieve global optimality

• Adding more knowledge of pregnancy study, ultrasound technique, and obstetrics into Phoebe framework. In other words, the additional knowledge will be modeled as con-

These suggestions go beyond this research. In my opinion, we cannot reach absolutely the global optimality because Phoebe framework focuses on local optimality with specific communities. Essentially, the suggestions only alleviate the weak point of the built-in SG algorithm in global

The used ultrasound samples are collected in fetal age from 28 to 42 weeks because delivery time is not over 48 h since last ultrasound scan. Hence, accuracy of weight estimation is only ensured when ultrasound examinations are performed after 28-week old fetal age. This section proposes an early weight estimation, in which ultrasound measures can be taken before 28 week old fetal age. We do not ensure improvement of estimation accuracy yet because we do not make experiments on the proposal yet, but the gestational sample can be totally collected at any appropriate time points in gestational period. In other words, the sample can lack fetal weights. This is a convenience for practitioners because they do not need to concern fetal weights when taking ultrasound examinations. Consequently, early weight estimation is achieved. As a convention, vectors are column vectors if there is no additional information. Without loss of generality, regression models are linear such as Y = α<sup>0</sup> + α1X<sup>1</sup> + α2X<sup>2</sup> + … + αnXn and Z = β<sup>0</sup> + β1X<sup>1</sup> + β2X<sup>2</sup> + … + βnXn where Y is fetal age and Z is fetal weight, whereas Xi (s) are gestational ultrasound measures such as bpd, hc, ac, and fl. Suppose both Y and Z conform

with Phoebe framework, the following are two essential suggestions:

• Experimenting on Phoebe framework with many samples.

6. A proposal of early weight estimation

normal distribution, according to Eq. (3) ([17] pp. 8–9).

<sup>T</sup> is data vector. The means of Y and Z are α<sup>T</sup>

where α = (α0, α1,…, αn)

Xn)

P Yh i <sup>j</sup>X, <sup>α</sup> <sup>¼</sup> <sup>1</sup>

P ZjX, <sup>β</sup> � � <sup>¼</sup> <sup>1</sup>

<sup>T</sup> and β = (β0, β1,…, βn)

ffiffiffiffiffiffiffiffiffiffi 2πσ<sup>2</sup> 1

ffiffiffiffiffiffiffiffiffiffi 2πσ<sup>2</sup> 2

<sup>q</sup> exp � <sup>Y</sup> � <sup>α</sup>TX � �<sup>2</sup>

<sup>p</sup> exp � <sup>Z</sup> � <sup>β</sup>TX � �<sup>2</sup>

X and β<sup>T</sup>

2σ<sup>2</sup> 1

2σ<sup>2</sup> 2

<sup>T</sup> are parameter vectors where X = (1, X1, X2,…,

X, respectively, whereas the variances

!

(3)

!

straints of SG algorithm.

116 eHealth - Making Health Care Smarter

optimality.

$$\mathbf{X} = \begin{pmatrix} \mathbf{x}\_1^T \\ \mathbf{x}\_2^T \\ \vdots \\ \mathbf{x}\_N^T \end{pmatrix} = \begin{pmatrix} 1 & \mathbf{x}\_{11} & \mathbf{x}\_{12} & \cdots & \mathbf{x}\_{1n} \\ 1 & \mathbf{x}\_{21} & \mathbf{x}\_{22} & \cdots & \mathbf{x}\_{2n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \mathbf{x}\_{N1} & \mathbf{x}\_{N2} & \cdots & \mathbf{x}\_{Nn} \end{pmatrix},$$

$$\mathbf{x}\_i = \begin{pmatrix} 1 \\ \mathbf{x}\_{11} \\ \mathbf{x}\_{12} \\ \vdots \\ \mathbf{x}\_{1n} \end{pmatrix}, \mathbf{y} = \begin{pmatrix} y\_1 \\ y\_2 \\ \vdots \\ y\_N \end{pmatrix}, \mathbf{z} = \begin{pmatrix} z\_1 \\ z\_2 \\ \vdots \\ \vdots \\ z\_N \end{pmatrix}$$

Given X, joint probability of Y and Z is product of the probability of Y given X and the probability of Z given X because Y and Z are conditionally independent given X, according to Eq. (4).

$$P\left = P\leftP\left = \frac{1}{2\pi\sqrt{\sigma\_1^2\sigma\_2^2}} \exp\left(\frac{-\left(Y - a^TX\right)^2}{2\sigma\_1^2} - \frac{\left(Z - \beta^TX\right)^2}{2\sigma\_2^2}\right) \tag{4}$$

Conditional expectation of sufficient statistic Z given X with regard to P(Z | X, β) is specified by Eq. (5).

$$E\langle Z|X\rangle = \beta^T X$$

When Z is hidden variable, there is a latent dependent relationship between Y and Z, which is specified by joint probability of Y and Z.

$$P(Y, Z) = P(Y)P(Z|Y)$$

Variables Y and Z have different measures. For instance, the unit of Y is week, whereas the unit of Z is gram. Suppose Y is considered as discrete variable whose values from 1 to K where K can be up to 42, for example. The P(Y) becomes parameter θY, which is the probability of Y where Y is from 1 to K.

$$P(Y, Z) = \theta\_Y P \langle Z | Y \rangle$$

For each Z, suppose the condition probability P(Z | Y) is distributed normally with mean μ<sup>Y</sup> and variance σ<sup>Y</sup> 2 . Eq. (6) specifies the joint probability P(Y, Z).

$$P\left = \frac{\theta\_{Y}}{\sqrt{2\pi\sigma\_{Y}^{2}}} \exp\left(\frac{-\left(Z-\mu\_{Y}\right)^{2}}{2\sigma\_{Y}^{2}}\right) \tag{6}$$

Conditional expectation of sufficient statistic Z given Y with regard to P(Z | Y, μY, σ<sup>Y</sup> 2 ) is specified by Eq. (7).

$$E\langle Z|Y\rangle = \mu\_Y \tag{7}$$

P y, z h i <sup>j</sup>X, <sup>Θ</sup> <sup>¼</sup> P y, zjμY, <sup>σ</sup><sup>2</sup>

(Due to all observations are independently and identically distributed)

yi � <sup>α</sup><sup>T</sup>xi � �<sup>2</sup> σ2 1

δ yi

Lð Þ¼ Θ logð Þ¼� P y, z h i jX, Θ Nlog 2ð Þ� π

; <sup>Y</sup> � � logð Þ� <sup>θ</sup><sup>Y</sup>

<sup>2</sup> log <sup>σ</sup><sup>2</sup> 2 � � � <sup>1</sup> 2σ<sup>2</sup> 1 X N

!

The optimal estimate Θ\* is a maximizer of l(Θ), according to Eq. (13) ([17] p. 9).

<sup>Θ</sup><sup>∗</sup> <sup>¼</sup> argmax Θ

Y � � <sup>þ</sup>

� 1 2σ<sup>2</sup> 2 X N

i¼1

log 2ð Þ π

When log(2π) and θ<sup>Y</sup> are constants, the reduced log-likelihood function is derived from the

i¼1

Lð Þ¼ Θ argmax Θ

zi � μ<sup>Y</sup> � �<sup>2</sup> σ2 Y

By taking first-order partial derivatives of l(Θ) with regard to Θ ([18] p. 34), we obtain:

yi � <sup>α</sup><sup>T</sup>xi � �<sup>2</sup>

It is conventional that if δ(yi, Y) = 0 then, the respective probability P(yi, zi | μY, σ<sup>Y</sup>

, zijμY, <sup>σ</sup><sup>2</sup> Y � �P yi

> þX<sup>N</sup> i¼1

;<sup>Y</sup> � � <sup>¼</sup> 1 if yi <sup>¼</sup> <sup>Y</sup>

from the product. The log-likelihood function is logarithm of the full joint probability as

! !

<sup>¼</sup> <sup>Y</sup> N

XN i¼1

<sup>¼</sup> <sup>1</sup> 2π

0 B@

where

follows:

ffiffiffiffiffiffiffiffiffi σ2 1σ2 2 1 CA

N

<sup>∗</sup>exp �<sup>1</sup> 2

� 1 2σ<sup>2</sup> 1 X N

þ X N

log-likelihood as seen in Eq. (12).

<sup>l</sup>ð Þ¼� <sup>Θ</sup> <sup>N</sup>

i¼1

X K

Y¼1 δ yi

� 1 2 X N

i¼1

i¼1

X K

Y¼1 δ yi

<sup>2</sup> log <sup>σ</sup><sup>2</sup> 1 � � � <sup>N</sup>

;Y � � log σ<sup>2</sup>

q

i¼1 P yi

Y � �P yh i <sup>j</sup>X, <sup>α</sup> P zjX, <sup>β</sup> � �

> zi � <sup>β</sup><sup>T</sup>xi � �<sup>2</sup> σ2 2

0 if yi ¼ Y � �

> zi � <sup>β</sup><sup>T</sup>xi � �<sup>2</sup>

<sup>2</sup> � log <sup>σ</sup><sup>2</sup>

yi � <sup>α</sup><sup>T</sup>xi � �<sup>2</sup> � <sup>1</sup>

!

<sup>j</sup>xi, <sup>α</sup> � �P zijxi, <sup>β</sup> � �

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight

∗ YN i¼1

Nlog σ<sup>2</sup> 1 � �

Y � �

<sup>2</sup> � zi � <sup>μ</sup><sup>Y</sup>

2σ<sup>2</sup> 2 X N

i¼1

zi � <sup>β</sup><sup>T</sup>xi � �<sup>2</sup>

lð Þ Θ (13)

(12)

YK Y¼1 δ yi ;Y � �θ<sup>Y</sup> ffiffiffiffiffiffiffiffiffiffiffi 2πσ<sup>2</sup> Y

<sup>2</sup> � <sup>N</sup>log <sup>σ</sup><sup>2</sup>

� �<sup>2</sup> 2σ<sup>2</sup> Y

<sup>q</sup> exp � zi � <sup>μ</sup><sup>Y</sup>

http://dx.doi.org/10.5772/intechopen.74883

2

2 � � 2

� �<sup>2</sup> 2σ<sup>2</sup> Y

119

) is removed

!

Please pay attention to Eq. (7) because Z will be estimated by such expectation later. Eq. (8) specifies expectation of sufficient statistic Z with regard to P(Y, Z | θY, μY, σ<sup>Y</sup> 2 ).

$$E(Z) = \sum\_{Y=1}^{K} \theta\_Y \mu\_Y \tag{8}$$

Due to:

$$\mathbb{E}\left<\mathcal{Z}|\boldsymbol{\theta}\_{\mathcal{Y}},\boldsymbol{\mu}\_{\mathcal{Y}},\sigma\_{\mathcal{Y}}^{2}\right> = \sum\_{\mathcal{Y}=1}^{K} \int\_{\mathcal{Z}} \mathbb{Z}\mathbb{P}\langle \mathcal{Y},\mathcal{Z}|\boldsymbol{\theta}\_{\mathcal{Y}},\boldsymbol{\mu}\_{\mathcal{Y}},\sigma\_{\mathcal{Y}}^{2}\rangle d\mathcal{Z} = \sum\_{\mathcal{Y}=1}^{K} \theta\_{\mathcal{Y}} \mathbb{E}\left<\mathcal{Z}|\boldsymbol{\theta}\_{\mathcal{Y}},\boldsymbol{\mu}\_{\mathcal{Y}},\sigma\_{\mathcal{Y}}^{2}\right> = \sum\_{\mathcal{Y}=1}^{K} \theta\_{\mathcal{Y}} \boldsymbol{\mu}\_{\mathcal{Y}}.$$

The full joint probability of Y and Z given X and parameters α, β, θY, μY, and σ<sup>Y</sup> <sup>2</sup> is the product specified by Eq. (9).

$$\begin{aligned} P\langle Y, Z|X, \alpha, \beta, \Theta\_Y, \mu\_Y, \sigma\_Y^2 \rangle &= P\langle Y, Z|\Theta\_Y, \mu\_Y, \sigma\_Y^2\rangle P\langle Y, Z|X, \alpha, \beta \rangle \\ &= P\langle Y, Z|\Theta\_Y, \mu\_Y, \sigma\_Y^2\rangle P\langle Y|X, \alpha\rangle P\langle Z|X, \beta\rangle \end{aligned} \tag{9}$$

where P(Y, Z | X, α, β) and P(Y, Z | θY, μY, σ<sup>Y</sup> 2 ) are specified by Eqs. (4) and (6), respectively. Eq. (9) indicates that both explicit dependence via P(Y, Z | X, α, β) and implicit dependence via P(Y, Z | θY, μY, σ<sup>Y</sup> 2 ) between Y and Z. Explicit dependence and implicit dependence share equal influence on Z if E(Z | X) specified by Eq. (5) is equal to E(Z) specified by Eq. (8), according to Eq. (10).

$$\sum\_{Y=1}^{K} \theta\_Y \mu\_Y = \boldsymbol{\beta}^T \mathbf{X} \tag{10}$$

Given sample D, all θ<sup>Y</sup> become constants and determined by Eq. (11).

$$\theta\_Y = \frac{\text{The number of } y\_i = Y}{N} \tag{11}$$

For convenience, let Θ = (α, β, μY) <sup>T</sup> be the compound parameter. The full joint probability specified by Eq. (9) is rewritten as follows:

$$P\langle y, z | \mathbf{X}, \Theta \rangle = P\langle y, z | \mu\_{Y'} \sigma\_Y^2 \rangle P\langle y | \mathbf{X}, \alpha \rangle P\langle z | \mathbf{X}, \beta \rangle$$

$$= \prod\_{i=1}^{N} P\langle y\_{i'} z\_i | \mu\_{Y'} \sigma\_Y^2 \rangle P\langle y\_i | \mathbf{x}\_{i'} \alpha \rangle P\langle z\_i | \mathbf{x}\_{i'} \beta \rangle$$

(Due to all observations are independently and identically distributed)

$$= \left(\frac{1}{2\pi\sqrt{o\_1^2o\_2^2}}\right)^N \ast \exp\left(\frac{-1}{2}\left(\sum\_{i=1}^N \frac{\left(y\_i - a^T\mathbf{x}\_i\right)^2}{o\_1^2} + \sum\_{i=1}^N \frac{\left(z\_i - \boldsymbol{\beta}^T\mathbf{x}\_i\right)^2}{o\_2^2}\right)\right) \ast \prod\_{i=1}^N \prod\_{Y=1}^K \frac{\delta\left(y\_i, \boldsymbol{Y}\right)\theta\_Y}{\sqrt{2\pi o\_Y^2}} \exp\left(\frac{-\left(z\_i - \mu\_Y\right)^2}{2o\_Y^2}\right)$$

where

For each Z, suppose the condition probability P(Z | Y) is distributed normally with mean μ<sup>Y</sup>

2πσ<sup>2</sup> Y

Conditional expectation of sufficient statistic Z given Y with regard to P(Z | Y, μY, σ<sup>Y</sup>

Please pay attention to Eq. (7) because Z will be estimated by such expectation later. Eq. (8)

Y¼1

Y � �dZ <sup>¼</sup> <sup>X</sup>

2

Eq. (9) indicates that both explicit dependence via P(Y, Z | X, α, β) and implicit dependence via

equal influence on Z if E(Z | X) specified by Eq. (5) is equal to E(Z) specified by Eq. (8),

<sup>θ</sup><sup>Y</sup> <sup>¼</sup> The number of yi <sup>¼</sup> <sup>Y</sup>

K

Y¼1

Y � �P Y, ZjX, <sup>α</sup>, <sup>β</sup> � �

� �P Yh i <sup>j</sup>X, <sup>α</sup> P ZjX, <sup>β</sup> � � (9)

) between Y and Z. Explicit dependence and implicit dependence share

E Zð Þ¼ <sup>X</sup> K

<sup>q</sup> exp � <sup>Z</sup> � <sup>μ</sup><sup>Y</sup>

� �<sup>2</sup> 2σ<sup>2</sup> Y

E Zh i jY ¼ μ<sup>Y</sup> (7)

<sup>θ</sup>YE ZjθY, <sup>μ</sup>Y, <sup>σ</sup><sup>2</sup>

2 ).

θYμ<sup>Y</sup> (8)

Y � � <sup>¼</sup> <sup>X</sup>

) are specified by Eqs. (4) and (6), respectively.

<sup>θ</sup>Yμ<sup>Y</sup> <sup>¼</sup> <sup>β</sup>TX (10)

<sup>N</sup> (11)

<sup>T</sup> be the compound parameter. The full joint probability

K

Y¼1

θYμ<sup>Y</sup>

<sup>2</sup> is the product

(6)

2 ) is

!

. Eq. (6) specifies the joint probability P(Y, Z).

Y � � <sup>¼</sup> <sup>θ</sup><sup>Y</sup> ffiffiffiffiffiffiffiffiffiffiffi

specifies expectation of sufficient statistic Z with regard to P(Y, Z | θY, μY, σ<sup>Y</sup>

ZP Y, ZjθY, <sup>μ</sup>Y, <sup>σ</sup><sup>2</sup>

The full joint probability of Y and Z given X and parameters α, β, θY, μY, and σ<sup>Y</sup>

Y � � <sup>¼</sup> P Y,ZjθY, <sup>μ</sup>Y, <sup>σ</sup><sup>2</sup>

> X K

Y¼1

Given sample D, all θ<sup>Y</sup> become constants and determined by Eq. (11).

Y

P Y, ZjθY, <sup>μ</sup>Y, <sup>σ</sup><sup>2</sup>

and variance σ<sup>Y</sup>

specified by Eq. (7).

E ZjθY, <sup>μ</sup>Y, <sup>σ</sup><sup>2</sup>

specified by Eq. (9).

P(Y, Z | θY, μY, σ<sup>Y</sup>

according to Eq. (10).

Y � � <sup>¼</sup> <sup>X</sup>

K

ð

Z

P Y,ZjX, <sup>α</sup>, <sup>β</sup>, <sup>θ</sup>Y, <sup>μ</sup>Y, <sup>σ</sup><sup>2</sup>

<sup>¼</sup> P Y,ZjθY, <sup>μ</sup>Y, <sup>σ</sup><sup>2</sup>

where P(Y, Z | X, α, β) and P(Y, Z | θY, μY, σ<sup>Y</sup>

2

For convenience, let Θ = (α, β, μY)

specified by Eq. (9) is rewritten as follows:

Y¼1

Due to:

2

118 eHealth - Making Health Care Smarter

$$\delta(y\_i, \mathcal{Y}) = \left\{ \begin{matrix} 1 \text{ if } y\_i = \mathcal{Y} \\ 0 \text{ if } y\_i = \mathcal{Y} \end{matrix} \right\}$$

It is conventional that if δ(yi, Y) = 0 then, the respective probability P(yi, zi | μY, σ<sup>Y</sup> 2 ) is removed from the product. The log-likelihood function is logarithm of the full joint probability as follows:

$$\begin{split} L(\Theta) &= \log(P(y, z | \mathbf{X}, \Theta)) = -N \log(2\pi) - \frac{N \log(\sigma\_1^2)}{2} - \frac{N \log(\sigma\_2^2)}{2} \\ &- \frac{1}{2\sigma\_1^2} \sum\_{i=1}^N \left( y\_i - \alpha^T \mathbf{x}\_i \right)^2 - \frac{1}{2\sigma\_2^2} \sum\_{i=1}^N \left( z\_i - \beta^T \mathbf{x}\_i \right)^2 \\ &+ \sum\_{i=1}^N \sum\_{Y=1}^K \delta(y\_i, Y) \left( \log(\theta\_Y) - \frac{\log(2\pi)}{2} - \frac{\log(\sigma\_Y^2)}{2} - \frac{\left( z\_i - \mu\_Y \right)^2}{2\sigma\_Y^2} \right) \end{split}$$

When log(2π) and θ<sup>Y</sup> are constants, the reduced log-likelihood function is derived from the log-likelihood as seen in Eq. (12).

$$\begin{split} l(\Theta) &= -\frac{N}{2} \log(\sigma\_1^2) - \frac{N}{2} \log(\sigma\_2^2) - \frac{1}{2\sigma\_1^2} \sum\_{i=1}^N \left( y\_i - a^T \mathbf{x}\_i \right)^2 - \frac{1}{2\sigma\_2^2} \sum\_{i=1}^N \left( z\_i - \boldsymbol{\beta}^T \mathbf{x}\_i \right)^2 \\ &- \frac{1}{2} \sum\_{i=1}^N \sum\_{Y=1}^K \delta(y\_i, Y) \left( \log(\sigma\_Y^2) + \frac{\left( z\_i - \mu\_Y \right)^2}{\sigma\_Y^2} \right) \end{split} \tag{12}$$

The optimal estimate Θ\* is a maximizer of l(Θ), according to Eq. (13) ([17] p. 9).

$$\Theta^\* = \underset{\Theta}{\text{argmax}} \, L(\Theta) = \underset{\Theta}{\text{argmax}} \, l(\Theta) \tag{13}$$

By taking first-order partial derivatives of l(Θ) with regard to Θ ([18] p. 34), we obtain:

$$\begin{aligned} \frac{\partial l(\Theta)}{\partial \alpha} &= \frac{1}{\sigma\_1^2} \sum\_{i=1}^N \left( y\_i - \alpha^T \mathbf{x}\_i \right) (\mathbf{x}\_i)^T \\\\ \frac{\partial l(\Theta)}{\partial \beta} &= \frac{1}{\sigma\_2^2} \sum\_{i=1}^N \left( z\_i - \beta^T \mathbf{x}\_i \right) (\mathbf{x}\_i)^T \\\\ \frac{\partial l(\Theta)}{\partial \mu\_Y} &= \sigma\_Y^2 \sum\_{i=1}^N \delta \left( y\_i, Y \right) \left( z\_i - \mu\_Y \right) \end{aligned}$$

When first-order partial derivatives of l(Θ) are equal to zero, it gets locally maximal. In other words, Θ\* is solution of the equation system 14 resulted from setting such derivatives to be zero and setting E(Z | X) = E(Z).

$$\begin{cases} \sum\_{i=1}^{N} \left( y\_i - \boldsymbol{\alpha}^T \mathbf{x}\_i \right) \left( \mathbf{x}\_i \right)^T = \mathbf{0}^T \\\\ \sum\_{i=1}^{N} \left( z\_i - \boldsymbol{\beta}^T \mathbf{x}\_i \right) \left( \mathbf{x}\_i \right)^T = \mathbf{0}^T \\\\ \sum\_{i=1}^{N} \delta(y\_i, \mathbf{y}) \left( z\_i - \mu\_{\mathcal{Y}} \right) = \mathbf{0} \\\\ \sum\_{j=1}^{K} \theta\_j \mu\_j = \boldsymbol{\beta}^T \mathbf{x}\_i \text{ for some } i \end{cases} \tag{14}$$

As usual, all parameters are changed after every iteration of EM algorithm, but fortunately, α\* is determined as a partial solution of equation system 14 at the first iteration of EM process because both X and y are complete. In other words, α\* is fixed, whereas β and μ<sup>Y</sup> are changed

1. E-step: Estimating only missing values zi (s) as the expectation of themselves based on the current mean μ<sup>t</sup>

according to Eq. (7). Note, each missing value zi is always associated with an observation yi. zi ¼ E zijyi � � <sup>¼</sup> <sup>μ</sup><sup>t</sup> yi 2. M-step: The next parameter Θ<sup>t</sup> + 1 is a maximizer of l(Θ), which is the solution of equation system 14. Note, Θ<sup>t</sup> + 1

.

At the first iteration, as usual Θ<sup>1</sup> is initialized arbitrarily but we can improve convergence of

tioners obtained n < N fetal weights z1, z2,…, zn from n ultrasound scans. Moreover, the fetal

The parameter β<sup>1</sup> at the first iteration is initialized according to previous studies in the literature.

According to experimental results, there is no doubt that Phoebe framework produces optimal formulas with high adequacy and accuracy; please see Tables 4–8 for more details. However, we also recognize the weak point of our research is that the built-in SG algorithm can lose some good formulas due to the heuristic conditions. The suggestive solution is to add more constraints in such conditions; please read the article "A framework of fetal age and weight estimation" ([10] pp. 24–25) for more details. The proposal of early weight estimation uses actually an additional constraint which is the latent relationship between fetal age and fetal weight. Such latent relationship represented by the joint probability of fetal age and weight is a knowledge aspect of pregnancy study. For further research, we will make experiment on the

Another weak point of our research is difficult to apply our complex formulas for fast mental calculation because we must pay the price for their high accuracy. In the future, we will embed

X<sup>T</sup>y (15)

http://dx.doi.org/10.5772/intechopen.74883

zi (16)

<sup>Y</sup> is initialized by

yi , 121

<sup>Y</sup> as sample mean. Without loss of generality, suppose practi-

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight

<sup>α</sup><sup>∗</sup> <sup>¼</sup> <sup>α</sup><sup>1</sup> <sup>¼</sup> <sup>X</sup><sup>T</sup><sup>X</sup> � ��<sup>1</sup>

" denotes the inversion of matrix.

age of all pregnant women over such n scans is the same, which is Y. Thus, μ<sup>1</sup>

proposal and try our best to discover other knowledge aspects.

μ1 <sup>Y</sup> <sup>¼</sup> <sup>1</sup> n Xn i¼1

in EM process. Eq. (15) ([20] p. 417) specifies α\*

becomes current parameter for the next iteration.

Table 9. E-step and M-step of EM algorithm.

where the superscript "�<sup>1</sup>

Eq. (16).

7. Conclusions

EM algorithm by initializing μ<sup>1</sup>

where

$$\delta(y\_i, Y) = \begin{cases} 1 \text{ if } y\_i = Y \\ 0 \text{ if } y\_i \neq Y \end{cases}$$

The notation 0 = (0, 0,…, 0)<sup>T</sup> denotes zero vector. All equations in system 14 are linear, whose unknowns are Θ = (α, β, μY) T . The last equation in system 14 is Eq. (10) with the heuristic assumption that explicit dependence and implicit dependence share equal influence on Z. Such last equation is only used to adjust μ<sup>Y</sup> (s) if the heuristic assumption is concerned; otherwise it is ignored.

We apply expectation maximization (EM) algorithm into estimating Θ = (α, β, μY) <sup>T</sup> with lack of fetal weights. Note that the full joint probability P(Y, Z | X, α, β, μY) specified by Eq. (9) is product of regular exponential distributions. EM algorithm has many iterations, and each iteration has expectation step (E-step) and maximization step (M-step) for estimating parameters. Given current parameter Θ<sup>t</sup> = (α<sup>t</sup> , βt , μ<sup>Y</sup> t ) <sup>T</sup> at the t th iteration, the two steps are shown in Table 9 ([19] p. 4).

The equation system 14 is solvable because missing values zi (s) were estimated in E-step. The EM algorithm stops if at some t th iteration, we have Θ<sup>t</sup> = Θ<sup>t</sup> + 1 = Θ\* . At that time, Θ\* = (α\* , β\* , μ<sup>Y</sup> \* ) <sup>T</sup> is the optimal estimate of EM algorithm, and hence, linear regression functions of Y and Z are determined with α\* , β\* .


Table 9. E-step and M-step of EM algorithm.

∂lð Þ Θ <sup>∂</sup><sup>α</sup> <sup>¼</sup> <sup>1</sup> σ2 1 X N

∂lð Þ Θ <sup>∂</sup><sup>β</sup> <sup>¼</sup> <sup>1</sup> σ2 2 X N

∂lð Þ Θ ∂μ<sup>Y</sup>

> X N

8

>>>>>>>>>>>>>>>><

>>>>>>>>>>>>>>>>:

i¼1

X N

i¼1

X N

i¼1 δ yi

X K

j¼1

δ yi

T

zero and setting E(Z | X) = E(Z).

120 eHealth - Making Health Care Smarter

unknowns are Θ = (α, β, μY)

= (α<sup>t</sup> , βt , μ<sup>Y</sup> t ) <sup>T</sup> at the t

> , β\* .

EM algorithm stops if at some t

Z are determined with α\*

where

is ignored.

μ<sup>Y</sup> \* )

rent parameter Θ<sup>t</sup>

<sup>¼</sup> <sup>σ</sup><sup>2</sup> Y X N

i¼1

i¼1

i¼1 δ yi

yi � <sup>α</sup><sup>T</sup>xi � �ð Þ xi

zi � <sup>β</sup><sup>T</sup>xi � �ð Þ xi

;<sup>Y</sup> � � zi � <sup>μ</sup><sup>Y</sup>

;<sup>Y</sup> � � <sup>¼</sup> 1 if yi <sup>¼</sup> <sup>Y</sup>

The notation 0 = (0, 0,…, 0)<sup>T</sup> denotes zero vector. All equations in system 14 are linear, whose

assumption that explicit dependence and implicit dependence share equal influence on Z. Such last equation is only used to adjust μ<sup>Y</sup> (s) if the heuristic assumption is concerned; otherwise it

fetal weights. Note that the full joint probability P(Y, Z | X, α, β, μY) specified by Eq. (9) is product of regular exponential distributions. EM algorithm has many iterations, and each iteration has expectation step (E-step) and maximization step (M-step) for estimating parameters. Given cur-

The equation system 14 is solvable because missing values zi (s) were estimated in E-step. The

th iteration, we have Θ<sup>t</sup> = Θ<sup>t</sup> + 1 = Θ\*

<sup>T</sup> is the optimal estimate of EM algorithm, and hence, linear regression functions of Y and

We apply expectation maximization (EM) algorithm into estimating Θ = (α, β, μY)

<sup>θ</sup>jμ<sup>j</sup> <sup>¼</sup> <sup>β</sup><sup>T</sup>xi for some <sup>i</sup>

When first-order partial derivatives of l(Θ) are equal to zero, it gets locally maximal. In other words, Θ\* is solution of the equation system 14 resulted from setting such derivatives to be

yi � <sup>α</sup><sup>T</sup>xi � �ð Þ xi

zi � <sup>β</sup><sup>T</sup>xi � �ð Þ xi

;<sup>Y</sup> � � zi � <sup>μ</sup><sup>Y</sup> � �

<sup>T</sup> <sup>¼</sup> <sup>0</sup><sup>T</sup>

9

>>>>>>>>>>>>>>>>=

(14)

<sup>T</sup> with lack of

, β\* ,

. At that time, Θ\* = (α\*

>>>>>>>>>>>>>>>>;

. The last equation in system 14 is Eq. (10) with the heuristic

th iteration, the two steps are shown in Table 9 ([19] p. 4).

<sup>T</sup> <sup>¼</sup> <sup>0</sup><sup>T</sup>

� � <sup>¼</sup> <sup>0</sup>

0 if yi 6¼ Y � � T

T

As usual, all parameters are changed after every iteration of EM algorithm, but fortunately, α\* is determined as a partial solution of equation system 14 at the first iteration of EM process because both X and y are complete. In other words, α\* is fixed, whereas β and μ<sup>Y</sup> are changed in EM process. Eq. (15) ([20] p. 417) specifies α\* .

$$\alpha^\* = \alpha^1 = \left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^T\mathbf{y} \tag{15}$$

where the superscript "�<sup>1</sup> " denotes the inversion of matrix.

At the first iteration, as usual Θ<sup>1</sup> is initialized arbitrarily but we can improve convergence of EM algorithm by initializing μ<sup>1</sup> <sup>Y</sup> as sample mean. Without loss of generality, suppose practitioners obtained n < N fetal weights z1, z2,…, zn from n ultrasound scans. Moreover, the fetal age of all pregnant women over such n scans is the same, which is Y. Thus, μ<sup>1</sup> <sup>Y</sup> is initialized by Eq. (16).

$$
\mu\_Y^1 = \frac{1}{n} \sum\_{i=1}^n z\_i \tag{16}
$$

The parameter β<sup>1</sup> at the first iteration is initialized according to previous studies in the literature.

#### 7. Conclusions

According to experimental results, there is no doubt that Phoebe framework produces optimal formulas with high adequacy and accuracy; please see Tables 4–8 for more details. However, we also recognize the weak point of our research is that the built-in SG algorithm can lose some good formulas due to the heuristic conditions. The suggestive solution is to add more constraints in such conditions; please read the article "A framework of fetal age and weight estimation" ([10] pp. 24–25) for more details. The proposal of early weight estimation uses actually an additional constraint which is the latent relationship between fetal age and fetal weight. Such latent relationship represented by the joint probability of fetal age and weight is a knowledge aspect of pregnancy study. For further research, we will make experiment on the proposal and try our best to discover other knowledge aspects.

Another weak point of our research is difficult to apply our complex formulas for fast mental calculation because we must pay the price for their high accuracy. In the future, we will embed these formulas into software or hardware of medical ultrasound machine so that users are easy to read estimated values resulted from machine.

[7] Lee W, Balasubramaniam M, Deter RL, Yeo L, Hassan SS, Gotsch F, Kusanovic JP, Gonçalves LF, Romero R. New fetal weight estimation models using fractional limb

Phoebe Framework and Experimental Results for Estimating Fetal Age and Weight

http://dx.doi.org/10.5772/intechopen.74883

123

[8] Chang F-M, Liang R-I, Ko H-C, Yao B-L, Chang C-H, Yu C-H. Three-dimensional ultrasound-assessed fetal thigh volumetry in predicting birth weight. Obstetrics & Gyne-

[9] Varol F, Saltik A, Kaplan PB, Kilic T, Yardim T. Evaluation of gestational age based on ultrasound fetal growth measurements. Yonsei Medical Journal. June 2001;42(3):299-303

[10] Flanagan MT. In: Flanagan MT, editor. Java Scientific Library. London, England: Univer-

[12] Oracle, "Java language," Oracle Corporation, [Online]. Available: https://www.oracle.

[13] Nguyen L, Ho H. A framework of fetal age and weight estimation. Journal of Gynecology

[14] Ho THT, Phan DT. Ước lượng cân nặng của thai từ 37–42 tuần bằng siêu âm 2 chiều.

[15] Ho T-HT, Phan DT. Ước lượng tuổi thai qua các số đo thể tích cánh tay bằng siêu âm 3 chiều và các số đo bằng siêu âm 2 chiều. Journal of Practical Medicine. December 2011;

[16] Nguyen L, Ho T-HT. Experimental results of phoebe framework: optimal formulas for estimating fetus weight and age. Journal of Community & Public Health Nursing13

[17] Lindsten F, Schön TB, Svensson A, Wahlström N. Probabilistic Modeling – Linear Regres-

[18] Nguyen L. In: Evans C, editor. Matrix Analysis and Calculus. 1st ed. Hanoi: Lambert

[19] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological). 1977;

[20] Montgomery DC, Runger GC. Applied Statistics and Probability for Engineers. 3rd ed.

[11] Jong Jd, A Java Expression Parser, Rotterdam: SpeQ Mathematics; 2010

volume. Ultrasound in Obstetrics & Gynecology. 1 November 2009;34(5):556-565

cology. September 1997;90(3):331-339

com/java. [Accessed 25 December 2014]

and Obstetrics (JGO). 30 March 2014;2(2):pp. 20-25

Journal of Practical Medicine. December 2011;12(797):8-9

sion & Gaussian processes. Uppsala: Uppsala University; 2017

New York, NY: John Wiley & Sons, Inc.; 2003. p. 706

sity College London; 2004

12(798):12-15

39(1):1-38

March 2017;3(2):1-5

Academic Publishing; 2015. p. 72

#### Acknowledgements

We express our deep gratitude to the author Michael Thomas Flanagan – University College London and the author Jos de Jong for giving us helpful software packages that help us to implement the framework.

#### Author details

Loc Nguyen1 \*, Truong-Duyet Phan<sup>2</sup> and Thu-Hang T. Ho<sup>3</sup>


#### References


these formulas into software or hardware of medical ultrasound machine so that users are easy

We express our deep gratitude to the author Michael Thomas Flanagan – University College London and the author Jos de Jong for giving us helpful software packages that help us to

[1] Hadlock FP, Harrist RB, Sharman RS, Deter RL, Park SK. Estimation of fetal weight with use of head, body and femur measurements: A prospective study. American Journal of

[2] Phan DT. Ứng dụng siêu âm để chẩn đoán tuổi thai và cân nặng thai trong tử cung.

[3] Phạm TNT. Ước lượng cân nặng thai nhi qua các số đo của thai trên siêu âm. Ho Chi

[4] Ho THT. Nghiên Cứu Phương Pháp Ước Lượng Trọng Lượng Thai, Tuổi Thai Bằng Siêu

[5] Shepard JM, Richards AV, Berkowitz LR, Warsof LS, Hobbins CJ. An evaluation of two equations for predicting fetal weight by ultrasound. American Journal of Obstetrics and

[6] Campbell S, Wilkin D. Ultrasonic measurement of fetal abdomen circumference in the estimation of fetal weight. BJOG: An International Journal of Obstetrics & Gynecology.

\*, Truong-Duyet Phan<sup>2</sup> and Thu-Hang T. Ho<sup>3</sup>

Obstetrics and Gynecology. 1 February 1985;151(3):pp. 333-337

Minh: Ho Chi Minh University of Medicine and Pharmacy; 2000

Âm Hai và Ba Chiều. Hanoi: Hanoi Univerisy of Medicine; 2011

\*Address all correspondence to: ng\_phloc@yahoo.com

1 Sunflower Soft Company, Ho Chi Minh, Vietnam

3 Vinh Long General Hospital, Vinh Long, Vietnam

Hanoi: Hanoi University of Medicine; 1985

Gynecology. 1 January 1982;142(1):47-54

September 1975;82(9):689-697

2 Hanoi Medical University, Hanoi, Vietnam

to read estimated values resulted from machine.

Acknowledgements

122 eHealth - Making Health Care Smarter

implement the framework.

Author details

Loc Nguyen1

References


**Chapter 8**

**Provisional chapter**

**Using Patient Registries to Identify Triggers of Rare**

**Using Patient Registries to Identify Triggers of Rare** 

Mapping the distribution of patients and analyzing disease clusters is an effective method in epidemiology, where the non-random aggregation of patients is carefully investigated. This can aid in the search for clues to the etiology of diseases, particularly the rare ones. Indeed, with the increased incidence of rare diseases in certain populations and/or geographic areas and with proper analysis of common exposures, it is possible to identify the likely promoters/triggers of these diseases at a given time. In this chapter, we will highlight the appropriate methodology and demonstrate several examples of cluster analyses that lead to the recognition of environmental, occupational and communicable

**Keywords:** cluster investigation, epidemiology, rare diseases, exposures, disease

Many diseases are preventable with lifestyle modifications and by minimizing exposures to harmful substances. In fact, it was recently reported that nearly half of all cancer-related deaths in the United States were attributable to modifiable and preventable risk factors [1]. Through epidemiological studies and careful examination of public health data such as disease registries, and by studying disease distribution, incidence, prevalence and mortality trends, the occurrence of diseases in defined populations can be estimated and can be related to different external factors. Disease clusters are aggregates of patients with a particular disease in a specified time period and at a defined geographical level, occurring at a rate markedly higher than

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

DOI: 10.5772/intechopen.76449

**Diseases**

**Diseases**

Feras M. Ghazawi, Steven J. Glassman, Denis Sasseville and Ivan V. Litvinov

Feras M. Ghazawi, Steven J. Glassman, Denis Sasseville and Ivan V. Litvinov

http://dx.doi.org/10.5772/intechopen.76449

**Abstract**

**1. Introduction**

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

preventable triggers of several rare diseases.

triggers, patient registries

#### **Using Patient Registries to Identify Triggers of Rare Diseases Using Patient Registries to Identify Triggers of Rare Diseases**

DOI: 10.5772/intechopen.76449

Feras M. Ghazawi, Steven J. Glassman, Denis Sasseville and Ivan V. Litvinov Feras M. Ghazawi, Steven J. Glassman, Denis Sasseville and Ivan V. Litvinov

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.76449

#### **Abstract**

Mapping the distribution of patients and analyzing disease clusters is an effective method in epidemiology, where the non-random aggregation of patients is carefully investigated. This can aid in the search for clues to the etiology of diseases, particularly the rare ones. Indeed, with the increased incidence of rare diseases in certain populations and/or geographic areas and with proper analysis of common exposures, it is possible to identify the likely promoters/triggers of these diseases at a given time. In this chapter, we will highlight the appropriate methodology and demonstrate several examples of cluster analyses that lead to the recognition of environmental, occupational and communicable preventable triggers of several rare diseases.

**Keywords:** cluster investigation, epidemiology, rare diseases, exposures, disease triggers, patient registries

#### **1. Introduction**

Many diseases are preventable with lifestyle modifications and by minimizing exposures to harmful substances. In fact, it was recently reported that nearly half of all cancer-related deaths in the United States were attributable to modifiable and preventable risk factors [1]. Through epidemiological studies and careful examination of public health data such as disease registries, and by studying disease distribution, incidence, prevalence and mortality trends, the occurrence of diseases in defined populations can be estimated and can be related to different external factors. Disease clusters are aggregates of patients with a particular disease in a specified time period and at a defined geographical level, occurring at a rate markedly higher than

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

expected. Analyzing and mapping the incidence rates of diseases indeed can help identify non-random distributions of patient clusters, while a proper assessment of the population demographics as well as the surrounding environment can implicate occupational, communicable and environmental exposures as potential causes for a given disease. For instance, in the 1800's, despite limited knowledge on the etiology of many diseases such as cholera, clustering analysis was the method that enabled physicians and scientists to establish a definite link between the disease outbreaks and causative or potentiating agents from the surrounding environment. In the case of cholera, water from a contaminated pump infected with *Vibrio cholerae* bacterium was clearly identified as a disease source in London, England. As highlighted in the examples in Section 2.2, it will be evident to the reader that geographic clustering analyses of patient populations can shed light on the triggers of many rare diseases.

devastating outcomes. The accepted explanation of cholera outbreaks at the time was attributed to the "Miasma theory", which suggested that poisonous vapor or mist filled with substances from decomposed matter ("Miasmata") caused many diseases including cholera, chlamydia and plague [5]. In 1854, a severe outbreak of cholera occurred in London, England, killing more than 600 people. Dr. John Snow, an English physician, investigated the cause of this epidemic by analyzing the geographic distribution of cholera and plotting cholera cases on a map, along with certain landmarks in the city including providers of potable water (**Figure 1**). Notably, most of the cholera cases occurred within 250 yards of the intersection of Broad and Cambridge streets and in close proximity to a public water pump on Broad street. This observation prompted the local council to disable the water pump which halted the spread of cholera. This analysis enabled the identification of the precise source of the cholera outbreak in London as the public water pump, which was built near an old open toilet. This work established for the first time that cholera can spread via contaminated water [6]. The breakthrough in fact paved the way for the field of epidemiology.

Using Patient Registries to Identify Triggers of Rare Diseases

http://dx.doi.org/10.5772/intechopen.76449

127

Mesothelioma is a rare but aggressive cancer that arises in the mesothelium, the lining of the pleura, peritoneum, and pericardium. Studying the prevalence of mesotheliomas in asbestos miners of South Africa established asbestos exposure as a critical factor responsible for this deadly malignancy. In a study by Wagner and colleagues, it was noted that while mesothelioma is a very rare disease in the Northwest Cape province of South Africa, 33 cases were described in the area, each having occupational exposure to crocidolite asbestos mining [8]. This finding was shortly followed by several population studies in Quebec Canada, United Kingdom, The Netherlands, Germany, Scotland and Northern Ireland, demonstrating that

**Figure 1.** Original map by Dr. John Snow illustrating clustering of cholera cases in the London epidemic of 1854. Cholera

*2.2.2. Mesothelioma*

cases are highlighted as black lines [7].

#### **2. Cluster investigation analysis**

#### **2.1. Cluster investigations**

In epidemiology, trends and causes of diseases and their progression and regression rates can be monitored over time and the occurrence of diseases within the defined populations can be estimated. There are several types of epidemiologic studies including cohort, case–control, cross-sectional, ecologic and cluster studies. Epidemiological studies have a significant impact on public health outcomes as they identify increased disease incidence/prevalence rates, which shape health policies including preventative measures and resource allocation planning, accordingly. Spatial epidemiology is the description and geographical analysis of health data, taking into account patients' demographics, and risk factors including socioeconomic, genetic, environmental, behavioral, infectious and noninfectious exposures [2]. Detection of disease clusters is an integral component of spatial epidemiology as it identifies disproportionately high rates of a disease in a given population, which ultimately generates hypotheses that can help elucidate disease triggers/promoters.

Clustering analyses can be characterized as either general (non-focused) or specific (focused). In a general clustering analysis, the precise location of disease clusters is not studied but rather the clustering tendency of the disease and the overall distribution of disease is examined [3, 4]. On the other hand, specific disease clustering analysis carefully describes unusual nonrandom accumulation of disease outbreaks and the precise location of clusters, in time or space, that are unlikely to be due to chance alone [3, 4]. These investigations can be applied to formulate hypotheses and elucidate potential causes of diseases. Further, clustering patterns of diseases have many applications beyond identifying disease triggers, including identification of areas with high disease prevalence in order to optimize medical management and resource allocation. This will be discussed in Section 2.2.

#### **2.2. Applications of cluster investigations in identifying disease triggers**

#### *2.2.1. Cholera*

Cholera is an acute infectious diarrheal disease that can be fatal within days if left untreated. This disease became a major health threat in the 1800s, with several outbreaks that had devastating outcomes. The accepted explanation of cholera outbreaks at the time was attributed to the "Miasma theory", which suggested that poisonous vapor or mist filled with substances from decomposed matter ("Miasmata") caused many diseases including cholera, chlamydia and plague [5]. In 1854, a severe outbreak of cholera occurred in London, England, killing more than 600 people. Dr. John Snow, an English physician, investigated the cause of this epidemic by analyzing the geographic distribution of cholera and plotting cholera cases on a map, along with certain landmarks in the city including providers of potable water (**Figure 1**). Notably, most of the cholera cases occurred within 250 yards of the intersection of Broad and Cambridge streets and in close proximity to a public water pump on Broad street. This observation prompted the local council to disable the water pump which halted the spread of cholera. This analysis enabled the identification of the precise source of the cholera outbreak in London as the public water pump, which was built near an old open toilet. This work established for the first time that cholera can spread via contaminated water [6]. The breakthrough in fact paved the way for the field of epidemiology.

#### *2.2.2. Mesothelioma*

expected. Analyzing and mapping the incidence rates of diseases indeed can help identify non-random distributions of patient clusters, while a proper assessment of the population demographics as well as the surrounding environment can implicate occupational, communicable and environmental exposures as potential causes for a given disease. For instance, in the 1800's, despite limited knowledge on the etiology of many diseases such as cholera, clustering analysis was the method that enabled physicians and scientists to establish a definite link between the disease outbreaks and causative or potentiating agents from the surrounding environment. In the case of cholera, water from a contaminated pump infected with *Vibrio cholerae* bacterium was clearly identified as a disease source in London, England. As highlighted in the examples in Section 2.2, it will be evident to the reader that geographic clustering analyses of patient populations can shed light on the triggers of many rare diseases.

In epidemiology, trends and causes of diseases and their progression and regression rates can be monitored over time and the occurrence of diseases within the defined populations can be estimated. There are several types of epidemiologic studies including cohort, case–control, cross-sectional, ecologic and cluster studies. Epidemiological studies have a significant impact on public health outcomes as they identify increased disease incidence/prevalence rates, which shape health policies including preventative measures and resource allocation planning, accordingly. Spatial epidemiology is the description and geographical analysis of health data, taking into account patients' demographics, and risk factors including socioeconomic, genetic, environmental, behavioral, infectious and noninfectious exposures [2]. Detection of disease clusters is an integral component of spatial epidemiology as it identifies disproportionately high rates of a disease in a given population, which ultimately generates

Clustering analyses can be characterized as either general (non-focused) or specific (focused). In a general clustering analysis, the precise location of disease clusters is not studied but rather the clustering tendency of the disease and the overall distribution of disease is examined [3, 4]. On the other hand, specific disease clustering analysis carefully describes unusual nonrandom accumulation of disease outbreaks and the precise location of clusters, in time or space, that are unlikely to be due to chance alone [3, 4]. These investigations can be applied to formulate hypotheses and elucidate potential causes of diseases. Further, clustering patterns of diseases have many applications beyond identifying disease triggers, including identification of areas with high disease prevalence in order to optimize medical management and

Cholera is an acute infectious diarrheal disease that can be fatal within days if left untreated. This disease became a major health threat in the 1800s, with several outbreaks that had

**2. Cluster investigation analysis**

hypotheses that can help elucidate disease triggers/promoters.

resource allocation. This will be discussed in Section 2.2.

**2.2. Applications of cluster investigations in identifying disease triggers**

**2.1. Cluster investigations**

126 eHealth - Making Health Care Smarter

*2.2.1. Cholera*

Mesothelioma is a rare but aggressive cancer that arises in the mesothelium, the lining of the pleura, peritoneum, and pericardium. Studying the prevalence of mesotheliomas in asbestos miners of South Africa established asbestos exposure as a critical factor responsible for this deadly malignancy. In a study by Wagner and colleagues, it was noted that while mesothelioma is a very rare disease in the Northwest Cape province of South Africa, 33 cases were described in the area, each having occupational exposure to crocidolite asbestos mining [8]. This finding was shortly followed by several population studies in Quebec Canada, United Kingdom, The Netherlands, Germany, Scotland and Northern Ireland, demonstrating that

**Figure 1.** Original map by Dr. John Snow illustrating clustering of cholera cases in the London epidemic of 1854. Cholera cases are highlighted as black lines [7].

most of the described mesothelioma patients clustered in several communities where occupational exposure to asbestos was routine. At that time asbestos was commonly used in insulation, construction, factory work as well as in shipyards. This analysis confirmed the causal link between asbestos and mesotheliomas and led to a legislative action to ban the use of this carcinogen in construction and other workplaces [9].

enzyme inhibitors, hydrochlorothiazide, and serotonin reuptake inhibitors), dermatophytes and viruses (EBV, HSV and HTLV-1) [21]. However, none of these agents have been defini-

Using Patient Registries to Identify Triggers of Rare Diseases

http://dx.doi.org/10.5772/intechopen.76449

129

In addition, recent studies in Canada further confirmed the existence of disease clusters and areas completely spared by this malignancy and implicated industrial exposure and living in a proximity to major transportation junctions as potential triggers for CTCL [22, 23]. Considering that the majority of skin cancers are caused by external and often preventable triggers (e.g. UV radiation, HPV, polyomaviruses, etc.) it is not surprising that skin lymphomas

**Figure 2.** Geographic mapping of cutaneous T-cell lymphoma cases in Houston, Texas demonstrating clustering pattern of patients with this rare malignancy with incidence rates 5–20-fold higher than expected. Patients in 'Spring' community are indicated by violet, patients in 'Katy' community are indicated by green, and patients residing in the

Houston memorial area are indicated by orange. Adapted from [17].

tively linked with this skin lymphoma.

#### *2.2.3. Squamous cell carcinoma*

For many centuries, arsenic was used by the Egyptians, Greeks, Asians and Romans for many applications including the treatment of rheumatism and for facial hair removal. Little was known about the carcinogenic effects of arsenic at that time. In 1898, Geyer conducted a detailed population study in Reichenstein, Silesia (Prussia). In this small arsenic-mining town, chronic poisoning took place primarily through the use of drinking water contaminated by precipitating arsenical fumes in the rain. This work shed light on the carcinogenic effects of arsenic [10]. Affected individuals developed a constellation of symptoms including pigmentation changes and hyperkeratosis (wart-like lesions) on the palms and soles. The latter had a high risk of progression to cutaneous squamous cell carcinomas. This condition was referred to as "Reichenstein's disease" [11]. The significant increase of this disease in the residents of this town helped establish the link between arsenic and the occurrence of arsenical keratoses and squamous cell carcinomas of the skin. This example further demonstrates the importance of non-random clustering of rare diseases in identifying novel environmental or occupational disease triggers.

#### *2.2.4. Cutaneous T-cell lymphoma*

Cutaneous T-cell lymphoma (CTCL) is a rare group of non-Hodgkin lymphomas that primarily involves the skin. Patients with CTCL typically present with persistent, red itchy patches and thickened plaques that are located mostly on the trunk. As the malignancy progresses, patients can develop skin tumors with concomitant involvement of lymph nodes and visceral organs. In some stages, the disease involves the blood and patients can develop erythroderma (generalized redness and desquamation of the skin) and suffer intractable pruritus as well as B-symptoms of lymphoma. Many advanced disease patients succumb to this malignancy within 2–3 years. Unfortunately, the risk factors and promoters for this disease remained poorly understood for many years. It is recognized that disruption of molecular pathways in skin lymphocytes by bacterial, viral or environmental factors can lead to cutaneous lymphomas [12–14]. Although progress has been made in the past few decades, the precise pathogenesis by which CTCL develops remains poorly understood. Several reports from different parts of the world examined the distribution of CTCL patients illustrating non-random clustering of cases. This was shown in Sweden [15], Houston, Texas (**Figure 2**) [16, 17] and the Pittsburgh metropolitan area [18]. Furthermore, the unusually high incidence of CTCL in married couples [19], and in families [20] was also noted. These clustering patterns of CTCL patients strongly argue for the existence of external and potentially preventable risk factors for this rare skin cancer.

Several factors have been implicated in CTCL carcinogenesis, including immunosuppression, vitamin D deficiency, bacterial agents (*Staphylococcus aureus*, *Mycobacterium leprae* and *Chlamydophila pneumoniae*), medications (calcium channel blockers, angiotensin converting enzyme inhibitors, hydrochlorothiazide, and serotonin reuptake inhibitors), dermatophytes and viruses (EBV, HSV and HTLV-1) [21]. However, none of these agents have been definitively linked with this skin lymphoma.

most of the described mesothelioma patients clustered in several communities where occupational exposure to asbestos was routine. At that time asbestos was commonly used in insulation, construction, factory work as well as in shipyards. This analysis confirmed the causal link between asbestos and mesotheliomas and led to a legislative action to ban the use of this

For many centuries, arsenic was used by the Egyptians, Greeks, Asians and Romans for many applications including the treatment of rheumatism and for facial hair removal. Little was known about the carcinogenic effects of arsenic at that time. In 1898, Geyer conducted a detailed population study in Reichenstein, Silesia (Prussia). In this small arsenic-mining town, chronic poisoning took place primarily through the use of drinking water contaminated by precipitating arsenical fumes in the rain. This work shed light on the carcinogenic effects of arsenic [10]. Affected individuals developed a constellation of symptoms including pigmentation changes and hyperkeratosis (wart-like lesions) on the palms and soles. The latter had a high risk of progression to cutaneous squamous cell carcinomas. This condition was referred to as "Reichenstein's disease" [11]. The significant increase of this disease in the residents of this town helped establish the link between arsenic and the occurrence of arsenical keratoses and squamous cell carcinomas of the skin. This example further demonstrates the importance of non-random clustering of rare

Cutaneous T-cell lymphoma (CTCL) is a rare group of non-Hodgkin lymphomas that primarily involves the skin. Patients with CTCL typically present with persistent, red itchy patches and thickened plaques that are located mostly on the trunk. As the malignancy progresses, patients can develop skin tumors with concomitant involvement of lymph nodes and visceral organs. In some stages, the disease involves the blood and patients can develop erythroderma (generalized redness and desquamation of the skin) and suffer intractable pruritus as well as B-symptoms of lymphoma. Many advanced disease patients succumb to this malignancy within 2–3 years. Unfortunately, the risk factors and promoters for this disease remained poorly understood for many years. It is recognized that disruption of molecular pathways in skin lymphocytes by bacterial, viral or environmental factors can lead to cutaneous lymphomas [12–14]. Although progress has been made in the past few decades, the precise pathogenesis by which CTCL develops remains poorly understood. Several reports from different parts of the world examined the distribution of CTCL patients illustrating non-random clustering of cases. This was shown in Sweden [15], Houston, Texas (**Figure 2**) [16, 17] and the Pittsburgh metropolitan area [18]. Furthermore, the unusually high incidence of CTCL in married couples [19], and in families [20] was also noted. These clustering patterns of CTCL patients strongly argue for the

existence of external and potentially preventable risk factors for this rare skin cancer.

Several factors have been implicated in CTCL carcinogenesis, including immunosuppression, vitamin D deficiency, bacterial agents (*Staphylococcus aureus*, *Mycobacterium leprae* and *Chlamydophila pneumoniae*), medications (calcium channel blockers, angiotensin converting

diseases in identifying novel environmental or occupational disease triggers.

carcinogen in construction and other workplaces [9].

*2.2.3. Squamous cell carcinoma*

128 eHealth - Making Health Care Smarter

*2.2.4. Cutaneous T-cell lymphoma*

In addition, recent studies in Canada further confirmed the existence of disease clusters and areas completely spared by this malignancy and implicated industrial exposure and living in a proximity to major transportation junctions as potential triggers for CTCL [22, 23]. Considering that the majority of skin cancers are caused by external and often preventable triggers (e.g. UV radiation, HPV, polyomaviruses, etc.) it is not surprising that skin lymphomas

**Figure 2.** Geographic mapping of cutaneous T-cell lymphoma cases in Houston, Texas demonstrating clustering pattern of patients with this rare malignancy with incidence rates 5–20-fold higher than expected. Patients in 'Spring' community are indicated by violet, patients in 'Katy' community are indicated by green, and patients residing in the Houston memorial area are indicated by orange. Adapted from [17].

could also be caused by an external trigger. Currently, the search for such trigger(s) for this malignancy is ongoing.

*2.2.7.2. Alzheimer's disease*

*2.3.1. Scurvy*

*2.3.2. Goiter*

identify nutritional deficiencies.

between nutritional deficiency and disease.

at least in part, driven by epidemiological research.

Alzheimer's disease is a common, yet incompletely understood form of dementia. Differences in the geographical distribution of patients with Alzheimer's disease were reported, highlighting the possible contribution of nutritional or socio-environmental factors in the development and progression of the disease [34]. Indeed, levels of essential trace elements including selenium, magnesium, iron, copper and zinc were shown to be markedly reduced in Alzheimer's patients compared to same age healthy individuals [35]. This illustrates that further epide-

Using Patient Registries to Identify Triggers of Rare Diseases

http://dx.doi.org/10.5772/intechopen.76449

131

Deficiency in micronutrients and vitamins can result in a variety of diseases. For instance, vitamin A deficiency is a known cause of keratomalacia, while vitamin D deficiency in childhood invariably causes rickets. One important use of clustering analysis in epidemiology is to

During the Age of Discovery in the fifteenth and sixteenth century, particularly during long transatlantic journeys, it was noted that the incidence of scurvy, a rare disease caused by a severe deficiency of vitamin C (ascorbic acid), was much higher in sailors, pirates and other sea explorers. Also, the disease later affected soldiers in world wars. Scurvy is characterized by general weakness, gingivitis and bleeding disorders. It was noted that eating citrus fruits prevented and cured this disease in sailors, which enabled later confirmation that vitamin C deficiency is the sole cause of scurvy. Thus, careful demographic and epidemiologic analyses of these individuals, who did not have access to fresh fruit and vegetables, established a link

Thyroid goiters, which represent enlargement of the thyroid gland, are caused by iodine deficiency. Fortification of table salt, medications and common foods like bread with iodine has largely eliminated the once pandemic goiter, but the condition persists in some regions of the developing world. The first hypothesis linking iodine with the treatment of goiter was made in the mid-1800s by a French chemist, Adolphe Chatin [36]. However, fortification of table salt with iodine was not implemented in the United States until the early 1920s [37], and this was,

It was noted that the prevalence of goiter was very high (in approximately 26–70% of children) in the upper Midwest and Great Lakes regions of the United States. In fact, this endemic region was known at the time as the "Goiter Belt" [38]. The prevalence was also reported as high as 64.4% in some areas of Michigan [39]. This highlighted the severity of the problem, sparking a major public health initiative to supplement table salt with iodine. The intervention was very successful, as the incidence of goiter in Michigan dropped by up to 90% within a decade of iodine supplementation [40]. Currently, several areas have remarkably

miologic studies can be used to associate nutritional deficiencies with diseases.

**2.3. Applications of cluster investigations in identifying nutritional deficiencies**

#### *2.2.5. Childhood leukemia*

Another example revealing a cause of an important disease came from an observation in the early 1980s in Woburn, Massachusetts, where an elevated incidence rate of childhood leukemia was documented. An extensive investigation of the geographical distribution of these patients helped implicate chlorinated organic compounds contaminating two of eight municipal wells servicing Woburn as a cause of childhood leukemia. Specifically, it was shown that select dwellings where the patients with this cancer resided, were provided water from these contaminated wells [24].

#### *2.2.6. Bladder cancer*

Bladder cancer is a disease of significant morbidity and mortality [25]. Cluster investigation recently helped identify occupational and behavioral promoters for this cancer. These factors are potentially modifiable and thus rates of this malignancy could possibly be reduced with primary prevention. The astute observation in 1895 by Rehn, a German physician, showed that the incidence rates of bladder cancer were remarkably high in aniline dye industry workers. This was the first evidence that occupational risk factors can be directly implicated in this malignancy [26]. By carefully analyzing the incidence of bladder cancers in industrial workers, it was possible to identify aromatic amines, polycyclic aromatic hydrocarbons and chlorinated hydrocarbons that are now well recognized as causative agents for this disease [25].

#### *2.2.7. Emerging trends*

#### *2.2.7.1. Multiple sclerosis*

Multiple sclerosis (MS) is an autoimmune demyelinating disease, affecting the central nervous system and resulting in a spectrum of neurological symptoms including vision problems, fatigue, pains, spasms and cognitive decline. The precise triggers of this rare disease have not yet been described or identified. However, studying the epidemiology and geographic distribution of MS globally has yielded many interesting trends that allowed generation of a number of hypotheses addressing the cause of MS. Clusters of new MS cases have been reported in many communities around the world including the United States, Canada, Europe, Israel, New Zealand, Australia and Russia [27–30]. Many studies indicated significant variation in the global distribution of MS patients, where the incidence of this autoimmune disease is relatively uncommon in tropical climates, but is much more common in temperate zones and in the Western Hemisphere [31]. Furthermore, remarkably elevated incidence rates in northern latitudes were reported [32, 33]. Many theories have been postulated to implicate promoters of MS, such as diet, soil minerals and deficiency in vitamin D [32, 33]. The identity of a definite trigger for MS remains unknown, and extensive follow up of identified clusters may potentially provide some clues in the future.

#### *2.2.7.2. Alzheimer's disease*

could also be caused by an external trigger. Currently, the search for such trigger(s) for this

Another example revealing a cause of an important disease came from an observation in the early 1980s in Woburn, Massachusetts, where an elevated incidence rate of childhood leukemia was documented. An extensive investigation of the geographical distribution of these patients helped implicate chlorinated organic compounds contaminating two of eight municipal wells servicing Woburn as a cause of childhood leukemia. Specifically, it was shown that select dwellings where the patients with this cancer resided, were provided water from these

Bladder cancer is a disease of significant morbidity and mortality [25]. Cluster investigation recently helped identify occupational and behavioral promoters for this cancer. These factors are potentially modifiable and thus rates of this malignancy could possibly be reduced with primary prevention. The astute observation in 1895 by Rehn, a German physician, showed that the incidence rates of bladder cancer were remarkably high in aniline dye industry workers. This was the first evidence that occupational risk factors can be directly implicated in this malignancy [26]. By carefully analyzing the incidence of bladder cancers in industrial workers, it was possible to identify aromatic amines, polycyclic aromatic hydrocarbons and chlorinated hydrocarbons that are now well recognized as causative

Multiple sclerosis (MS) is an autoimmune demyelinating disease, affecting the central nervous system and resulting in a spectrum of neurological symptoms including vision problems, fatigue, pains, spasms and cognitive decline. The precise triggers of this rare disease have not yet been described or identified. However, studying the epidemiology and geographic distribution of MS globally has yielded many interesting trends that allowed generation of a number of hypotheses addressing the cause of MS. Clusters of new MS cases have been reported in many communities around the world including the United States, Canada, Europe, Israel, New Zealand, Australia and Russia [27–30]. Many studies indicated significant variation in the global distribution of MS patients, where the incidence of this autoimmune disease is relatively uncommon in tropical climates, but is much more common in temperate zones and in the Western Hemisphere [31]. Furthermore, remarkably elevated incidence rates in northern latitudes were reported [32, 33]. Many theories have been postulated to implicate promoters of MS, such as diet, soil minerals and deficiency in vitamin D [32, 33]. The identity of a definite trigger for MS remains unknown, and extensive follow up of identified clusters

malignancy is ongoing.

130 eHealth - Making Health Care Smarter

*2.2.5. Childhood leukemia*

contaminated wells [24].

agents for this disease [25].

*2.2.7. Emerging trends*

*2.2.7.1. Multiple sclerosis*

may potentially provide some clues in the future.

*2.2.6. Bladder cancer*

Alzheimer's disease is a common, yet incompletely understood form of dementia. Differences in the geographical distribution of patients with Alzheimer's disease were reported, highlighting the possible contribution of nutritional or socio-environmental factors in the development and progression of the disease [34]. Indeed, levels of essential trace elements including selenium, magnesium, iron, copper and zinc were shown to be markedly reduced in Alzheimer's patients compared to same age healthy individuals [35]. This illustrates that further epidemiologic studies can be used to associate nutritional deficiencies with diseases.

#### **2.3. Applications of cluster investigations in identifying nutritional deficiencies**

#### *2.3.1. Scurvy*

Deficiency in micronutrients and vitamins can result in a variety of diseases. For instance, vitamin A deficiency is a known cause of keratomalacia, while vitamin D deficiency in childhood invariably causes rickets. One important use of clustering analysis in epidemiology is to identify nutritional deficiencies.

During the Age of Discovery in the fifteenth and sixteenth century, particularly during long transatlantic journeys, it was noted that the incidence of scurvy, a rare disease caused by a severe deficiency of vitamin C (ascorbic acid), was much higher in sailors, pirates and other sea explorers. Also, the disease later affected soldiers in world wars. Scurvy is characterized by general weakness, gingivitis and bleeding disorders. It was noted that eating citrus fruits prevented and cured this disease in sailors, which enabled later confirmation that vitamin C deficiency is the sole cause of scurvy. Thus, careful demographic and epidemiologic analyses of these individuals, who did not have access to fresh fruit and vegetables, established a link between nutritional deficiency and disease.

#### *2.3.2. Goiter*

Thyroid goiters, which represent enlargement of the thyroid gland, are caused by iodine deficiency. Fortification of table salt, medications and common foods like bread with iodine has largely eliminated the once pandemic goiter, but the condition persists in some regions of the developing world. The first hypothesis linking iodine with the treatment of goiter was made in the mid-1800s by a French chemist, Adolphe Chatin [36]. However, fortification of table salt with iodine was not implemented in the United States until the early 1920s [37], and this was, at least in part, driven by epidemiological research.

It was noted that the prevalence of goiter was very high (in approximately 26–70% of children) in the upper Midwest and Great Lakes regions of the United States. In fact, this endemic region was known at the time as the "Goiter Belt" [38]. The prevalence was also reported as high as 64.4% in some areas of Michigan [39]. This highlighted the severity of the problem, sparking a major public health initiative to supplement table salt with iodine. The intervention was very successful, as the incidence of goiter in Michigan dropped by up to 90% within a decade of iodine supplementation [40]. Currently, several areas have remarkably high prevalence of goiter, such as parts of India and the Himalayan/sub-Himalayan belts [41]. In fact, despite efforts to implement iodine supplementation and table salt fortification with iodine, the goiter prevalence in these communities has not decreased significantly [42]. Thus, more work needs to be done to address logistic, cultural and other obstacles to eliminate suffering from goiter in these regions. In conclusion, recognizing the high prevalence of 'uncommon' diseases such as goiter has important clinical implications. These studies help detect regions with micronutrient deficiency, which can serve as surrogate markers for poor nutrition and encourage prioritizing resource allocation to the affected communities.

Relevant collected information should include age at diagnosis, year of diagnosis (for incidence calculation), gender, ethnic background (to study disease ethnic predilection), patients' addresses (for geographical mapping), age at death, year of death (for mortality

Using Patient Registries to Identify Triggers of Rare Diseases

http://dx.doi.org/10.5772/intechopen.76449

133

**5.** Subsequent calculations of incidence can be easily performed using the obtained data (incidence rate per year = number of new patients per year/population at risk per year). A plot of incidence rates (y axis) *vs.* year (x axis) will enable calculating an average incidence rate and trending the change of rate over time. Mortality calculations are done similarly, using

**6.** Incidence rates in smaller geographical regions can be calculated similarly. For rare diseases, it is important to include only locations with at least >5000–10,000 residents per geographical area to reduce erroneous false-positive hits, in which a few cases of disease occurring within a scarcely populated area (e.g., <5000 residents) may artificially inflate the

**7.** The calculated incidence/mortality rates can be normalized to several variables (such as age, gender, ethnicity) or to a known distribution of relevant disease-specific variables (such as communicable diseases, geographical latitude, socioeconomic status, etc.) This is important to account for potential confounding variables and to highlight trends that can

**8.** Conduct proper statistical analysis to determine statistically significant high and low incidence/mortality rates per geographical region at all levels. Two of the most commonly used methods of statistical analysis are the chi-square test (comparing observed number of cases to that expected under an assumed Poisson distribution) and the Knox test for time–space interaction, among more than 70 different methods, which have been used in

**9.** Plot the incidence rates in a specialized computer program such as ArcGIS or other geographic information system (GIS) software. Generate several maps, choosing appropriate color schemes representing standardized rates. It may also be advantageous to generate maps representing rates of statistical significance. Maps should serve as a clear, rapid and informative summary of complex geographical information and should help the reader

**10.** Repeat the mapping analysis (step 9) using different normalized rates. Map the data in different formats and beware of "biased mapping" which was discussed elsewhere [44].

**11.** Visualize and further analyze the plotted maps and note the presence of disease clusters ("disease hot-spots") as well as areas of significantly low incidence/mortality rates ("coldspots"). Observe for interesting trends, particularly, if several of these clusters occur geographically side-by-side and are supported by hypotheses/current evidence of disease pathogenesis. It is often useful to compare generated disease maps with land-use maps

calculation), disease stage, etc.

incidence/mortality rate.

previously published studies [43].

number of deceased patients per year/population at risk.

be 'masked' if rates are not normalized in subsequent analyses.

identify interesting trends and generate relevant hypotheses.

that can be obtained from local authorities.

Ensure plotting maps that convey the message clearly and accurately.

#### **2.4. Conducting a proper cluster investigation analysis**

#### *2.4.1. Systematic approach to conducting a cluster analysis*

The study of the incidence/prevalence of a disease and mapping its distribution requires a systematic approach when trying to implicate occupational and environmental exposures as disease triggers/promoters. Mapping and exposure investigations are critical to highlight the existence and significance of identified clusters. However, it is not enough to only learn about the geographical disease clusters (i.e., disease hot-spots). It is also important to identify regions that are significantly spared by the disease (i.e., disease cold-spots). Detailed epidemiological and statistical analysis of both can help rule-in or rule-out environmental contamination or exposures as disease triggers [43]. A point-by-point guide of a systematic approach to conducting a cluster investigation is provided below:


Relevant collected information should include age at diagnosis, year of diagnosis (for incidence calculation), gender, ethnic background (to study disease ethnic predilection), patients' addresses (for geographical mapping), age at death, year of death (for mortality calculation), disease stage, etc.

high prevalence of goiter, such as parts of India and the Himalayan/sub-Himalayan belts [41]. In fact, despite efforts to implement iodine supplementation and table salt fortification with iodine, the goiter prevalence in these communities has not decreased significantly [42]. Thus, more work needs to be done to address logistic, cultural and other obstacles to eliminate suffering from goiter in these regions. In conclusion, recognizing the high prevalence of 'uncommon' diseases such as goiter has important clinical implications. These studies help detect regions with micronutrient deficiency, which can serve as surrogate markers for poor

The study of the incidence/prevalence of a disease and mapping its distribution requires a systematic approach when trying to implicate occupational and environmental exposures as disease triggers/promoters. Mapping and exposure investigations are critical to highlight the existence and significance of identified clusters. However, it is not enough to only learn about the geographical disease clusters (i.e., disease hot-spots). It is also important to identify regions that are significantly spared by the disease (i.e., disease cold-spots). Detailed epidemiological and statistical analysis of both can help rule-in or rule-out environmental contamination or exposures as disease triggers [43]. A point-by-point guide of a systematic approach

**2.** Obtain 'background' information about patient demographics to enable standardization of incidence and mortality rates (such as standardization by age, gender, race, socioeconomic

**3.** Obtain census or other population information to enable calculating incidence and mortality rates per country, territory/state/province, city and postal code. It is also helpful to learn about common exposures or diseases in that population to adjust for potential confounders. For instance, when studying the incidence of hepatitis C infection in a population, the rate of HIV prevalence would be an important confounder, since in many patients there is co-infection with both viruses due to shared risk factors for viral transmission. Population demographic parameters often vary and can be useful for subsequent analysis of collected data. The specific parameters of interest will differ for each disease, but often include population size, age ranges, race, gender distribution, socioeconomic status, data on lifestyle/behaviors, other environmental, occupational, or

**4.** Obtain public health data on patients with the disease of interest (e.g. local or national cancer registries and Centers for Disease Control, etc.). It is critical to obtain the data from population-based registries since it is often very difficult to draw conclusions from data based on a single medical center or a few select hospitals' experience. One must always seek to correlate single center evidence with population-based registries/databases.

nutrition and encourage prioritizing resource allocation to the affected communities.

**2.4. Conducting a proper cluster investigation analysis**

*2.4.1. Systematic approach to conducting a cluster analysis*

to conducting a cluster investigation is provided below:

**1.** Define the disease and population(s) to be examined.

local rates of communicable diseases, etc.

status, etc.).

132 eHealth - Making Health Care Smarter


**12.** Perform sub-analysis of the identified "disease hot-spots" and correlate with the surrounding environment for any prevalent occupations, exposures, environmental factors, etc. If the patients within the area of high incidence (e.g. within a zip/postal code or a city) demonstrate an additional level of clustering (e.g., living on the same street or up and down the stream or river) it can further strengthen clustering findings and provide clues regarding possible triggers/exposures.

**Author details**

Feras M. Ghazawi<sup>1</sup>

**References**

2017/11/22

1855;**vii**(1):162

271 Epub 1960/10/01

305 Epub 1991/04/01

, Steven J. Glassman<sup>1</sup>

1 Division of Dermatology, University of Ottawa, Ottawa, Ontario, Canada

\*Address all correspondence to: ivan.litvinov@mcgill.ca

, Denis Sasseville<sup>2</sup>

2 Division of Dermatology, McGill University Health Centre, Montréal, Québec, Canada

[1] Islami F, Goding Sauer A, Miller KD, Siegel RL, Fedewa SA, Jacobs EJ, et al. Proportion and number of cancer cases and deaths attributable to potentially modifiable risk factors in the United States. CA: A Cancer Journal for Clinicians. 2017;**68**(1):31-54. Epub

[2] Elliott P, Wartenberg D. Spatial epidemiology: Current approaches and future challenges. Environmental Health Perspectives. 2004;**112**(9):998-1006 Epub 2004/06/17 [3] Rezaeian M, Dunn G, St Leger S, Appleby L. Geographical epidemiology, spatial analysis and geographical information systems: A multidisciplinary glossary. Journal of

[4] Torabi M, Rosychuk RJ. An examination of five spatial disease clustering methodologies for the identification of childhood cancer clusters in Alberta, Canada. Spatial and Spatio-

[5] Halliday S. Death and miasma in Victorian London: An obstinate belief. BMJ.

[6] Snow SJ. John snow: The making of a hero? Lancet. 2008;**372**(9632):22-23 Epub 2008/07/10 [7] Snow J. On the Mode of Communication of Cholera. 2nd ed. London,: J. Churchill.

[8] Wagner JC, Sleggs CA, Marchand P. Diffuse pleural mesothelioma and asbestos exposure in the north western Cape Province. British Journal of Industrial Medicine. 1960;**17**:260-

[9] McDonald JC. Epidemiology of malignant mesothelioma—An outline. The Annals of

[10] Geyer L. Über die chronischen Hautveränderungen beim Arsenicismus und Betrachtungen über die Massenerkrankungen in Reichenstein in Schlesien. Archiv für

[11] Schwartz RA. Reichenstein disease. International Journal of Dermatology. 1991;**30**(4):304-

Epidemiology and Community Health. 2007;**61**(2):98-102 Epub 2007/01/20

Temporal Epidemiology. 2011;**2**(4):321-330 Epub 2012/07/04

Occupational Hygiene. 2010;**54**(8):851-857 Epub 2010/11/10

Dermatologie und Syphilis. 1898;**43**(1):221-280

2001;**323**(7327):1469-1471 Epub 2001/12/26

and Ivan V. Litvinov<sup>2</sup>

Using Patient Registries to Identify Triggers of Rare Diseases

http://dx.doi.org/10.5772/intechopen.76449

\*

135

#### *2.4.2. Limitations and bias*

As illustrated in this chapter, studying the spatial patterns and geographical distribution of diseases has many benefits including the identification of disease clusters. This can be a powerful tool to help identify disease triggers and to better allocate financial and logistic resources for better management of these medical conditions. When the analysis is conducted properly, results are often specific. However, as in any type of analysis, one must be aware of potential limitations and intrinsic bias of the method. When analyzing clusters of patients in a given geographical region, one must be aware that there is a possibility that at least some of the observed clusters may be occurring by chance alone. Another important point, when studying the incidence of rare diseases in small regions: it is imperative to bracket the population analysis to at least 5000–10,000 residents per geographical area to reduce erroneous false-positive hits. Also, association does not always imply causality. Extensive additional field and experimental work must be performed to link identified associations causally with a given disease. Finally, one must be careful when directly comparing different geographic clustering studies as differences in the inclusion criteria, statistical methods or intrinsic differences of the populations at risk can produce divergent results.

#### **3. Conclusions**

The applications of cluster studies in medicine have developed rather rapidly in recent decades. These will enable us to focus on studying risk factors and possible etiologic triggers of rare cancers and other conditions. Furthermore, this work can help make informed decisions regarding resource allocation and promote the development of primary prevention programs.

#### **Acknowledgements**

The authors would like to sincerely thank both Dr. Linda Moreau and Dr. Elham Rahme for their generous support and valuable advice.

#### **Conflict of interest**

The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this book chapter.

#### **Author details**

**12.** Perform sub-analysis of the identified "disease hot-spots" and correlate with the surrounding environment for any prevalent occupations, exposures, environmental factors, etc. If the patients within the area of high incidence (e.g. within a zip/postal code or a city) demonstrate an additional level of clustering (e.g., living on the same street or up and down the stream or river) it can further strengthen clustering findings and provide clues

As illustrated in this chapter, studying the spatial patterns and geographical distribution of diseases has many benefits including the identification of disease clusters. This can be a powerful tool to help identify disease triggers and to better allocate financial and logistic resources for better management of these medical conditions. When the analysis is conducted properly, results are often specific. However, as in any type of analysis, one must be aware of potential limitations and intrinsic bias of the method. When analyzing clusters of patients in a given geographical region, one must be aware that there is a possibility that at least some of the observed clusters may be occurring by chance alone. Another important point, when studying the incidence of rare diseases in small regions: it is imperative to bracket the population analysis to at least 5000–10,000 residents per geographical area to reduce erroneous false-positive hits. Also, association does not always imply causality. Extensive additional field and experimental work must be performed to link identified associations causally with a given disease. Finally, one must be careful when directly comparing different geographic clustering studies as differences in the inclusion criteria, statistical methods or intrinsic differences of the populations at risk can produce divergent results.

The applications of cluster studies in medicine have developed rather rapidly in recent decades. These will enable us to focus on studying risk factors and possible etiologic triggers of rare cancers and other conditions. Furthermore, this work can help make informed decisions regarding

The authors would like to sincerely thank both Dr. Linda Moreau and Dr. Elham Rahme for

The authors declare no potential conflicts of interest with respect to the research, authorship,

resource allocation and promote the development of primary prevention programs.

regarding possible triggers/exposures.

*2.4.2. Limitations and bias*

134 eHealth - Making Health Care Smarter

**3. Conclusions**

**Acknowledgements**

**Conflict of interest**

their generous support and valuable advice.

and/or publication of this book chapter.

Feras M. Ghazawi<sup>1</sup> , Steven J. Glassman<sup>1</sup> , Denis Sasseville<sup>2</sup> and Ivan V. Litvinov<sup>2</sup> \*

\*Address all correspondence to: ivan.litvinov@mcgill.ca


#### **References**


[12] Bogle MA, Riddle CC, Triana EM, Jones D, Duvic M. Primary cutaneous B-cell lymphoma. Journal of the American Academy of Dermatology. 2005;**53**(3):479-484

[24] Zelen SWLBJWM. An analysis of contaminated well water and health effects in Woburn,

Using Patient Registries to Identify Triggers of Rare Diseases

http://dx.doi.org/10.5772/intechopen.76449

137

[25] Burger M, Catto JW, Dalbagni G, Grossman HB, Herr H, Karakiewicz P, et al. Epidemiology and risk factors of urothelial bladder cancer. European Urology.

[26] Frumin E, Velez H, Bingham E, Gillen M, Brathwaite M, LaBarck R. Occupational bladder cancer in textile dyeing and printing workers: Six cases and their significance for screening programs. Journal of Occupational Medicine. 1990;**32**(9):887-890 Epub 1990/09/01 [27] Kurtzke JF. Multiple sclerosis in time and space—Geographic clues to cause. Journal of

[28] Bezzini D, Pepe P, Profili F, Meucci G, Ulivelli M, Bartalini S, et al. Multiple sclerosis spatial cluster in Tuscany. Neurological Sciences. 2017;**38**(12):2183-2187. Epub 2017/10/12

[29] Sheremata WA, Poskanzer DC, Withum DG, MacLeod CL, Whiteside ME. Unusual occurrence on a tropical island of multiple sclerosis. Lancet. 1985;**2**(8455):618 Epub

[30] Schiffer RB, McDermott MP, Copley C.A multiple sclerosis cluster associated with a small, north-central Illinois community. Archives of Environmental Health. 2001;**56**(5):389-395

[31] Jin Y, de Pedro-Cuesta J, Soderstrom M, Stawiarz L, Link H. Seasonal patterns in optic neuritis and multiple sclerosis: A meta-analysis. Journal of the Neurological Sciences.

[32] Simpson S Jr, Blizzard L, Otahal P, Van der Mei I, Taylor B. Latitude is significantly associated with the prevalence of multiple sclerosis: A meta-analysis. Journal of Neurology,

[33] Koch-Henriksen N, Sorensen PS. The changing demographic pattern of multiple sclerosis epidemiology. The Lancet Neurology. 2010;**9**(5):520-532 Epub 2010/04/20

[34] Russ TC, Batty GD, Hearnshaw GF, Fenton C, Starr JM. Geographical variation in dementia: Systematic review with meta-analysis. International Journal of Epidemiology.

[35] Vural H, Demirin H, Kara Y, Eren I, Delibas N. Alterations of plasma magnesium, copper, zinc, iron and selenium concentrations and some related erythrocyte antioxidant enzyme activities in patients with Alzheimer's disease. Journal of Trace Elements in

[36] Chatin A. Recherches sur l'iode des eaux douces; de la presence de ce xorps sand les plantes at les animaux terrestes. Comptes Rendus de l'Académie des Sciences.

Neurosurgery, and Psychiatry. 2011;**82**(10):1132-1141 Epub 2011/04/12

Medicine and Biology. 2010;**24**(3):169-173 Epub 2010/06/24

Massachusetts. Journal of the American Statistical Society. 1986;**81**:583-596

Neurovirology. 2000;**6**(Suppl 2):S134-S140 Epub 2000/06/29

2013;**63**(2):234-241 Epub 2012/08/11

1985/09/14

Epub 2002/01/05

1852;**35**:505-517

2000;**181**(1-2):56-64 Epub 2000/12/02

2012;**41**(4):1012-1032 Epub 2012/07/17


[24] Zelen SWLBJWM. An analysis of contaminated well water and health effects in Woburn, Massachusetts. Journal of the American Statistical Society. 1986;**81**:583-596

[12] Bogle MA, Riddle CC, Triana EM, Jones D, Duvic M. Primary cutaneous B-cell lymphoma. Journal of the American Academy of Dermatology. 2005;**53**(3):479-484

[13] Suzuki R. Pathogenesis and treatment of extranodal natural killer/T-cell lymphoma.

[14] Tsukasaki K, Tobinai K. Human T-cell lymphotropic virus type I-associated adult T-cell leukemia-lymphoma: New directions in clinical research. Clinical Cancer Research: An Official Journal of the American Association for Cancer Research. 2014;**20**(20):5217-5225

[15] Gip L, Nilsson E. Ansamling av mycosis fungoides i Vasternorrlands lan [Clustering of mycosis fungoides in the county of Vasternorrland]. Lakartidningen. 1977;**74**(12):1174-

[16] Litvinov IV, Tetzlaff MT, Rahme E, Jennings MA, Risser DR, Gangar P, et al. Demographic patterns of cutaneous T-cell lymphoma incidence in Texas based on two different cancer

[17] Litvinov IV, Tetzlaff MT, Rahme E, Habel Y, Risser DR, Gangar P, et al. Identification of geographic clustering and regions spared by cutaneous T-cell lymphoma in Texas using

[18] Moreau JF, Buchanich JM, Geskin JZ, Akilov OE, Geskin LJ. Non-random geographic distribution of patients with cutaneous T-cell lymphoma in the greater Pittsburgh area. Dermatology Online Journal. 2014;**20**(7):pii: 13030/qt4nw7592w. Epub 2014/07/22

[19] Hazen PG, Michel B. Hodgkin's disease and mycosis fungoides in a married couple.

[20] Hodak E, Klein T, Gabay B, Ben-Amitai D, Bergman R, Gdalevich M, et al. Familial mycosis fungoides: Report of 6 kindreds and a study of the HLA system. Journal of the

[21] Litvinov IV, Shtreis A, Kobayashi K, Glassman S, Tsang M, Woetmann A, et al. Investigating potential exogenous tumor initiating and promoting factors for cutaneous T-cell lymphomas (CTCL), a rare skin malignancy. Oncoimmunology. 2016;**5**(7):e1175799

[22] Ghazawi FM, Netchiporouk E, Rahme E, Tsang M, Moreau L, Glassman S, et al. Comprehensive analysis of cutaneous T-cell lymphoma (CTCL) incidence and mortality in Canada reveals changing trends and geographic clustering for this malignancy.

[23] Ghazawi FM, Netchiporouk E, Rahme E, Tsang M, Moreau L, Glassman S, et al. Distribution and clustering of cutaneous T-cell lymphoma (CTCL) cases in Canada during 1992 to 2010. Journal of Cutaneous Medicine and Surgery. 2018 Mar/Apr;**22**(2):

American Academy of Dermatology. 2005;**52**(3 Pt 1):393-402 Epub 2005/03/12

2 distinct cancer registries. Cancer. 2015;**121**(12):1993-2003 Epub 2015/03/03

registries. Cancer Medicine. 2015;**4**(9):1440-1447 Epub 2015/07/03

Dermatologica. 1977;**154**(5):257-260 Epub 1977/01/01

Cancer. 2017;**123**(18):3550-3567 Epub 2017/05/12

154-165. DOI: 10.1177/1203475417745825. Epub 2017/12/16

Seminars in Hematology. 2014;**51**(1):42-51

1176 Epub 1977/03/23

136 eHealth - Making Health Care Smarter

Epub 2016/09/14


[37] Leung AM, Braverman LE, Pearce EN. History of U.S. iodine fortification and supplementation. Nutrients. 2012;**4**(11):1740-1746 Epub 2012/12/04

**Chapter 9**

**Provisional chapter**

**Real-Time Tele-Auscultation Consultation Services over**

A real-time tele-auscultation over the Internet is effective medical services that increase the accessibility of healthcare services to remote areas. However, the quality of auscultation's sounds transmitted over the Internet is the most critical issue, especially in real-time service. Packet loss and packet delay variations are the main factors. There is little knowledge of these factors affecting auscultation's sounds transmitted over the Internet. In this work, we investigate the effects of packet loss and packet delay variations, in particular, heart and lung sounds with auscultation's sound over the Internet in real-time services. We have found that both sounds are more sensitive to packet delay variations than packet loss. Lung sounds are more sensitive than heart sounds due to their timing interpretation. Some different levels of packet loss can be tolerated, e.g., 10% for heart sounds and 2% for lung sounds. Packet delay variation boundary of 50 msec is recommended. In addition, we have developed the real-time tele-auscultation prototype that tries to minimize the packet delay variation. We have found that real-time waveform of auscultation's visualization can help physician's confident level for sound interpreting. Some techniques for quality of

service improvement are suggested, e.g., noise reduction and user interface (UI).

**Keywords:** tele-auscultation, e-stethoscope, e-health, tele-medicine, packet loss and

Quality of healthcare services in rural areas is a critical issue. Most developing countries are actively on improving the quality of healthcare services with the short and long term policy,

**Real-Time Tele-Auscultation Consultation Services over** 

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

DOI: 10.5772/intechopen.74680

**the Internet: Effects of the Internet Quality of Service**

**the Internet: Effects of the Internet Quality of Service**

Sinchai Kamolphiwong, Thossapon Kamolphiwong,

Soontorn Saechow and Verapol Chandeeying

Thossapon Kamolphiwong, Soontorn Saechow

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

delay variations, heart and lung sounds

http://dx.doi.org/10.5772/intechopen.74680

Sinchai Kamolphiwong,

and Verapol Chandeeying

**Abstract**

**1. Introduction**


#### **Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet Quality of Service Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet Quality of Service**

DOI: 10.5772/intechopen.74680

Sinchai Kamolphiwong, Thossapon Kamolphiwong, Soontorn Saechow and Verapol Chandeeying Sinchai Kamolphiwong, Thossapon Kamolphiwong, Soontorn Saechow and Verapol Chandeeying

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.74680

#### **Abstract**

[37] Leung AM, Braverman LE, Pearce EN. History of U.S. iodine fortification and supple-

[38] Pearce EN. National trends in iodine nutrition: Is everyone getting enough? Thyroid: Official Journal of the American Thyroid Association. 2007;**17**(9):823-827 Epub 2007/10/25

[39] Markel H. "When it rains it pours": Endemic goiter, iodized salt, and David Murray Cowie, MD. American Journal of Public Health. 1987;**77**(2):219-229 Epub 1987/02/01

[40] Markel H. A grain of salt. The Milbank Quarterly. 2014;**92**(3):407-412 Epub 2014/09/10

[41] Manjunath B, Suman G, Hemanth T, Shivaraj NS, Murthy NS. Prevalence and factors associated with goitre among 6-12-year-old children in a rural area of Karnataka in South India. Biological Trace Element Research. 2016;**169**(1):22-26 Epub 2015/06/13 [42] Gupta RK, Langer B, Raina SK, Kumari R, Jan R, Rani R. Goiter prevalence in schoolgoing children: A cross-sectional study in two border districts of sub-Himalayan Jammu and Kashmir. Journal of Family Medicine and Primary Care. 2016;**5**(4):825-828 Epub

[43] Wartenberg D. Using disease-cluster and small-area analyses to study environmental justice. In: Toward Environmental Justice: Research, Education, and Health Policy

[44] Monmonier MS. How to Lie with Maps, Vol. xiii. 2nd ed. Chicago: University of Chicago

mentation. Nutrients. 2012;**4**(11):1740-1746 Epub 2012/12/04

Needs. USA: National Academies Press; 1999:23-35

2017/03/30

138 eHealth - Making Health Care Smarter

Press; 1996. p. 207

A real-time tele-auscultation over the Internet is effective medical services that increase the accessibility of healthcare services to remote areas. However, the quality of auscultation's sounds transmitted over the Internet is the most critical issue, especially in real-time service. Packet loss and packet delay variations are the main factors. There is little knowledge of these factors affecting auscultation's sounds transmitted over the Internet. In this work, we investigate the effects of packet loss and packet delay variations, in particular, heart and lung sounds with auscultation's sound over the Internet in real-time services. We have found that both sounds are more sensitive to packet delay variations than packet loss. Lung sounds are more sensitive than heart sounds due to their timing interpretation. Some different levels of packet loss can be tolerated, e.g., 10% for heart sounds and 2% for lung sounds. Packet delay variation boundary of 50 msec is recommended. In addition, we have developed the real-time tele-auscultation prototype that tries to minimize the packet delay variation. We have found that real-time waveform of auscultation's visualization can help physician's confident level for sound interpreting. Some techniques for quality of service improvement are suggested, e.g., noise reduction and user interface (UI).

**Keywords:** tele-auscultation, e-stethoscope, e-health, tele-medicine, packet loss and delay variations, heart and lung sounds

#### **1. Introduction**

Quality of healthcare services in rural areas is a critical issue. Most developing countries are actively on improving the quality of healthcare services with the short and long term policy,

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

increasing healthcare staffs and implementing new technologies are the two wildly example policies [1–3].

**2.1. Differences between traditional auscultation and tele-auscultation**

the differences between traditional stethoscope and e-stethoscope.

**Properties Electronic stethoscope Acoustic stethoscope**

Record and playback Yes/no No Volume control & amplification Yes No Noise reduction Yes No Power supply Batteries No Transmission technologies Bluetooth, RF, USB No Chest piece Single side tunable, button and screen

Sound signal Digital Analog

**Table 1.** Comparison between electronic stethoscope and acoustic stethoscope.

Tubing Similar

Headset

**2.2. Compositions of tele-auscultation system**

Auscultation is the medical method for sound listening inside the patient body, on detecting and identifying the abnormal sounds [22, 23]. A tele-auscultation provides the medical examination method similar to the traditional one, but the key of tele-auscultation that steps over the traditional auscultation is the mechanism for transmission the auscultation's sounds on long distance. Internet technology and electronic stethoscope are the keys for driven teleauscultation service. Moreover, experiences on using tele-auscultation may differ from the traditional one, it is not face-to-face and may experience some delay of body sound sent from the remote site. This awareness should be raised when using this service. **Table 1** summarizes

Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet…

http://dx.doi.org/10.5772/intechopen.74680

141

In our study, tele-auscultation can be summarized in three main compositions: stethoscope, client application, and application server. In terms of processes, they are: capturing, processing, transmission, and display. The system overview of tele-auscultation structure is shown in **Figure 1**.

In **Figure 1**, the first component is stethoscope. It is an instrument to capture the sounds in the body, and is used widely in auscultation process [24, 25]. Currently, an e-stethoscope (electronic stethoscope) is a new generation of auscultation's device. It can improve the quality of auscultation sounds [26], as summarized in **Table 1**, and it is a source for transmitting the sounds from the body to the next component. The next component, client application, is software that captures the sounds sent by e-stethoscope. There are two techniques of sound capturing: the modern one is receiving directly from e-stethoscope via the transmission module embedded in the device [27–30], and the older one is receiving from stethoscope's ear tip via audio input device such as microphone [31]. Moreover, the client application is the main processing function for the auscultation's sounds such as improving sound's quality, sounds volume control, encoding, filtering, waveform rendering, transmitting/receiving sound between cooperated components, and sound play out. The last component, application server, is a center for managing the

Single side tunable, dual side tunable (bell and

diaphragm)

Generally, people who live in rural areas receive healthcare services at primary care unit as the first choice. However, a rural healthcare unit has some limitations such as infrastructure, healthcare staff and good/advanced medical equipment. Thus, referral of patients to the secondary or tertiary care unit is used for solving these problems, which lead to cost on traveling and waste of time. The effective care at the primary care units is one of the significant keys, which can improve a quality of healthcare services in rural areas. Telemedicine is one of the main keys, and telemedicine applications can enhance the accessibility of healthcare services through the collaborations between primary care unit and cooperation unit [4–8].

Interactive consulting over the Internet is a cost-effective way between physician and specialist. The real-time applications such as Skype [9–12], Google Hangout [13, 14], FaceTime [15, 16], and WebRTC [17–19] can make good experience to participants like a face-to-face communication. However, the real-time consulting between physician and specialist requires some specific information and equipment which depend on consultation's topic such as stethoscope for listen the body sounds on auscultation process. However, when data are transmitted over the Internet, loss and delay can be occurred, especially its sensitivity in a real-time application. Packet loss [20] and packet delay variations [21] are the two main factors that reduce sound quality in real-time communication, e.g., body sounds from the stethoscope.

In our work, we investigate the effects of using e-stethoscope caused by packet loss and delay in the real-time tele-auscultation system over the Internet.

Significance: There is a little knowledge on the effects caused by packet loss and delay to the real-time tele-auscultation system over the Internet. This work deeply investigates these effects on lung and heart sounds, to see the impacts and factors that influence the quality of tele-auscultation services. We discuss what effects will happen, their causes, and results by varying a number of packet loss, delay variations, and types of lung and heart sounds. It is significantly different from human conversation sounds. We have suggested the percentage of packet loss and packet delay variations to meet some confident level of sound interpreting which can increase the success rate and effective outcomes.

The rest of chapter is organized as follows: Section 2 describes an overview of tele-auscultation including the differences between traditional auscultation and tele-auscultation, system compositions, and types of services. Section 3 presents our prototype system design and development. Section 4 shows the analysis result of the effects of packet loss and packet delay variations of heart and lung sounds over the Internet service. Finally, we conclude of our work in Section 5.

#### **2. Overview of tele-auscultation**

Tele-auscultation is a system for providing a remote auscultation to another place. The main challenge for this service is how to find the suitable mechanism to transmit the auscultation's sound over the Internet effectively with an acceptable quality of auscultation's sounds and to find other related supporting systems.

#### **2.1. Differences between traditional auscultation and tele-auscultation**

Auscultation is the medical method for sound listening inside the patient body, on detecting and identifying the abnormal sounds [22, 23]. A tele-auscultation provides the medical examination method similar to the traditional one, but the key of tele-auscultation that steps over the traditional auscultation is the mechanism for transmission the auscultation's sounds on long distance. Internet technology and electronic stethoscope are the keys for driven teleauscultation service. Moreover, experiences on using tele-auscultation may differ from the traditional one, it is not face-to-face and may experience some delay of body sound sent from the remote site. This awareness should be raised when using this service. **Table 1** summarizes the differences between traditional stethoscope and e-stethoscope.

#### **2.2. Compositions of tele-auscultation system**

increasing healthcare staffs and implementing new technologies are the two wildly example

Generally, people who live in rural areas receive healthcare services at primary care unit as the first choice. However, a rural healthcare unit has some limitations such as infrastructure, healthcare staff and good/advanced medical equipment. Thus, referral of patients to the secondary or tertiary care unit is used for solving these problems, which lead to cost on traveling and waste of time. The effective care at the primary care units is one of the significant keys, which can improve a quality of healthcare services in rural areas. Telemedicine is one of the main keys, and telemedicine applications can enhance the accessibility of healthcare services

Interactive consulting over the Internet is a cost-effective way between physician and specialist. The real-time applications such as Skype [9–12], Google Hangout [13, 14], FaceTime [15, 16], and WebRTC [17–19] can make good experience to participants like a face-to-face communication. However, the real-time consulting between physician and specialist requires some specific information and equipment which depend on consultation's topic such as stethoscope for listen the body sounds on auscultation process. However, when data are transmitted over the Internet, loss and delay can be occurred, especially its sensitivity in a real-time application. Packet loss [20] and packet delay variations [21] are the two main factors that reduce sound quality in real-time communication, e.g., body sounds from the stethoscope.

In our work, we investigate the effects of using e-stethoscope caused by packet loss and delay

Significance: There is a little knowledge on the effects caused by packet loss and delay to the real-time tele-auscultation system over the Internet. This work deeply investigates these effects on lung and heart sounds, to see the impacts and factors that influence the quality of tele-auscultation services. We discuss what effects will happen, their causes, and results by varying a number of packet loss, delay variations, and types of lung and heart sounds. It is significantly different from human conversation sounds. We have suggested the percentage of packet loss and packet delay variations to meet some confident level of sound interpreting

The rest of chapter is organized as follows: Section 2 describes an overview of tele-auscultation including the differences between traditional auscultation and tele-auscultation, system compositions, and types of services. Section 3 presents our prototype system design and development. Section 4 shows the analysis result of the effects of packet loss and packet delay variations of heart and lung sounds over the Internet service. Finally, we conclude of our work in Section 5.

Tele-auscultation is a system for providing a remote auscultation to another place. The main challenge for this service is how to find the suitable mechanism to transmit the auscultation's sound over the Internet effectively with an acceptable quality of auscultation's sounds and to

in the real-time tele-auscultation system over the Internet.

which can increase the success rate and effective outcomes.

**2. Overview of tele-auscultation**

find other related supporting systems.

through the collaborations between primary care unit and cooperation unit [4–8].

policies [1–3].

140 eHealth - Making Health Care Smarter

In our study, tele-auscultation can be summarized in three main compositions: stethoscope, client application, and application server. In terms of processes, they are: capturing, processing, transmission, and display. The system overview of tele-auscultation structure is shown in **Figure 1**.

In **Figure 1**, the first component is stethoscope. It is an instrument to capture the sounds in the body, and is used widely in auscultation process [24, 25]. Currently, an e-stethoscope (electronic stethoscope) is a new generation of auscultation's device. It can improve the quality of auscultation sounds [26], as summarized in **Table 1**, and it is a source for transmitting the sounds from the body to the next component. The next component, client application, is software that captures the sounds sent by e-stethoscope. There are two techniques of sound capturing: the modern one is receiving directly from e-stethoscope via the transmission module embedded in the device [27–30], and the older one is receiving from stethoscope's ear tip via audio input device such as microphone [31]. Moreover, the client application is the main processing function for the auscultation's sounds such as improving sound's quality, sounds volume control, encoding, filtering, waveform rendering, transmitting/receiving sound between cooperated components, and sound play out. The last component, application server, is a center for managing the


**Table 1.** Comparison between electronic stethoscope and acoustic stethoscope.

In peer-to-peer model, it has no central server for managing the media stream. Each node directly communicates to each other. This will minimize the processing time at the central server as well as communication link time delay. Conversely, client/server model has a central server for managing almost everything, e.g., all information must be passed to the server first. However, for time-based information sharing, the client/server model will be useful. For tele-auscultation services, both models are used: client/server based model [29, 30], and peer-

Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet…

http://dx.doi.org/10.5772/intechopen.74680

143

We design and develop a real-time tele-auscultation application covering both communication models: peer-to-peer and client/server based models. The model consists of two main components. First component is client application that includes stethoscope controller, session controller, real-time audio waveform, and audio player. Another component is application server that comprises of account management and session management. The server is used for user authentication and session initiation between two client applications. Some

In prototype demonstration, 3 M™ Littmann® Electronic Stethoscope Model 3200 [32] is utilized. The e-stethoscope provides a digital audio which in linear pulse-code modulation

As 4000 samples per second is not a standard voice sampling, we need to up-sample to 8000 times per second before putting in Real Time Transport Protocol (RTP) [34]. As a result, the

Stethoscope controller performs a connection to e-stethoscope by Bluetooth stack for capturing the auscultation's sound and handling some optional messages of e-stethoscope and other components. In our experiment, we captured the auscultation's sound every 50 msec, which created 400 bytes of audio data per packet. These audio packets will be transmitted to the

The handshaking mechanism is the important signal in peer-to-peer networks. Our prototype uses session initiation protocol (SIP) [33] and non-SIP (Web based signaling) for an initial protocol for real-time session establishing between participants. After completing the connection, the auscultation's sounds will be conveyed in RTP packets. There is no need for the central

**3. Design and development of a real-time tele-auscultation**

(LPCM) format, 4000 Hz of sampling rate and 16 bits per sampling.

sound quality is improved, but the bandwidth is increased to twice.

destination node over the Internet for play-out at the destination.

to-peer based model [28].

more details are as follows:

**3.1. Electronic stethoscope**

**3.2. Auscultation's sound capturing**

**3.3. Session and audio transmission**

server participating in the media session.

**Figure 1.** Overview of tele-auscultation architecture.

sessions, signaling, user accounts, forwarding the sounds (for client–server-service model). It should be noted that apart from client-server model, peer-to-peer service model may be considered. However, the server in this latter case may have less service requirements, e.g., for communication establishment, not for media streaming between client and server.

#### **2.3. Types of tele-auscultation services**

Synchronous and asynchronous (store-and-forward) are the two main communication types that describe the characteristic of each tele-auscultation service. Synchronous service is an interactive communication between participants [27–31], and auscultation' sound during the live session must be sent and played out immediately. While asynchronous service will store auscultation' sound in the middleware first, and then participants make a request and receive the data later [28, 29]. It seems that today technology with high speed Internet, asynchronous service may have small time delay while sending stored auscultation sounds to the remote site. However, since synchronous service is live session, physician can request different auscultation's sounds (from different body positions) from a patient or healthcare staff immediately to improve healthcare service level. For example, physician may ask healthcare staff to move the stethoscope up/down/left/right from the current position in place, to follow up the result from the sound interpretation, to capture a better sound quality (unclear sound). This will make the service quality of real-time tele-stethoscope much better over asynchronous one. However, the service requires some certain level of QoS, e.g., high speed link capacity. In conclusion, both services have different significant impacts to tele-auscultation services, and it depends on the purpose of usage and what practice scenarios are for of healthcare services.

#### **2.4. Communication models**

Client/server and peer-to-peer are the two widely used communication models that describe the characteristic of systems for sharing the media between source unit and destination unit. In peer-to-peer model, it has no central server for managing the media stream. Each node directly communicates to each other. This will minimize the processing time at the central server as well as communication link time delay. Conversely, client/server model has a central server for managing almost everything, e.g., all information must be passed to the server first. However, for time-based information sharing, the client/server model will be useful. For tele-auscultation services, both models are used: client/server based model [29, 30], and peerto-peer based model [28].

## **3. Design and development of a real-time tele-auscultation**

We design and develop a real-time tele-auscultation application covering both communication models: peer-to-peer and client/server based models. The model consists of two main components. First component is client application that includes stethoscope controller, session controller, real-time audio waveform, and audio player. Another component is application server that comprises of account management and session management. The server is used for user authentication and session initiation between two client applications. Some more details are as follows:

#### **3.1. Electronic stethoscope**

sessions, signaling, user accounts, forwarding the sounds (for client–server-service model). It should be noted that apart from client-server model, peer-to-peer service model may be considered. However, the server in this latter case may have less service requirements, e.g., for

Synchronous and asynchronous (store-and-forward) are the two main communication types that describe the characteristic of each tele-auscultation service. Synchronous service is an interactive communication between participants [27–31], and auscultation' sound during the live session must be sent and played out immediately. While asynchronous service will store auscultation' sound in the middleware first, and then participants make a request and receive the data later [28, 29]. It seems that today technology with high speed Internet, asynchronous service may have small time delay while sending stored auscultation sounds to the remote site. However, since synchronous service is live session, physician can request different auscultation's sounds (from different body positions) from a patient or healthcare staff immediately to improve healthcare service level. For example, physician may ask healthcare staff to move the stethoscope up/down/left/right from the current position in place, to follow up the result from the sound interpretation, to capture a better sound quality (unclear sound). This will make the service quality of real-time tele-stethoscope much better over asynchronous one. However, the service requires some certain level of QoS, e.g., high speed link capacity. In conclusion, both services have different significant impacts to tele-auscultation services, and it depends on the purpose of usage and what practice scenarios are for of healthcare services.

Client/server and peer-to-peer are the two widely used communication models that describe the characteristic of systems for sharing the media between source unit and destination unit.

communication establishment, not for media streaming between client and server.

**2.3. Types of tele-auscultation services**

**Figure 1.** Overview of tele-auscultation architecture.

142 eHealth - Making Health Care Smarter

**2.4. Communication models**

In prototype demonstration, 3 M™ Littmann® Electronic Stethoscope Model 3200 [32] is utilized. The e-stethoscope provides a digital audio which in linear pulse-code modulation (LPCM) format, 4000 Hz of sampling rate and 16 bits per sampling.

As 4000 samples per second is not a standard voice sampling, we need to up-sample to 8000 times per second before putting in Real Time Transport Protocol (RTP) [34]. As a result, the sound quality is improved, but the bandwidth is increased to twice.

#### **3.2. Auscultation's sound capturing**

Stethoscope controller performs a connection to e-stethoscope by Bluetooth stack for capturing the auscultation's sound and handling some optional messages of e-stethoscope and other components. In our experiment, we captured the auscultation's sound every 50 msec, which created 400 bytes of audio data per packet. These audio packets will be transmitted to the destination node over the Internet for play-out at the destination.

#### **3.3. Session and audio transmission**

The handshaking mechanism is the important signal in peer-to-peer networks. Our prototype uses session initiation protocol (SIP) [33] and non-SIP (Web based signaling) for an initial protocol for real-time session establishing between participants. After completing the connection, the auscultation's sounds will be conveyed in RTP packets. There is no need for the central server participating in the media session.

## **4. Effects of packet loss and packet delay variations**

In this section, study and analysis of auscultation's sounds quality affected by packet loss and packet delay variations in real-time communication over the Internet will be presented.

In our work, we focused and studied the characteristics of two body sounds: cardiac auscultation and lung auscultation. For the first one, cardiac auscultation, it is a method for screening heart sounds and heart murmurs [35]. In cardiac auscultation process, patient positions (left lateral recumbent, sitting and supine) and body locations (aortic area, pulmonic area, tricuspid area, and mitral area) can affect the quality of heart sounds. Particular positions and locations are important to the quality of listening to the specific heart sounds or heart murmurs [36–38]. Heart murmurs are the critical sounds related to the valvular heart disease. Intensity (grade), pitch, timing, and location have described the characteristic of each murmur [36–38]. For the second one, lung auscultation, it is a method for screening abnormal sounds over the lung's areas including front and back of the patient body [39]. Listening to the sounds of inspiration and expiration, together with comparing the intensity and pitch of each breath are the fundamental process for diagnosis the lung sounds [39, 40].

As noted above, a real-time auscultation service over the Internet should ensure a feasible and reliable. Packet loss and packet delay variations are the critical factors that significantly damage the heart and lung sound quality.

#### **4.1. The packet loss and packet delay variation generator**

We developed the software that can generate the difference level of packet loss and packet delay variation patterns, to study and analyze the effects to heart and lung sound quality. The software components are described in **Figure 2**.

• Sender: It sends auscultation's sound with a packet size of 400 bytes, 50 msec packet interval time. The following original three heart sounds [41] and five lung sounds [42] are observed, as shown in **Table 2**.

The waveforms of each sound are shown in **Figures 3** and **4**.


In this experiment, we measured the packet loss and delay variations between different network services. For example, Ethernet network was the Intranet in our experiment site which has a big bandwidth, the 3G/4G network was a service provided by local mobile operators, and ADSL in the remote area was the Internet connection provided by a local service provider. We collected test information for a week at different time. **Table 3** shows the average

of packet loss and delays when information was sent across different network services. We noticed that the Intranet gave lowest packet loss and delays while ADSL in the remote area

**Figure 4.** The original lung sounds (1) coarse crackles, (2) inspiratory stridor, (3) normal vesicular, (4) pleural friction,

**Figure 3.** The original heart sounds (1) early systolic murmur, (2) heart normal, and (3) pan-systolic murmur.

gave higher packet loss and larger delays.

and (5) wheezing.

**Figure 2.** Software components for packet loss and packet delay variations generator.

**1.**Coarse crackles (27 bpm), **2.**Inspiratory stridor (23 bpm), **3.**Normal vesicular (16 bpm), **4.**Pleural friction (19 bpm), **5.**Wheezing (27 bpm).

Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet…

http://dx.doi.org/10.5772/intechopen.74680

145

**Heart sounds Lung sounds**

**Table 2.** The property of heart sounds and lung sounds.

**1.**Normal heart sound (75 bpm), **2.**Early systolic murmur (75 bpm), **3.**Pan-systolic murmur (75 bpm).

Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet… http://dx.doi.org/10.5772/intechopen.74680 145

**Figure 2.** Software components for packet loss and packet delay variations generator.


**Table 2.** The property of heart sounds and lung sounds.

**4. Effects of packet loss and packet delay variations**

the fundamental process for diagnosis the lung sounds [39, 40].

**4.1. The packet loss and packet delay variation generator**

The waveforms of each sound are shown in **Figures 3** and **4**.

Poisson distribution [46] with 50, 60, and 70 msec time delays.

software components are described in **Figure 2**.

age the heart and lung sound quality.

144 eHealth - Making Health Care Smarter

observed, as shown in **Table 2**.

In this section, study and analysis of auscultation's sounds quality affected by packet loss and packet delay variations in real-time communication over the Internet will be presented.

In our work, we focused and studied the characteristics of two body sounds: cardiac auscultation and lung auscultation. For the first one, cardiac auscultation, it is a method for screening heart sounds and heart murmurs [35]. In cardiac auscultation process, patient positions (left lateral recumbent, sitting and supine) and body locations (aortic area, pulmonic area, tricuspid area, and mitral area) can affect the quality of heart sounds. Particular positions and locations are important to the quality of listening to the specific heart sounds or heart murmurs [36–38]. Heart murmurs are the critical sounds related to the valvular heart disease. Intensity (grade), pitch, timing, and location have described the characteristic of each murmur [36–38]. For the second one, lung auscultation, it is a method for screening abnormal sounds over the lung's areas including front and back of the patient body [39]. Listening to the sounds of inspiration and expiration, together with comparing the intensity and pitch of each breath are

As noted above, a real-time auscultation service over the Internet should ensure a feasible and reliable. Packet loss and packet delay variations are the critical factors that significantly dam-

We developed the software that can generate the difference level of packet loss and packet delay variation patterns, to study and analyze the effects to heart and lung sound quality. The

• Sender: It sends auscultation's sound with a packet size of 400 bytes, 50 msec packet interval time. The following original three heart sounds [41] and five lung sounds [42] are

• Controller: It is a component for generating the patterns of packet loss and packet delay variations. The loss patterns were generated based on Gilbert-Elliot model [43–45] with 2, 5, 10, and 20% of loss values. The packet delay variation patterns were generated based on

• Receiver: It is a component for converting the received packets to the play out format

In this experiment, we measured the packet loss and delay variations between different network services. For example, Ethernet network was the Intranet in our experiment site which has a big bandwidth, the 3G/4G network was a service provided by local mobile operators, and ADSL in the remote area was the Internet connection provided by a local service provider. We collected test information for a week at different time. **Table 3** shows the average

(LPCM, 4000 Hz, 16 bits, and 1 Mono-channel), with a controlled jitter buffer.

**Figure 3.** The original heart sounds (1) early systolic murmur, (2) heart normal, and (3) pan-systolic murmur.

**Figure 4.** The original lung sounds (1) coarse crackles, (2) inspiratory stridor, (3) normal vesicular, (4) pleural friction, and (5) wheezing.

of packet loss and delays when information was sent across different network services. We noticed that the Intranet gave lowest packet loss and delays while ADSL in the remote area gave higher packet loss and larger delays.


**Table 3.** Packet loss and delays between different network connection services.

#### **4.2. Distortions of heart and lung sounds**

Signal distortion is the alteration in the pulse of heart sound and the breath on lung sound. In our experiment method, we summarize the number of distortion pulses and breaths in a minute duration. Sample pulses of heart sound and breath of lung sound are shown in **Figures 5** and **6,** respectively.

#### *4.2.1. Packet loss*

In packet loss experiment, the heart and lung sound with 2, 5, 10, and 20% of packet loss are used.

Of replication five times, the signal distortions of three heart sounds and five lung sounds by comparing the pulse and breathing of each sound with its original sound. The following results are given:

**Figure 7** shows the results of packet loss by varying from 2, 5, 10, and 20% for pan-systolic murmur. We can see that shape and position losses are randomly occurred from time to time where a higher value of packet loss gives more damage of shape and position. We tested all other given heart and lung sounds in **Table 2**.

**Figure 8** shows sound damage positions of pan-systolic murmur randomly from five times of experiment where 20% of packet loss is applied. We can see that shape and position losses are randomly occurred from time to time. This will make the receiver node in hard condition for the result interpretation.

Of replication five times, with 2, 5, 10, and 20% of packet loss, the distortions are summarized in **Table 4**. The figures in the table are the percentage of distortions of each sound beats.

Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet…

http://dx.doi.org/10.5772/intechopen.74680

147

From **Table 4**, the increase of packet loss gets along with short-range distortions on heart sounds, but fluctuates and has long-range distortion of lung sounds. All of heart sounds and all of packet loss levels, the percent of distortions are less than 50%. On the other hand, for 2 and 5% of packet loss, the percent of distortions on lung sounds are less than 50%. However,

In packet delay variations experiment, the heart and lung sounds are investigated with three different levels of average packet delay variations; 50, 60, and 70 ms. Each delay was tested

when packet losses are 10 and 20%, the distortions go over 70%.

**Figure 8.** Randomly sound distortions and positions of pan-systolic murmur at 20% packet loss.

**Figure 7.** Packet loss varying from 2, 5, 10, 20% for pan-systolic murmur.

*4.2.2. Packet delay variations*

**Figure 6.** The breath of lung sound.

**Figure 5.** The pulse of hear sound.

**Figure 6.** The breath of lung sound.

**4.2. Distortions of heart and lung sounds**

**Table 3.** Packet loss and delays between different network connection services.

other given heart and lung sounds in **Table 2**.

and **6,** respectively.

1) Sender: Ethernet network Receiver: Remote area via 3G/4G

146 eHealth - Making Health Care Smarter

2) Sender: Ethernet network Receiver: Ethernet network

3) Sender: Ethernet network Receiver: Remote area via ADSL

*4.2.1. Packet loss*

results are given:

for the result interpretation.

**Figure 5.** The pulse of hear sound.

Signal distortion is the alteration in the pulse of heart sound and the breath on lung sound. In our experiment method, we summarize the number of distortion pulses and breaths in a minute duration. Sample pulses of heart sound and breath of lung sound are shown in **Figures 5**

**Network types Packet loss (%) Packet delay variations (msec)**

Range Average Range Average

0−3 2 10−70 55

0 0 0−5 2

0−30 20 10−150 70

In packet loss experiment, the heart and lung sound with 2, 5, 10, and 20% of packet loss are used. Of replication five times, the signal distortions of three heart sounds and five lung sounds by comparing the pulse and breathing of each sound with its original sound. The following

**Figure 7** shows the results of packet loss by varying from 2, 5, 10, and 20% for pan-systolic murmur. We can see that shape and position losses are randomly occurred from time to time where a higher value of packet loss gives more damage of shape and position. We tested all

**Figure 8** shows sound damage positions of pan-systolic murmur randomly from five times of experiment where 20% of packet loss is applied. We can see that shape and position losses are randomly occurred from time to time. This will make the receiver node in hard condition

**Figure 7.** Packet loss varying from 2, 5, 10, 20% for pan-systolic murmur.

**Figure 8.** Randomly sound distortions and positions of pan-systolic murmur at 20% packet loss.

Of replication five times, with 2, 5, 10, and 20% of packet loss, the distortions are summarized in **Table 4**. The figures in the table are the percentage of distortions of each sound beats.

From **Table 4**, the increase of packet loss gets along with short-range distortions on heart sounds, but fluctuates and has long-range distortion of lung sounds. All of heart sounds and all of packet loss levels, the percent of distortions are less than 50%. On the other hand, for 2 and 5% of packet loss, the percent of distortions on lung sounds are less than 50%. However, when packet losses are 10 and 20%, the distortions go over 70%.

#### *4.2.2. Packet delay variations*

In packet delay variations experiment, the heart and lung sounds are investigated with three different levels of average packet delay variations; 50, 60, and 70 ms. Each delay was tested


**Table 4.** Percent of distortions among various heart and lung sounds on each level of packet loss.

for five times. **Figure 9** shows sample results of replication five times, the distortions of three heart sounds/five lung sounds by comparing the pulse and breathing of each sound with its original sound are shown in **Table 2**.

indicated the type of sound with the confident level. The test result is shown in **Table 6**. It seems that most of them can detect normal heart, pan-systolic murmur, early systolic murmur, and normal vesicular sounds when a small packet loss is applied, e.g., less than 10%. More than 90% can correctly detect all heart sounds at 2−20% of packet loss. Percentage of correct detection on lung sounds depends on the type of lung sounds and level of packet loss.

**Table 5.** Percent of distortion among various heart and lung sounds on each level of packet delay variations.

**50 60 70**

http://dx.doi.org/10.5772/intechopen.74680

149

Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet…

From the experiment of packet loss and packet delay variations, we analyzed the results as

Heart normal 100 100 100 90 Pan-systolic murmur 100 100 100 90 Early systolic murmur 100 90 90 90

Normal vesicular 100 100 90 60 Wheezing 100 90 80 30 Coarse crackles 70 60 40 70 Inspiratory stridor 90 80 50 70 Pleural friction 60 60 50 50

**2% 5% 10% 20%**

**4.4. Analysis of the effects of packet loss and packet delay variations**

**Sounds Packet loss**

**Table 6.** Percent of correct detection on each sound.

**Sounds Packet delay variations (msec)**

Early systolic murmur 3 83 100 Heart normal 3 72 100 Pan-systolic murmur 0 100 100

Coarse crackles 4 100 100 Inspiratory stridor 0 100 100 Normal vesicular 13 100 100 Pleural friction 5 100 100 Wheezing 15 100 100

follows:

Heart sounds

Lung sounds

Heart sounds

Lung sounds

Of replication five times, the distortions of three heart sounds/five lung sounds by comparing the pulse and breathing of each sound with its original sound are shown in **Table 5**.

From **Table 5**, the packet delay variations at 50 msec get a short-range distortion on both heart and lung sounds and high distortion on both heart and lung sounds at 60 to 70 ms of packet delay variations. We can see that increasing a small delay variation, e.g., from 50 to 60 msec, significantly impact the sound distortion.

#### **4.3. Evaluation of sound quality by assessor: packet loss**

Ten medical professionals (they are medical doctors and nurses) who experience on auscultation operation at least 2 years participated in this evaluation. All of them are binding tests. Each person listens to three heart sounds/five lung sounds at packet loss of 2, 5, 10, and 20% without knowing which sound he or she is listening. According to the listening results, they



**Table 5.** Percent of distortion among various heart and lung sounds on each level of packet delay variations.

indicated the type of sound with the confident level. The test result is shown in **Table 6**. It seems that most of them can detect normal heart, pan-systolic murmur, early systolic murmur, and normal vesicular sounds when a small packet loss is applied, e.g., less than 10%. More than 90% can correctly detect all heart sounds at 2−20% of packet loss. Percentage of correct detection on lung sounds depends on the type of lung sounds and level of packet loss.

#### **4.4. Analysis of the effects of packet loss and packet delay variations**

From the experiment of packet loss and packet delay variations, we analyzed the results as follows:


**Table 6.** Percent of correct detection on each sound.

for five times. **Figure 9** shows sample results of replication five times, the distortions of three heart sounds/five lung sounds by comparing the pulse and breathing of each sound with its

**2% 5% 10% 20%**

Of replication five times, the distortions of three heart sounds/five lung sounds by comparing

From **Table 5**, the packet delay variations at 50 msec get a short-range distortion on both heart and lung sounds and high distortion on both heart and lung sounds at 60 to 70 ms of packet delay variations. We can see that increasing a small delay variation, e.g., from 50 to 60 msec,

Ten medical professionals (they are medical doctors and nurses) who experience on auscultation operation at least 2 years participated in this evaluation. All of them are binding tests. Each person listens to three heart sounds/five lung sounds at packet loss of 2, 5, 10, and 20% without knowing which sound he or she is listening. According to the listening results, they

the pulse and breathing of each sound with its original sound are shown in **Table 5**.

Early systolic murmur 3 13 28 48 Heart normal 3 16 36 43 Pan-systolic murmur 6 24 27 48

Coarse crackles 11 25 68 86 Inspiratory stridor 13 48 71 91 Normal vesicular 23 27 69 94 Pleural friction 25 46 68 98 Wheezing 20 37 79 85

**Table 4.** Percent of distortions among various heart and lung sounds on each level of packet loss.

original sound are shown in **Table 2**.

Heart sound

148 eHealth - Making Health Care Smarter

Lung sound

significantly impact the sound distortion.

**4.3. Evaluation of sound quality by assessor: packet loss**

**Figure 9.** The result of packet delay variations for pan-systolic murmur.

**Sounds Packet loss**

*4.4.2. Pulse transformation*

(2) mid systolic murmur.

Transformation is the effect caused by packet loss. When the sound missing in some positions like murmur shape, it may lead to transformation to another type, and pulse transformation

Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet…

http://dx.doi.org/10.5772/intechopen.74680

151

**Figure 13.** Samples of pulse transformations. (a) Early systolic murmur to heart normal (1) early systolic murmur (2) heart normal, (b) early systolic murmur to mid systolic murmur (1) early systolic murmur (2) mid-systolic murmur, (c) pan-systolic murmur to heart normal (1) pan-systolic murmur (2) heart normal, (d) pan-systolic murmur to late systolic murmur (1) pan-systolic murmur (2) late systolic murmur, (e) pan-systolic murmur to early systolic murmur (1) pansystolic murmur (2) early systolic murmur, (f) pan-systolic murmur to mid systolic murmur (1) pan-systolic murmur

**Figure 10.** Sound discontinuity by packet loss, (1) early systolic murmur and (2) inspiratory stridor.

**Figure 11.** Sound discontinuity by packet delay variations, (1) early systolic murmur and (2) inspiratory stridor.

#### *4.4.1. Sound missing and splitting*

Auscultation method requires continued sound on each cycle to recognize the rhythm, pitch, and intensity. Packet loss and packet delay variations destroy the sounds continuity in different patterns randomly. Packet loss makes missing in some positions of sound, as shown in **Figure 10**, and packet delay variation makes split in some positions of sound, as shown in **Figure 11**.

Sound missing consists of two patterns: (1) either pulse or breath missing, and (2) some parts of pulse or breath missing, as shown in **Figure 12**. A pulse or breath missing is caused by burst loss on transmission process which is narrow range damage to heart and lung sounds. For another pattern, some parts of pulses or breaths missing, it is caused by uncertain pattern loss on transmission process. This pattern is wide range damage to heart and lung sounds. The increase of packet loss level cannot specify the pattern of sound discontinuity; it can only increase the missing sound.

**Figure 12.** Behaviors of sound discontinuity on early systolic murmur after packet loss occurred, (1) a pulse and (2) some part of pulse.

#### *4.4.2. Pulse transformation*

*4.4.1. Sound missing and splitting*

150 eHealth - Making Health Care Smarter

increase the missing sound.

**Figure 11**.

part of pulse.

Auscultation method requires continued sound on each cycle to recognize the rhythm, pitch, and intensity. Packet loss and packet delay variations destroy the sounds continuity in different patterns randomly. Packet loss makes missing in some positions of sound, as shown in **Figure 10**, and packet delay variation makes split in some positions of sound, as shown in

**Figure 11.** Sound discontinuity by packet delay variations, (1) early systolic murmur and (2) inspiratory stridor.

**Figure 10.** Sound discontinuity by packet loss, (1) early systolic murmur and (2) inspiratory stridor.

Sound missing consists of two patterns: (1) either pulse or breath missing, and (2) some parts of pulse or breath missing, as shown in **Figure 12**. A pulse or breath missing is caused by burst loss on transmission process which is narrow range damage to heart and lung sounds. For another pattern, some parts of pulses or breaths missing, it is caused by uncertain pattern loss on transmission process. This pattern is wide range damage to heart and lung sounds. The increase of packet loss level cannot specify the pattern of sound discontinuity; it can only

**Figure 12.** Behaviors of sound discontinuity on early systolic murmur after packet loss occurred, (1) a pulse and (2) some

Transformation is the effect caused by packet loss. When the sound missing in some positions like murmur shape, it may lead to transformation to another type, and pulse transformation

**Figure 13.** Samples of pulse transformations. (a) Early systolic murmur to heart normal (1) early systolic murmur (2) heart normal, (b) early systolic murmur to mid systolic murmur (1) early systolic murmur (2) mid-systolic murmur, (c) pan-systolic murmur to heart normal (1) pan-systolic murmur (2) heart normal, (d) pan-systolic murmur to late systolic murmur (1) pan-systolic murmur (2) late systolic murmur, (e) pan-systolic murmur to early systolic murmur (1) pansystolic murmur (2) early systolic murmur, (f) pan-systolic murmur to mid systolic murmur (1) pan-systolic murmur (2) mid systolic murmur.

does not happen every pulse. The following result analysis of pulse transformation, as some examples, are as follows: Early systolic murmur sound transforms to normal heart sound, is a result of missing of the murmur shape but S1 and S2 are still remaining, as shown in **Figure 13(a)**. Transformation of early systolic murmur to mid-systolic murmur is a result of missing of the beginning part of murmur shape, as shown in **Figure 13(b)**. Transformation of pan-systolic murmur to normal heart sound is a result of missing a murmur shape, as shown in **Figure 13(c)**. Transformation of pan-systolic murmur to late systolic murmur is a result of missing of the first half of murmur shape, as shown in **Figure 13(d)**. Transformation of pan-systolic murmur to early systolic murmur is a result of missing of the second half of murmur shape, as shown in **Figure 13(e)**. Transformation of pan-systolic murmur to mid systolic murmur is a result of missing of the beginning part and tail of murmur shape, as shown in **Figure 13(f)**.

## **5. Design for improving the quality of service**

Voice quality factors have been known for a long time, e.g., ITU guideline and standards [47], in tele-medicine applications, we do need to re-apply some techniques for this particular situation. The following design and implementation should be considered:


(e.g., more than 10%). We have concluded that with this UI designed, it helps for awareness of doctor's confidential level. Moreover, packet loss and delay pontificating levels can be

Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet…

http://dx.doi.org/10.5772/intechopen.74680

153

This work studies the effects of packet loss and packet delay variations to real-time tele-auscultation services over the Internet. We categorize communications models of tele-auscultation services to asynchronous and synchronous, client/server and peer-to-peer. Some important compositions of tele-auscultation are drawn out with prototype software demonstration. We then focus and study the characteristics of two body sounds: cardiac auscultation and lung auscultation when these sounds are transmitted on the Internet in real-time applications.

adjusted according to the doctor experience.

**Figure 15.** Real-time e-stethoscope signal with packet loss and packet delay indicators.

**Figure 14.** Jitter adaptation for time delay variation reduction.

**6. Conclusion**

We have shown above that packet loss and delay will affect the quality of hearing, as a result, symptom determining may be hesitated. According to our prototype testing, most physicians at the remote side are happy with an e-stethoscope signal showing on the screen. This will help the interpretation confidence level after they are familiar with. We tested UI with packet loss and delay indicator, as shown in **Figure 15**, to raise awareness of a doctor during interacting operation. The levels of packet loss and delay can be easily noticed, for example, green color means there is no packet loss and delay (or very few, e.g., 1%), yellow color means there a few packet loss and delay (e.g., 5%), red color means there are high packet loss and delay Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet… http://dx.doi.org/10.5772/intechopen.74680 153

**Figure 14.** Jitter adaptation for time delay variation reduction.

**Figure 15.** Real-time e-stethoscope signal with packet loss and packet delay indicators.

(e.g., more than 10%). We have concluded that with this UI designed, it helps for awareness of doctor's confidential level. Moreover, packet loss and delay pontificating levels can be adjusted according to the doctor experience.

#### **6. Conclusion**

does not happen every pulse. The following result analysis of pulse transformation, as some examples, are as follows: Early systolic murmur sound transforms to normal heart sound, is a result of missing of the murmur shape but S1 and S2 are still remaining, as shown in **Figure 13(a)**. Transformation of early systolic murmur to mid-systolic murmur is a result of missing of the beginning part of murmur shape, as shown in **Figure 13(b)**. Transformation of pan-systolic murmur to normal heart sound is a result of missing a murmur shape, as shown in **Figure 13(c)**. Transformation of pan-systolic murmur to late systolic murmur is a result of missing of the first half of murmur shape, as shown in **Figure 13(d)**. Transformation of pan-systolic murmur to early systolic murmur is a result of missing of the second half of murmur shape, as shown in **Figure 13(e)**. Transformation of pan-systolic murmur to mid systolic murmur is a result of missing of the beginning part and tail of murmur shape, as shown in **Figure 13(f)**.

Voice quality factors have been known for a long time, e.g., ITU guideline and standards [47], in tele-medicine applications, we do need to re-apply some techniques for this particular situ-

• ITU provides PLC (Packet loss concealment) technique for digital voice communications. However, waveform substitution (one technique in PLC) may not be appropriate. Zero

• Jitter buffer: jitter adaptation technique is deployed to reduce the effect of delay fluctuation due to the late or early arrival of voice packets (**Figure 14**). This will help the receiver-end hears sound in more comfortable level. However, due to real-time communication session condition, the buffer of delay absorption should be limited; we can have a small jitter buffer, e.g., few hundreds of milliseconds. In our experiment, 500 msec buffering (or 10 packets buffering) seems to

• Noise removing: as mentioned, the device is operated remotely, and we noticed that moving of stethoscope creates a lot of noise. This creates an uncomfortable situation to the remote side. We have applied two stages noise filtering technique. The first stage looks at noise within a single voice packet (50 msec time interval), while the second stage evaluates the average noise energy for three consecutive packets. This will help a doctor in remote

We have shown above that packet loss and delay will affect the quality of hearing, as a result, symptom determining may be hesitated. According to our prototype testing, most physicians at the remote side are happy with an e-stethoscope signal showing on the screen. This will help the interpretation confidence level after they are familiar with. We tested UI with packet loss and delay indicator, as shown in **Figure 15**, to raise awareness of a doctor during interacting operation. The levels of packet loss and delay can be easily noticed, for example, green color means there is no packet loss and delay (or very few, e.g., 1%), yellow color means there a few packet loss and delay (e.g., 5%), red color means there are high packet loss and delay

**5. Design for improving the quality of service**

152 eHealth - Making Health Care Smarter

ation. The following design and implementation should be considered:

insertion or silent insertion (another one in PLC) is more appropriate,

be good enough. This will be traffic engineering choice to vary this figure.

side working in convenience and comfortable way.

This work studies the effects of packet loss and packet delay variations to real-time tele-auscultation services over the Internet. We categorize communications models of tele-auscultation services to asynchronous and synchronous, client/server and peer-to-peer. Some important compositions of tele-auscultation are drawn out with prototype software demonstration. We then focus and study the characteristics of two body sounds: cardiac auscultation and lung auscultation when these sounds are transmitted on the Internet in real-time applications.

From our experiment results, based on medical professional staff verification, we have found that both sounds are more sensitive to packet delay variations than packet loss. Lung sound is more sensitive than heart sound due to its timing interpretation, to recognize the rhythm, pitch, and intensity. Some different levels of packet loss can be tolerated for both sounds, e.g., 10% for heart sounds, 2% for lung sounds. However, packet delay variation boundary of 50 msec is recommended. Based on our analysis, sound missing and split, and pulse transformations are the two factors that affect the sound quality. The pulse transformation result may lead to misinterpreting of abnormal sounds. We have also found that making distinct normal sounds is more accurate than abnormal sounds. In addition to our prototype software, we have concluded that real-time waveform of auscultation's visualization can help physician confident level for sound interpreting. Moreover, showing the ratio of packet loss and delay variations in clear icon will raise awareness and increase the success rate and effective outcomes.

[4] World Health Organization. Telemedicine: Opportunities and Developments in Member States: Report on the Second Global Survey on eHealth. World Health Organization; 2010

Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet…

http://dx.doi.org/10.5772/intechopen.74680

155

[5] Kijsanayotin B, Kasitipradith N, Pannarunothai S. eHealth in Thailand: The current sta-

[6] Ekeland AG, Bowes A, Flottorp S. Effectiveness of telemedicine: A systematic review of

[7] Dávalos ME, French MT, Burdick AE, Simmons SC. Economic evaluation of telemedicine: Review of the literature and research guidelines for benefit–cost analysis. Telemedicine

[8] Eron L. Telemedicine: The future of outpatient therapy?. Clinical Infectious Diseases.

[9] Armfield NR, Bradford M, Bradford NK. The clinical use of Skype—For which patients, with which problems and in which settings? A snapshot review of the literature.

[10] Nield M, Hoo GW. Real-time telehealth for COPD self-management using skype™. COPD: Journal of Chronic Obstructive Pulmonary Disease. 2012;**9**(6):611-619

[11] Reynolds HN, Rogove H, Bander J, McCambridge M, Cowboy E, Niemeier M. A working lexicon for the tele-intensive care unit: We need to define tele-intensive care unit to

[12] Brecher DB. The use of Skype in a community hospital inpatient palliative medicine

[13] Engle X, Aird J, Tho L, Bintcliffe F, Monsell F, Gollogly J, Noor S. Combining continuing education with expert consultation via telemedicine in Cambodia. Tropical Doctor.

[14] Lee JF, Schieltz KM, Suess AN, Wacker DP, Romani PW, Lindgren SD, Kopelman TG, Dalmau YCP. Guidelines for developing telehealth services and troubleshooting problems with telehealth technology when coaching parents to conduct functional analyses and functional communication training in their homes. Behavior Analysis in Practice.

[15] Armstrong DG, Giovinco N, Mills JL, Rogers LC. FaceTime for physicians: Using real time mobile phone–based videoconferencing to augment diagnosis and care in telemed-

[16] Brandt R, Hensley D. Teledermatology: The use of ubiquitous technology to redefine traditional medical instruction, collaboration, and consultation. The Journal of Clinical

[17] Cola C, Valean H. E-health appointment solution, a web based approach. In: Proceedings

[18] Vidul AP, Hari S, Pranave KP, Vysakh KJ, Archana KR. Telemedicine for emergency care management using WebRTC. In: Proceedings of Advances in Computing,

of E-Health and Bioengineering Conference (EHB), IEEE; 2015. p. 1-4

grow and understand it. Telemedicine and e-Health. 2011;**17**(10):773-783

consultation service. Journal of Palliative Medicine. 2013;**16**(1):110-112

tus. Studies in Health Technology and Informatics. 2010;**160**(Pt 1):376

International Journal of Medical Informatics. 2015;**84**(10):737-742

and e-Health. 2009;**15**(10):933-948

2010;**51**(Supplement no 2):S224-S230

2013:0049475513515654

2015;**8**(2):190-200

icine. Eplasty. 2011;**11**

and Aesthetic Dermatology. 2012;**5**(11):35

reviews. International Journal of Medical Informatics. 2010;**79**(11):736-771

## **Acknowledgements**

This work is supported by the Higher Education Research Promotion and National Research University Project of Thailand, Office of the Higher Education Commission (under the funding no. MED540548S at Prince of Songkla University).

#### **Author details**

Sinchai Kamolphiwong1 \*, Thossapon Kamolphiwong1 , Soontorn Saechow<sup>1</sup> and Verapol Chandeeying2

\*Address all correspondence to: sinchai.k@psu.ac.th

1 Department of Computer Engineering, Faculty of Engineering, Prince of Songkla University, Hatyai, Songkla, Thailand

2 Faculty of Medicine, University of Phayao, Muang, Phayao, Thailand

#### **References**


[4] World Health Organization. Telemedicine: Opportunities and Developments in Member States: Report on the Second Global Survey on eHealth. World Health Organization; 2010

From our experiment results, based on medical professional staff verification, we have found that both sounds are more sensitive to packet delay variations than packet loss. Lung sound is more sensitive than heart sound due to its timing interpretation, to recognize the rhythm, pitch, and intensity. Some different levels of packet loss can be tolerated for both sounds, e.g., 10% for heart sounds, 2% for lung sounds. However, packet delay variation boundary of 50 msec is recommended. Based on our analysis, sound missing and split, and pulse transformations are the two factors that affect the sound quality. The pulse transformation result may lead to misinterpreting of abnormal sounds. We have also found that making distinct normal sounds is more accurate than abnormal sounds. In addition to our prototype software, we have concluded that real-time waveform of auscultation's visualization can help physician confident level for sound interpreting. Moreover, showing the ratio of packet loss and delay variations in

clear icon will raise awareness and increase the success rate and effective outcomes.

\*, Thossapon Kamolphiwong1

1 Department of Computer Engineering, Faculty of Engineering, Prince of Songkla

[1] World Health Organization. Increasing Access to Health Workers in Remote and Rural Areas Through Improved Retention: Global policy recommendations. World Health

[2] Dummer TJB, Cook IG. Exploring China's rural health crisis: Processes and policy impli-

[3] Agyepong IA, Adjei S. Public social policy development and implementation: A case study of the Ghana National Health Insurance scheme. Health Policy and Planning.

2 Faculty of Medicine, University of Phayao, Muang, Phayao, Thailand

This work is supported by the Higher Education Research Promotion and National Research University Project of Thailand, Office of the Higher Education Commission (under the fund-

, Soontorn Saechow<sup>1</sup>

and

**Acknowledgements**

154 eHealth - Making Health Care Smarter

**Author details**

**References**

Sinchai Kamolphiwong1

Verapol Chandeeying2

ing no. MED540548S at Prince of Songkla University).

\*Address all correspondence to: sinchai.k@psu.ac.th

cations. Health Policy. 2007;**83**(1):1-16

University, Hatyai, Songkla, Thailand

Organization; 2010

2008;**23**(2):150-160


Communications and Informatics (ICACCI), International Conference on, IEEE; 2015. pp. 1741-1745

[32] 3M™ Littmann® stethoscope. Electronic Stethoscope Model 3200. Internet: http:// solutions.3m.com/wps/portal/3M/en\_US/Littmann/stethoscope/electronic-auscultation/

Real-Time Tele-Auscultation Consultation Services over the Internet: Effects of the Internet…

http://dx.doi.org/10.5772/intechopen.74680

157

[33] Rosenberg, Jonathan, Henning Schulzrinne, Gonzalo Camarillo, Alan Johnston, Jon Peterson, Robert Sparks, Mark Handley, and Eve Schooler. SIP: Session Initiation

[34] Jacobson V, Frederick R, Casner S, Schulzrinne H. RTP: A transport protocol for real-

[35] Miller-Keane Encyclopedia and Dictionary of Medicine, Nursing, and Allied Health. 7th Edition. S.v. "cardiac auscultation". Available from http://medical-dictionary.thefreedic-

[36] Tavel ME. Cardiac auscultation a glorious past—And it does have a future! Circulation.

[37] Chizner MA. Cardiac auscultation: Rediscovering the lost art. Current Problems in

[38] Mangione S, Nieman LZ, Gracely E, Kaye D. The teaching and practice of cardiac auscultation during internal medicine and cardiology training: A nationwide survey. Annals of

[39] Bohadana A, Izbicki G, Kraman SS. Fundamentals of lung auscultation. New England

[40] Scott PR. Lung auscultation recordings from normal sheep and from sheep with welldefined respiratory tract pathology. Small Ruminant Research. 2010;**92**(1):104-107 [41] 3M™ Littmann® stethoscope. Basic Heart Sounds Course. Available from: http://www. littmann.ca/wps/portal/3M/en\_CA/3M-Littmann-CA/stethoscope/littmann-learning-

[42] 3M™ Littmann® stethoscope. Listen to Lung Sounds. Available from: http://www.littmann.ca/wps/portal/3M/en\_CA/3M-Littmann-CA/stethoscope/littmann-learning-insti-

[43] Gilbert EN. Capacity of a burst-noise channel. Bell System Technical Journal. 1960;**39**(5):

[44] Elliott EO. Estimates of error rates for codes on burst-noise channels. Bell System

[45] Haßlinger G, Hohlfeld O. The Gilbert-Elliott model for packet loss in real time services on the internet. In: Proceedings of Measuring, Modelling and Evaluation of Computer

[46] Saaty TL. Elements of Queueing Theory: With Applications. New York: McGraw-Hill;

[47] ITU-T Recommendations G.114 (05/2003). Available from: https://www.itu.int/rec/T-

and Communication Systems (MMB), 14th GI/ITG Conference. 2008. pp. 1-15

institute/heart-lung-sounds/heart-sounds/. [Accessed: June 10, 2015]

tute/heart-lung-sounds/lung-sounds/. [Accessed: June 10, 06]

Protocol. Vol. 23. RFC 3261, Internet Engineering Task Force; 2002

tionary.com/Cardiac+auscultation [Accessed: June 10, 2015]

model-3000-series/ [Accessed: June 10, 2015]

time applications. IETF RFC. 2003:3550

2006;**113**(9):1255-1259

1253-1265

1961

Cardiology. 2008;**33**(7):326-408

Internal Medicine. 1993;**119**(1):47-54

Journal of Medicine. 2014;**370**(8):744-751

Technical Journal. 1963;**42**(5):1977-1997

REC-G.114/en [Accessed: June 10, 2015]


[32] 3M™ Littmann® stethoscope. Electronic Stethoscope Model 3200. Internet: http:// solutions.3m.com/wps/portal/3M/en\_US/Littmann/stethoscope/electronic-auscultation/ model-3000-series/ [Accessed: June 10, 2015]

Communications and Informatics (ICACCI), International Conference on, IEEE; 2015.

[19] Jang-Jaccard J, Nepal S, Celler B, Yan B. WebRTC-based video conferencing service for

[20] Bolot Jean-Chrysotome. End-to-end packet delay and loss behavior in the internet. In: ACM SIGCOMM Computer Communication Review. Vol. 23, No. 4. New York, NY,

[21] Demichelis C, Chimento P. "IP Packet Delay Variation Metric for IP Performance Metrics

[22] Dorland's Medical Dictionary for Health Consumers. S.v. "auscultation". Available from: http://medical-dictionary.thefreedictionary.com/auscultation [Accessed: February

[23] The American Heritage® Medical Dictionary. S.v. "auscultation". Available from: http:// medical-dictionary.thefreedictionary.com/auscultation, [Accessed: February 10, 2016]

[24] Miller-Keane Encyclopedia and Dictionary of Medicine, Nursing, and Allied Health. 7th edition. S.v. "stethoscope". Available from: http://medical-dictionary.thefreedictionary.

[25] McGraw-Hill Concise Dictionary of Modern Medicine. S.v. "stethoscope". Retrieved February 10 2016 from: http://medical-dictionary.thefreedictionary.com/stethoscope

[26] Myint WW, Dillard B. An electronic stethoscope with diagnosis capability. In System Theory, 2001. In: Proceedings of the 33rd Southeastern Symposium on, IEEE; 2001. pp.

[27] Johanson M, Gustafsson M, Johansson L-A. A remote auscultation tool for advanced

[28] Xu, Lisheng, Ying Wang, Yue Wang, Ning Geng, Yao Jiang, Guanxiong Wang, Jiajin Liu, and Cong Feng. "The design and implementation of telemedical consulting system for auscultation." In: Proceedings of 2011 IEEE International Conference on Information

[29] McMechan, Christian, Irina Morozov, Aaron Patten, and Poman So. "Tele-auscultation system." In: Proceedings of Broadband and Wireless Computing, Communication and Applications (BWCCA), 2011 International Conference on, IEEE; 2011. pp. 478-481

[30] Foche-Perez I, Ramirez-Payba R, Hirigoyen-Emparanza G, Balducci-Gonzalez F, Simo-Reigadas F-J, Seoane-Pascual J, Corral-Peñafiel J, Martinez-Fernandez A. An open real-

[31] Lu B-Y, Hsu L-Y, Wu H-D, Hsueh M-L, Sing S-S, Tang R-H, Su M-J, Wang J-C, Lai J-S. Real-time mobile-to-mobile stethoscope for distant healthcare. In: Proceedings of Advanced Communication Technology (ICACT), 2014 16th International Conference on,

time tele-stethoscopy system. BioMedical Engineering. 1186;**11**(2012):57

home health-care. Journal of Telemedicine and Telecare. 2002;**8**(2):45-46

pp. 1741-1745

156 eHealth - Making Health Care Smarter

(IPPM); 2002

10, 2016]

133-137

telehealth. Computing. 2016;**98**(1-2):169-193

com/stethoscope. [Accessed: February 10, 2016]

and Automation, IEEE; 2011. pp. 242-247

IEEE. 2014. pp. 151-156

USA: ACM; 1993. pp. 289-298


**Chapter 10**

**Provisional chapter**

**Exploring the Interrelationship of Risk Factors for**

**Exploring the Interrelationship of Risk Factors for** 

DOI: 10.5772/intechopen.75033

In developing countries like Africa, the physician-to-population ratio is below the World Health Organization (WHO) minimum recommendation. Because of the limited resource setting, the healthcare services did not get the equity of access to the use of health services, the sustainable health financing, and the quality of healthcare service provision. Efficient and effective teaching, alerting, and recommendation system are required to support the activities of the healthcare service. To alleviate those issues, creating a competitive eHealth knowledge-based system (KBS) will bring unlimited benefit. In this study, Apriori techniques are applied to malaria dataset to explore the degree of the association of risk factors. And then, integrate the output of data mining (i.e., the interrelationship of risk factors) with knowledgebased reasoning. Nearest neighbor retrieval algorithms (for retrieval) and voting method (to

reuse tasks) are used to design and deliver personalized knowledge-based system. **Keywords:** knowledge-based system, eHealth, pattern discovery, data mining,

In Africa, on average there are nine hospital beds per 10,000 people in comparison to the world average of 27. In sub-Saharan Africa, the physician-to-population ratio is the lowest in the world [1, 2]. Countries like Ethiopia set a strategic plan to improve access and equity to preventive, essential health interventions at the village and household levels to ensure health-

On the one hand, because of the limited resource setting, the healthcare services have not got the equity of access to the use of health services, the sustainable health financing, and the

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

**Supporting eHealth Knowledge-Based System**

**Supporting eHealth Knowledge-Based System**

Geletaw Sahle Tegenaw

Geletaw Sahle Tegenaw

**Abstract**

association rule

care coverage in rural areas [3, 4].

**1. Introduction**

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75033

#### **Exploring the Interrelationship of Risk Factors for Supporting eHealth Knowledge-Based System Exploring the Interrelationship of Risk Factors for Supporting eHealth Knowledge-Based System**

DOI: 10.5772/intechopen.75033

#### Geletaw Sahle Tegenaw Geletaw Sahle Tegenaw

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75033

#### **Abstract**

In developing countries like Africa, the physician-to-population ratio is below the World Health Organization (WHO) minimum recommendation. Because of the limited resource setting, the healthcare services did not get the equity of access to the use of health services, the sustainable health financing, and the quality of healthcare service provision. Efficient and effective teaching, alerting, and recommendation system are required to support the activities of the healthcare service. To alleviate those issues, creating a competitive eHealth knowledge-based system (KBS) will bring unlimited benefit. In this study, Apriori techniques are applied to malaria dataset to explore the degree of the association of risk factors. And then, integrate the output of data mining (i.e., the interrelationship of risk factors) with knowledgebased reasoning. Nearest neighbor retrieval algorithms (for retrieval) and voting method (to reuse tasks) are used to design and deliver personalized knowledge-based system.

**Keywords:** knowledge-based system, eHealth, pattern discovery, data mining, association rule

#### **1. Introduction**

In Africa, on average there are nine hospital beds per 10,000 people in comparison to the world average of 27. In sub-Saharan Africa, the physician-to-population ratio is the lowest in the world [1, 2]. Countries like Ethiopia set a strategic plan to improve access and equity to preventive, essential health interventions at the village and household levels to ensure healthcare coverage in rural areas [3, 4].

On the one hand, because of the limited resource setting, the healthcare services have not got the equity of access to the use of health services, the sustainable health financing, and the

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

quality of healthcare service provision. The physician-to-population ratio is below the World Health Organization (WHO) minimum recommendations [1, 2]. Still, pneumonia, diarrhea, acute upper respiratory tract infection, acute febrile illness, and malaria account for 64% under five morbidity [5].

learning, understanding, emotions, consciousness, intuition and creativity, language capacity, etc.). On the one hand, KBS is advantageous when there is shortage of expert, decisionmaking for problem-solving needs an intelligent assistant, expertise is needed to be stored for future use, and so on. On the other hand, KBS faces a lot of challenges due to the abstract nature of knowledge, limitation of cognitive science, and other scientific methods [9, 10].

Exploring the Interrelationship of Risk Factors for Supporting eHealth Knowledge-Based System

http://dx.doi.org/10.5772/intechopen.75033

161

Knowledge representation and inference engine are the two building blocks of KBS. The knowledge acquired from experts, documents, books, and other resources has been organized using knowledge representation. The inference engine gets the knowledge and instructs how to use the knowledge to solve problems using rule- or case-based reasoning. Rule-based reasoning is a technique that reasons out about a problem based on the knowledge that is represented in the form of rules [11]. Case-based system represents situations or domain knowledge in the form of cases, and it uses case-based reasoning technique to solve new

Knowledge-based system (in health and medical domain) has made a remarkable effect through providing a reliable diagnostic and cost-effective service. Several systems have been implemented in different medical areas like cancer therapy, infections, blood diseases, general internal medicine, glaucoma, and pulmonary function tests [13, 14]. Such systems can be designed to exhaustively consider all possible diseases in a domain, which could outperform human experts to achieve a rapid and accurate diagnosis. Integrating and updating domain knowledge with knowledge discovery are relevant to increase the interestingness and user belief (such as matching discovered pattern with existing knowledge) [15]. Integrated eHealth knowledge-based system based on acquiring health knowledge will support users in exchange of knowledge and accessibility for the users through data collection, care documen-

Following and implementing a hybrid (integrated) intelligent system for medical data classification is good to produce effective knowledge-based system [17]. A promising result is scored through integration to improve the quality of knowledge-based system [17, 18]. For instance, integrating the result (rule) of the PART classification algorithm with the knowledge-based system has delivered a favorable result for the diagnosis and treatment of visceral leishmaniasis [19]. The paper by Seera and Lim has experimented and used fuzzy min-max neural networks to learn incrementally from sample data, classification, and regression tree for prediction and random forest model to achieve high classification performance [17]. Kerdprasop and Kerdprasop also tried to automate data mining model by focusing on post data mining process to step of automatic knowledge deployment using induced knowledge and formal-

However, the need to work more in providing explanatory rules and handling missing data in real-world application is expected. A mechanism for handling irrelevant rule (result) is required in case of inductive experiment system and so on. To alleviate those issues, in our case study, we tried to explore the pattern and the interrelationship of risk factors for supporting the knowledge-based system as well as prediction of malaria death occurrences/cases. The result will increase the interestingness and belief of eHealth knowledge-based system which

problems or to handle new situations [12].

tation, and knowledge extraction [16].

ization classification rules [20].

will bring unlimited benefit in low resource setting.

*About 68% of the country's total population living in areas at risk of malaria, 75% of the country is vulnerable to malaria (defined as areas <2000 m, those areas are fertile and suitable for agriculture and accounts for up to 17% of outpatient consultations, 15% of admissions and 29% of inpatient deaths.* [6, 7]

On the other hand, it has been more than four decades the computer program reasons (e.g., MYCIN was developed in the early 1970s) and uses knowledge to assist the domain experts and minimize routine activities. To prevent and control the crisis of malaria, different scholars and responsible bodies have made remarkable efforts by conducting researches, implementing strategies, and policies [6, 7]. A predictive data mining model has been constructed using Ethiopian WHO malaria database, metrological database, and national mapping database [8]. The model is accurate to determine the occurrence of death, and it is good enough to identify the cases.

However, the system is looking a mechanism to assist routine healthcare activities, administrative and medical cost, demographic challenges, and equitable health distribution. For instance, the health extension workers (HEWs) assist peripheral health services by bridging the gap between the communities and health facilities [3, 4]. Each kebele has two HEWs responsible for providing outreach services. A kebele is the smallest governmental administrative unit and on average has a population of 5000 people. The HEWs teach the community house to house for each and every person in the kebele in order to create and promote healthy lifestyles. In all, the healthcare system is searching a mechanism for teaching, alerting, and recommendation system to support the daily routine activities.

To alleviate those issues, creating a competitive eHealth knowledge-based system (KBS) is the main goal of this work which will bring unlimited benefit in low resource setting. As a case study, we choose malaria (malaria dataset) because malaria prevention and control at the community level face numerous challenges because of the climate condition (temperature, rainfall), epidemiological, and genetic, poverty, malaria outbreak, over prescription for positive result and so on. Knowing the pattern and interrelationship of risk factors is important for supporting knowledge-based system as well as prediction of malaria death occurrences/ cases. An attempt is made for exploring the degree of association between malaria risk factors (related to the malaria death occurrence and case identification). Investigating the degree of interrelationship among risk factors will have a great contribution toward eradicating the outbreak of malaria. The outcome of the study helps to mitigate the severity through investigating the association of risk factors and building of a competitive knowledge-based system.

#### **2. Literature review**

Knowledge-based systems is aimed to understand or bring human-level intelligence through simulating or acting one or more of intelligent behaviors (such as thinking, problem-solving, learning, understanding, emotions, consciousness, intuition and creativity, language capacity, etc.). On the one hand, KBS is advantageous when there is shortage of expert, decisionmaking for problem-solving needs an intelligent assistant, expertise is needed to be stored for future use, and so on. On the other hand, KBS faces a lot of challenges due to the abstract nature of knowledge, limitation of cognitive science, and other scientific methods [9, 10].

quality of healthcare service provision. The physician-to-population ratio is below the World Health Organization (WHO) minimum recommendations [1, 2]. Still, pneumonia, diarrhea, acute upper respiratory tract infection, acute febrile illness, and malaria account for 64%

*About 68% of the country's total population living in areas at risk of malaria, 75% of the country is vulnerable to malaria (defined as areas <2000 m, those areas are fertile and suitable for agriculture and accounts for up to 17% of outpatient consultations, 15% of admissions and 29% of inpatient deaths.* [6, 7] On the other hand, it has been more than four decades the computer program reasons (e.g., MYCIN was developed in the early 1970s) and uses knowledge to assist the domain experts and minimize routine activities. To prevent and control the crisis of malaria, different scholars and responsible bodies have made remarkable efforts by conducting researches, implementing strategies, and policies [6, 7]. A predictive data mining model has been constructed using Ethiopian WHO malaria database, metrological database, and national mapping database [8]. The model is accurate to determine the occurrence of death, and it is good enough to identify

However, the system is looking a mechanism to assist routine healthcare activities, administrative and medical cost, demographic challenges, and equitable health distribution. For instance, the health extension workers (HEWs) assist peripheral health services by bridging the gap between the communities and health facilities [3, 4]. Each kebele has two HEWs responsible for providing outreach services. A kebele is the smallest governmental administrative unit and on average has a population of 5000 people. The HEWs teach the community house to house for each and every person in the kebele in order to create and promote healthy lifestyles. In all, the healthcare system is searching a mechanism for teaching, alerting, and

To alleviate those issues, creating a competitive eHealth knowledge-based system (KBS) is the main goal of this work which will bring unlimited benefit in low resource setting. As a case study, we choose malaria (malaria dataset) because malaria prevention and control at the community level face numerous challenges because of the climate condition (temperature, rainfall), epidemiological, and genetic, poverty, malaria outbreak, over prescription for positive result and so on. Knowing the pattern and interrelationship of risk factors is important for supporting knowledge-based system as well as prediction of malaria death occurrences/ cases. An attempt is made for exploring the degree of association between malaria risk factors (related to the malaria death occurrence and case identification). Investigating the degree of interrelationship among risk factors will have a great contribution toward eradicating the outbreak of malaria. The outcome of the study helps to mitigate the severity through investigating the association of risk factors and building of a competitive knowledge-based system.

Knowledge-based systems is aimed to understand or bring human-level intelligence through simulating or acting one or more of intelligent behaviors (such as thinking, problem-solving,

recommendation system to support the daily routine activities.

under five morbidity [5].

160 eHealth - Making Health Care Smarter

**2. Literature review**

the cases.

Knowledge representation and inference engine are the two building blocks of KBS. The knowledge acquired from experts, documents, books, and other resources has been organized using knowledge representation. The inference engine gets the knowledge and instructs how to use the knowledge to solve problems using rule- or case-based reasoning. Rule-based reasoning is a technique that reasons out about a problem based on the knowledge that is represented in the form of rules [11]. Case-based system represents situations or domain knowledge in the form of cases, and it uses case-based reasoning technique to solve new problems or to handle new situations [12].

Knowledge-based system (in health and medical domain) has made a remarkable effect through providing a reliable diagnostic and cost-effective service. Several systems have been implemented in different medical areas like cancer therapy, infections, blood diseases, general internal medicine, glaucoma, and pulmonary function tests [13, 14]. Such systems can be designed to exhaustively consider all possible diseases in a domain, which could outperform human experts to achieve a rapid and accurate diagnosis. Integrating and updating domain knowledge with knowledge discovery are relevant to increase the interestingness and user belief (such as matching discovered pattern with existing knowledge) [15]. Integrated eHealth knowledge-based system based on acquiring health knowledge will support users in exchange of knowledge and accessibility for the users through data collection, care documentation, and knowledge extraction [16].

Following and implementing a hybrid (integrated) intelligent system for medical data classification is good to produce effective knowledge-based system [17]. A promising result is scored through integration to improve the quality of knowledge-based system [17, 18]. For instance, integrating the result (rule) of the PART classification algorithm with the knowledge-based system has delivered a favorable result for the diagnosis and treatment of visceral leishmaniasis [19]. The paper by Seera and Lim has experimented and used fuzzy min-max neural networks to learn incrementally from sample data, classification, and regression tree for prediction and random forest model to achieve high classification performance [17]. Kerdprasop and Kerdprasop also tried to automate data mining model by focusing on post data mining process to step of automatic knowledge deployment using induced knowledge and formalization classification rules [20].

However, the need to work more in providing explanatory rules and handling missing data in real-world application is expected. A mechanism for handling irrelevant rule (result) is required in case of inductive experiment system and so on. To alleviate those issues, in our case study, we tried to explore the pattern and the interrelationship of risk factors for supporting the knowledge-based system as well as prediction of malaria death occurrences/cases. The result will increase the interestingness and belief of eHealth knowledge-based system which will bring unlimited benefit in low resource setting.

#### **2.1. Research aim and objectives**

Investigating the potential applicability of exploring the interrelationship of risk factors using data mining to create a competitive eHealth knowledge-based system is the main goal of this work.

#### **2.2. Methodology**

Cross industry standard process for data mining (CRISP-DM) methodology is adopted to investigate the interrelationship of malaria risk factors. Then, to design eHealth knowledgebased system, nearest neighbor retrieval algorithms (for retrieval) and voting method (to reuse tasks) are used. The technique is easy in exploring relevant cases and provides an opportunity to retrieve partially matching cases [21–24].

**Table 1.** Summary of uncomplicated malaria less than 5 years.

Exploring the Interrelationship of Risk Factors for Supporting eHealth Knowledge-Based System

http://dx.doi.org/10.5772/intechopen.75033

163

**Table 2.** Malaria with severe anemia and list of attributes constructed.

**Figure 1.** Summary of attributes and their transformed values.

The malaria data is collected from a zonal health facility in each of the 86 zones of Ethiopia. To understand the problem domain, we used observation, interviewing with experts and data managers and reviewing documents, reports, and literatures. This helps us to select and integrate decisive attributes from different sources. The data selected from the WHO (World Health Organization) database is integrated with the decisive attributes (like temperature, rainfall, and altitude) extracted from the Ethiopian National Meteorological Agency and Mapping Agency in order to find the association of risk factors.

Exploratory data analysis is performed to get familiar with the data and prepared for investigating the degree of interrelationship. The data mining task is finding the internal association between data elements that will determine the occurrence of death/case. To maintain the quality of the data, preprocessing tasks such as data cleaning (handling missing values, noisy, and outer values), data integration tasks, and data transformation tasks are performed.

The collected Ethiopian WHO malaria database contains five basic attributes (more than 37, 000 records) that provide information about the geographic location and period of coverage. These attributes include country name, region (administrative regions from which malaria information is collected), zone and health facility name, year, and month. The detailed information of the attributes is categorized based on the WHO standards and explicitly represents the detailed information about malaria in each zone of the region across Ethiopia. These categories contain age (less than, equal, or greater than 5 year), malaria type (*P. vivax* and *P. falciparum*), cases (inpatient and outpatient), inpatient cases (cases and deaths), severe anemia (inpatient malaria cases less than 5 years and greater than 5 years), and uncomplicated lab-confirmed malaria less than 5 years and greater than 5 years (*P. vivax* outpatient cases and *P. falciparum* outpatient cases). Each attribute is preprocessed and statistically summarized into address, patient profile, weather, and altitude. For example, **Table 1** presented the statistical summary of uncomplicated malaria less than 5 years of lab-confirmed *Plasmodium falciparum*.

In order to extract hidden patterns and relationships within the data, from the initial dataset, a number of attributes are constructed. As shown in **Table 2**, from malaria with severe anemia, attributes such as age, malaria type, cases, and malaria visits, as well as the number of cases and deaths, are constructed.

Summary of the datasets compiled for association rule discovery with their possible nominal values and description is depicted in **Figure 1**. The malaria dataset used for this study consists Exploring the Interrelationship of Risk Factors for Supporting eHealth Knowledge-Based System http://dx.doi.org/10.5772/intechopen.75033 163


**Table 1.** Summary of uncomplicated malaria less than 5 years.

**2.1. Research aim and objectives**

162 eHealth - Making Health Care Smarter

to retrieve partially matching cases [21–24].

Mapping Agency in order to find the association of risk factors.

work.

**2.2. Methodology**

Investigating the potential applicability of exploring the interrelationship of risk factors using data mining to create a competitive eHealth knowledge-based system is the main goal of this

Cross industry standard process for data mining (CRISP-DM) methodology is adopted to investigate the interrelationship of malaria risk factors. Then, to design eHealth knowledgebased system, nearest neighbor retrieval algorithms (for retrieval) and voting method (to reuse tasks) are used. The technique is easy in exploring relevant cases and provides an opportunity

The malaria data is collected from a zonal health facility in each of the 86 zones of Ethiopia. To understand the problem domain, we used observation, interviewing with experts and data managers and reviewing documents, reports, and literatures. This helps us to select and integrate decisive attributes from different sources. The data selected from the WHO (World Health Organization) database is integrated with the decisive attributes (like temperature, rainfall, and altitude) extracted from the Ethiopian National Meteorological Agency and

Exploratory data analysis is performed to get familiar with the data and prepared for investigating the degree of interrelationship. The data mining task is finding the internal association between data elements that will determine the occurrence of death/case. To maintain the quality of the data, preprocessing tasks such as data cleaning (handling missing values, noisy, and

The collected Ethiopian WHO malaria database contains five basic attributes (more than 37, 000 records) that provide information about the geographic location and period of coverage. These attributes include country name, region (administrative regions from which malaria information is collected), zone and health facility name, year, and month. The detailed information of the attributes is categorized based on the WHO standards and explicitly represents the detailed information about malaria in each zone of the region across Ethiopia. These categories contain age (less than, equal, or greater than 5 year), malaria type (*P. vivax* and *P. falciparum*), cases (inpatient and outpatient), inpatient cases (cases and deaths), severe anemia (inpatient malaria cases less than 5 years and greater than 5 years), and uncomplicated lab-confirmed malaria less than 5 years and greater than 5 years (*P. vivax* outpatient cases and *P. falciparum* outpatient cases). Each attribute is preprocessed and statistically summarized into address, patient profile, weather, and altitude. For example, **Table 1** presented the statistical summary of uncom-

In order to extract hidden patterns and relationships within the data, from the initial dataset, a number of attributes are constructed. As shown in **Table 2**, from malaria with severe anemia, attributes such as age, malaria type, cases, and malaria visits, as well as the number of cases

Summary of the datasets compiled for association rule discovery with their possible nominal values and description is depicted in **Figure 1**. The malaria dataset used for this study consists

outer values), data integration tasks, and data transformation tasks are performed.

plicated malaria less than 5 years of lab-confirmed *Plasmodium falciparum*.

and deaths, are constructed.


**Table 2.** Malaria with severe anemia and list of attributes constructed.

**Figure 1.** Summary of attributes and their transformed values.

of 14 attributes. The first inner part mainly presents the category of the attributes (i.e., profile, weather, address, and date), the second inner part presents the list of attributes constructed from category, and the last inner part indicates the value of each attribute. The region and zones contain all administrative regions in Ethiopia and the locations where the patients live. The date (year and month) indicates the year and the months that the patients visit the zonal health center. The age indicates the category of patients usually classified under 5 and greater or equal to 5. The type of malaria visits is classified into inpatient and outpatient malaria visits. The attribute type of case indicates the category of cases called in pregnancy, severe anemia, and uncomplicated lab-confirmed malaria. The type of malaria attributes contains PV, PF, and not-known values. In case of severe anemia and in pregnancy, the type of malaria is not known or determined in the dataset. A number of cases and deaths indicate the total number of cases and deaths, respectively. The attribute occurrence of deaths contains not probable (i.e., no deaths exist), probable (deaths exist), and undetermined values. The rainfall attribute contains a numeric value to represent average rainfall. In the case of outpatient visit, in pregnancy and uncomplicated lab-confirmed malaria, death is not known or listed in the dataset. The transformed temperature values 1, 2, 3, 4, 5, 6, 7, 8, and 9 represents 0–50c, 5–100c, 10–150c,16–200c, 21–250c, 25–300c, 31–350c, 35–400c, and >400c, respectively. The transformed altitude values T1,T2, T3, T4, T5, T6, T7, and T8 represent >3500 m, 2500–3500 m, 2000–2500 m, 1500–2000 m, 1000–1500 m, 500–1000 m, 0–500 m, and <zero values, respectively.

general, support of an association rule is the frequency of occurrence of the set of items it mentions, and confidence of this association rule is the probability of j given i1,…,ik. It is the number of transactions with i1,…,ik containing item j. This will measure the strength of asso-

Exploring the Interrelationship of Risk Factors for Supporting eHealth Knowledge-Based System

http://dx.doi.org/10.5772/intechopen.75033

165

The key concepts are frequent item sets (the set of items which have minimum support, denoted by Li for ith-item set), a priori property (any subset of frequent item set must be frequent), and join operation (to find Lk, a set of candidate k-item sets are generated by joining Lk-1 with itself). Once frequent item sets are obtained, it is straightforward to generate association rules with confidence larger than or equal to a user-specified minimum support and minimum confidence. The next top quality of the Apriori algorithm is to implement its achievement of good performance by reducing the size of candidate sets that are considered

A class implementing an Apriori-type algorithm iteratively reduces the minimum support

For mining Weka (Waikato Environment for Knowledge Analysis), knowledge discovery tool using Java is used. In Weka 3.7.3, if class association rule (car) property is enabled, the class association is mined instead of (general) association rules. Class classification generates rules that are frequently happening to the probable occurrence of malaria cases. In many studies, associative classification has been found to be more accurate than some traditional classification methods, such as C4.5 [25]. Associative classification can search strong associations between frequent patterns (conjunctions of attribute-value pairs) and class labels. Because association rules explore highly confident associations among multiple attributes, this approach may overcome some constraints introduced by decision tree induction, which considers only one attribute at a time. In all, in association of rule mining, finding all the rules that satisfy both a minimum support and a minimum confidence threshold is important so as

Knowledge-based systems are computer programs that try to solve problems in a human expertlike fashion by using knowledge about the application domain (knowledge base) and problemsolving techniques (inference method). The rule-based reasoning technique can be used with other reasoning techniques in order to make a knowledge-based system more efficient. For example, case-based and rule-based reasoning can be used together. Rule-based system is an example of knowledge-based system that uses rules for knowledge representation and rule-based reasoning for reasoning techniques. The development of knowledge-based systems in medical areas has made it possible to provide reliable and thorough diagnostic services with a minimum cost. Such systems can be designed to exhaustively consider all possible diseases in a domain, which could outperform human experts to achieve a rapid and accurate diagnosis. Several systems have been implemented in different medical areas like cancer therapy, infections, blood diseases,

until it finds the required number of rules with the given minimum confidence [29].

to generate strong and interesting rules from the frequent patterns.

general internal medicine, glaucoma, and pulmonary function tests [13, 14].

**4. eHealth knowledge-based system**

ciations between i1, i2,…,ik, and j.

and selected for frequent k-item set [28].

#### **3. Apriori method**

The Apriori algorithm (a well-known association rule discovery method) takes a dataset with a list of items that can be easily transformed into a transaction form by creating an item for each attribute value pair that exists in the dataset [25–27]. Minimum support and minimum confidence thresholds are also defined to enable Apriori algorithms identify frequent items that are strongly associated. **Table 3** presented the step-by-step procedure to mine and extract frequent items using Apriori methods.

Given a support threshold (S), sets of X items that appear in greater than or equal to S baskets are called frequent item sets. Find all rules on item sets of the form X→Y with minimum support and confidence. For example, if-then rules about the content of the baskets {i1, i2,…,ik} → j means "if a basket contains all of i1,…,ik then it is likely to contain j." A typical question of Apriori is to "find all association rules with support ≥ S and confidence ≥ C." In

**Table 3.** Apriori methods.

general, support of an association rule is the frequency of occurrence of the set of items it mentions, and confidence of this association rule is the probability of j given i1,…,ik. It is the number of transactions with i1,…,ik containing item j. This will measure the strength of associations between i1, i2,…,ik, and j.

The key concepts are frequent item sets (the set of items which have minimum support, denoted by Li for ith-item set), a priori property (any subset of frequent item set must be frequent), and join operation (to find Lk, a set of candidate k-item sets are generated by joining Lk-1 with itself). Once frequent item sets are obtained, it is straightforward to generate association rules with confidence larger than or equal to a user-specified minimum support and minimum confidence. The next top quality of the Apriori algorithm is to implement its achievement of good performance by reducing the size of candidate sets that are considered and selected for frequent k-item set [28].

A class implementing an Apriori-type algorithm iteratively reduces the minimum support until it finds the required number of rules with the given minimum confidence [29].

For mining Weka (Waikato Environment for Knowledge Analysis), knowledge discovery tool using Java is used. In Weka 3.7.3, if class association rule (car) property is enabled, the class association is mined instead of (general) association rules. Class classification generates rules that are frequently happening to the probable occurrence of malaria cases. In many studies, associative classification has been found to be more accurate than some traditional classification methods, such as C4.5 [25]. Associative classification can search strong associations between frequent patterns (conjunctions of attribute-value pairs) and class labels. Because association rules explore highly confident associations among multiple attributes, this approach may overcome some constraints introduced by decision tree induction, which considers only one attribute at a time. In all, in association of rule mining, finding all the rules that satisfy both a minimum support and a minimum confidence threshold is important so as to generate strong and interesting rules from the frequent patterns.

#### **4. eHealth knowledge-based system**

of 14 attributes. The first inner part mainly presents the category of the attributes (i.e., profile, weather, address, and date), the second inner part presents the list of attributes constructed from category, and the last inner part indicates the value of each attribute. The region and zones contain all administrative regions in Ethiopia and the locations where the patients live. The date (year and month) indicates the year and the months that the patients visit the zonal health center. The age indicates the category of patients usually classified under 5 and greater or equal to 5. The type of malaria visits is classified into inpatient and outpatient malaria visits. The attribute type of case indicates the category of cases called in pregnancy, severe anemia, and uncomplicated lab-confirmed malaria. The type of malaria attributes contains PV, PF, and not-known values. In case of severe anemia and in pregnancy, the type of malaria is not known or determined in the dataset. A number of cases and deaths indicate the total number of cases and deaths, respectively. The attribute occurrence of deaths contains not probable (i.e., no deaths exist), probable (deaths exist), and undetermined values. The rainfall attribute contains a numeric value to represent average rainfall. In the case of outpatient visit, in pregnancy and uncomplicated lab-confirmed malaria, death is not known or listed in the dataset. The transformed temperature values 1, 2, 3, 4, 5, 6, 7, 8, and 9 represents 0–50c, 5–100c, 10–150c,16–200c, 21–250c, 25–300c, 31–350c, 35–400c, and >400c, respectively. The transformed altitude values T1,T2, T3, T4, T5, T6, T7, and T8 represent >3500 m, 2500–3500 m, 2000–2500 m,

1500–2000 m, 1000–1500 m, 500–1000 m, 0–500 m, and <zero values, respectively.

The Apriori algorithm (a well-known association rule discovery method) takes a dataset with a list of items that can be easily transformed into a transaction form by creating an item for each attribute value pair that exists in the dataset [25–27]. Minimum support and minimum confidence thresholds are also defined to enable Apriori algorithms identify frequent items that are strongly associated. **Table 3** presented the step-by-step procedure to mine and extract

Given a support threshold (S), sets of X items that appear in greater than or equal to S baskets are called frequent item sets. Find all rules on item sets of the form X→Y with minimum support and confidence. For example, if-then rules about the content of the baskets {i1, i2,…,ik} → j means "if a basket contains all of i1,…,ik then it is likely to contain j." A typical question of Apriori is to "find all association rules with support ≥ S and confidence ≥ C." In

**3. Apriori method**

164 eHealth - Making Health Care Smarter

**Table 3.** Apriori methods.

frequent items using Apriori methods.

Knowledge-based systems are computer programs that try to solve problems in a human expertlike fashion by using knowledge about the application domain (knowledge base) and problemsolving techniques (inference method). The rule-based reasoning technique can be used with other reasoning techniques in order to make a knowledge-based system more efficient. For example, case-based and rule-based reasoning can be used together. Rule-based system is an example of knowledge-based system that uses rules for knowledge representation and rule-based reasoning for reasoning techniques. The development of knowledge-based systems in medical areas has made it possible to provide reliable and thorough diagnostic services with a minimum cost. Such systems can be designed to exhaustively consider all possible diseases in a domain, which could outperform human experts to achieve a rapid and accurate diagnosis. Several systems have been implemented in different medical areas like cancer therapy, infections, blood diseases, general internal medicine, glaucoma, and pulmonary function tests [13, 14].

**Figure 2** presented the detail architecture of the proposed system. In this research we tried to integrate the output of data mining (i.e., the interrelationship of risk factors) with a knowledge-based system. Apriori algorithm using CRISP-DM methodology is adopted to create the interrelationship of risk factors and used for knowledge acquisition to develop a knowledge-based system. Nearest neighbor retrieval algorithms (for retrieval) and voting method (to reuse tasks) are used to design the eHealth knowledge-based system. The knowledge is represented using "IF a certain situation holds, THEN take a particular action," and the knowledge acquired (interrelationship of risk factors) from Apriori algorithm are rules. An

inference engine of the knowledge-based system can derive the conclusion by looking the possible scenario and recommendations. The goal of the case study is to provide a personalized knowledge-based system solution using the support of interrelationship of risk factors.

Exploring the Interrelationship of Risk Factors for Supporting eHealth Knowledge-Based System

http://dx.doi.org/10.5772/intechopen.75033

167

**Figure 3** presents the graphical user interface. The user tries to use the system or initiate queries by selecting profile and address information. Based on the desired location (using region and zone), weather information such as altitude, rainfall, and temperature are filled automatically from external weather API. Then, similarity matching is performed using the new queries to retrieve and recommend the proposed solution. However, if similarity matching is unsuccessful, voting technique is applied to select the relevant cases. Finally, to select or recommend the solution, the domain expert will evaluate and validate the new case. The

The study tried to explore the interrelationship of risk factors for supporting eHealth knowledge-based system. We have used malaria dataset as a case study to discover the association among the various malaria risk factors using associative rule discovery data mining, and we

The user interfaces provide a communication between the user and the system.

knowledge-based system will use the validated case for future purpose.

**5. Experimental results and discussions**

**Figure 3.** Graphical user interface.

integrate it with the eHealth knowledge-based system.

**Figure 2.** Adopted and proposed architecture for supporting eHealth KBS [19, 23, 24].

Exploring the Interrelationship of Risk Factors for Supporting eHealth Knowledge-Based System http://dx.doi.org/10.5772/intechopen.75033 167


**Figure 3.** Graphical user interface.

**Figure 2** presented the detail architecture of the proposed system. In this research we tried to integrate the output of data mining (i.e., the interrelationship of risk factors) with a knowledge-based system. Apriori algorithm using CRISP-DM methodology is adopted to create the interrelationship of risk factors and used for knowledge acquisition to develop a knowledge-based system. Nearest neighbor retrieval algorithms (for retrieval) and voting method (to reuse tasks) are used to design the eHealth knowledge-based system. The knowledge is represented using "IF a certain situation holds, THEN take a particular action," and the knowledge acquired (interrelationship of risk factors) from Apriori algorithm are rules. An

166 eHealth - Making Health Care Smarter

**Figure 2.** Adopted and proposed architecture for supporting eHealth KBS [19, 23, 24].

inference engine of the knowledge-based system can derive the conclusion by looking the possible scenario and recommendations. The goal of the case study is to provide a personalized knowledge-based system solution using the support of interrelationship of risk factors. The user interfaces provide a communication between the user and the system.

**Figure 3** presents the graphical user interface. The user tries to use the system or initiate queries by selecting profile and address information. Based on the desired location (using region and zone), weather information such as altitude, rainfall, and temperature are filled automatically from external weather API. Then, similarity matching is performed using the new queries to retrieve and recommend the proposed solution. However, if similarity matching is unsuccessful, voting technique is applied to select the relevant cases. Finally, to select or recommend the solution, the domain expert will evaluate and validate the new case. The knowledge-based system will use the validated case for future purpose.

#### **5. Experimental results and discussions**

The study tried to explore the interrelationship of risk factors for supporting eHealth knowledge-based system. We have used malaria dataset as a case study to discover the association among the various malaria risk factors using associative rule discovery data mining, and we integrate it with the eHealth knowledge-based system.

#### **5.1. Experimental setup**

A general and a class association rules are used to discover interesting association patterns. A total of 120 experiments executed using Apriori algorithm (60 experiments using general association rule and 60 experiments for class association rule mining) as depicted in **Table 4**. The confidence level is the most important parameter to attain the required objective. By considering this, the experiment is done at different confidence levels of 100, 90, 80, 70, 60, and 50%. Each confidence level is also experimented with a lower bound support of 10–100%. In both scenarios the min support of the upper bound is 100%.

**5.2. Generated association rules**

negative diagnostic test.

with the increment of malaria cases.

occurrence of death is undetermined.

**Table 5.** Summarized experimental rules.

**6. Discussions**

From the experiment, we observed that the class association mining supports the rules generated in general association mining. It also discovers interesting interrelationship (with 100% confidence level) related with the type of visit, age, altitude, temperature, and malaria type. With 100% confidence level and 60% support, outpatient cases are more closely related to the undetermined occurrence of death specifically when the age group of malaria patient is greater than 5. On the one hand, the result noted that occurrence of death is mostly related to outpatient case instead of the inpatient one. This shows that health workers offer great attention and intensive care for inpatient visits. On the other hand, most of outpatient visits are uncomplicated lab-confirmed malaria. However, the occurrence of death is undetermined and probable when the type of malaria visit is outpatient and the age of the patients is greater than 5. This may be because of lack of qualified health workers and the patients are not properly prescribed as confirmed by Ndiaye et al. [30] in Senegal that the lay health workers made

Exploring the Interrelationship of Risk Factors for Supporting eHealth Knowledge-Based System

http://dx.doi.org/10.5772/intechopen.75033

169

**Table 5** illustrates the summary of experimental results. It is difficult to determine the occurrence of death for outpatient cases, and the experimental result revealed that the occurrence of death is related with the increment of malaria case. Roca-Feltrer et al. [31] noted that the increment of malaria cases is related with the transmission intensity, seasonality, and age that lead to a probability of occurrence of deaths. For instance, the experimental result in west Gojjam (specifically in November) supports the probability of occurrence of deaths related

Knowing the seasonality of malaria helps to provide proper intervention to eradicate occurrence of death and cases [31]. Roca-Feltrer et al. [32] relate the transmission intensity, seasonality, and the age pattern of malaria and confirm that younger age groups are with increasing transmission intensity. Our experimental result also confirmed that occurrence of death is undetermined if the altitude is 1500–2000 m and when the age of the person is greater than 5 with a confidence level of 62 and 60%, respectively. This happens because of high transmission intensity. With 100% confidence level, if the type of malaria visit is outpatient and age is below 5, it is difficult to predict occurrence of deaths. Interestingly, occurrence of malaria death is related with severe anemia rather than pregnancy. As discussed by Knoblauch et al.

• If temperature is between 15 and 200°C, the type of malaria visits is outpatient, and the type of cases is uncompli-

• If the type of malaria is PV, then occurrence of death is undetermined, and if the type of malaria is PF, then

• If age is under 5 years and the type of malaria visits is outpatient, then occurrence of death is undetermined.

cated lab-confirmed malaria, then the occurrence of death is undetermined.


**Table 4.** Scenario and result of general association rule experiment.

#### **5.2. Generated association rules**

**5.1. Experimental setup**

168 eHealth - Making Health Care Smarter

**No. of experiments**

100% 5 60–100% None

90% 4 70–100% None

80% 4 70–100% None

70% 4 70–100% None

60% 4 70–100% None

Total exp. 60

50% 4 70–100% No rule generated

**Table 4.** Scenario and result of general association rule experiment.

**Confidence level**

A general and a class association rules are used to discover interesting association patterns. A total of 120 experiments executed using Apriori algorithm (60 experiments using general association rule and 60 experiments for class association rule mining) as depicted in **Table 4**. The confidence level is the most important parameter to attain the required objective. By considering this, the experiment is done at different confidence levels of 100, 90, 80, 70, 60, and 50%. Each confidence level is also experimented with a lower bound support of 10–100%. In

> **Experiment result No. of association/**

**interrelationship rules generated**

1 50% 7 10 50% 1 40% 7 12 40% 3 10–30% 10 11 35%

 60% 2 8 60% 50% 7 10 50% 40% 7 12 40% 10–30% 10 13 35%

1 60% 2 8 60% 5 10–50% 10 10 50%

1 60% 2 8 60% 5 10–50% 10 10 50%

1 60% 2 8 60% 5 10–50% 10 10 50%

1 60% 2 8 60% 5 10–50% 10 10 50%

**No. of cycles performed**

**Min support used to generate the rules**

both scenarios the min support of the upper bound is 100%.

**Min support lower bound**

From the experiment, we observed that the class association mining supports the rules generated in general association mining. It also discovers interesting interrelationship (with 100% confidence level) related with the type of visit, age, altitude, temperature, and malaria type. With 100% confidence level and 60% support, outpatient cases are more closely related to the undetermined occurrence of death specifically when the age group of malaria patient is greater than 5. On the one hand, the result noted that occurrence of death is mostly related to outpatient case instead of the inpatient one. This shows that health workers offer great attention and intensive care for inpatient visits. On the other hand, most of outpatient visits are uncomplicated lab-confirmed malaria. However, the occurrence of death is undetermined and probable when the type of malaria visit is outpatient and the age of the patients is greater than 5. This may be because of lack of qualified health workers and the patients are not properly prescribed as confirmed by Ndiaye et al. [30] in Senegal that the lay health workers made negative diagnostic test.

#### **6. Discussions**

**Table 5** illustrates the summary of experimental results. It is difficult to determine the occurrence of death for outpatient cases, and the experimental result revealed that the occurrence of death is related with the increment of malaria case. Roca-Feltrer et al. [31] noted that the increment of malaria cases is related with the transmission intensity, seasonality, and age that lead to a probability of occurrence of deaths. For instance, the experimental result in west Gojjam (specifically in November) supports the probability of occurrence of deaths related with the increment of malaria cases.

Knowing the seasonality of malaria helps to provide proper intervention to eradicate occurrence of death and cases [31]. Roca-Feltrer et al. [32] relate the transmission intensity, seasonality, and the age pattern of malaria and confirm that younger age groups are with increasing transmission intensity. Our experimental result also confirmed that occurrence of death is undetermined if the altitude is 1500–2000 m and when the age of the person is greater than 5 with a confidence level of 62 and 60%, respectively. This happens because of high transmission intensity. With 100% confidence level, if the type of malaria visit is outpatient and age is below 5, it is difficult to predict occurrence of deaths. Interestingly, occurrence of malaria death is related with severe anemia rather than pregnancy. As discussed by Knoblauch et al.

<sup>•</sup> If temperature is between 15 and 200°C, the type of malaria visits is outpatient, and the type of cases is uncomplicated lab-confirmed malaria, then the occurrence of death is undetermined.

<sup>•</sup> If the type of malaria is PV, then occurrence of death is undetermined, and if the type of malaria is PF, then occurrence of death is undetermined.

<sup>•</sup> If age is under 5 years and the type of malaria visits is outpatient, then occurrence of death is undetermined.

[33], anemia is prevalent in the 6- to 59-month-old children, and the association of anemia with a child age, underlying with iron requirements, is related to growth rate, and hence iron demand declines with age. Further, the algorithm associated (with 100% confidence level) with the type of malaria is unknown for inpatient malaria visits.

retrieved, whereas precision is the percentage of retrieved cases that are relevant to the query

Exploring the Interrelationship of Risk Factors for Supporting eHealth Knowledge-Based System

http://dx.doi.org/10.5772/intechopen.75033

171

The case similarity testing shows that when the query is made up of attribute values that have the same value with the case from the case base, the result of the global similarity becomes 1.0. But when there is a difference in the attribute values of the query and the case in the case base, the global similarity value decreases. Therefore, adding cases in the case base improves the performance of knowledge-based reasoning system in solving problems (new cases).

The nearest neighbor algorithm, which is used to develop the retrieval process of the prototype, uses distance to compute the similarity between the query and cases by representing the cases in N dimension vector. However, the recommendation doesn't have clear boundaries as it has subjectivity and depends on the experience of the domain experts as tested and adopted in [19, 23, 24]. In addition, the importance value that is assigned to the attributes of the case structure is done manually with the help of the domain experts, as there is no research that is conducted for the importance value of the attributes in malaria case management. This could affect the result of the retrieval and the reuse performance of the prototype. However, it needs user acceptance testing (using measuring usability with the system usability scale) in real-world scenarios to measure whether the potential users would like to use the proposed system frequently or not. So that, eHealth knowledge-based system for retrieving relevant cases and proposing solution will attain promising user acceptance, accuracy, and domain

The experimental result presents the association of risk factors (with relation to the malaria occurrence of death and type of case identification in Ethiopia) using climate, elevation, location, type of malaria, type of malaria visits, number of cases, and death attributes. Both general and class association minings are done using Apriori techniques for discovering the association or patterns of risk factors. The results noted the existence of strong association between occurrence of deaths, type of malaria visits, age, and type of cases. More interestingly, it discovers occurrence of malaria deaths, which are mostly related with severe anemia cases rather than pregnancy. It is also important to precede usability and user acceptance testing of eHealth knowledge-based system in real time and perform testing to compare and contrast with domain experts. So, health institutions have to give great attention to provide the necessary diagnosis and treatment for anemia, especially in regions that are more vulnerable for malaria. It also provides a significant contribution to design an optimal strategy in

First and foremost I would like to thank God and the Holy Mother. Then, I would like to express special thanks for Worku Birhane who provided me proofreading and valuable comments.

support of malaria prevention and control program within the country.

expert evaluation.

**Acknowledgements**

**7. Conclusion and future work**

[34, 36, 37]. Accuracy is used to measure the performance of the reuse process [34, 36].

However, some unexpected or interesting interrelationship is prevailed such as with 100% confident level, and it is difficult to determine the occurrence of death for both PV and PF malaria types. This needs further investigation to verify whether it is unrelated or expected.

In all, the study presents the association of malaria risk factors using climate, elevation, location, type of malaria, type of malaria visits, number of cases, and death attributes. Both general and class association minings are done using Apriori techniques for discovering the association or patterns between the occurrence of deaths with the type of cases and malaria visits. And then, integrate the output of data mining (i.e., the interrelationship of risk factors) with knowledge-based reasoning. Nearest neighbor retrieval algorithms (for retrieval) and voting method (to reuse tasks) are used to design and deliver personalized knowledge-based system.

#### **6.1. Evaluations**

The evaluation of the result is executed by combining both an expert and testing tool approaches. An overall measure of pattern values, combining novelty, usefulness, and simplicity, to achieve a predefined goal is evaluated to measure the interestingness of the interrelationship. We have used different multitudes of measurement in the evaluation such as accuracy, support level, confidence, confidence level, and complexity with a 10fold cross validation. We adopted the four measures such as sensitivity, specificity, prediction accuracy, and precision to evaluate the correctness of interrelationship and validate the system through performance testing. Tenfold cross validation is used in the experiment to predict the error rate [34, 35]. The basic measure is accuracy, which computes the percentage of correctly classified instances in the test set. The accuracy of a test compares how close a new test value is to a value predicted by if-then rules [36].

The interrelationship of risk factors (association rules) was evaluated in terms of the number of rules and meaning of patterns generated at different minimum support and confidence thresholds for measuring interestingness of the rules. Association (interrelationship) was analyzed in terms of different criteria. The criteria include the number of rules generated at different minimum support and confidence thresholds. The minimum support and confidence thresholds varied from 0.1 to 1 and 0.5 to 1, respectively. As depicted in **Table 4**, at 90% confidence level with min support of 60 and 20%, the techniques generate 2 and 10 rules, respectively. Furthermore, we investigate the following indicators of the quality of the rule ranking induced by the interestingness measures of the mining algorithm in the average rank of the first rule that covers a test instance and the average rank of the first rule that covers and correctly predicts a test instance.

The performance of eHealth knowledge-based system prototype is evaluated using test case. Thus, the effectiveness of the retrieval process of eHealth knowledge-based system reasoning is measured by using recall and precision. Precision and recall are useful measures of retrieval performance [34]. Recall is the percentage of relevant cases for the query (new case) that are retrieved, whereas precision is the percentage of retrieved cases that are relevant to the query [34, 36, 37]. Accuracy is used to measure the performance of the reuse process [34, 36].

The case similarity testing shows that when the query is made up of attribute values that have the same value with the case from the case base, the result of the global similarity becomes 1.0. But when there is a difference in the attribute values of the query and the case in the case base, the global similarity value decreases. Therefore, adding cases in the case base improves the performance of knowledge-based reasoning system in solving problems (new cases).

The nearest neighbor algorithm, which is used to develop the retrieval process of the prototype, uses distance to compute the similarity between the query and cases by representing the cases in N dimension vector. However, the recommendation doesn't have clear boundaries as it has subjectivity and depends on the experience of the domain experts as tested and adopted in [19, 23, 24]. In addition, the importance value that is assigned to the attributes of the case structure is done manually with the help of the domain experts, as there is no research that is conducted for the importance value of the attributes in malaria case management. This could affect the result of the retrieval and the reuse performance of the prototype. However, it needs user acceptance testing (using measuring usability with the system usability scale) in real-world scenarios to measure whether the potential users would like to use the proposed system frequently or not. So that, eHealth knowledge-based system for retrieving relevant cases and proposing solution will attain promising user acceptance, accuracy, and domain expert evaluation.

#### **7. Conclusion and future work**

[33], anemia is prevalent in the 6- to 59-month-old children, and the association of anemia with a child age, underlying with iron requirements, is related to growth rate, and hence iron demand declines with age. Further, the algorithm associated (with 100% confidence level)

However, some unexpected or interesting interrelationship is prevailed such as with 100% confident level, and it is difficult to determine the occurrence of death for both PV and PF malaria types. This needs further investigation to verify whether it is unrelated or expected. In all, the study presents the association of malaria risk factors using climate, elevation, location, type of malaria, type of malaria visits, number of cases, and death attributes. Both general and class association minings are done using Apriori techniques for discovering the association or patterns between the occurrence of deaths with the type of cases and malaria visits. And then, integrate the output of data mining (i.e., the interrelationship of risk factors) with knowledge-based reasoning. Nearest neighbor retrieval algorithms (for retrieval) and voting method (to reuse tasks) are used to design and deliver personalized knowledge-based system.

The evaluation of the result is executed by combining both an expert and testing tool approaches. An overall measure of pattern values, combining novelty, usefulness, and simplicity, to achieve a predefined goal is evaluated to measure the interestingness of the interrelationship. We have used different multitudes of measurement in the evaluation such as accuracy, support level, confidence, confidence level, and complexity with a 10fold cross validation. We adopted the four measures such as sensitivity, specificity, prediction accuracy, and precision to evaluate the correctness of interrelationship and validate the system through performance testing. Tenfold cross validation is used in the experiment to predict the error rate [34, 35]. The basic measure is accuracy, which computes the percentage of correctly classified instances in the test set. The accuracy of a test compares how close a new test value is to

The interrelationship of risk factors (association rules) was evaluated in terms of the number of rules and meaning of patterns generated at different minimum support and confidence thresholds for measuring interestingness of the rules. Association (interrelationship) was analyzed in terms of different criteria. The criteria include the number of rules generated at different minimum support and confidence thresholds. The minimum support and confidence thresholds varied from 0.1 to 1 and 0.5 to 1, respectively. As depicted in **Table 4**, at 90% confidence level with min support of 60 and 20%, the techniques generate 2 and 10 rules, respectively. Furthermore, we investigate the following indicators of the quality of the rule ranking induced by the interestingness measures of the mining algorithm in the average rank of the first rule that covers a test instance and the average rank of the first rule that covers and

The performance of eHealth knowledge-based system prototype is evaluated using test case. Thus, the effectiveness of the retrieval process of eHealth knowledge-based system reasoning is measured by using recall and precision. Precision and recall are useful measures of retrieval performance [34]. Recall is the percentage of relevant cases for the query (new case) that are

with the type of malaria is unknown for inpatient malaria visits.

**6.1. Evaluations**

170 eHealth - Making Health Care Smarter

a value predicted by if-then rules [36].

correctly predicts a test instance.

The experimental result presents the association of risk factors (with relation to the malaria occurrence of death and type of case identification in Ethiopia) using climate, elevation, location, type of malaria, type of malaria visits, number of cases, and death attributes. Both general and class association minings are done using Apriori techniques for discovering the association or patterns of risk factors. The results noted the existence of strong association between occurrence of deaths, type of malaria visits, age, and type of cases. More interestingly, it discovers occurrence of malaria deaths, which are mostly related with severe anemia cases rather than pregnancy. It is also important to precede usability and user acceptance testing of eHealth knowledge-based system in real time and perform testing to compare and contrast with domain experts. So, health institutions have to give great attention to provide the necessary diagnosis and treatment for anemia, especially in regions that are more vulnerable for malaria. It also provides a significant contribution to design an optimal strategy in support of malaria prevention and control program within the country.

## **Acknowledgements**

First and foremost I would like to thank God and the Holy Mother. Then, I would like to express special thanks for Worku Birhane who provided me proofreading and valuable comments.

#### **Author details**

Geletaw Sahle Tegenaw

Address all correspondence to: gelapril1985@gmail.com

Faculty of Computing, Jimma Institute of Technology, Jimma University (JU), Ethiopia

[14] Pandey J, Bajpai D. Developmental design of a rule based expert system for diagnosis. In: Proceedings of the First Regional Conference, IEEE Engineering in Medicine and Biology Society and 14th Conference of the Biomedical Engineering Society of India. New Delhi:

Exploring the Interrelationship of Risk Factors for Supporting eHealth Knowledge-Based System

http://dx.doi.org/10.5772/intechopen.75033

173

[15] Pohle C. Integrating and updating domain knowledge with knowledge discovery. In: 6th International Conference for Business Informatics 2003 (WI-2003); Dresden, Germany;

[16] Nasiri S, Fathi M. Toward an integrated e-health based on acquired healthcare knowledge. In: 2014 Middle East Conference on Biomedical Engineering (MECBME); 2014

[17] Seera M, Lim CP. A hybrid intelligent system for medical data classification. Expert

[18] Sedighian Z, Javanmard M. The effect of data mining on expert systems used for improving efficiency of correct speech E-learning systems. Advances in Natural and Applied

[19] Mulugeta T, Beshah T. Integrating Data Mining Results with the Knowledge Based System for Diagnosis and Treatment of Visceral Leishmaniasis. 2015. Available at www.

[20] Kerdprasop K, Kerdprasop N. Bridging data mining model to the automated knowledge base of biomedical informatics. International Journal of Bio-Science and Bio-Technology.

[21] Martin B. Instance-based learning: Nearest neighbour with generalisation [MSc thesis]. University of Waikato; 1995. Available at: http://www.cs.waikato.ac.nz/pubs/wp/1995/

[22] Mishra D, Sahu B. Feature selection for cancer classification: A signal-to-noise ratio approach. International Journal of Scientific & Engineering Research. Hamilton, New

[23] Salem AM. Case-based reasoning technology for medical diagnosis. World Academy of

[24] Bekele H. A case based reasoning knowledge based system for hypertension manage-

[25] Han J, Micheline K. Data Mining: Concepts and Techniques. Waltham, USA: Morgan

[26] Larose DT. Discovering Knowledge in Data: An Introduction to Data Mining. New

[27] Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Series in Data Management Systems. 2nd ed. Physical Description xxxi. San Francisco: Morgan Kaufman Publishers; 2005. 525 p. ill. https://trove.nla.gov.

An International Meet; 1995. pp. 1/41-1/42. DOI: 10.1109/RCEMBS.1995.508680

September 15-17; 2003

Sciences. 2014;**8**(10):102-106

ijarcsse.com

2012;**4**(1):13

Systems with Applications. 2014;**41**:2239-2249

uow-cs-wp-1995-18.pdf [Accessed: March 27, 2011]

Science, Engineering and Technology. 2007;**31**:9-13

Kufman Publishers is an imprint of Elsevier; 2006

Jersey, Canada: John Wiley & Sons; 2005

au/version/46617902

ment [MSc thesis]. Ethiopia: Addis Ababa University; 2011

Zealand: University of Waikato. 2011;**2**(4):1-7

#### **References**


[14] Pandey J, Bajpai D. Developmental design of a rule based expert system for diagnosis. In: Proceedings of the First Regional Conference, IEEE Engineering in Medicine and Biology Society and 14th Conference of the Biomedical Engineering Society of India. New Delhi: An International Meet; 1995. pp. 1/41-1/42. DOI: 10.1109/RCEMBS.1995.508680

**Author details**

**References**

Geletaw Sahle Tegenaw

172 eHealth - Making Health Care Smarter

Address all correspondence to: gelapril1985@gmail.com

models and the rise of 3G. 2010;(5)

malaria [Accessed: June 2014]

2014;**8**(1):e7. www.eJHI.net

932. DOI: 10.1145/4284.4286

2008;**32**:235-244

toc.htm

Faculty of Computing, Jimma Institute of Technology, Jimma University (JU), Ethiopia

[1] ESTA, OSHD. Health in Africa Over the Next 50 years. 2013. www.afdb.org

Ethiopia: Health Extension and Education Center; 2007

[5] http://www.moh.gov.et/factsheets {Accessed: August 2017]

Based System: Model, Application & Research. 2010;**1**:1-11

Ababa, Ethiopia: Federal Ministry of Health; 2005

Indicator Survey 2007 Technical Summary; 2008

[2] AfricaNext Investment Research. The future of African broadband: Economics, business

[3] Federal Ministry of Health. Health Extension Program in Ethiopia. Addis Ababa,

[4] Family Health Department. National Strategy for Child Survival in Ethiopia. Addis

[6] Federal Democratic Republic of Ethiopia Ministry of Health. Ethiopia National Malaria

[7] Ministry of Health. Malaria Prevention Control Program. 2014. http://www.moh.gov.et/

[8] Sahle G, Meshesha M. Uncovering knowledge that supports malaria prevention and control intervention program in Ethiopia. Electronic Journal of Health Informatics.

[9] Sajja PS, Akerkar R. Knowledge-based systems for development. Advanced Knowledge

[10] Tan C. A prototype of knowledge-based system for fault diagnosis in automatic wire bonding machine. Turkish Journal of Engineering and Environmental Sciences.

[11] Hayes-Roth F. Rule-based systems. Communications of ACM. September 1985;**28**(9):921-

[12] Engelmore RS, Feigenbaum E. Knowledge-Based Systems in Japan.Japanese Technology Evaluation Center. Maryland: Loyola College; 1993. http://www.wtec.org/loyola/kb/

[13] Abdelhamied K, Hafez S, Abdalla W, Hiekal H, Adel A. A rule-based expert system for rapid problem solving in crowded outpatient clinics in Egypt. IEEE. 1988;**3**:1419-1420


[28] Wu X, Kumar V, Ross Quinlan J, et al. Knowledge and Information Systems. Springer-Verlag. 2008;**14**:1. https://doi.org/10.1007/s10115-007-0114-2. Print ISSN:0219-1377. Online

[29] The University of Waikato, WEKA Manual (Waikato Environment for Knowledge Analysis) for Version 3-7-4, This manual is licensed under the GNU General Public

[30] Ndiaye Y, Ndiaye JLA, Cisse B, Blanas D, Bassene J, Manga IA, Ndiath M, Faye SL, Bocoum M, Ndiaye M, Thior PM, Sene D, Milligan P, Gaye O, Schellenberg D. Community case management in malaria: Review and perspectives after four years of operational experi-

[31] Roca-Feltrer A, Armstrong Schellenberg J, Smith L, Carneiro I. A simple method for

[32] Roca-Feltrer A, Carneiro I, Smith L, Schellenberg JRMA, Greenwood B, Schellenberg D. The age patterns of severe malaria syndromes in sub-Saharan Africa across a range of

[33] Knoblauch AM, Winkler MS, Archer C, Divall MJ, Owuor M, Yapo RM, Yao PA, Utzinger J. The epidemiology of malaria and anaemia in the Bonikro mining area, central Côte

[34] McSherry D. Precision and recall in interactive case-based reasoning. In case-based reasoning research and development (ICCBR). Lecture Notes in Artificial Intelligence.

[35] Kohavi R. A study of cross validation and bootstrap for accuracy estimation and model selection. In: The International Joint Conference on Artificial Intelligence; 1995

[36] Junker M, Hoch R, Dengel A. On the evaluation of document analysis components by recall, precision, and accuracy. In: Proceedings of the Fifth International Conference on

[37] Losee RM. When information retrieval measures agree about the relative quality of document rankings. Journal of the American Society for Information Science. 2000;**51**:

transmission intensities and seasonality settings. Malaria Journal. 2010;**9**:282

License version 2. Available at http://www.gnu.org/copyleft/gpl.html

ence in Saraya district, south-East Senegal. Malaria Journal. 2013;**12**:240

defining malaria seasonality. Malaria Journal. 2009;**8**:276

Document Analysis and Recognition; 1999. pp. 713-716

d'Ivoire. Malaria Journal. 2014;**13**:194

2001;**2080**:392-406

834-840

ISSN: 0219-3116

174 eHealth - Making Health Care Smarter

*Edited by Thomas F. Heston*

eHealth has revolutionized health care and the practice of medicine. Internet technologies have given the most rural communities access to healthcare services, and automated computer algorithms are improving medical diagnoses and speeding up the delivery of care. Handheld apps, wearable devices, and artificial intelligence lead the way, creating a global healthcare solution that is smarter and more accessible. Read what leaders in the field are doing to advance the use of electronic technology to improve global health.

Published in London, UK © 2018 IntechOpen © ClaudioVentrella / iStock

eHealth - Making Health Care Smarter

eHealth

Making Health Care Smarter

*Edited by Thomas F. Heston*