Data Governance and Applications

#### **Chapter 5**

## Ethical Considerations for Health Research Data Governance

*Mantombi Maseme*

#### **Abstract**

Research involving humans often generates considerable data irrespective of the context in which the research is being conducted. This data must be protected from unauthorized access, use, and sharing as a means of safe-guarding research participants' rights. Notwithstanding the fact that several jurisdictions globally have promulgated laws and regulations aimed at protecting individual citizens' personal information, violation of privacy and related rights occurs in some instances. This could partly relate to a general lack of health research sector specific data governance policies and laws, which include data transfer agreements prevalent in most countries. The chapter therefore aims to cover the ethical aspects of health research data access, use, and sharing as a means of enabling health research institutions and policymakers to develop robust data governance structures and procedures. The scope of the chapter covers health research data generated in empirical research as well as that which is produced within a medical laboratory research context, i.e., human sample associated data.

**Keywords:** data access, data use, data sharing, data governance, privacy, confidentiality

#### **1. Introduction**

Data governance is defined as "all processes related to the collection, storage, processing, curation, use, and deletion of data" [1]. Data governance entails not only the development and rules for data quality management but also specifying the responsibility for making decisions related to data handling as well as the duties related to such decisions [2]. Data governance also assures compliance with the laws governing data [3]. Notably, there is a presumption that data governance is a universal approach with a one size that fits all organizations alike. Weber et al. argue that this should not be the case [3]. Accordingly, data governance within the health research context is considered in this chapter. In the context of this chapter, health research data governance refers to the development of structures and processes for the access, use, and sharing of health research data. The question of why data governance matters in the context of health research is that it is mainly for the purpose of safe-guarding the individual data subject's rights by the data custodians. Infamous cases of unauthorized health research data access, use, and sharing have been well documented [4, 5]. This is despite the existence of regulations aimed at protecting individual citizens' personal information in certain countries. To demonstrate the issue of health research data use

that is not in line with consent granted, health research data misuse is discussed in this chapter. A key consideration for using personal information in medical research is to seek informed consent [6]. Accordingly, health research data and consent will be discussed in this chapter.

#### **2. Governance of health research data**

According to the WMA Declaration of Taipei, governance of health databases should be based on the principles of [7]: (1) protection of individual rights' over the interests of other stakeholders of science; (2) transparency in making any relevant information available to the public; (3) participation and inclusion of individuals and communities by health database custodians through consultations and engagement as well as (4) accountability in that custodians of health databases should be accessible to all stakeholders. Correspondingly, each of these principles for data governance is discussed in turn in subsequent sections.

The individual (human participant) rights to data (interchangeably health research data herein) access that should be respected include the right to privacy and confidentiality, notably, anonymization or confidentiality measures for ensuring confidentiality should be considered by the data custodians [8]. Other applicable rights that ought to be observed with respect to data access, use, and sharing include the right to autonomy and the right to dignity [7]. Human rights should be protected by the rule of law [9] and as such human rights are inalienable (should not be taken away) except in specific circumstances, e.g., restriction of liberty if a person is found guilty of committing a crime by a court of law [10]. The CIOMS ethical guidelines also recommend governance systems that uphold the principle of accountability while maintaining good stewardship for samples and their associated data [11].

Some of the elements for governance of health data (including health research data) include, *inter alia*: arrangements for duration of data storage; nature and purpose of data collection; arrangements for disposal and destruction of data; arrangement for dealing with the data in the event of change of ownership, obtaining consent, protecting the rights of data subjects (interchangeably research participants herein); criteria and procedures for data access and sharing as well as measures to prevent unauthorized access or inappropriate sharing as well as; responsibility for data governance [7].

#### **2.1 Protection of data subjects' rights**

As already mentioned, the rights that must be protected in relation to access, use, and sharing of personal information are the rights to: privacy and confidentiality; autonomy and dignity.

Confidential information is that which is sensitive, needs protection, and can only be disclosed in a trusting relationship [12]. A recommendation for maintaining confidentiality of health research data is through anonymization or coding in order to protect the participant (data subject) from harm or stigmatization [11]. Moreover, confidentiality is a significant standard of professionalism [12]. Privacy on the other hand is difficult to define, with no universally accepted definition and has been subject to extensive debate by philosophers and legal scholars [13]. In modern society, however, the term addresses the question of who has access to personal information and the conditions of such access [13]. Primary justification for protecting privacy of

persons is to protect their interests [13]. Based on this notion, it follows then that ethical justification for protecting privacy of persons aligns with the notion of respecting participant's autonomy (self-determination) by virtue of the objective of privacy being to protect individual interests.

Professional codes of practice for healthcare professionals sometimes appeal to norms of respecting autonomy and privacy as well as not harming others (non-maleficence) although these norms are not explicitly expressed in such codes [14]. The rights to autonomy, privacy, and confidentiality call for not only protection of bodily integrity, but also that the scope of decision-making should be free from interference by others [15]. It can therefore be inferred that access and use of health research data that are outside the scope of the data subject's decision-making for participation (consent) violate these rights. Some of the factors that promote noncompliance of the principles of ethics in health research include lack of ethical supervision; paternalism caused by trust in the researcher, resulting in a subtle loss of participant autonomy; informed consent documents that do not adequately address all aspects of research participation, in particular the potential risks involved, thereby threatening autonomy and engendering maleficence if there are any risks involved; lack of legislation to protect participants and pressure for researchers to increase research output at the expense of ethical principles [16].

#### **2.2 Transparency in data governance**

Smits and Champagne define transparency as the disclosure of procedure and results [17]. Mahsa and Mojisola consider transparency to be "the development and public availability of data sharing and access policies" [18]. Notably, there is no single conception of transparency, but rather it has multiple definitions, purposes, and applications with users of transparency including self-governing citizens, governments, and private firms [19]. Transparent governance for health data requires clear information for data circulation, data-sharing agreements, research objectives, and findings to be made available to the public [20]. In the context of biobank governance, transparency enables donors to better understand biobank governance and thereby make better, more informed decisions about sample and associated data donation as well as research participation [21]. Lack of transparency has the potential to undermine public trust in initiatives that involve large datasets such as biobanks [21].

A notable limitation of making information available to the public is that of web based transparency particularly for communities that lack web access or information technology infrastructure [21]. If data collection or use raises specific ethical questions, e.g., with regard to consent and transparency as well as privacy and data subjects' rights and expectations, an explanation of how the ethical concerns will be mitigated is requisite in the operational plan for data collection and processing [22]. Community engagement has been identified as an important element of ethical research data sharing by research stakeholders that include researchers and health providers, community representatives, assistant chiefs, and field workers [23].

#### **2.3 Community and individual engagement in the context of health research data governance**

Community (public) engagement is perceived as a means of cultivating public trust and cooperation in research activities [24]. Ethical justification for community engagement is that it improves the consent process, identifies ethical issues and develops processes for resolving ethical issues when they arise [25]. Jao et al. identified a number of community engagement goals in relation to health research data access, and these include: (1) creating an awareness of data sharing activities with information on how any benefits and risks would be managed; (2) giving feedback to the community or representatives on the data sharing process; (3) ad hoc community consultation in relation to specific data sharing requests [23]. Evans et al. propose Community Advisory Boards (CABs) as platforms to engage the affected populations on how they would want their data to be collected, stored, and shared [26]. The Mayo Clinic Biobank CABs provide a way of incorporating interested community groups in the governance of large-scale bio-resources [27].

Engagement of research participant groups as well is important because: participants are in a better position to speak to the risks associated with the use of their data as well as having an interest in the use of their data, which the public might not necessarily have [27]. Moreover, engaging participants in data management decisions has been cited as strengthening transparency and accountability [28].

#### **2.4 Accountability for health research data**

The WMA defines accountability as that which "requires being prepared to provide an explanation for something one has done or has not done" [7]. Accountability represents a moral obligation to answer and the practical ability to convey that answer [29]. Moreover, accountability serves to establish responsibilities [29]. In the context of health research data governance, it is the responsibilities of the data custodians that should be established. This chapter refers to data custodianship rather than ownership because firstly, claiming ownership rights of data is a misconception in that proprietary rights over data do not exist in the international intellectual property (IP) system [30]. Secondly, because trends on claims of data ownership are based on "flawed models and on implausible arguments" [30]. Data custodians are also referred to as stewards [31]. Accordingly, data steward(ship) and custodian(ship) will be used interchangeably in this chapter. The concept of a data steward is intended to denote a level of fiduciary (trust) responsibility toward the data [32]. Moreover, responsibilities for data stewardship are conceptualized and fulfilled by the process of governance [32]. Data stewardship entails the existence of mechanisms for the responsible acquisition, storage, safe-guarding, and use of data [32]. The concept of custodians should also ensure the existence of systems that ensure privacy of individuals at every stage [33] as well as ensuring an adequate level of confidentiality of such data in order to preserve the data as much as possible for the researchers [8].

Empirical research conducted in Australian data custodians shows that they perceive their role to be more of protecting data subjects' privacy than other vulnerabilities [31]. Data custodians also have the responsibility of ensuring that data sharing complies with legal and policy requirements prior authorizing the release of data on behalf of institutions [31].

#### **3. Ethical considerations and issues in health research data access, sharing, and use**

There is a wide recognition that sharing of data generated from research involving humans raises ethical and governance issues [34]. Some of the issues (risks) of

#### *Ethical Considerations for Health Research Data Governance DOI: http://dx.doi.org/10.5772/intechopen.106940*

data access and sharing include: (1) confidentiality and privacy breaches as well as the need to manage these two aspects and (2) violation of expectations of data reuse [35]. These are ethical challenges because of their potential to violate human dignity and autonomy as well as pose a risk of discrimination [35]. Other ethical considerations for data sharing include: valid consent particularly when future uses of data are unclear; the potential impact of such sharing on public trust and implication for future research in terms of inappropriate data use, e.g., publication of data in discriminatory ways and issues related to decisions on data access [36]. According to interviewees in a study on health research data ethical practices conducted in South Africa (SA), researchers have an ethical duty to provide accurate data as a means of nurturing professional integrity through transparent practice, coupled with avoidance of unauthorized future research use [37]. Such stakeholder views should be addressed by policies to ensure ethical data sharing nationally and internationally [37]. The noted ethical issues pertaining to health research data access, sharing, and use are discussed in turn in the subsequent sections.

#### **3.1 Health research data breaches**

Sharing of research data requires adequate safeguards for the protection of participants' rights and should also be fully consistent with the terms of consent granted [38]. Unauthorized users are able to access databases due to vulnerabilities in software, human error, and security failures resulting in sensitive data being exposed leading to confidentiality breaches [39]. Lord et al. argue that using anonymized health research data need not be regarded as a confidentiality breach claiming that informed consent is unnecessary and often impractical [40]. The basis for this claim is unclear but seems to be based on the notion that anonymized data use benefits outweigh any confidentiality issues. Anonymization is the process of irreversibly removing from a dataset those variables that can identify an individual [41]. In the context of this chapter, health research data misuse refers to access, sharing, and use of such data that is not in line with consent granted. There is paucity of literature on misuse of health research data; however, a case in point is that of an alleged report of African sample associated data that was transferred from SA to the UK to develop gene chips [5]. Neither SA researchers nor research participants were aware of such purported commercialization of African health research data [5]. If proven to be true, such allegations demonstrate a violation of research participants' autonomy and dignity. When data are not anonymized, the risk of malevolent exploitation seems to be significantly increased [42].

Iceland Health Sector Database (HSD) legislation and the visibility of its processes have exposed the innovation of genomics to a public debate resulting in exposure of ethical issues of commodification of bioinformatics (the fusion of biotechnology and informatics) and human tissue to the international cultural and political agenda [43]. Nicolson argues that data reuse by healthcare professionals and researchers commodifies people's medical records and reduces such data to a commodity that can be bought and sold based on the reasoning that data reuse may reinforce social inequalities [44]. This argument does not hold, particularly when data reuse is in line with ethico-legal requirements.

Public forum comprising of respondents from professions in legal, ethics, medicine, medical, and social scientists, government professional, security, digital health, and bioinformatics in a study by Staunton et al. in 2019 in SA had a general awareness of the need for protection of personal health information [45].

#### **3.2 Health research data reuse and consent**

It is ethically mandatory for the data subject's rights to be protected [46]. Meystre et al. propose principles for ethical data reuse, and these include principles of: information (privacy and disposition—right to privacy and control the use of one's data); openness (appropriate and timely data disclosure); security (data protection through appropriate measures); least intrusive alternative (any violation of privacy or individual's right or control his/her data may occur in a least intrusive manner with minimal interference of the person's rights and accountability (infringement of rights and control of an individual's data must be justified to the affected individual in a timely and appropriate manner) [46]. A significant number of research participants across empirical research studies prefer to be contacted and re-consented for the reuse of their data [47]. The majority of Quebec citizens in a study by Cumyn et al. expressed support for the reuse of health data provided that individuals are informed about such use and consent is sought [48]. Reusing health data without informed consent contravenes patients' expectations resulting in violation of the patients' perceived ownership rights [49]. Moreover, autonomy is a fundamental human right, which may be limited during public health emergencies, provided that such an interference is deemed necessary [49]. Key issues in the discussion about limits for the use of personal data in medical research relate to the scope and limitations of consent as a legal basis for such use [50]. Moreover, one of the principles for processing of personal data in the European Union (EU) regulatory framework is lawfulness, which mandates consent or another legitimate basis (laid down by the law) as requirements for such processing [50].

As already mentioned, another ethical concern is when future uses of data are unclear; accordingly, the potential impact of such sharing on public trust follows in the next section.

#### **3.3 The potential impact of unclear purpose of health data sharing on public trust**

There is a long-standing doctor-patient trust relationship through which the doctor (or researcher in this context) is bound by professional integrity to act in the best interests of their patients (or research participants in this context) [51]. Kerasidou submits that trust is important in biomedical research and that professional integrity can promote trust in research [52]. The presence of legally binding ethico-regulatory frameworks aimed at protecting the dignity of research participants enables the development of researcher-participant trust [53]. Trust is an essential element of building and maintaining mutual respect, particularly in relationships where there is an imbalance of power [54]. Kraft et al. have identified factors that influence research participants' trust, and these include: (1) participants' varying benefits expectations, (2) historical discrimination in research, (3) participants' fear that their data might be used inappropriately [55]. These factors will be explored in detail in subsequent sections. The latter factor aligns with health data breaches discussed in Section 3.1 and will therefore not be discussed further in this section. Trust issues in medical research are also caused by exploitation of vulnerable populations, different regulatory frameworks, particularly in research collaborations as well as lack of robust operational management particularly of biobanks as cited by SA researchers in a study conducted by Moodley et al. [56].

Trust relationship between researchers and participants is built when researchers share information, reciprocity based on integrity and equality in replacing

#### *Ethical Considerations for Health Research Data Governance DOI: http://dx.doi.org/10.5772/intechopen.106940*

vulnerability and dependence [57]. It is not only prospective research information that can be shared with participants but also research findings through community (public) engagement as a means of building trust [58]. Public engagement has been identified as a key mechanism for building trust [59]. Another requirement for building the healthcare professional-patient trust relationship is confidentiality [60]. Therefore, trust imposes a duty of maintaining confidentiality on healthcare professionals.

#### *3.3.1 Research participants' varying benefits expectations and trust*

Molyneux et al. distinguish between direct benefits for research participants (e.g., diagnostic tests, distribution of medication, evaluation services) and indirect (collateral) benefits for those that are not specifically targeted at research participants but might include research participants as well (e.g., provision of healthcare services to family or community members) in a fair benefits framework [61]. A study conducted in Kenyan views and experiences on research participants' benefits and payments showed that inconsistencies in research benefits such as varying transport fares on different occasions for the same study has the potential for introducing participant mistrust [61]. Biobank research participants in a qualitative study in Australia declared institutional trust in that they did not necessarily trust the individual researchers but rather the research institution based on a perception that the institution carrying out the research was reputable [62]. Some researchers, particularly in the social and behavioral sciences, believe that research participant deception that does not involve any harm is justified, while those who oppose this view hold that such deception violates and takes advantage of participants' trust in scientists [63].

Benefit sharing in research, particularly in genetic data banking, is recommended not only from an ethical obligation point of view but also as a potential solution to resolving the issue of loss of public trust in a sense that benefit sharing is recognition for participants' contribution to the research endeavor [64]. Nicol and Critchley note that based on the notion of reprocity, biobanks, which reward contribution, through benefit sharing, should promote trust, which in turn leads to public participation in biomedical research [65]. Johnson et al., however, are of the opinion that patients are expected to participate in research without benefitting personally because research is a requirement for good-quality healthcare [66].

#### *3.3.2 Historical discrimination in research*

Literature on historical discrimination in research involving humans is dominated by the infamous Tuskegee study [67–69]. Briefly, the study that commenced in 1932 in the USA involving a total of 600 black men of which 399 had syphilis and 201 did not have the disease involved a number of ethical transgressions, which were inflicted on the research participants [70]. The ethical violations included participant consent not being sought as well as participants not being adequately informed in that they were misled on the purpose of sample collection under the guise that they were being treated for "bad blood" in exchange for free medical examinations, meals, and burial insurance. Another ethical transgression was the maleficence inflicted on the participants in that even when penicillin became widely available as a treatment of choice in 1943, the participants were not offered treatment [70]. By the time the study was terminated in 1972, after having being leaked by the press, out of the 399 participants who had syphilis, 28 had died, another 100 from syphilis-related complications, 40

patients' wives contracted the disease, and 19 children were infected at birth [71]. Lack of trust regarding the healthcare system and health researchers as cited by research participants, particularly African Americans, has historical roots, with the Tuskegee syphilis study having been cited either explicitly or implicitly and its impact continues throughout the generations [67]. Perceived discrimination contributes to higher societal distrust of African Americans in the healthcare system compared with their white counterparts [68]. Such historically nuanced concerns should be addressed by institutional review boards (IRBs) even though the process may be frustrating because such assessments are imprecise by nature [72]. An interesting finding by Brandon et al. revealed that black race, not necessarily knowledge about the Tuskegee study, was a predictor of medical care mistrust, and it is believed that African American mistrust arises from a general mistrust of societal institutions with the Tuskegee study being confirmation of what is speculated or already known about African American treatment in medical systems [69].

The Nuremberg trials involved specific crimes that took place during World War II in which German physicians conducted a series of more than 12 medical experiments in concentration camp inmates with some of the crimes involving killing of Jews for anatomical research, euthanasia of sick and disabled civilians, and killing of tubercular Poles [73].

#### **4. Conclusion**

This chapter has considered health research data governance through the lens of the WMA Declaration of Taipei that is based on requirements for: (1) protection of participants' rights; (2) transparency of information; (3) individual and community inclusion through engagement; and (4) accountability of health database custodians through being accessible to all stakeholders. The ethical considerations for health research data access, sharing, and use include: confidentiality and privacy breaches as well as the need to manage these two aspects; violations of expectations of data reuse; valid consent and the potential impact of such sharing on public trust. Factors that influence research participants' trust include: (1) varying benefits expectations, (2) historical discrimination in research, (3) participants' fear that their data might be used inappropriately. These factors have the potential to erode trust of health research data subjects because the context is the same, i.e., research.

#### **Conflict of interest**

The author declares no conflict of interest.

#### **Acronyms and abbreviations**


*Ethical Considerations for Health Research Data Governance DOI: http://dx.doi.org/10.5772/intechopen.106940*

### **Author details**

Mantombi Maseme National Health Laboratory Service, Johannesburg, South Africa

\*Address all correspondence to: masememr@gmail.com

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **References**

[1] Stahl BC, Rainey S, Harris E, Fothergill BT. The role of ethics in data governance of large neuro-ICT projects. Journal of the American Medical Informatics Association. 2018;**25**(8): 1099-1107. DOI: 10.1093/jamia/ocy040

[2] Otto B. Organizing Data governance: Findings from the telecommunications industry and consequences for large service providers. Communications of the Association for Information Systems. 2011;**29**(3):45-66. Available from: http://aisel.aisnet.org/cais/ vol29/iss1/3?utm\_source=aisel. aisnet.org%2Fcais%2Fvol29%2Fiss 1%2F3&utm\_medium=PDF&utm\_ campaign=PDFCoverPages

[3] Weber K, Otto B, Österle H. One size does not fit all—A contingency approach to data governance. ACM Journal of Data and Information Quality. 2009;**1**(1):1-27. DOI: 10.1145/1515693.1515696

[4] Caulfield T, Burningham S, Joly Y, Master Z, Shabani M, Borry P, et al. A review of the key issues associated with the commercialization of biobanks. Journal of Law and the Biosciences. 2014:94-110. DOI: 10.1093/jlb/lst004

[5] Moodley K, Kleinsmidt A. Allegations of misuse of African DNA in the UK: Will data protection legislation in South Africa be sufficient to prevent a recurrence? Developing World Bioethics. 2020;**00**:1-6. DOI: 10.1111/dewb.12277

[6] Singleton P, Wadsworth M. Confidentiality and consent in medical research: Consent for the use of personal medical data in research. BMJ. 2006;**333**:255-258

[7] World Medical Association. WMA Declaration of Taipei on Ethical

Considerations Regarding Health Databases and Biobanks. 2016. Available from: https://www.wma.net/policiespost/wma-declaration-of-taipei-onethical-considerations-regarding-healthdatabases-and-biobanks/. [Accessed: June 14, 2022]

[8] Organisation for Economic Co-operation and Development. OECD Principles and Guidelines for Access to Research Data from Public Funding. 2007. Available from: https://www.oecd. org/sti/inno/38500813.pdf. [Accessed: June 14, 2022]

[9] United Nations. Universal Declaration of Human Rights. 2015. Available from: https://www.un.org/en/udhrbook/pdf/ udhr\_booklet\_en\_web.pdf. [Accessed: June 14, 2022]

[10] United Nations Office of the High Commissioner. What are Human Rights. 2022. Available from: https://www. ohchr.org/en/what-are-human-rights. [Accessed: June 14, 2022]

[11] Council for International Organizations of Medical Sciences. International Ethical Guidelines for Health-related Research Involving Humans. 2016. Available from: https:// cioms.ch/wp-content/uploads/2017/01/ WEB-CIOMS-EthicalGuidelines.pdf. [Accessed: June 14, 2022]

[12] DeJong. Chapter Four-Confidentiality. 2014. Available from: https://www.sciencedirect.com/science/ article/pii/B9780124081284000047. [Accessed: June 14, 2022]

[13] Nass SJ, Nass SJ, Levit LA, Gostin LO. Beyond the HIPAA Privacy Rule: Enhancing Privacy, Improving Health through Research. Washington

*Ethical Considerations for Health Research Data Governance DOI: http://dx.doi.org/10.5772/intechopen.106940*

DC: The National Academies Press; 2009. 76 p. Available from: http://www.nap. edu/catalog/12458.html

[14] Beauchamp TL, Childress JF. Moral and ethical theory. In: Beauchamp TL, Childress JF, editors. Principles of Biomedical Ethics. 4th ed. New York: Oxford University Press; 1994. p. 6

[15] Beauchamp TL, Childress JF. Liberal individualism: Rights-based theory. In: Beauchamp TL, Childress JF, editors. Principles of Biomedical Ethics. 4th ed. New York: Oxford University Press; 1994. p. 6

[16] Moreno BAC, Arteaga GMG. Violation of ethical principles in clinical research. Influences and possible solutions for Latin America. BMC Medical Ethics. 2012;**13**(35):1- 4. Available from: http://www. biomedcentral.com/1472-6939/13/35

[17] Smits P, Champagne F. Governance of health research funding institutions: An integrated conceptual framework and actionable functions of governance. Health Research Policy and Systems. 2020;**18**(22):1-19. DOI: 10.1186/ s12961-020-0525-z

[18] Shabani M, Obasa M. Transparency and objectivity in governance of clinical trials data sharing: Current practices and approaches. Clinical Trials. 2021:45-60. DOI: 10.1177/1740774519865517

[19] Kosack S, Fung A. Does transparency improve governance? Annual Review of Political Science. 2014;**17**:65-87

[20] Ford E, Boyd A, Bowles JKF, Havard A, Aldridge RW, Curcin V, et al. Our data, our society, our health: A vision for inclusive and transparent health data science in the United Kingdom and beyond. Learning Health Systems. 2019;**3**(e10191):1-12. DOI: 10.1002/lrh2.10191

[21] Gille F, Axler R, Blasimme A. Transparency about governance contributes to biobanks' trustworthiness: Call for action. Biopreservation and Biobanking. 2020:1-2. DOI: 10.1089/ bio.2020.0057

[22] European Commission. Ethics and Data Protection. 2018. Available from: https://ec.europa.eu/info/sites/ default/files/5.\_h2020\_ethics\_and\_data\_ protection\_0.pdf. [Accessed: June 14, 2022]

[23] Jao I, Kombe F, Mwalukore S, Bull S, Parker M, Kamuya D, et al. Involving research stakeholders in developing policy on sharing public health research data in Kenya: Views on fair process for informed consent, access oversight, and community engagement. Journal of Empirical Research on Human Research Ethics. 2015;**10**(3):264-277. DOI: 10.1177/ 1556264615592385

[24] Sleigh J, Vayena E. Public engagement with health data governance: The role of visuality. Humanities and Social Sciences Communications. 2021;**8**(149):1-12. DOI: 10.1057/ s41599-021-00826-6

[25] National Institutes of Health. Community Engagement. 2011. Available from: https://www.atsdr. cdc.gov/communityengagement/pdf/ PCE\_Report\_508\_FINAL.pdf. [Accessed: June 18, 2022]

[26] Evans EA, Delorme E, Cyr K, Goldstein DM. A qualitative study of big data and the opioid epidemic: Recommendations for data governance. BMC Medical Ethics. 2020;**21**(101):1-13. DOI: 10.1186/s12910-020-00544-9

[27] Milne R, Sorbie A, Dixon-Woods M. What can data trusts for health research learn from participatory governance in biobanks? Journal of Medical

Ethics. 2022;**48**:323-328. DOI: 10.1136/ medethics-2020-107020

[28] Shah N, Coathup V, Teare H, Forgie I, Giordano GN, Hansen TH, et al. Sharing data for future research— Engaging participants' views about data governance beyond the original project: A DIRECT study. Genetics in Medicine. 2019;**21**(5):1131-1138

[29] Hoeyer K, Bauer S, Pickersgill M. Datafication and accountability in public health: Introduction to a special issue. Social Studies of Science. 2019;**49**(4):459-475. DOI: 10.1177/ 0306312719860202

[30] Andanda P. Towards a Paradigm Shift in Governing Data Access and Related Intellectual Property Rights in Big Data and Health-Related Research. Springer. 2019;**50**:1052-1081. DOI: 10.1007/ s40319-019-00873-2

[31] Allen J, Adams C, Flack F. The role of data custodians in establishing and maintaining social licence for health research. Bioethics. 2019;**41**:404-409. DOI: 10.1111/bioe.12549

[32] Rosenbaum S. Data governance and stewardship: Designing data stewardship entities and advancing data access. Health Services Research. 2020;(Special Issue):1442-1455. DOI: 10.1111/j. 1475-6773.2010.01140.x

[33] United Nations. 2018. A Human Rights-based Approach to Data. Available from: https:// www.ohchr.org/sites/default/files/ Documents/Issues/HRIndicators/ GuidanceNoteonApproachtoData.pdf. [Accessed: June 24, 2022]

[34] Wellcome Trust. 2015. Ethical Sharing of Health Research Data in Lowand Middle-income Countries: Views of Research Stakeholders. Available from:

https://cms.wellcome.org/sites/default/ files/ethical-sharing-of-health-researchdata-in-low-and-middle-incomecountries-phrdf-2014.pdf [Accessed: June 23, 2022]

[35] Organisation for Economic Co-Operation and Development. OECD iLibrary. 2019. Enhancing Access to and Sharing of Data: Reconciling Risks and Benefits for Data Re-use across Societies. Available from: https://www. oecd-ilibrary.org/sites/15c62f9c-en/ index.html?itemId=/content/ component/15c62f9c-en. [Accessed: June 23, 2022]

[36] O'Connell and Plewes. 2015. Sharing Research Data to Improve Public Health in Africa. Available from: https://www.ncbi.nlm.nih.gov/books/ NBK321547/pdf/Bookshelf\_NBK321547. pdf. [Accessed: June 23, 2022]

[37] Denny SG, Silaigwana B, Wassenaar D, Bull S, Parker M. Developing ethical practices for public health research data sharing in South Africa: The views and experiences from a diverse sample of research stakeholders. Journal of Empirical Research on Human Research Ethics. 2015;**10**(3):290-301. DOI: 10.1177/ 1556264615592386

[38] Sankoh O. Sharing research data to improve public health. The Lancet. 2011;**377**:537-539. DOI: 10.1016/ S0140- 6736(10)62234-9

[39] Seh AH, Zarour M, Alenezi M, Sarkar AK, Agrawal A, Kumar R, et al. Healthcare data breaches: Insights and implications. Healthcare. 2020;**8**(133): 1-18. DOI: 10.3390/healthcare8020133

[40] Lord W, Doll R, Asscher W, Hurley R, Langman M, Gillon R, et al. Consequences for research if use of anonymised patient data breaches confidentiality. BMJ. 1999;**319**:1366-1372 *Ethical Considerations for Health Research Data Governance DOI: http://dx.doi.org/10.5772/intechopen.106940*

[41] World Health Organisation. 2022. Sharing and Reuse of Health-related Data for Research Purposes: WHO Policy and Implementation Guidance. Available from: https://www.who.int/ publications/i/item/9789240044968. [Accessed: July 04, 2022]

[42] Ostherr K, Borodina S, Bracken RC, Lotterman C, Storer E, Williams B. Trust and privacy in the context of usergenerated health data. Big Data and Society. 2017:1-11. DOI: 10.1177/ 2053951717704673

[43] Rose H. The Commodification of Bioinformation: The Icelandic Health Sector Database. London: The Wellcome Trust; 2001. p. 31

[44] Nicolson J. The commodification of patient medical records. BMJ. 2013;**347**(f5867):1-1. DOI: 10.1136/bmj. f5867

[45] Staunton C, Tschigg K, Sherman G. Data protection, data management, and data sharing: Stakeholder perspectives on the protection of personal health information in South Africa. PLoS One. 2021;**16**(12):1-19. DOI: 10.1371/journal. pone.0260341

[46] Meystre SM, Lovisb C, Bürklec T, Tognolad G, Budrionise A, Lehmann CU. Clinical data reuse or secondary use: Current status and potential future progress. In: IMIA Yearbook of Medical Informatics. Germany: IMIA and SchattauerGmbH; 2017. pp. 38-52. DOI: 10.15265/IY-2017-007

[47] VandeVusse A, Mueller J, Karcher S. Qualitative data sharing: Participant understanding, motivation, and consent. Qualitative Health Research. 2022;**32**(1):182-191. DOI: 10.177/ 10497323211054058

[48] Cumyn A, Dault R, Barton A, Cloutier A-M, Ethier J-F. Citizens, research ethics committee members and researchers' attitude toward information and consent for the secondary use of health data: Implications for research within learning health systems. Journal of Empirical Research on Human Research Ethics. 2021;**16**(3):165-178

[49] Tsai F-J, Junod V. Medical research using governments' health claims databases: With or without patients' consent? Journal of Public Health. 2018;**40**(4):71-877. DOI: 10.1093/ pubmed/fdy034

[50] Mostert M, Bredenoord AL, Biesaart MCIH, van Delden JJM. Big Data in medical research and EU data protection law: Challenges to the consent or anonymise approach. European Journal of Human Genetics. 2016;**24**:956-960. DOI: 10.1038/ ejhg.2015.239

[51] O'Neill O. Autonomy and Trust in Bioethics. United Kingdom: Cambridge University Press; 2002. p. 17

[52] Kerasidou A. Trust me, I'm a researcher!: The role of trust in biomedical research. Medicine, Health Care and Philosophy. 2017;**20**:43-50. DOI: 10.1007/s11019-016-9721-6

[53] Maseme M. Commodification of biomaterials and data when funding is contingent to transfer in biobank research. Medicine, Health Care and Philosophy. 2021:1-9. DOI: 10.1007/ s11019-021-10042-3

[54] Wilkins CH. Effective engagement requires trust and being trustworthy. Medical Care. 2018;**56**(10):S6-S8

[55] Kraft S, Cho M, Gillespie K, Halley M, Varsava N, Ormond K, et al. Beyond consent: Building trusting relationships with diverse populations in precision medicine research. American

Journal of Bioethics. 2018;**18**(4):3-20. DOI: 10.1080/15265161.2018.1431322

[56] Moodley K, Singh S. "It's all about trust": Reflections of researchers on the complexity and controversy surrounding biobanking in South Africa. BMC Medical Ethics. 2016;**17**(57):1-9. DOI: 10.1186/s12910-016-0140-2

[57] Mcdonald M, Townsend A, Cox SM, Paterson ND, Lafrenière D. Trust in health research relationships: Accounts of human subjects. Journal of Empirical Research on Human Research Ethics. 2016:35-47. DOI: 10.1525/jer.2008.3.4.35

[58] McDavitt B, Bogart LM, Mutchler MG, Wagner GJ, Green HD Jr, Lawrence SJ, et al. Dissemination as dialogue: Building trust and sharing research findings through community engagement. Public Health, Research, Practice, and Policy. 2016;**13**:150473. DOI: 10.5888/pcd13.150473

[59] Platt J, Kardia S. Public trust in health information sharing: Implications for biobanking and electronic health record systems. Journal of Personalised Medicine. 2015;**5**:3-21. DOI: 10.3390/ jpm5010003

[60] O'Brien J, Chantler C. Confidentiality and the duties of care. Journal of Medical Ethics. 2003;**29**:36-40. DOI: 10.1136/jme.29.1.36

[61] Molyneux S, Mulupi S, Mbaabu L, Marsh V. Benefits and payments for research participants: Experiences and views from a research Centre on the Kenyan coast. BMC Medical Ethics. 2012;**13**(13):1-15

[62] Allen J, McNamara B. Reconsidering the value of consent in biobank research. Bioethics. 2011;**25**(3):155-166. DOI: 10.1111/j.1467-8519.2009.01749.x

[63] Tai MC-T. Deception and informed consent in social, behavioral, and educational research (SBER). Tzu Chi Medical Journal. 2012;**24**:218-222. DOI: 10.1016/j.tcmj.2012.05.003

[64] Nicol D. Public Trust, intellectual property and human genetic databanks: The need to take benefit sharing seriously. JIBL. 2006;**3**:89-103

[65] Nicol D, Critchley C. Benefit sharing and biobanking in Australia. Public Understanding of Science. 2011;**21**(5):534- 555. DOI: 10.1177/0963662511402425

[66] Johnsson L, Helgesson G, Hansson MG, Eriksson S. Adequate trust avails, mistaken trust matters: On the moral responsibility of doctors as proxies for patients' trust in biobank research. Bioethics. 2012;**1-8**. DOI: 10.1111/j.1467-8519.2012.01977.x

[67] Scharff DP, Mathews KJ, Jackson P, Hoffsuemmer J, Martin E, Edwards D. More than Tuskegee: Understanding mistrust about research participation. Journal of Health Care for the Poor and Underserved. 2010;**21**(3):879-897. DOI: 10.1353/hpu.0.0323

[68] Durant RW, Legedza AT, Marcantonio ER, Freeman MB, Landon BE. Different types of distrust in clinical research among whites and African Americans. Journal of the National Medical Association. 2011;**103**(2):123-130

[69] Brandon DT, Isaac LA, LaVeist TA. The legacy of Tuskegee and Trust in Medical Care: Is Tuskegee responsible for race differences in mistrust of medical care? Journal of the National Medical Association. 2005;**97**(7):951-956

[70] Centers for Disease Control and Prevention. The Tuskegee Timeline. 2021. Available from:

*Ethical Considerations for Health Research Data Governance DOI: http://dx.doi.org/10.5772/intechopen.106940*

https://www.cdc.gov/tuskegee/timeline. htm#:~:text=In%201932%2C%20the%20 USPHS%2C%20working,Syphilis%20 Study%20at%20Tuskegee%E2%80%9D. [Accessed: July 05, 2022]

[71] ScienceDirect. Tuskegee Syphilis Experiment. 2022. Available from: https://www.sciencedirect.com/topics/ medicine-and-dentistry/tuskegeesyphilis-experiment/pdf. [Accessed: July 05, 2022]

[72] Silvers A. Historical vulnerability and special scrutiny: Precautions against discrimination in medical research. The American Journal of Bioethics. 2004;**4**(3):56-57

[73] Harvard Law School. Nuremberg Trials Project. 2020. Available from: https://nuremberg.law.harvard.edu/ nmt\_1\_intro. [Accessed: July 05, 2022]

#### **Chapter 6**

## Predictive Data Analysis Using Linear Regression and Random Forest

*Julius Olufemi Ogunleye*

#### **Abstract**

A statistical technique called predictive analysis (or analytics) makes use of machine learning and computers to find patterns in data and forecasts future actions. It is now preferred to go beyond descriptive analytics in order to learn whether training initiatives are effective and how they may be enhanced. Data from the past as well as the present can be used in predictive analysis to make predictions about what might occur in the future. Businesses can improve upcoming learning projects by taking actionable action after identifying the potential risks or possibilities. This chapter compares two predictive analysis models used in the predictive analysis of data: the Generalized Linear Model with Linear Regression (LR) and the Decision Trees with Random Forest (RF). With an RMSE (Root Mean Square Error) of 0.0264965 and an arithmetic mean for all errors of 0.016056967, Linear Regression did better in this analysis than Random Forest, which had an RMSE of 0.117875 and an arithmetic mean for all errors of 0.07062315. Through the hyper-parameter tuning procedure, these percentage errors can still be decreased. The combined strategy of combining LR and RF predictions, by averaging, nevertheless produced even more accurate predictions and will overcome the danger of over-fitting and producing incorrect predictions by individual algorithms, depending on the quality of data used for the training.

**Keywords:** data analysis, predictive data analysis, linear regression, random Forest, generalized linear model, decision trees

#### **1. Introduction**

Data analysis is the process of analyzing data to increase productivity and business growth. It involves steps like data cleansing, transformation, inspection, and modeling to perform market analysis, gather hidden data insights, enhance business studies, and generate reports based on the available data using tools like Tableau, Power BI, R and Python, Apache Spark, etc.

#### **1.1 Predictive analysis**

Predictive analytics also referred to as predictive analysis, is a subset of data analysis that focuses on creating future predictions from data. Other types of data analysis

exist, such as descriptive and diagnostic analysis, but predictive analysis is very wellliked in business analysis because it is crucial for making wise decisions. In any case, predictive analysis typically uses a variety of statistical models, techniques, and tools that all aid in understanding the patterns in datasets and making predictions. Data description and sorting are only a small part of predictive analysis. It largely relies on sophisticated models created to conclude from the data it encounters. To predict future trends, these models evaluate previous and present data using algorithms and machine learning. Depending on the particular requirements of people using predictive analysis, each model varies. Predictive analysis is very useful for assessing business decisions. This is because decisions effectively involve understanding their effects and basing them on projections of how a project, group, environment, or other entity will perform. A few typical fundamental models that are often used include:


Prediction is a key component of data mining. Predictive analysis is a method for forecasting future patterns from current or historical data. As a result, businesses will be able to forecast future data trends. It can take many different forms, but some of the most advanced models make use of machine learning and artificial intelligence [1].

#### **1.2 Models for predictive analysis**

Predictive analysis encompasses several different types of data analysis models. Most of these are regression models, which aim to determine the connections between two or more variables. They can aid in predicting the value of an unknown variable as the value of a known variable changes by recognizing the links between these variables.

i.Generalized Linear Model *- Linear Regression*

The linear regression model is the most basic predictive analysis approach. In this approach, it is presumed that an unknown variable's value will scale linearly with a known variable's value. To track straightforward relationships and anticipate their future, such as expanding a customer base, linear regression models might be useful.

ii.Decision Trees - Random Forests

Random forests are machine learning models that, among other things, can be used to model regression. They are appropriate for huge data collections with several variables and are made up of some decision trees.

iii.Neural Networks

A cutting-edge tool for predictive analysis is neural networks. They are a collection of digital or biological neurons that talk to one another. A neural network changes shape and comes to new conclusions based on the data.

#### **1.3 Predictive analysis tools**

Aside from models, there are many specific tools available for conducting predictive analysis. These technologies aid in the discovery of connections that can be utilized to establish future predictions on data. They take on the bulk of the user's work by incorporating many statistical models used in the predictive analysis [2].

#### i.RapidMiner Studio

IBM provides a variety of predictive analytics technologies, including its premier SPSS Statistics software offering, as SaaS solutions. The system, which offers a variety of predictive analysis models, is primarily aimed at enterprise users.

#### ii.KNIME

Many of the functionalities of RapidMiner Studio are also available in the opensource data analysis tool KNIME. It appears to be made for more experienced users, though.

#### iii.IBM Predictive Analytics

A well-liked commercial tool for all types of predictive analysis is RapidMiner Studio. It aids in data collection, processing, and application of various statistical models to produce insightful results.

iv.SAP Predictive Analytics

SAP has a well-known SaaS product in the predictive analytics market. The developer of enterprise management software provides an analytics cloud for business users that is implemented similarly to IBM's.

### **2. Related works**

#### **2.1 Predictive analysis using linear regression with SAS (Bafna J., 2017)**

According to Bafna J., a scalar dependent variable and one or more independent variables that are explanatory are connected using linear regression. The best-fitted straight line across the points in linear regression, one of the most widely used prediction methods, is referred to as a regression line. To demonstrate his thesis, the author gave the example of estimating people's weights based on their heights. The dependent variable in this situation is the weight, which needs to be predicted, and the independent variable is the height. The following outcomes were obtained using SAS' PROC REG to utilize linear regression to determine the relationship between two variables:


• To check for any outliers in the observations, the value of r was determined. If the value of r was greater than 2 or less than −2, the observations were considered outliers. (Note: −2 < r < 2.)

No observations deviated from the outliers range, leading the author to conclude that a major variable accounted for 95% of the person's weight (height) [3].

#### **2.2 Random forest model to identify factors associated with anabolic-androgenic steroid use (Manoochehri Z., Barati M., Faradmal J. and Manoochehri S., 2021)**

Androgenic-anabolic steroids are one form of doping bodybuilders frequently take (AAS). In addition to breaking athletic ethics, using AAS would harm one's physical and mental health. This study used a prototype willingness model to identify the key characteristics influencing AAS use among bodybuilders (PWM). A total of 280 male bodybuilders were chosen in 2016 utilizing multistage sampling from the bodybuilding clubs in Hamadan city for the analytical cross-sectional study. The data was then gathered through a self-administered questionnaire that included demographic data and PWM components, and a random forest model was also employed to evaluate the data. The most crucial elements in defining behavioral intention were behavioral willingness, attitude, and prior AAS usage. Additionally, BMI, attitude, subjective standards, and prototypes had the biggest impacts on predicting behavioral willingness to take AAS. Additionally, it was found that behavioral intention was more significant than behavioral willingness in predicting AAS usage. The findings indicate that, in comparison to the social reaction path, the reasoned action path has a stronger impact on predicting the use of AAS among bodybuilders [4].

#### **2.3 Linear regression analysis study (Kumari K. and Yadav S., 2021)**

According to the authors, linear regression is a statistical method for determining the value of a dependent variable based on an independent variable and determining the relationship between two variables. It is a modeling method in which one or more independent variables are used to forecast a dependent variable, and, according to the authors, it is the most widely applied statistical method. The chapter provided an overview of the underlying ideas and examples of performing linear regression calculations using SPSS and Excel (**Table 1**).


**Table 1.** *Summary output.*

*Predictive Data Analysis Using Linear Regression and Random Forest DOI: http://dx.doi.org/10.5772/intechopen.107818*

According to the table above, multiple R is the correlation coefficient, where 1 (one) denoted a perfect correlation, and 0 (zero) denoted a lack of correlation. The factors might account for 92% of the variation according to the R Square coefficient of determination. Adjusted R-squared was utilized because it was corrected for many factors. The best methods for figuring out the link between two variables, according to the authors, were correlation and linear regression. Correlation measures the strength of a linear relationship between two variables, whereas regression describes the relationship as an equation. In the essay, straightforward examples using SPSS and Excel were provided to illustrate linear regression analysis and urge readers to adopt these techniques to analyze their data [5].

#### **3. Methods**

#### **3.1 Linear regression**

A machine learning technique called linear regression enables the conversion of numerical inputs into numerical outputs and the fitting of a line through the data points. In other words, a method of modeling the relationship between one or more variables is called linear regression (**Figure 1**) [6]. From a machine learning perspective, this is done to accomplish generalization, which enables the model to forecast results for inputs it has never seen before. It is one of the most well-known concepts in statistics and machine learning, and since it is so crucial, it consumes a sizable chunk of almost every Machine Learning course [7].

$$\mathbf{y} = m\mathbf{x} + \mathbf{c}\_r$$

where x is the score of the independent variable, m is the regression coefficient, c is the constant, and x is the independent variable, is the formula for every straight line on a plot.

The formula for this in machine learning is h(x) = w0 + w1.x, where x is the input feature, w0 and w1 are weights, and h(x) is the label (i.e., y-value). The goal of linear regression is to identify the weights (w0 and w1) that produce the line that fits the input data the best (i.e. x features) [8].

**Figure 1.** *Graphical illustration of a line (in red) generated by linear regression [3].*

#### **3.2 Random Forest**

Machine learning methods for solving classification and regression issues include Random Forests. It uses ensemble learning, a method for solving complicated issues by combining a number of classifiers. The decision trees used in the random forest algorithm are numerous. The random forest algorithm creates a "forest" trained via bagging or bootstrap aggregation [9]. The accuracy of machine learning algorithms is increased by bagging, an ensemble meta-algorithm. Based on the predictions of the decision trees, the (random forest) algorithm determines the result. It makes predictions by averaging or averaging out the results from different trees [10]. The accuracy of the result grows as the number of trees increases. The decision tree algorithm's drawbacks are eliminated by a random forest, which also decreases dataset overfitting and boosts precision. Without requiring numerous configurations in packages, it generates forecasts (like Scikit-learn) [11].

#### **3.3 The random Forest Algorithm's features**

It overcomes the problem of overfitting in decision trees and is more accurate than the decision tree technique. In every random forest tree, a subset of characteristics is randomly chosen at the node's splitting point, providing an efficient approach of addressing missing data [12].

The fundamental distinction between the random forest method and the decision tree algorithm is that the latter randomly selects the root nodes and groups the nodes (**Figure 2**). To produce the necessary forecast, the random forest uses the bagging approach [13]. Bagging entails using multiple samples of data (training data) as opposed to a single sample. Predictions are made using features and observations from a training dataset. Depending on the training information employed by the random forest algorithm, the decision trees generate various results. The highest ranking of these outputs will be chosen as the final output [7].


#### *3.3.1 Advantages of random forest*

	- More resources are needed for calculation when utilizing a random forest.
	- It takes longer than a decision tree approach [15].

#### **4. Discussions**

In this chapter, the predictive analysis methods Linear Regression and Random Forest are compared. Data on software cost estimation was obtained from Kaggle, and the database contained details on the function point-measured size of the implemented program. To determine which model had the lowest error and anticipated the software cost, H2O AutoML was used. The expected performance of machine learning systems may be greatly impacted by erroneous and noisy input. Poor data quality, notably the significant occurrence of missing values and outliers, may result in inconsistent and incorrect conclusions. Therefore, a key stage in developing ML models is pre-processing data through selection, cleaning, reduction, transformation, and feature selection (**Table 2**).

The project dataset was divided into the train (80%) and test (20%) halves for modeling purposes using H2O. The first one was used to create models, while the second one was used to verify their capacity to estimate effort. H2O AutoML was used to apply two data mining prediction methods (Generalized Linear Models - Linear Regression (LR) and Decision Trees - Random Forest (RF)) for both dependent variables. In order to assess their potential utility for implementation inside companies, error and accuracy measures were contrasted. The error measurements used to evaluate the accuracy of software estimate models were *Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE),* and *Root Mean Squared Log Error (RMSLE)* (**Table 3**).

Linear Regression outperformed Random Forest, as shown in the graph above. Additionally, there is very little variation between the models' RMSLE and MAE. Therefore, it can be concluded that there were no significant errors. The difference in prediction accuracy between the algorithms was essentially small, and each one could be employed independently for the examination of the predictions. In conclusion, both models are quite good at making predictions. However, in this instance, linear


#### *Data Integrity and Data Governance*

*Predictive Data Analysis Using Linear Regression and Random Forest DOI: http://dx.doi.org/10.5772/intechopen.107818*


**Table 2.**

*Sample of sourced data.*


#### **Table 3.**

*Errors and accuracy measures.*

**Figure 3.**

*Graphical representation of the errors and accuracy measures.*

regression outperformed the other model. If deployed for a specific company and trained using a homogeneous dataset, models may be more accurate (**Figure 3**).

#### **5. Conclusion**

Almost every field uses predictive analysis, even though it has drawn some criticisms. With more information, future outcomes can be predicted with relative accuracy. This makes it possible for organizations and businesses to make educated decisions to increase production. Learning the methods of predictive analysis has become essential for jobs in data science and business analysis since it has numerous applications in every conceivable industry. In this investigation, Random Forest had an RMSE of 0.117875 and arithmetic mean for all errors of 0.07062315, while Linear Regression had an RMSE of 0.0264965 and arithmetic mean of 0.016056967. Through the hyper-parameter tuning procedure, these percentage mistakes can still be decreased.

This study compares the Generalized Linear Model with Linear Regression and the Decision Trees with Random Forest models for predictive analysis. Additionally, a merged strategy was investigated, which used the arithmetic mean to combine the predictions of the two models. The outcomes demonstrated that distinct data mining techniques might be applied to make predictions. The combined strategy of combining LR and RF predictions by averaging nevertheless produced even more accurate predictions and will overcome the danger of over-fitting and producing incorrect predictions by individual algorithms, depending on the quality of data used for the training. To maintain accuracy in a project's changing environment, it is important to remember that project management offices should ensure good input data quality and model updates.

#### **Acknowledgements**

I, Julius Olufemi Ogunleye (the author), would like to express my gratitude to Ass. Prof. Zdenka Prokopova and Ass. Prof. Petr Silhavy for their support and guidance in making this research work possible. This work was supported by the Faculty of Applied Informatics, Tomas Bata University in Zlín, under Projects IGA/ CebiaTech/2022/001.

### **Author details**

Julius Olufemi Ogunleye Tomas Bata University in Zlin, Czech Republic

\*Address all correspondence to: juliusolufemi@yahoo.com

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **References**

[1] Data Mining Techniques: Algorithm, Methods & top Data Mining Tools. Software Testing Help; March 2020. Available from: https:// www.softwaretestinghelp.com/ data-mining-techniques/

[2] Steeneken F, Ackley D. A Complete Model of the Supermarket Business. BPTrends ▪ January 2012

[3] Bafna J. Predictive Analysis Using Linear Regression With SAS. Big Data Zone – DZone; 2017

[4] Manoochehri Z, Barati M, Faradmal J, Manoochehri S. Random forest model to identify factors associated with anabolicandrogenic steroid use. BMC Sports Sci Med Rehabil. 2021;**13**(1):30

[5] Kumari K, Yadav S. Linear regression analysis study. Curriculum in Cardiology—Statistics. 2021;**4**:33-36

[6] Sumiran K. An overview of data mining techniques and their application in industrial engineering. Asian Journal of Applied Science and Technology. 2018;**2**:947-953

[7] Mehmed K. Data Mining – Concepts, Models, Methods, and Algorithms. Edition – 2, Illustrated Edition. Wiley; 2011. ISBN 1118029127, 9781118029121

[8] Varshini AGP, Kumari KA. Predictive analytics approaches for software effort estimation: A review. Indian Journal of Science and Technology. 2020;**13**:2094-2103

[9] Nassif AB et al. Software development effort estimation using regression fuzzy models. Computational Intelligence and Neuroscience. 2019;**2019**:8367214

[10] Azzeh MA, Nassif B, Banitaan S. Comparative analysis of soft computing techniques for predicting software effort based use case points. IET Software. 2018;**12**(1):19-29

[11] Dejaeger K et al. Data mining techniques for software effort estimation: A comparative study. IEEE Transactions on Software Engineering. 2012;**38**(2):375-397. DOI: 10.1109/ TSE.2011.55

[12] Weiss GM, Davison BD. Data Mining. In: Bidgoli H, editor. Handbook of Technology Management. John Wiley and Sons; 2010

[13] Berson A et al. An Overview of Data Mining Techniques. (Excerpts from the book 'Building Data Mining Applications for CRM' by Alex Berson, Stephen Smith, and Kurt Thearling). McGraw-Hill; 2005

[14] Data Mining Techniques: Algorithm, Methods & top Data Mining Tools. Software Testing Help; April 2020. Available from: https:// www.softwaretestinghelp.com/ data-mining-techniques/

[15] Kushwaha DS, Misra AK. Software Test Effort Estimation. (ACM SIGSOFT Software Engineering Notes – Page 3). May 2008;**33**(3)

## **Chapter 7** Field Programmable Reconfigurable Mesh (FPRM)

*Esti Stein and Yosi Ben Asher*

#### **Abstract**

Many application areas demand increasing amounts of processing capabilities. FPGAs have been widely used for improving this performance. FPRM (Field Programmable Reconfigurable Mesh) is a technique we propose to improve FPGA performance. A Reconfigurable Mesh (RM) consists of a grid of Processing Elements that use dynamic reconfigurations to create varying bus segments between them. The RM can thus perform computations such as Sorting or Counting in a constant number of steps. It has long been speculated that the RM's dynamic reconfigurations should replace the FPGA's static reconfigurations. We show that the RM is capable of not only speeding up specific computations such as sorting or summing, but also of speeding up the evaluation of Boolean circuits (BCs), which is the main purpose of the FPGA. Our proposed RM algorithm can evaluate BCs without causing size blowup. Furthermore, tri-state switching elements can be used instead of PEs in a grid.

**Keywords:** FPGA, reconfigurable mesh, DNF, Boolean circuits, tri-state

#### **1. Introduction**

FPGAs are integrated circuits that form a matrix of conigurable logic units (CLUs)<sup>1</sup> connected via programmable routing interconnects. By downloading different routing configurations to the FPGA, any circuit *C x*ð Þ 0, … , *xn*�<sup>1</sup> can be embedded and then executed/evaluated. After embedding the circuit's topology in the FPGA, the circuit is executed every time a new input is received. Due to the FPGA's routing interconnects and CLUs, this evaluation mode makes the FPGA relatively slow compared to ASICs.

Assuming the circuit *C x*ð Þ 0, … , *xn*�<sup>1</sup> that has been discussed previously, we wish to examine the possibility of speeding up the evaluation of *C* by using a dynamic mode of reconfiguration rather than the above-mentioned FPGA mode. Essentially, we devised an algorithm that evaluates *C x*ð Þ 0, … , *xn*�<sup>1</sup> faster than its depth [1] (the longest path from the root/output to any leaf/input)<sup>2</sup> . In a sequence of reconfiguration steps this algorithm: 1) Spans bus segments on different subsets of f g *x*0, … , *xn*�<sup>1</sup> in parallel; 2) Uses a single broadcast in each of these bus segments, and computes in parallel the AND/OR/COUNTING-1 s(counting the # of '1's) of each segment; 3)

<sup>1</sup> See appendix A for all acronyms and abbreviations.

<sup>2</sup> A preliminary version of the following algorithm and results was presented as a poster in [1].

Computes *C* in a fixed small number of steps regardless of *C*'s depth based on the above computations. The above algorithm uses a platform based on Reconfigurable Mesh (RM) [2], which is a 2D grid of Processing Elements (PEs) that uses dynamic bus reconfiguration to create varying bus segments for fast communication. Consequently, computations such as summation and sorting can be expedited.

Reconfigurable Mesh (RM) has been demonstrated to be able to perform parallel computations faster than the Parallel random access machine model (PRAM) [3], which is an abstract model for parallel computation. This includes *O*ð Þ1 summing [4], *O*ð Þ log integer summing [5], *O*ð Þ1 multiplication [6], sorting [7], convex hall [8], graph algorithms [9, 10] and image processing [11]. Despite this potential power of the RM model, it has not yet been fully realized since the model assumes a signal can be transmitted along a bus/connected component in a single step regardless of the number of switches/ports. From this perspective, a variety of restricted RMs have been proposed. These include the RMBM [12], where only the structure of the RM's switch has been simplified but still busses with a linear number of switches are used. The SRGA [13, 14] proposed a mesh, where each row/column has a complete binary tree of reconfigurable switches, allowing to route messages between the leaves of this tree. [15] proposes a linear RM (LR-Mesh) bending cost, where the delay of a bus varies as a function of how many times it bends between rows and columns. It showed that for busses with a reasonable delay of at most *<sup>D</sup>* <sup>¼</sup> *<sup>N</sup><sup>ε</sup>* bends they can simulate algorithms for LR-Meshes in constant time. A bus of length *d n*ð Þ¼ *<sup>n</sup>*<sup>1</sup>*=<sup>k</sup>* was suggested in [16], also showing that restricted RM algorithms can be directly coded in Verilog. This way of programming RM-algorithms overcomes most of the drawbacks of the C-like programming style proposed so far for RM-algorithms (e.g.,ARMlang [17]). However, they only addressed the problem of COUNTING-1 s. Our method of evaluating the circuit is partially based on the solution for COUNTING-1 s. [18] shows that integrating branching program with Boolean circuits is better than using each of them separately. Other realizations of the RM [19] were mainly to a small-size grid of Soft-CPUs and cannot be synthesized for large values of *n*.

A number of dynamic reconfiguration (DR) FPGAs have also been proposed, mainly for the purpose of speed acceleration. However, the main challenge was the reconfiguration delay. The use of DR is therefore rare [20]. There is also a method of addressing this problem, and it is commonly referred to as partial reconfiguration (PR) at runtime. PR can be implemented through external FPGA interfaces as well as special internal interfaces such as the ICAP on Xilinx devices [21]. Even so, PR is still primarily an auxiliary feature in modern commercial FPGAs rather than something with which the architecture is designed [22, 23]. Thus, PR design involves many details related to low-level architecture that require a high level of expertise. [24] proposed time-multiplexed DRFPGA, where registers are added to store computational states and partial results. Yet, only a few contexts are allowed because of area overhead. Memristors (RRAM), have also been applied as a programmable switch, as they are naturally more delay-efficient and lead to higher-performance FPGA architectures. However, [25, 26] only focus on the architectural repercussions of this technology. Very limited works investigate realistic RRAM-based circuit design constraints, while these have a strong impact on the final architectural performances. Fine-grain DR (FDR), described in [27], consists of homogeneous reconfigurable logic elements (LEs). It is possible to configure each LE as either a lookup table (LUT) or as an interconnect, or even as a combination of both. While this improves flexibility for allocating hardware resources between LUTs and interconnects, it still consumes a large amount of space. At first glance, this model seems to be close to what we have

proposed, but one of the main difference lies in the algorithm for evaluating the Boolean Circuit *C x*ð Þ 0, … , *xn*�<sup>1</sup> faster than its depth.

Our approach to the speed-up evaluation problem is to use the Reconfigurable Mesh (RM). We propose an infrastructure called FPRM (Field Programmable Reconfigurable Mesh) which is a sub-model of the RM model based on current CMOS technology and adapted to the proposed algorithm. The FPRM consists of twodimensional grids of switches *pei*,*<sup>j</sup>* , with each switch connected to four neighbors *pei*�1,*<sup>j</sup>* ,*pei*þ1,*<sup>j</sup>* ,*pei*,*j*�1,*pei*,*j*þ<sup>1</sup> via four links. It allows reconfiguration of its internal links in different reconfiguration modes *S*0,*S*1,*S*2, … as depicted in **Figure 1** (upper left). Each *pei*,*<sup>j</sup>* has four registers used to read/write to each of the four links: *Nr* to read/write to the link connecting *pei*,*<sup>j</sup>* to *pei*�1,*<sup>j</sup>* and *Sr=Wr=Er* to read/write to *pei*þ1,*<sup>j</sup> <sup>=</sup>pei*,*j*�1*=pei*,*j*þ<sup>1</sup> respectively. Each *pei*,*<sup>j</sup>* executes a program based on its current state, its coordinates *i*,*j* and the values of *Nr*,*Sr*,*Wr*,*Er*. Upon execution, each *pei*,*<sup>j</sup>* can change its reconfiguration mode, its state, and the content of its registers *Nr*,*Sr*,*Wr*,*Er*. **Figure 1** contains a four instructions program (bottom left side) for executing COUNTING-1 s of a four bits input. As depicted in **Figure 1** right side, the execution of this program creates a bus whose bendings corresponds to the <sup>0</sup> 10 s input values. By examining the exit point (row number) of a signal sent through *S*> we obtain the number of 1 � *bits* in the input.

The second step of the FPRM computation is shown in **Figure 2** computing the DNF (Disjunctive Normal Form): ð Þ ð Þ *x*0∧*x*1∧*x*<sup>2</sup> ∨ð Þ *x*0∧*x*<sup>3</sup> ∨ð Þ *x*1∧*x*<sup>3</sup> ∨ð Þ *x*2∧*x*<sup>3</sup> where each and-term (minterm) is computed in a different row. As with COUNTING-1 s, we broadcast the input values along the columns in the first step. If *pei*,*<sup>j</sup>* is associated with ∧*xi*∧ … in an and-term (minterm) and the input *xi* ¼¼ 1 then *pei*,*<sup>j</sup>* switches to a connect mode selecting *S*2, alternatively on (*xi* ¼¼ 0) it switches to a disconnect mode selecting *S*4. The opposite is performed if *pei*,*<sup>j</sup>* is associated with ∧*xi*∧ … . A *true* signal is sent from *S*> for every row, while each disconnected *pei*,*<sup>j</sup>* broadcasts a *false* signal from its *Er*. The or-term of these and-terms is computed in another broadcast along the last column. Obviously, the FPRM can be used to execute the *O*ð Þ1 RM algorithms such as summing of *n* numbers, multiplication [6, 28], sorting, convex hull [2], graph algorithms [9] and image processing [11]). However, here we consider the problem of parallel evaluation of circuits with large depths for which no previous RM algorithm exists. Preliminary results demonstrate the FPRM feasibility and that it is likely to outperform FPGAs.

**Figure 1.** *The FPRM switches and a program to compute COUNTING-1 s using a* 4 � 4 *FPRM.*

**Figure 2.** *Computing a DNF formula using a* 4 � 4 *FPRM.*

According to the proposed algorithm, boolean circuits *C x*ð Þ 0, … , *xn*�<sup>1</sup> can be evaluated in a constant number of FPRM steps regardless of *C*'s depth. During compilation, we calculate the minimized DNF formula *dnf <sup>y</sup>* for every possible result of COUNTING-1 s, which is *<sup>y</sup>* <sup>¼</sup> <sup>P</sup>*<sup>n</sup>*�<sup>1</sup> <sup>0</sup> *xi*. An FPRM program is generated for each of the DNFs (*dnf <sup>y</sup>*) using the algorithm described in **Figure 2**. Each of these DNFs (*dnf <sup>y</sup>*) is compiled into an FPRM program working in a similar manner to the algorithm described in **Figure 2**. At run-time, after performing COUNTING-1 s of the input, the FPRM selects the DNF-program *dnf <sup>y</sup>* for *y* and executes it. Thus, in a constant number of steps this algorithm computes *C x*ð Þ 0, … , *xn*�<sup>1</sup> , using dymamic reconfiguration (DR). By first applying COUNTING-1 s, we get that the size of the FPRM grid needed to execute each of the *dnf <sup>y</sup>*¼0, … ,*n*�<sup>1</sup> is less or equal to the size of the original *C x*ð Þ 0, … , *xn*�<sup>1</sup> . This is expected since each *dnf <sup>y</sup>* in *C x*ð Þ 0, … , *xn*�<sup>1</sup> is restricted to the case where *<sup>y</sup>* <sup>¼</sup> <sup>P</sup>*<sup>n</sup>*�<sup>1</sup> <sup>0</sup> *xi*. Further, the and-terms of *dnf <sup>y</sup>* are packed in a 2D-FPRM layout with multiple and-terms computed in a single row (unlike **Figure 2**, where each and-term is computed separately).

The rest of the chapter is organized as follows. The following section describes the use of COUNTING-1 s operation in order to reduce the formula size. The next step describes the problem of fitting as many and-terms as possible into an FPRM grid, which is one of the most challenging aspects of the technique. The results of the experiments will be presented next, followed by a summary of the conclusions.

#### **2. Using the counting-1 s operation to reduce formula size**

For every possible outcome of *<sup>y</sup>* <sup>¼</sup> <sup>P</sup> 0 <sup>∗</sup> *<sup>n</sup>* � <sup>1</sup>*xi*, the proposed algorithm starts by obtaining the minimized DNF formula, *dnf <sup>y</sup>*. The input is divided into *k* segments containing *fracnk* bits, and the number of 1-bits is counted for each segment. For each of the *<sup>n</sup> <sup>k</sup>* <sup>þ</sup> <sup>1</sup> � �*<sup>k</sup>* possible COUNTING-1 s results *<sup>y</sup>*1, … ,*yk yi* <sup>¼</sup> <sup>0</sup> … *<sup>n</sup> k* � �, we compute a minimized DNF *dnf <sup>y</sup>*1, … ,*yk* by:

1.Building a truth table *Ty*1, … ,*yk* of *C x*ð Þ 0, … , *xn*�<sup>1</sup> for all the binary numbers *x*0, … ,*xn*�<sup>1</sup> with *yi* 0 10 s in the *i* segment.


Consider the truth table *T* of the address-function of *n* ¼ 6 boolean variables given in **Figure 3**.

$$F(a,b,c,d,e,f) = \begin{cases} c &  = 0,0 \\ d &  = 0,1 \\ e &  = 1,0 \\ f &  = 1,1 \end{cases}$$

*T* is arranged by COUNTING-1 s in < *a*,*b*,*c*> (*y*1ð Þ *a*, *b*, *c* ∈f g 0,1,2,3 ), and COUNTING-1 s in <*d*,*e*,*f* > (*y*2ð Þ *d*, *e*, *f* ∈f g 0,1,2,3 ). Since the address function has a very small formula to begin with

$$F(a,b,c,d,e,f) = ca'b' + da'b + eab' + fab'$$

(where *x*<sup>0</sup> *is*¬*x*), it is not expected that using COUNTING-1 s can significantly reduce the size of the remaining circuits *C<sup>y</sup>*1ð Þ¼ *<sup>a</sup>*, *<sup>b</sup>*, *<sup>c</sup> <sup>i</sup>*,*y*2ð Þ¼ *<sup>d</sup>*, *<sup>e</sup>*, *<sup>f</sup> <sup>j</sup>* ð Þ *a*, *b*, *c*, *d*, *e*, *f* . Indeed, the results in **Figure 3** shows that the minimal boolean formula for *C<sup>y</sup>*1ð Þ¼ *<sup>a</sup>*, *<sup>b</sup>*, *<sup>c</sup>* 2,*y*2ð Þ¼ *<sup>d</sup>*, *<sup>e</sup>*, *<sup>f</sup>* <sup>1</sup> ð Þ *a*, *b*, *c*, *d*, *e*, *f* is *a*<sup>0</sup> *bcde*0 *f* 0 þ *ab*<sup>0</sup> *cd*<sup>0</sup> *ef*<sup>0</sup> þ *abc*<sup>0</sup> *d*0 *e*0 *f* which is even larger than the original formula for the whole function *ca*<sup>0</sup> *b*<sup>0</sup> þ *da*<sup>0</sup> *b* þ *eab*<sup>0</sup> þ *fab*.


**Figure 3.** *Truth table of the address function arranged by COUNTING-1 s results.*

However, this happens only for four out of the sixteen possible cases of *y*1ð Þ¼ *a*, *b*, *c i*,*y*2ð Þ¼ *d*, *e*, *f j*. In all the remaining 12 cases the boolean formula has one or no variables.

Yet, COUNTING-1 s is very helpful for the multiplication function

$$F(a, b, c, d, e, f) = \mathbf{1} \text{ } \text{iff } (a \cdot \mathbf{2} + b) \cdot (c \cdot \mathbf{2} + d)) \text{ mod } 4 = (e \cdot \mathbf{2} + f)$$

The results depicted in **Figure 4** shows that in all sixteen cases, the minimal boolean formula for *Cy*1ð Þ¼ *<sup>a</sup>*, *<sup>b</sup>*, *<sup>c</sup> <sup>i</sup>*,*y*2ð Þ¼ *<sup>d</sup>*, *<sup>e</sup>*, *<sup>f</sup> <sup>j</sup>* ð Þ *a*, *b*, *c*, *d*, *e*, *f* is very small.

**Figure 5** depicts the largest *dnf <sup>y</sup>* for *C* ¼ *STCON x*ð Þ 0, … , *x*<sup>48</sup> of a seven nodes directed graph where the input is 7 � 7 adjacency matrix of the graph. In this case we selected *<sup>k</sup>* <sup>¼</sup> ffiffiffiffiffi <sup>49</sup> <sup>p</sup> <sup>¼</sup> 7, hence *<sup>y</sup>* <sup>¼</sup> <sup>&</sup>lt; *<sup>y</sup>*1, … ,*y*<sup>7</sup> <sup>&</sup>gt; *yi* <sup>¼</sup> <sup>0</sup> … 7. Out of all the COUNTING-1 s cases for *STCON x*ð Þ 0, … , *x*<sup>48</sup> , **Figure 5** depicts the worst/largest *dnf <sup>y</sup>* obtained. The *dnf <sup>y</sup>* of **Figure 5** should be read as follows:



**Figure 4.** *Truth table of the mult function arranged by COUNTING-1 s results.* *Field Programmable Reconfigurable Mesh (FPRM) DOI: http://dx.doi.org/10.5772/intechopen.107425*


**Figure 5.**

*Resulting formula for worst COUNTING-1 s case of* 7 � 7 *STCON.*

As shown in **Figure 5**, the *dnf <sup>y</sup>* contains only 16 minterms (and-terms), each containing 10 variables (or negation of variables) omitting some *all don*<sup>0</sup> *t* � *care* columns at the end of each row. To compare, we obtained the full circuit for *STCON x*ð Þ 0, … , *x*<sup>48</sup> using VIVADO-HLS on the following C-code:

```
#define SQM 7
  for kð ¼ 0; k<SQM; k þ þÞf
  #pragma HLS unroll factor ¼ 7
    for ið ¼ 0; i <SQM; i þ þÞf
   #pragm HLS unroll factor ¼ 7
     for jð ¼ 0; j<SQM; j þ þÞf
     #pragma HLS unroll factor ¼ 7
      if mat i ð ½ �½ �j ∥∥ðmat i½ �½ � k &&
         mat k½ �½ �ÞÞ j mat i½ �½�¼j 1; g
   g
  g
return mat ð Þ ½ � 1 ½ � SQM � 1 ;
```
The synthesys results of the Verilog code obtained for the above code are: *clock* � *latency* ¼ 6*ns*, #*registers* ¼ 591, #*LUTs* ¼ 742 and #*MUXes* ¼ 782. This is significantly larger than *dnf <sup>y</sup>* of **Figure 5** which is a DNF with 143 gates and clock latency of less than 1*ns*. The experiments show that using a fast pre-computation function of the inputs (COUNTING-1 s) can significantly reduce the size of max *<sup>y</sup>* ∣*dnf <sup>y</sup>*∣ compare to the size of the complete circuit.

The above *dnf <sup>y</sup>* could have been further simplified by replacing monochromatic rectangles with new variables. As illustrated in the **Figure 6**, it is possible to simplify the largest *dnf <sup>y</sup>* of a five node graph by replacing eight monochromatic rectangles (left-side) with new variables. The result is a reduction from 94 ð Þ *=*14 *=simeq*6 to ð Þ 49*=*14 *=simeq*3 in the average number of variables per minterm. As each rectangle is evaluated at runtime, its subset of variables is logically ANDed. This can be achieved with a sub-bus of the FPRM that allows a zero-*xi* to broadcast, on the value of 0.

#### **3. Computing the FPRM layout of dnfy**

Using the algorithm of **Figure 2**, we can evaluate the *dnf <sup>y</sup>*s on the FPRM in three steps. Broadcasts representing the minterms of *dnf <sup>y</sup>* are used to evaluate the DNF. As a


**Figure 6.** *Reducing the DNF size of a dnf <sup>y</sup> by pre-computing monochromatic rectangles.*

result, we can evaluate the DNF using the *n* � *k* grid of sub-FPRM, where *n* is the number of rows (minterms), and *k* is the number of variables (14 � 17 for the DNF shown in **Figure 6** right). There are two steps involved: broadcasting the values of each *xi* over the columns in the sub-FPRM; and configuring each row as a single bus and computing the logical AND on each row. We can, however, pack several minterms/bus segments in one row, reducing the size of the FPRM sub-grid needed for the computation. Our 2D layout of a *dnf <sup>y</sup>* can be optimized by swapping minterms in each level and arranging the literals in each minterm (node).

**Figure 7** illustrates the optimized (by hand) layout of the DNF of **Figure 5** (called *LG*), wherein the minterms are arranged in six levels, each containing 1–4 minterms. Straight busses are used to broadcast the values of the literals in this layout. According to **Figure 7**, this layout also includes the extra duplications of <sup>0</sup> *b*<sup>0</sup> and <sup>0</sup> *n*<sup>0</sup> required. There is a significant improvement in the total area and max-switching length when compared to the simple method of arranging all minterms in one column. The optimized (by-hand) layout of **Figure 7** is also better compared to that of **Figure 5** when

**Figure 7.** *Optimized FPRM layout.*

used as an *LG*. In the following sections, we describe the details of the proposed algorithm to find a minimized *LG*.

#### **3.1 First stage: find the level arrangement of minterms in the final FPRM layout**


#### **3.2 Second stage: rearranging the minterms in each level**

The position/index of each node/minterm in the final layout is computed as follows:

• Create a leveled graph *LG* whose nodes *Vlevel*,*index* correspond to the minterms in each column of the layout previously obtained.

**Figure 11** depicts the resulting *LG*.

**Figure 8.** *The intersection graph G*<sup>0</sup> *and extracting first MIS.*

**Figure 9.** *G*<sup>2</sup> *and extracting the next level in the layout.*

*G*<sup>5</sup> *and extracting the last level in the layout.*

• Rearrange the minterms in each level of *LG* by: Finding a set of nodes (called "mid-cut"), one from each level of *LG*, and a partition of the remaining nodes in each level into a "left-part" and a "right-part" such that:

◦ The number of edges between the left-part and the right-part is minimal.

◦ The number of nodes in the left-part in each level is about the same as the number of nodes in the right-part of that level.

*Field Programmable Reconfigurable Mesh (FPRM) DOI: http://dx.doi.org/10.5772/intechopen.107425*

#### **Figure 11.** *The leveled graph (LG) of the MIS arrangment.*

**Figure 12** depicts finding a mid-cut and the resulting partition to a left-part and right-part. As can be seen, only one edge (the <sup>0</sup> *b*0 ) crosses from the left-part to the right-part in **Figure 12**. In the final FPRM layout, the mid-cut minterms will be stacked one on top of the other. Recursively, the process is applied to the left-part and the right-part until all nodes of *LG* are arranged in 2D.

#### **3.3 Third stage: rearranging the order of literals in each minterm**

	- Replacing one pair of crossing edges with a down-going source-edge.
	- Reordering literals in the two minterms of the crossing edge. Crossing edges between minterms with the same level-index are resolved by rearranging the literals in those minterms.

#### **Figure 12.** *Rearanging the minterms in LG's levels via mid-cuts.*

The next edge selected to be replaced by a source edge is the one with the maximal number of crossings. For example, in **Figure 13** the first edge to be replaced by a source edge is the edge connecting the left <sup>0</sup> *b*<sup>0</sup> in the first level to the right <sup>0</sup> *b*<sup>0</sup> in the fourth level, as this edge cut acrosses seven edges. The process is repeated until there are no crossing edges, as shown in **Figure 14**. Since source edges will be aligned vertically later, crossing with source edges is not counted.

#### **3.4 Fourth stage: completing the alignment**

At this stage, the minterms in each level and the literals in each minterm have been arranged so that no crossing edges exist. Aligning the literals such that all the edges form straight vertical columns leads to the final FPRM layout:


*Field Programmable Reconfigurable Mesh (FPRM) DOI: http://dx.doi.org/10.5772/intechopen.107425*

#### **Figure 13.**

*Expanding current level graph LG to a literal graph TG.*

**Figure 14.** *Final arrangement of literals in each minterm.*

**Figure 15.** *Completing the alignment.*

**Figure 15** illustrates the resulting layout with a total area of about 143, which includes the area for broadcasting the duplicated literals (*c*,*h*,*b*,*k*,*n*,*l*).

Since the FPRMs constructed with literals as control entries into the tristate switches.

#### **4. Realization and results**

Tri-state switch is the natural candidate for the infrastructure logic realization of the final FPRM layout, as other switching devices such as NMOS are not supported by ASIC synthesis tools. For control input of '1', the output of the tri-state is exactly identical to its input, but for control input of '0' the output is high impedance (disconnected or 'z'). As a result, multiple tri-states can share the same output wires. The DNF is simply a ∨*<sup>n</sup> <sup>i</sup>*¼<sup>1</sup>*mi*, where *mi* represent a minterm *mi* <sup>¼</sup> <sup>∧</sup><sup>∣</sup>*mi*<sup>∣</sup> *<sup>j</sup>*¼<sup>1</sup>*aj*, and *aj* is a literal. For each literal, a minterm can be represented conceptually by a list of connected tri-states. The leftmost tri-state outputs '1' or 'z' based on input of '1' and control from the literal. The next tri-state, produces the input according to the next literal value in *mi*, thus performs the operation of ∧<sup>∣</sup>*mi*<sup>∣</sup> *<sup>j</sup>*¼<sup>1</sup>*aj*. Since the output of each minterm *mi* is '1' or 'z', their output wires can be connected directly, performing ∨*n <sup>i</sup>*¼<sup>1</sup>*mi*. **Figure 16** illustrates the FPRM for the DNF of two minterms *M* ¼ ð Þ *a*1∧*a*2 ∨ð Þ *a*2∧*a*3 , where the literal values are represented by thick vertical lines (dark-gray for '1' and light-gray for '0'). Each (potential) literal *aj* in a minterm *mi* residing in row *r* consists of 6 tri-states and one encoder (depicted in the dashed line rectangle). The *cnr*,*<sup>j</sup>* tri-state is connecting the literal value to minterm *mi*, providing that *aj* appears in *mi*. Otherwise, *cnr*,*<sup>j</sup>* will pass the incoming signal to the next literal. The output of *cnr*,*<sup>j</sup>* is the control of *tlr*,*<sup>j</sup>*, transferring the input from *aj*�<sup>1</sup> or disconnecting (producing 'z') given the value of *aj*. According to the encoder's *er*,*<sup>j</sup>* input, *trr*,*<sup>j</sup>* will pass/hold the current signal to *aj*þ1, or *cir*,*<sup>j</sup>* will start a new minterm

*Field Programmable Reconfigurable Mesh (FPRM) DOI: http://dx.doi.org/10.5772/intechopen.107425*

**Figure 16.** *Tri-state configuration of the FPRM.*

calculation, sending '1' to the next literal on the right. Finally, the role of *cor*,*<sup>j</sup>* is to send down the output of the current minterm, providing that it is the rightmost literal in the current minterm. Note that the outputs of the minterms are connected together since these are outputs of tri-states ('1' or 'z'). The output of the DNF is indicated by *M* on the right-bottom side.

Based on the description in [30], the FPRM implementation is compared to an Island FPGA routing architecture. **Figure 17** shows a variant that contains: A logic unit with and/or-gate that is connected to a grid of 4 *bits N vertical buses* 4 *bits N horizontal buses* via two connection units. Any vertical-bus can be connected to any horizontal-bus using a crossbar-like routing unit. Additionally, vertical/horizontal busses can be disconnected, so that bends will not consume the entire bus.

All connections/disconnections and fuse operations are made by a back-to-back pair of tri-state devices, allowing bi-directional signals. ASIC synthesis results obtained with Synopsys Design compiler using a 160 nm cell library. As shown in **Table 1**, the FPRM architecture is 4X faster and more efficient in both power and

**Figure 17.** *The FPGA routing architecture used for the experiments.*


**Table 1.**

*Synthesis results comparing the FPRM vs. the FPGA routing infrastructure.*

area<sup>3</sup> than the FPGA routing infrastructure. Based on the FPRM of **Figure 16** and the assumption that the counting stage requires two cycles. When the expected latency of the FPGA is added, we get about twice as fast performance from the FPRM.

A chain of switches (tri-states) selects whether values should be passed on or not in the circuit we designed. This idea will obviously work faster than a chain of *and* and *or* gates as implemented in FPGA. The fact that there are no switches along the wire that ends with M further accelerates the speed of receiving the output. This is triggered when a value of 1 comes out from one of the minterms. Given that the tri-state consumes power as an ordinary buffer, and the *and=or* operations (∨*<sup>n</sup> <sup>i</sup>*¼<sup>1</sup>*mintermi*) are implemented simply by merging the tri-states outputs, the power consumption is likely to be a function of the number of tri-state buffers. On the other hand, the FPGA needs to be powered for the *and=or* gates as well as the switching systems to connect the logical blocks. Compared to a real FPGA, we have simplified our implementation, but this can only reduce power. Conversely, the FPRM is general, assuming that any Boolean Circuit can be represented as a DNF.

#### **5. Conclusions**

As part of the contribution of this work, we developed the algorithm to evaluate boolean circuits on the RM; a method to compute an optimized FPRM layout; and a method for realizing the FPRM as a tri-state circuit with comparable performance to the conventional FPGA implementation. A tri-state (MOSFET transistor) acts as a switching element in both the FPGA and FPRM. Passing a signal through a chain of *k* switches (that is, a chain of *k* source-drain connected transistors) incurs a quadratic delay of *<sup>k</sup>*<sup>2</sup> <sup>2</sup> *r* � *c* (where *r* is the resistance and *c* is the capacitance of each transistor). As a result of the reconfiguration of the FPRM, a relatively long chain of transistors can be created. Due to the short chains involved in the circuit evaluation problem discussed here, the FPRM will be able to execute the circuit evaluation process fairly quickly. In order to compare the FPRM with the FPGA/ASIC realization of *f x*ð Þ 1, … , *xn* , a SPICE simulation of the FPRM can be carried out. This includes selecting the most appropriate MOSFET transistor technology to minimize signal propagation delays through the FPRM bus.

<sup>3</sup> The area is categorized by Units of Cells (UC), which correspond to two-input NAND gate.

*Field Programmable Reconfigurable Mesh (FPRM) DOI: http://dx.doi.org/10.5772/intechopen.107425*

This research can be furthered by comparing the synthesized results with those obtained from HLS (High Level Synthesis) of the *f x*ð Þ 0, … , *xn*�<sup>1</sup> C-code. Analyzing other functions that can be efficiently computed using the FPRM in *O*ð Þ1 . Finally, study how the partitioning into segments affects the size of the resulting formulas, and build a decision tree that computes *f x*ð Þ 0, … , *xn*�<sup>1</sup> on the FPRM in an even smaller size.

#### **Appendix**

#### **A. List of acronyms and abbreviations**


### **Author details**

Esti Stein<sup>1</sup> \*† and Yosi Ben Asher2†

1 Department of Computer Science, The Academic College of Tel Aviv-Yaffo, Jaffa, Israel

2 Department of Computer Science, Haifa University, Haifa, Israel

\*Address all correspondence to: esterst@mta.ac.il

† These authors contributed equally.

© 2023 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Field Programmable Reconfigurable Mesh (FPRM) DOI: http://dx.doi.org/10.5772/intechopen.107425*

#### **References**

[1] Asher Y, B, Stein E. Evaluation of circuits on the reconfigurable mesh. In: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Rio De Janeiro, Brazil: IEEE; 2019. pp. 71-74

[2] Vaidyanathan R, Trahan J. Dynamic Reconfiguration: Architectures and Algorithms. US: Springer Science & Business Media; 2004

[3] Matias Y and Schuster A. On the Power of a 2-Band Reconfigurable Network. Unpublished Manuscript. 1992

[4] Chen G, Wang B, Li H. Deriving algorithms on reconfigurable networks based on function decomposition. Theoretical Computer Science. 1993; **120**(2):215-227

[5] Nakano K, Wada K. Integer summing algorithms on reconfigurable meshes. Theoretical Computer Science. 1998;**197**: 57-77

[6] Jang J, Park H, Prasanna VK. An optimal multiplication algorithm on reconfigurable mesh. In: Proc. Symp. On Parallel and Distributed Processing. Beverly Hills, CA: IEEE; 1992. pp. 381-391

[7] Jang J, Prasanna VK. An optimal sorting algorithm on reconfigurable mesh. In: Proc. Inter. Parallel Processing Symp. Beverly Hills, CA: IEEE; 1992. pp. 130-137

[8] Elmesbahi M, KJ, Errami A, Bouattane O. Theta(1) time parallel algorithm for finding 2d convex hull on a reconfigurable mesh computer architecture. Global Journal of Computer Science and Technology. 2021;**21**:1-9

[9] Trahan JL, Subbaraman CP, Vaidyanathan R. List ranking and graph algorithms on the reconfigurable multiple machine. In: Proceedings of International Conference on Parallel Processing. NY: Syracuse University, CRC Press; 1993. pp. III–224-III–247

[10] Wang B-F, Chen G-H. Constant time algorithms for the transitive closure and some related graph problems on processor arrays with reconfigurable bus systems. IEEE Transactions on Parallel and Distributed Systems. 1990;**1**(4): 500-507

[11] Miller R, Prasanna-Kumar VK, Reisis DI, Stout QF. Image computations on reconfigurable VLSI arrays. In: Proceedings of the Conference on Vision and Pattern Recognition. Ann Arbor, MI: IEEE; 1988. pp. 925-930

[12] Trahan JL, Vaidyanathan R. Relative scalability of the reconfigurable multiple bus machine. In: Proc. Workshop Reconfigurable Arch. And Algs. Honolulu, HI: IEEE; 1996

[13] Sidhu R, Wadhwa S, Mei A, Prasanna VK. A self-reconfigurable gate array architecture. In: Field-Programmable Logic and Applications: The Roadmap to Reconfigurable Computing. Berlin, Heidelberg: Springer; 2000. pp. 106-120

[14] Hatem ME-B, Vaidyanathan R, Trahan JL, Rai S. On the communication capability of the self-reconfigurable gate array architecture. IPDPS. 2002:**500**: 0152b. IEEE

[15] Hatem ME-B, Vaidyanathan R, Trahan JL, Rai S. On designing implementable algorithms for the linear reconfigurable mesh. PDPTA. 2003: 241-246

[16] Ben-Asher Y, Stein E, Tartakovsky V. Fpga realization of the reconfigurable mesh counting algorithm. Journal of Circuits, Systems and Computers. 2021;**30**(9):2150157

[17] Giefers H, Platzner M. Armlang: A language and compiler for programming reconfigurable mesh many-cores. In: Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium. Rome, Italy: IEEE; 2009. pp. 1-8

[18] Ben-Asher Y, Stein E, Vaidyanathan R. Combining boolean gates and branching programs in one model can lead to faster circuits. In: Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017 IEEE International. Orlando, FL: IEEE; 2017. pp. 184-191

[19] Giefers H, Platzner M. An Fpga-Based Reconfigurable Mesh Many-Core. IEEE Transactions on Computers. 2013; **63**(12):2919-2932

[20] Hauck S, Fry TW, Hosler MM, Kao JP. The chimaera reconfigurable functional unit. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2004;**12**(2):206-217

[21] Xilinx. Logicore Ip Xps Hwicap. Report DS586. San Jose, CA: Xilinx; 2010

[22] Intel Corporation. Intel® Quartus® Prime Pro Edition User Guide: Partial Reconfiguration. San Jose, CA: Intel; 2022

[23] Babu P, Parthasarathy E. Reconfigurable fpga architectures: A survey and applications. Journal of The Institution of Engineers (India): Series B. 2021;**102**(1):143-156

[24] Khan MA, Miyamoto N, Pantonial R, Kotani K, Sugawa S, Ohmi T. Improving multi-context execution speed on

drfpgas. In: Solid-State Circuits Conference, 2006. ASSCC 2006. IEEE Asian. San Francisco, CA: IEEE; 2006. pp. 275-278

[25] Cong J, Xiao B. A novel fpga architecture with memristor-based reconfiguration. In: Nanoscale Architectures (NANOARCH), 2011 IEEE/ACM International Symposium. San Diego, CA: IEEE; 2011. pp. 1-8

[26] Cong J, Xiao B. Fpga-rpi: A novel fpga architecture with rram-based programmable interconnects. IEEE Trans. VLSI Syst. 2014;**22**(4):864-877

[27] Lin T-J, Zhang W, Jha NK. A finegrain dynamically reconfigurable architecture aimed at reducing the fpgaasic gaps. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2014;**22**(12):2607-2620

[28] Ben-Asher Y, Stein E. Adaptive booth algorithm for three-integers multiplication for reconfigurable mesh. Journal of Interconnection Networks. 2016;**16**(1):1-25

[29] Rudell R, Sangiovanni-Vincentelli A. Espresso-mv: Algorithms for multiplevalued logic minimization. Proc. IEEE Custom Integrated Circuits Conf. 1985: 230-234

[30] Xilinx. The Programmable Logic Data Book. 2000. Available from: http:// www.xilinx.com/index.shtml.

## *Edited by B. Santhosh Kumar*

Data integrity is the overall accuracy, completeness, and consistency of data. Data integrity also refers to the safety of data regarding regulatory compliance, such as GDPR compliance, and security. It is maintained by a collection of processes, rules, and standards implemented during the design phase. Data governance is the process of managing the availability, usability, integrity, and security of the data in enterprise systems, based on internal data standards and policies that also control data usage. Effective data governance ensures that data is consistent and trustworthy and does not get misused. This book provides a comprehensive overview of data integrity and data governance and their myriad applications.

Published in London, UK © 2023 IntechOpen © gonin / iStock

Data Integrity and Data Governance

Data Integrity and Data

Governance

*Edited by B. Santhosh Kumar*