**Abstract**

Improving our understanding of rare disease and developing new therapies can only succeed through global collaboration. Whole genome sequencing is increasingly being deployed to diagnose rare disease, and can be combined with machine-learning tools that analyze patient photos to identify phenotypes. Clinical interpretation of genomes and phenotypic data in rare disease depends on sharing individual patient data internationally. Data sharing is essential in rare disease contexts, to support the diagnosis of patients, recruitment into trials, the development of precision diagnostics and therapies, and clinical trial transparency. The sharing of rich molecular and phenotypic data presents privacy risks for rare disease patients, though many want to see their data made available to improve their care and advance research. Informed consent, access governance, and access technologies are important to realize the benefits of data sharing while mitigating risks. Rare disease patients should be involved in the design of data sharing governance to ensure it responds to their particular needs and preferences.

**Keywords:** rare disease, data sharing, law, ethics, consent, privacy, patient involvement

## **1. Introduction**

There is great interest in adopting data-intensive approaches as part of both care pathways and research in rare disease contexts. Indeed, many rare diseases "have no treatments, are incurable, and have a devastating impact on patients and their families" [1]. One of the first areas where whole-genome sequencing is already demonstrating clinical utility is in helping to provide a genetic diagnosis of individuals with rare disease [2]. Whole-genome sequencing can be a powerful tool to resolve diagnoses for patients with rare disease. Receiving a timely and accurate diagnosis can have a number of direct benefits for patients, "enabling a better understanding of their prognosis, more personalized treatment and tailored management and surveillance" [3]. An ethics report from Canada's health technology assessment body, CADTH, recently concluded that genome sequencing could be effective for patients with unexplained developmental disabilities and multiple congenital abnormalities, if responsibly administered [4].

Data-intensive medicine is powered by data sharing. Data sharing practice and policy has long been a hallmark of genomic research. Many health research funders and journals now require researchers to deposit sequence data in repositories or otherwise make data available to the broader research community. "Data sharing enables researchers to rigorously test the validity of research findings, strengthen analyses through combined datasets, reuse hard-to-generate data, and explore new frontiers of discovery" [5]. Data sharing in health care contexts is also growing in importance. The American College of Medical Genetics and Genomics (ACMG), for example, "advocates for extensive sharing of laboratory and clinical data from individuals who have undergone genomic testing" [6]. Data sharing is expected to have a range of benefits, including improving the diagnoses of other patients, informing the development of diagnostic approaches and tools, and powering research. Some have even argued that sharing minimal information about variant interpretations should be the standard of care in genetics [7].

Data sharing is of particular importance in rare disease contexts. Making a diagnosis is often dependent on many forms of data sharing. Databases of population genetic variation are needed as a reference to help filter out benign variants from test results. Comparison of family trios can help filter candidate disease-causing variants even further. Making a genetic diagnosis available through publications or public genetic variant databases can offer confirmation and can inform and accelerate the diagnosis of future patients. Data-intensive approaches are not limited to genomic data. Facial recognition technologies such as Face2Gene can inform diagnosis based on images of facial morphology [8]. Because the meaning of all this data is not fully understood, data-intensive medicine for rare disease depends on adoption of a learning health system approach. In learning health systems, rich data are generated as part of routine clinical care and are subsequently made available for quality improvement and research. Data sharing between rare disease clinicians, laboratories and scientists can help to refine interpretive techniques and analysis pipelines. Images and videos of facial morphology can be used to train machinelearning algorithms and improve diagnostic tools [9].

The impetus to make genomic and health-related data collected as part of routine clinical care available for research is stronger in the rare disease context, where there are numerous barriers and limited incentives. These data could also serve as a rich resource for natural history studies to better understand the progression of rare diseases, for biomarker discovery, as registries to recruit patients into precision clinical trials, and as a resource for ongoing surveillance of the clinical and cost effectiveness of rare disease therapies.

Biobanking and biobank networks are essential infrastructure for genomic research, which require collection and analysis of biospecimens [10]. Biobanks are organized collections of samples and associated data. Samples have to be collected, stored, and shared following scientific and technical standards in order to be comparable. Biobanks must meet standards of quality and size to be scientifically valuable. Biobank networks are established to enable aggregation of samples and data from geographically disperse patients. This is essential in the rare disease context. Fostering standardization is a complicated challenge for biobank networks. In the era of Big Data, the value of biobanks increasingly lies in their datafication. Datafication includes: (1) the collection of rich associated demographic, health and clinical information about patients, and (2) the analysis of samples to generate molecular, imaging or other forms of biological data. Of course, the quality and compatibility of data collected or generated is also essential to aggregation of data across biobank networks. In order to attract researchers and additional resources to understudied areas like rare disease, biobank networks and datafication are key. This article will focus primarily on data sharing, but the reader should keep in mind that the generation of data from biological samples is an essential step, one that is organizationally and scientifically non-trivial.

*International Data Sharing and Rare Disease: The Importance of Ethics and Patient Involvement DOI: http://dx.doi.org/10.5772/intechopen.91237*

Little will be achieved for rare disease patients without collaboration and international data sharing. No single institution, laboratory, or even country is likely to encounter a sufficient number and diversity of patients with a given rare disease to be able to advance research alone. In the next section, I review some major data-driven initiatives to improve rare disease care and research. These learning health system approaches, powered by international data sharing, are essential to deliver dataintensive medicine for rare disease. International data sharing does raise concerns about the privacy of patients with rare disease. I discuss issues of privacy and consent in data-intensive rare disease medicine and research. However, it is important to note that many rare disease patients want to make their data available to improve their care and to support research. It is therefore important that patients are involved in the development and implementation of data sharing governance to ensure the benefits of data sharing are achieved while managing risks to patient privacy.
