**2.2 Conceptual data model**

*Artificial Intelligence - Applications in Medicine and Biology*

meaning of big data [9].

requests were used to evaluate the model.

acts such as a surgery act and medical technical act.

**2. Methods**

**2.1 Data source**

request. Nevertheless, they are inadequate to build distributed data warehouses and needful to cope with the scalability of storage space and the increase of hospital stay data [5]. In addition, execution of decision requests demeans the performance of data warehouses in DBMS [6]. In the context of the Georges Pompidou European Hospital, the data warehouse is rapidly growing and contains structured (ICD-10 code, etc.) and unstructured (natural language text, etc.) data of stays from different medical specialties. These data are scattered and do not offer direct access to the medical act coding data based on hierarchies of the 10th revision of International Classification of Disease (ICD-10) for diagnostics as well as the French Common Classification of Medical Act (CCMA) for medical procedures. Moreover, the complexity of the classification in a coding process poses a major problem for sub coding (code forgotten) and over coding (addition of codes not justified) of hospital stays [7]. These challenges can be addressed by providing physicians with a big data storage environment dedicated to coding hospital stays whose particularity is to combine their size (volume), frequency of updates (velocity), or diversity (variety) [8]. Volume, variety, and velocity, often referred to as the three Vs, capture the real

Because big data bring many attractive opportunities to knowledge management [9, 10], the aim of this study is to model the coding data of hospital stays extracted from a data warehouse and implement them in a document-oriented NoSQL data model capable of storing a large distributed big data set. This study sought also to design a big data-coding warehouse efficient for medical coding according to a NoSQL document-oriented data model, which will allow to obtain the optimal

This section describes the methodology used to design, implement, and evaluate our data model. Generally, there is a need for a semantic data model to define how data will be structured and related in the database [11], and it is generally acquired that Unified Modeling Language (UML) meets this requirement [12]. Therefore, we first used the formalism of UML [13, 14] to build a conceptual model describing big data of hospital stays. Then, the corresponding rules were used to convert the conceptual model to NoSQL database. In this paper, we choose to focus on document-oriented NoSQL model, namely, MongoDB. This model developed since 2007 by the company of the same name is considered to be the most efficient in terms of performance, for multi-criteria access queries. Finally, the document-oriented model of the big data warehouse was described in JavaScript Object Notation (JSON) format and implemented in the MongoDB database. Subsequently, decision

The CIS grouping software for Georges Pompidou European Hospital was used to extract the base of hospital summary report (HSR) from the digestive, oncologic, and orthopedic surgery units. This base of HSR contains a year of hospital stays coded and validated by the department of medical information. Each HSR has medical benefit entities and temporal and administrative patients' data. The entity of medical benefits represented by the medical act and diagnostics appears in three different types of diagnoses: the associated significant diagnosis (ASD), principal diagnosis (PD), and related diagnosis (RD). Moreover, there are various types of

combination of codes (diagnoses and acts) for any given reason for care.

**120**

The goal of a document-oriented database is the representation of more or less complex information that satisfies the needs of flexibility, richness of structure, etc. Modeling of a big data-coding warehouse is a function of the hospital stay's structuring elements. The hospital stay is a document represented by a pair (key, value) and has a tree-shaped structure. The stay entity is the root of the tree. The entities (key) and values for coming up with a conceptual model (**Figure 1**) were designed from the base of HSR. The main entities were defined as an object class that consisted of stay, patient, movement of the patient between the different clinical unit entities, and terminology. The medical benefit entity is a sub-document of stay entities in which the related act and diagnosis are sub-entities.

The entities have a heterogeneous structure and multiple values. The relationship between them is of cardinality "n,m." To ensure that they are unified, it is important to have a flexible open data schema whose structure can be extensible and is able to adapt to more or less important variations. The rank of medical acts, as well as the rank of diagnoses, can easily complete the corresponding type of diagnosis and acts. The rule used to convert the conceptual model to a document-based model is based on the tree model so that the code associates the structure (the tree) and the content (the text in the sheets).
