**4. Discussion and conclusion**

*Artificial Intelligence - Applications in Medicine and Biology*

The list of associated codes present in **Tables 1** and **2** is not exhaustive; it can be

**Learn/control DB (%) Precision (%) Recall (%)** Mono-criteria 50/50 40 25 Mono-criteria 80/20 92 87 Multi-criteria 80/20 80 70

**Requested code/typology Associated code Typology Elapsed time (ms)**

ACT

**Associated code Typology Elapsed time (ms)**

87

I49.9/dp Not associated code No typology 75 Z09.8/dp Q21.3 (PD, RD) or (PD, ASD) 85 (Z09.8/dp) and (N185/das) Z992.1, D638, and JVJB001 PD, RD, and ASD 89

I49.9 Not associated code No typology 75 Z09.8 D35.0 (PD, RD) or (PD, ASD) 85 Z09.8 EBQM002, E78.0 PD, RD, ASD 90 E660.0 I10, N17.9 PD, RD, ASD 87 DEQP003 Z864, I708, E70.8 ACT, PD, RD, ASD 90

E26.0/dr Z71.3, D35.0, and DZQM006 PD, RD, ASD, and/or

EQQP008 N18.5, I70.2, and Z098 ACT, PD, RD, and ASD

*Associations of diagnosis codes according to their typology and their elapsed time.*

*Associations of diagnosis codes according to their typology and their elapsed time.*

These results show that the main coding rules have been respected. The associated diagnosis must always be coupled to Z code declared as the main diagnostic and associated acts linked to disease declared as main diagnostic. The presence of related diagnostic demonstrates the quality of associated code containing in the

Query #1 and Query #2 were used to compute the precision and the robustness

Based on the observation, the least selective (more lines selected) queries required a long execution time. According to our evaluation, we observed that the system is bijective and corresponds to the reality of the coding of clinical activity of HEGP. This suggests that we can, from the document-oriented model, recover the initial encoding data and vice versa. In this regard, it is apparent that everything that has been set in the big data warehouse corresponds to the reality of the patient. The data warehouse gives the possibility of being more aware of the coding per-

Based on the requests defined above and executed using the learn/control database, **Table 1** shows the results of the evaluation provided by the big data

extended to more than 100. We make the choice to present a small number.

**124**

data warehouse.

**Table 2.**

**Table 3.**

**Table 1.**

**Requested code**

of the model (**Table 3**).

*Evaluation results of the big data model.*

formed in the previous year.

This study investigated the process for implementation of a big data-coding warehouse for coding support in a document-oriented NoSQL system. We observed that flexibility is the particularity of this model as it allows inserting redundancy into the database. A stay with four ASD codes and one PD code is split into four documents. The duplicated line is high when there are more associated diagnoses and medical acts. Therefore, presenting one entity is easier in the entire document. The case of "stay" with only a primary diagnosis, one or more associated diagnosis, and/or without a medical act can be easily inserted in the database without the need to implement a generic code to replace the missing one. In most cases, the addition of a generic code is meant to let the physician understand that there is no need of associating a diagnostic code used with the medical act. This system is advantageous since there is complete information because the issue of missing data is solved. Therefore, the information can be handled without any need to join. Only one reading is needed to get all information. If there is no link between the documents, it is possible to arrange the collection without any challenge. This is an essential part of the construction of a big data-coding warehouse. However, one of the disadvantages associated with this model is that the hierarchization of access does not allow access to ICD-10 code information without going through the type of medical benefit, in addition to the redundancy; there are two pseudorandom choices that provide effective results, while the hazardous choice (50/50%) produces wrong results. To generate huge volumes of data, we used the same "HSR base" and swapped the name ICD-10 by the concept "Obicd10'' and CCMA by the concept "Obccam" (Ob as rejected). The rejected data were used to show that, in the optimization process of coding, we learn about as many accepted cases as rejected cases. The major interest in building the coding aid data warehouse is to use the huge volumes of coding information from a large number of hospitals because it is more exhaustive. The model that was implemented allows obtaining an optimal combination of codes (diagnosis, acts) for a given reason for care. Because of the way they are structured, relational databases usually scale vertically—a single server has to host the entire database to ensure reliability and continuous availability of data. This gets expensive quickly, places limits on scale, and creates a relatively small number of failure points for database infrastructure. It's why we propose our model to solve this problem. Indeed, our coding aid data warehouse scales horizontally—several servers host the entire database, allow grouping of all the relevant data for the diagnosis and medical coding in a generic way, to enrich the coding data by crossing the coding information from other hospital sources and to allow for easier exploration of the coding code associations. It's a system that is subject to expertise. This fact Does not remove the richness of Clinical Data Warehouse (CDW). Our contribution consists of building a specific CDW-based document to propose an "in silico" test framework to enhance the efficacy of algorithms used to optimize coding as an example of algorithm based on manual decision-making paper [15] and various natural language processing (NLP) tools associated with the EHR in−/outpatient summary reports [16].

*Artificial Intelligence - Applications in Medicine and Biology*
