**2.5 Evaluation**

To evaluate our model, framework MongoDB was deployed in only one data node. The system resources used were (8Go RAM, processor i5–4 heats 2 Terabytes hard disk). MongoDB is a key-value system using document-oriented storage. The volume of data received by one node for the test is 1.6 million documents representing 1 year of the hospital stays. The volume of documents can be worm at 40 times the initial volume. These data are divided into two major groups: encoded and rejected data. The interest of the rejected data is to expand the database, introduce the noise, and have a case of associations of diagnostic code to avoid. Two evaluations were performed. The first one was performed by calculating the model performance (multi-criteria request (Query #2), mono's (Query #1) elapsed time), and the second one was performed by calculating the precision and recall of the model. The initial step was to create two main groups of requests arranged by dimensionality and selectivity validated by the business process, which can be used in a real context. Dimensionality is the value of different keys of the entities ("typediag" and/or "typeact"). Selectivity refers to the degree of data elimination through an aggregate function on the search attribute (code = "CCMA code or ICD-10 code").

**123**

of request.

**3. Results**

multi-criteria queries.

elapsed time of this request is 90 ms (**Table 1**).

ography at least 12 leads" (**Table 2**).

elapsed time represents the execution time of request.

*Using Artificial Intelligence and Big Data-Based Documents to Optimize Medical Coding*

correct associations to the number of all associations to be corrected.

The document-oriented model of the big data-coding warehouse was implemented in the MongoDB database. Ten separate single-criteria queries were executed with an elapsed time between 75 and 90 milliseconds (ms), while an elapsed time between 80 and 110 ms was obtained during the execution of 10 separate

Query #1 requests the data warehouse to display all association codes, in which the diagnosis code is equal to "Z092" in the ICD-10 coding system, corresponding to "the pharmacotherapy for other conditions." Associated codes obtained are the associated diagnostic code "E780" used to code "pure hypercholesterolemia" and the act code "EBQM002" used to code "Doppler ultrasonography of extracranial cervicocephalic arteries, with Doppler ultrasound of lower extremity arteries." The

Query #2 requests the data warehouse to display all association codes, in which the type of diagnosis is the main diagnosis and the diagnosis code is equal to "I51.4" in the ICD-10 coding system, corresponding to "cardiomegaly." The response time obtained with no index is approximately 1900 ms. The response time obtained with a diagnostic code indexed is approximately 110 ms. The associated codes are "I080" corresponding to "disorders of mitral and aortic valves" and "D721" corresponding to "eosinophilia," and associated act is "DEQP003" corresponding to "electrocardi-

**Table 2** presents results of five sequences of request of Query #2 where requested code represents the code to be queried and its typologies, associated code represents obtained associated codes (diagnostic and act), typology represents different types of diagnostic, and elapsed time represents the execution time

**Table 1** presents results of five sequences of request of Query #1 where requested code represents the code to be queried, associated code represents the obtained associated codes, typology represents different types of diagnostic, and

Query #1 is expressed as follows: for a given diagnosis or act codes, find all associated code and their corresponding type (typediag, typeact). Three functional use cases were used. The first one concerns the specific case of Z codes as part of the entire ICD-10 code set. Z codes are diagnosis codes used for situations where patients Do not have a known disorder and required a related code to precise the real medical. The second one concerns the code of another chapter of ICD-10, and the last concerns the CCMA act code. Query #2 is expressed as follows: for a given diagnosis and act codes, find all associated codes and their corresponding type (typediag, typeact). Second, through cross-validation, random evaluation is used in measuring the impact of a distributed data storage device on medical procedures' coding optimization. This method provides an opportunity for inserting the process of selection codes of medical acts in a random draw. There are two groups that arise from this draw: an oriented test group with 20% sampling size and 50% of the volume of the data warehouse. The second is a control group with a 50% sampling size and 80% of the data warehouse volume of coding derived by the medical information department. Ten samplings with the same percentage were generated to perform the tests. We used the request previously described to compute the precision and recall of the model. Precision is the ratio between the number of correct associations and the total number of associations, and the ratio is the number of

*DOI: http://dx.doi.org/10.5772/intechopen.85749*

## *Using Artificial Intelligence and Big Data-Based Documents to Optimize Medical Coding DOI: http://dx.doi.org/10.5772/intechopen.85749*

Query #1 is expressed as follows: for a given diagnosis or act codes, find all associated code and their corresponding type (typediag, typeact). Three functional use cases were used. The first one concerns the specific case of Z codes as part of the entire ICD-10 code set. Z codes are diagnosis codes used for situations where patients Do not have a known disorder and required a related code to precise the real medical. The second one concerns the code of another chapter of ICD-10, and the last concerns the CCMA act code. Query #2 is expressed as follows: for a given diagnosis and act codes, find all associated codes and their corresponding type (typediag, typeact). Second, through cross-validation, random evaluation is used in measuring the impact of a distributed data storage device on medical procedures' coding optimization. This method provides an opportunity for inserting the process of selection codes of medical acts in a random draw. There are two groups that arise from this draw: an oriented test group with 20% sampling size and 50% of the volume of the data warehouse. The second is a control group with a 50% sampling size and 80% of the data warehouse volume of coding derived by the medical information department. Ten samplings with the same percentage were generated to perform the tests. We used the request previously described to compute the precision and recall of the model. Precision is the ratio between the number of correct associations and the total number of associations, and the ratio is the number of correct associations to the number of all associations to be corrected.
