**1. Introduction**

Care processes are becoming increasingly complex with the growth of technological, biological, and genetic knowledge [1]. This leads naturally to a subdivision of medical specialties, with increasing patient's care costs. Such subdivision, in view of the multiplicity of pathologies of certain patients, complicates diagnosis coding measures, regardless of the coding plan used in clinical and medical-economic settings. It was assumed that incorrect coding of medical information caused, on average, a 14.7% hospital revenue loss per patient [2]. More than three-quarters of these errors are caused by clinicians. These numbers are explained by the intricacy of the classifications and nomenclatures used to code medical acts [3]. In addition, they make it difficult for medical officers to understand the process involved in the coding of this activity [4]. In this context, the establishment of clinical information systems in hospitals can be a key factor in optimizing the coding process of medical information and therefore the expenses of healthcare. The successful establishment of the latter is not done without challenge. In fact, its deployment is based on the data warehouse, which plays an important role in the collection and analysis of large volume of data for decision support. Generally, a data warehouse is often implanted under the relational database management systems (DBMS). The latter obtrudes itself by the richness of their functionality and performance of their

request. Nevertheless, they are inadequate to build distributed data warehouses and needful to cope with the scalability of storage space and the increase of hospital stay data [5]. In addition, execution of decision requests demeans the performance of data warehouses in DBMS [6]. In the context of the Georges Pompidou European Hospital, the data warehouse is rapidly growing and contains structured (ICD-10 code, etc.) and unstructured (natural language text, etc.) data of stays from different medical specialties. These data are scattered and do not offer direct access to the medical act coding data based on hierarchies of the 10th revision of International Classification of Disease (ICD-10) for diagnostics as well as the French Common Classification of Medical Act (CCMA) for medical procedures. Moreover, the complexity of the classification in a coding process poses a major problem for sub coding (code forgotten) and over coding (addition of codes not justified) of hospital stays [7]. These challenges can be addressed by providing physicians with a big data storage environment dedicated to coding hospital stays whose particularity is to combine their size (volume), frequency of updates (velocity), or diversity (variety) [8]. Volume, variety, and velocity, often referred to as the three Vs, capture the real meaning of big data [9].

Because big data bring many attractive opportunities to knowledge management [9, 10], the aim of this study is to model the coding data of hospital stays extracted from a data warehouse and implement them in a document-oriented NoSQL data model capable of storing a large distributed big data set. This study sought also to design a big data-coding warehouse efficient for medical coding according to a NoSQL document-oriented data model, which will allow to obtain the optimal combination of codes (diagnoses and acts) for any given reason for care.
