**1. Introduction**

Recent advancements in artificial intelligence (AI) and machine learning (ML) have offered an opportunity for utilization of these advanced methodologies in the healthcare industry, while also at the same time improving upon the performance and accuracy benchmarks established by the classical statistical techniques [1]. A variety of ML techniques have been already applied to clinical data to examine a number of conditions and therapeutic areas, their onset, progression, and treatment options. In addition, deep learning algorithms such as convolutional neural network (CNN) have been employed in medical image data to predict disease onset and progression with even greater precision [2–5].

ML algorithms applied to a large amount of structured and unstructured data and combined with available data processing technology have already improved researchers' ability to mine the vast amount of data and assisted in making the patient healthcare decisions [6]. As a result of the high precision and robustness of ML algorithms compared to the classical statistical methods, the insights derived from the application of these methods became important in driving the strategies and processes related to healthcare access, patient care, as well as disease diagnostics, healthcare trend forecasting, drug discovery, etc., thereby, further impacting the ability to reducing medical costs, shortening the time to diagnoses and treatment, and enhancing patients' quality of life and outcomes [7].

Endometriosis is one of the most commonly occurring disorders in women of menstruating age. Tissues, resembling the endometrium lining, grow on the outer part of the uterus and other organs of the pelvic area. The signs and symptoms differ across patients with some individuals experiencing mild symptoms, while others displaying moderate to severe signs. The most common symptoms of endometriosis include pain in the pelvic area, dysmenorrhea, and the inability to have children. Most commonly laparoscopy, surgery under general anesthesia, is performed to confirm the diagnosis of endometriosis [8]. Since it is an invasive procedure, it may not be suitable for all women. Laparoscopy is also quite expensive and women require a confirmation of a variety of indicatives of endometriosis before undergoing this procedure [9]. There are also a number of studies researching biomarkers of endometriosis via assessing endometrial tissue, uterine or menstrual fluids, immunological markers in blood or urine, gene expressions, etc. [10].

The availability of noninvasive methods to predict the likelihood of endometriosis could reduce the diagnostic delays and the number of women undergoing surgery unnecessarily, and thus avoiding unwanted complications and potential trauma [11]. In other research studies, researchers developed a new ensemble technique called GenomeForest that analyzed the gene expression data. The method systematically examined capabilities in classifying endometriosis and control samples, using both transcriptomics and methylomics data [12, 13].

Another research study developed symptom-based models that predicted the likelihood of endometriosis using logistic regression (LR). Symptomatic data including patient demographics, women's past medical history, obstetrics, family history, etc. were collected through a 25-item self-administered questionnaire [14]. Researchers also systematically applied selected ultrasound techniques in the diagnosis of endometriosis and concluded that these methods should remain the first-line procedures in the evaluation of patients with endometriosis [15].

In recent years, researchers aimed at developing CNN-based CAD systems that could classify endometrial lesions images obtained from hysteroscopy and evaluate the diagnostic performance of the model [16]. Their system slightly outperformed gynecologists in classifying endometrial lesion images. With a large number of diagnostic procedures, there is, however, no guaranteed treatment for endometriosis at this time. With an early diagnosis and available medical and surgical options; however, healthcare providers might be able to reduce the risks of potential complications and improve the quality of life for their patients [17, 18].

In the above research studies, researchers used either relatively small samples, or a limited number of variables to develop models or systems to predict the likelihood of endometriosis. The source of data represented mostly clinics and care providers in a controlled environment. There have been a limited amount of research studies performed thus far leveraging US-based patient-level claims data in predicting endometriosis. Claims data consist of the entire patient medical journey, such as diagnosis, procedures, prescriptions, physician, and patient demographics [19, 20]. In this chapter, US patient-level claims datasets at a transactional level were leveraged to develop accurate ML algorithms to predict the likelihood of endometriosis onset. Predicting the probability of endometriosis occurrence via leveraging the diagnosed patients' medical history might benefit both the diagnostics process as well as improved patients' quality of life. The LR and eXtreme Gradient Boosting

(XGB) algorithms were employed to identify the key drivers of endometriosis onset. An earlier version of this chapter is available on the Research Square website. The posting allowed for the dissemination of these important insights with the research community in advance, while at the same time, leveraging the received feedback to enhance the research design in this chapter.
