**2. Methodology overview**

As mentioned earlier, the analysis design was described in the earlier version of the chapter available on the Research Square website. It leveraged the US healthcare claims patient-level database with the period from January 31, 2019 to December 31, 2019 [21]. Patients with a history of medical diagnosis ICD 10 codes for endometriosis were labeled as targets and the remaining patients were assigned as controls. As endometriosis is a women-only condition, female patients 18 and older were selected for the study target cohort. A control cohort, using a propensity matching algorithm, was built as a comparison group to the study targets. Thirty six (36) months of patients' medical history before the first condition event in 2019 were extracted for both cohorts. The US healthcare claims data included diagnosis, medical, procedural, surgical, and hospital codes, as well as medical treatments and therapies prescribed to patients. The dataset was presented at the transactional level to ensure proper capture of medical events longitudinally [21]. Several analytical approaches were employed for the analysis from the rules-based patient qualification criteria to ML algorithms to derive the probability of endometriosis onset. The healthcare claims patient-level dataset considered in the analysis represented healthcare claims sourced for the United States regions only.
