**4.1 The data source for post-marketing surveillance**

There are two data sources3 structured for example spontaneous reporting systems (SRSs) and unstructured like medical literature, clinical notes, and social media

<sup>3</sup> Post marketing surveillance "refers to the process of monitoring the safety of drugs once they reach the market" [20].

posts [6]. The ADRs are collected by regulatory authorities through voluntary reporting to SRSs, therefore, under-reporting is the main drawback of these sources, therefore, it is important to use more data sources to comprehensively collect the safety information [6]. The supervised and semi-supervised machine learning techniques helped in mining other data sources, such as clinical notes, medical literature, and social media [4, 6, 9, 10, 12–14].

### **4.2 Improve the accuracy and time efficiency**

The PV sources are dynamic, which means it is periodically updated over time, these sources become large beside their unstructured characteristics [6], and the accuracy of using ML techniques in the detection or prediction of ADRs was between 74% to 90% [6, 9, 12], the precision was between 0.7 and 0.9 [10, 11], furthermore, the ML model spent 48 hr. to finish the ICSR identification task from social media compared to an estimated 44,000 hr. spent by human experts with accuracy 74% [12].

## **4.3 ADRs prediction**

Predicting ADRs in the early stages will enhance drug safety activities and reduce the financial cost, for example, saving the cost of hospitalization due to the ADRs [21], the ML techniques were used to predict the ADRs from the social media posts, F score=0.9 [14].

## **4.4 Limitation of the review**

Only two databases are considered, the scoping review is not like the systematic review, therefore, it is expected to miss some relevant articles.
