**1. Introduction**

The World Health Organization's (WHO) definition of pharmacovigilance (PV) is "the science and activities relating to the detection, assessment, understanding, and prevention of adverse effects or any other drug-related problem" [1]. It is difficult to get comprehensive safety characteristics of the drug during the drug development phase because the clinical trials are conducted in a controlled environment in a limited patients number and for a specific duration, however, after the drug marketing, it will be prescribed to thousands of patients in different age groups, therefore, it is obligatory that "safety of all medicines to be monitored throughout their use" [2].

In 2018, the WHO global database of individual case safety reports (VigiBase) has 17 million ADRs reports [3] and the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) has more than 10 million of which 5 million are serious ADRs and one million caused the death [4]. These databases use spontaneous reporting to collect ADRs, nevertheless, the known criticisms of spontaneous

reporting are under-reporting and uncertainty of the causality assessment1 [5], therefore, there is a need to find other methods to predict ADRs and to efficiently analyze the available data not only from the structured data from spontaneous reporting databases (SRS) but also from other data sources, such as electronic health records (EHR), clinical narratives, medical literature, social media, and health forums [6].

The PV data sources are dynamic, diverse, structured, and unstructured, accordingly, the manual detection of ADRs and processing of PV data are time-consuming, therefore, the automation of ADRs/signal detection and reports processing will be efficient [7].

Machine learning (ML) is a robust data analysis technique that has statistical and probabilistic techniques to develop models that automatically learn from data and consequently help to accurately identify and predict the source data [8]. ML algorithms are supervised, unsupervised, and semi-supervised learning. In supervised learning, a known label is used to train a model to predict labels from new data, while the unsupervised mathematical methods are used to cluster data, and semisupervised uses models based on both [8].

This scoping review aims to explore the current applications of machine learning techniques on pharmacovigilance (PV) activities; therefore, the research questions are:

