**Figure 3.**

*Cheminformatics and Its Applications*

and molecular weight (MW), or others such as molar refractivity, are important physicochemical parameters for quantitative structure-activity relationship (QSAR) analysis. These molecular descriptors are based on Lipinski's rule and Verger's rule regarding the prediction of the pharmacological similarity of orally active pharmacological potential [65–67]. The statistical analysis of the physicochemical proper-

**2.3 3D visualization of chemical space of compounds with antimalarial activity**

PCAs were done with MOE software [64], and the dominant characteristics are expressed as covariance and visualized with the corresponding 2D or 3D graphic score plot with DataWarrior program v. 5.0 [69]. **Figures 2**–**8** showed the distribution of different compounds with antimalarial activities in the chemical spaces. In **Figures 2**–**8** we observed that NPs, drugs, and synthetic compounds occupy, in general, similar chemical space and are overlapping in most of the evaluated databases.

Three binary molecular fingerprints were calculated with RStudio package rcdk: Extended connectivity fingerprints with diameter 4 (ECFP-4) for similarity searching, molecular access system (MACCS) keys of 166 bits for determining similarity and molecular diversity, and PubChem keys of 881 bits for encoding molecular fragment information [42–44]. The similarity of fingerprints by structural pairs of compounds was calculated with the Tanimoto coefficient and analyzed with the cumulative distribution function (CDF). This approach has been used to calculate,

**Figures 9**–**11** show the CDFs of the pairwise similarity of the different data sets evaluated with Tanimoto coefficient and ECPF-4, MACCS keys, and PubChem

measure, and represent the molecular variety of compound data sets [23].

ties was realized with RStudio Software 1.0.136 AGPL [68].

**2.4 Molecular diversity based on fingerprints**

fingerprints, respectively.

**86**

**Figure 2.**

*3D visualization of the chemical space of natural product databases.*

*3D visualization of the chemical space of synthetic compounds.*

**Figures 9**–**11** provide information on the structural diversity of the six databases. Similar approach has been previously published [23]; the curves obtained with ECFP-4 did not prove to be a suitable fingerprint representation for these data sets. In the three similarity graphs based on fingerprints, it is shown that the database of natural products with antimalarial activity, OMS, and MMV has the lowest molecular diversity, while GSK DB was the most diverse.

In **Tables 2**–**4**, the statistical values of the pairwise Tanimoto similarity with the data sets analyzed are shown. In these tables, CHEMBL and DrugBank databases are excluded from our analysis, due to the small amount of data.

*3D visualization of the chemical spaces of natural products and GNF DBs.*

**Figure 5.** *3D visualization of the chemical spaces of natural products and TCMDC DBs.*

#### **Figure 6.**

*3D visualization of the chemical spaces of natural products and DBK DBs.*
