**Computational Studies and Biosynthesis of Natural Products with Promising Anticancer Properties**

Aurélien F.A. Moumbock, Conrad V. Simoben, Ludger Wessjohann, Wolfgang Sippl, Stefan Günther and Fidele Ntie‐Kang

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/67650

#### **Abstract**

We present an overview of computational approaches for the prediction of metabolic pathways by which plants biosynthesise compounds, with a focus on selected very prom‐ ising anticancer secondary metabolites from floral sources. We also provide an overview of databases for the retrieval of useful genomic data, discussing the strengths and limita‐ tions of selected prediction software and the main computational tools (and methods), which could be employed for the investigation of the uncharted routes towards the bio‐ synthesis of some of the identified anticancer metabolites from plant sources, eventually using specific examples to address some knowledge gaps when using these approaches.

**Keywords:** anticancer, biosynthesis, computational prediction, natural products, plant metabolism

### **1. Introduction**

An immense number of secondary metabolites (SMs) exist in nature, originating from plants, bacteria, fungi and marine life forms, serving as drugs for the treatment of many life‐threat‐ ening diseases, including cancer [1–4]. Taxol, vinblastine, vincristine, podophyllotoxin and camptothecin, for example, are typically well‐known drugs used in cancer treatment, which are of plant origin. The search for drugs against cancer has often resorted to plants and marine life for lead compounds. To illustrate this, Newmann and Cragg published a recent study in which it was shown that ~49% of drugs used in cancer treatment were either natural products

© 2017 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

(NPs) or their derivatives [5]. We would henceforth refer to SMs and NPs interchangeably, since NPs are the products of secondary (or specialised) metabolism, as opposed to primary metabolism, which results in molecules playing a key role in physiological processes of the organism and are thus necessary for the plant's survival. It should be mentioned that SMs are important for the plant's defence against attacks by other organisms. Several efforts have also been made towards the collection of data on naturally occurring plant metabolites showing anticancer properties. As an example, Mangal and co‐workers published the naturally occur‐ ring plant‐based anti‐cancer compound activity‐target database (NPACT), containing about 1,500 NPs [6]. In addition to the experimentally verified *in vitro* and *in vivo* data for these NPs, the authors also include biological activities (in the form of IC50s, ED50s, EC50s, GI50s, etc.), along with physical, elemental and topological properties of the NPs, the tested cancer types, cell lines, protein targets, commercial suppliers and drug likeness of the NPACT compounds. A similar effort was published the following year, for NPs from African flora, resulting in a dataset of about 400 compounds, named AfroCancer [7]. A further study showed that the NPACT and AfroCancer datasets showed little intersection, thus providing us a combined dataset of about 2,000 NPs [8]. The anticancer properties of some of the most promising AfroCancer compounds have been described in detail in recent reviews [9–12]. Further cura‐ tion of data from Northern African species has recently resulted in the Northern African Natural Products Database (NANPDB), a web accessible and completely downloadable vast database of NPs, with a significant proportion of anticancer metabolites [13]. The NANPDB effort was founded on the observation that the Northern Africa region is particularly highly endowed with diverse vegetation types, serving as a huge reservoir of bioactive natural products [14–16].

For decades, NPs were identified exclusively by using chemical identification based on bio‐ activity‐guided screening approaches. Recently, it has been postulated that genomics and bioinformatics would transform the approach of natural products discovery, even though genome mining has had only little influence on the advancement of natural product discovery until now [17]. Several algorithms have been developed for the mining of the (meta)genomic data, which continue to be generated. Computational methods and tools for the identification of biosynthetic gene clusters (BGCs, which are physically clustered groups of a few genes in a particular genome that together encode a biosynthetic pathway for the production of a specialised metabolite) in genome sequences and the prediction of chemical structures of their products have been developed [18]. BGCs for SM biosynthetic pathways are impor‐ tant in bacteria and filamentous fungi, with examples being recently discovered in plants [19, 20], although some metabolic processes in plants, for example, the thalianol pathway for triterpene synthesis in *Arabidopsis thaliana* has been suggested to be controlled by operon‐like (clusters of unrelated) gene clusters [21]. This, coupled with the rapid progress in sequencing technologies has led to the development of new screening methods, which focus on whole genome sequences of the organisms producing the NPs. Genome mining approaches for NP discovery basically focus on:


Computational Studies and Biosynthesis of Natural Products with Promising Anticancer Properties http://dx.doi.org/10.5772/67650 259

(NPs) or their derivatives [5]. We would henceforth refer to SMs and NPs interchangeably, since NPs are the products of secondary (or specialised) metabolism, as opposed to primary metabolism, which results in molecules playing a key role in physiological processes of the organism and are thus necessary for the plant's survival. It should be mentioned that SMs are important for the plant's defence against attacks by other organisms. Several efforts have also been made towards the collection of data on naturally occurring plant metabolites showing anticancer properties. As an example, Mangal and co‐workers published the naturally occur‐ ring plant‐based anti‐cancer compound activity‐target database (NPACT), containing about 1,500 NPs [6]. In addition to the experimentally verified *in vitro* and *in vivo* data for these NPs, the authors also include biological activities (in the form of IC50s, ED50s, EC50s, GI50s, etc.), along with physical, elemental and topological properties of the NPs, the tested cancer types, cell lines, protein targets, commercial suppliers and drug likeness of the NPACT compounds. A similar effort was published the following year, for NPs from African flora, resulting in a dataset of about 400 compounds, named AfroCancer [7]. A further study showed that the NPACT and AfroCancer datasets showed little intersection, thus providing us a combined dataset of about 2,000 NPs [8]. The anticancer properties of some of the most promising AfroCancer compounds have been described in detail in recent reviews [9–12]. Further cura‐ tion of data from Northern African species has recently resulted in the Northern African Natural Products Database (NANPDB), a web accessible and completely downloadable vast database of NPs, with a significant proportion of anticancer metabolites [13]. The NANPDB effort was founded on the observation that the Northern Africa region is particularly highly endowed with diverse vegetation types, serving as a huge reservoir of bioactive natural

For decades, NPs were identified exclusively by using chemical identification based on bio‐ activity‐guided screening approaches. Recently, it has been postulated that genomics and bioinformatics would transform the approach of natural products discovery, even though genome mining has had only little influence on the advancement of natural product discovery until now [17]. Several algorithms have been developed for the mining of the (meta)genomic data, which continue to be generated. Computational methods and tools for the identification of biosynthetic gene clusters (BGCs, which are physically clustered groups of a few genes in a particular genome that together encode a biosynthetic pathway for the production of a specialised metabolite) in genome sequences and the prediction of chemical structures of their products have been developed [18]. BGCs for SM biosynthetic pathways are impor‐ tant in bacteria and filamentous fungi, with examples being recently discovered in plants [19, 20], although some metabolic processes in plants, for example, the thalianol pathway for triterpene synthesis in *Arabidopsis thaliana* has been suggested to be controlled by operon‐like (clusters of unrelated) gene clusters [21]. This, coupled with the rapid progress in sequencing technologies has led to the development of new screening methods, which focus on whole genome sequences of the organisms producing the NPs. Genome mining approaches for NP

• identifying the genes of the organism involved in the biosynthesis of the NPs, • identifying the metabolic pathways by which the NPs are biosynthesised and

• predicting the products of the identified pathways (**Figure 1A**).

products [14–16].

258 Natural Products and Cancer Drug Discovery

discovery basically focus on:

**Figure 1.** (A) Summary of genome mining approaches for the discovery of SMs and (B) classification of tools by applicability domain.

The four main strategies that are mostly employed to identify such pathways are based on pro‐ cesses involved in the production of plant secondary metabolites, for example, physical clus‐ tering, co‐expression, evolutionary co‐occurrence and epigenomic co‐regulation of the genes [22–25]. Such approaches have been successfully applied for the investigation of fungal and microbial metabolites [26–28]. Since the discovery of the first gene cluster for secondary metabo‐ lism in *Zea mays*, the corn species [29], BGCs for plant secondary metabolism have become an emerging theme in plant biology [30]. It is even believed that synthetic biology technologies will eventually lead to the effective functional reconstitution of candidate pathways using a variety of genetic systems [25]. A knowledge of BGCs and their manipulation is therefore important in understanding how to activate a number of 'silent' gene clusters observed from the investigation of whole‐genome sequencing of organisms. This would make available a wealth of new chemical entities (NCEs), which could be evaluated as drug leads and biologically active compounds [20].

This chapter aims at discussing the metabolic pathways by which plants biosynthesise com‐ pounds with anticancer activities, with a focus on selected very promising anticancer SMs from the African flora. We also aim to provide an overview of computational tools, which have been used to predict metabolic pathways and eventually address knowledge gaps when using the former. Additionally, we will present some databases for the retrieval of useful genomic data, discuss the strengths and limitations of selected computational (prediction) tools, which could be employed for the investigation of the uncharted routes towards the biosynthesis of some of the identified anticancer metabolites from plant sources, with specific examples. It is believed that properly addressing knowledge gaps that exist would lay the foundation for proper future investigations.

### **2. Natural products and plant genomic data**

Genome data mining indicates that the vast majority of plant‐based NPs have not yet been discovered [24, 25]. In addition, SMs are normally produced only at later growth stages of plant metabolism and are frequently found only at low concentrations within complex mix‐ tures in plant extracts, due to several factors. Some of these factors include physiological variations, geographic variations, environmental conditions and genetic factors [25, 31, 32]. The aforementioned factors are the main drawbacks in the isolation and purification of NPs in meaningful quantities for either research or commercial aims. Nowadays, BGCs can be inves‐ tigated using computational methodologies and used to predict the NPs present in microbial, fungal and floral matter [18, 20, 33, 34]. It is current knowledge that more than 70 genome sequences for several plant species have been made available, along with a wealth of tran‐ scriptome data [25]. However, the interpretation of such data, for example, the translation of predicted sequences into enzymes, pathways and SMs remains challenging. Advances in bioinformatics and synthetic biology have permitted the cheap and efficient overproduction of secondary metabolites of medicinal interest in heterologous (non‐native) host organisms by reengineering of BGCs [35]. This is carried out through reengineering of BGCs as well as the activation of silent BGCs to yield unreported natural products of the target chemical space [17, 36], for example, an engineered *Escherichia coli* strain was used as the heterologous host organism for the production of taxadiene (a vital precursor of paclitaxel, an anticancer agent isolated from the bark of *Taxus brevifolia*), a precursor of the anticancer agent taxol [37]. In this way, quite a number of interesting SMs of plant origin (e.g. resveratrol, vanillin, cono‐ lidin, etc.) have been objects of pathway engineering in bacteria, yeast and other plants [38]. Thus, chemical libraries of diverse and novel hybrid natural products analogues can now be generated through combinatorial biosynthesis by manipulation of biosynthetic enzymes [39], for example, several analogues of the antibiotic erythromycin were obtained *via* combi‐ natorial biosynthesis [40]. Such bioengineered libraries of 'unnatural' natural products show promises in drug discovery campaigns against multidrug‐resistant cancer cells.

### **3. Some database resources for retrieving secondary metabolism prediction information**

A summary of databases for retrieving information on BGCs is provided in **Table 1**. A major‐ ity of them focus on microbial BGCs, for example, ClusterMine360, ClustScan, DoBISCUIT, IMG‐ABC and the Recombinant ClustScan Database. Details on the utility of the aforemen‐ tioned databases have been provided in excellent recent reviews [26–28, 53]. Further efforts towards the construction of plant‐based BGC and genomic databases include those of the Medicinal Plants Genomics and Metabolomics Resource consortium [47]. This effort has been focused on 14 medicinal plants and includes a BLAST search module, a genome browser, a genome putative search function tool and transcriptome search tools. While the entire data‐ base is available for download, similar efforts from the Plant Metabolic Network (PMN) have the advantage of having included several plant metabolic pathway databases, mostly among food crops [49, 50]. The PMN, for example, currently houses one multi‐species reference data‐ base called PlantCyc and 22 species/taxon‐specific databases, providing access to manually curated and/or computationally predicted information about enzymes, pathways, and more for individual species.


plant metabolism and are frequently found only at low concentrations within complex mix‐ tures in plant extracts, due to several factors. Some of these factors include physiological variations, geographic variations, environmental conditions and genetic factors [25, 31, 32]. The aforementioned factors are the main drawbacks in the isolation and purification of NPs in meaningful quantities for either research or commercial aims. Nowadays, BGCs can be inves‐ tigated using computational methodologies and used to predict the NPs present in microbial, fungal and floral matter [18, 20, 33, 34]. It is current knowledge that more than 70 genome sequences for several plant species have been made available, along with a wealth of tran‐ scriptome data [25]. However, the interpretation of such data, for example, the translation of predicted sequences into enzymes, pathways and SMs remains challenging. Advances in bioinformatics and synthetic biology have permitted the cheap and efficient overproduction of secondary metabolites of medicinal interest in heterologous (non‐native) host organisms by reengineering of BGCs [35]. This is carried out through reengineering of BGCs as well as the activation of silent BGCs to yield unreported natural products of the target chemical space [17, 36], for example, an engineered *Escherichia coli* strain was used as the heterologous host organism for the production of taxadiene (a vital precursor of paclitaxel, an anticancer agent isolated from the bark of *Taxus brevifolia*), a precursor of the anticancer agent taxol [37]. In this way, quite a number of interesting SMs of plant origin (e.g. resveratrol, vanillin, cono‐ lidin, etc.) have been objects of pathway engineering in bacteria, yeast and other plants [38]. Thus, chemical libraries of diverse and novel hybrid natural products analogues can now be generated through combinatorial biosynthesis by manipulation of biosynthetic enzymes [39], for example, several analogues of the antibiotic erythromycin were obtained *via* combi‐ natorial biosynthesis [40]. Such bioengineered libraries of 'unnatural' natural products show

promises in drug discovery campaigns against multidrug‐resistant cancer cells.

**3. Some database resources for retrieving secondary metabolism** 

A summary of databases for retrieving information on BGCs is provided in **Table 1**. A major‐ ity of them focus on microbial BGCs, for example, ClusterMine360, ClustScan, DoBISCUIT, IMG‐ABC and the Recombinant ClustScan Database. Details on the utility of the aforemen‐ tioned databases have been provided in excellent recent reviews [26–28, 53]. Further efforts towards the construction of plant‐based BGC and genomic databases include those of the Medicinal Plants Genomics and Metabolomics Resource consortium [47]. This effort has been focused on 14 medicinal plants and includes a BLAST search module, a genome browser, a genome putative search function tool and transcriptome search tools. While the entire data‐ base is available for download, similar efforts from the Plant Metabolic Network (PMN) have the advantage of having included several plant metabolic pathway databases, mostly among food crops [49, 50]. The PMN, for example, currently houses one multi‐species reference data‐ base called PlantCyc and 22 species/taxon‐specific databases, providing access to manually curated and/or computationally predicted information about enzymes, pathways, and more

**prediction information**

260 Natural Products and Cancer Drug Discovery

for individual species.


**Table 1.** Summary of currently available database resources for retrieving genomic data for biosynthesis prediction. It provides a broad network of plant metabolic pathway databases that contain curated information from the literature and computational analyses about the genes, enzymes, compounds, reactions and pathways involved in primary and secondary metabolism in the included plant species. The PlantCyc database also provides access to manually curated or reviewed information about shared and unique metabolic pathways present in over 350 plant species. On the other hand, Plant Reactome is a pathway database for several crops and model plant species, making use of a framework of a eukaryotic cell model. Currently, it uses rice as a reference species and gene homology‐based pathway projections have been made to 62 plant species [51].

### **4. Some computational tools for the analysis of genomic data and specialised metabolism prediction**

Some computational tools for biochemical pathway prediction have been summarised in excellent reviews [54]. We have provided a more detailed summary of the main tools that could be useful in analysing plant and microbial genomic data for metabolism prediction in **Table 2**. Some of the tools are designed for the detection and analysis of specialised metabo‐ lism in microbes (e.g. antiSMASH, CompGen, GNP, PRISM and WebAUGUSTUS). Others are specially designed for plant metabolism prediction or may only include data for some specific organisms (e.g. AraNet, MADIBA, miP3v2, PlantClusterFinder, SAVI and WikiPathways for plants), while others are more general tools, useful for both microbial and plant metabolism prediction and BGC analysis (e.g. E‐zyme, KEGG, PathPred and PathComp) and others are more useful for developers (e.g. Geneious, OptFlux, PathVisio and Pathway GeneSWAPPER), **Figure 1B**. We could also classify the tools according to their respective tasks; prediction and analysis of BGCs (e.g. antiSMASH, MADIBA, Pathway GeneSWAPPER, WebAUGUSTUS), searching, visualisation and prediction of biosynthetic pathways and reaction paths (e.g. BioCyc, CycSim, FMM, GNP, KEGG, MetaCyc, PathComp, PathPred, PathSearch, PathVisio, Pathway GeneSWAPPER, PlantClusterFinder, SAVI, WikiPathways for plants), prediction of SMs (PRISM), metabolic engineering (OptFlux), other functions (miP3v2). Among the tools for specialised metabolism in plants, AraNet is a probabilistic functional gene network (with currently a total of 27,029 protein‐encoding genes) of *A. thaliana*. It is based on a modified Bayesian integration of data from multiple organisms, each data type being weighted based on how well it links genes that are known to function together in *A. thaliana*. Each interac‐ tion is associated with a log‐likelihood score (LLS), which is a measure of the probability of an interaction representing a true functional linkage between two genes [56]. On the other hand, MADIBA facilitates the interpretation of *Plasmodium* and plant (data currently avail‐ able for *Oryza sativa* and *A. thaliana*) gene clusters [64]. This tool eases the task by automating the post‐processing stage during the assignment of biological meaning to gene expression clusters. MADIBA is designed as a relational database and has stored data from gene to path‐ way for the aforementioned species. Tools within the GUI allow the rapid analyses of each cluster with the view of identifying the Gene Ontology terms, as well as visualising the meta‐ bolic pathways where the genes are implicated, their genomic localisations, putative common

**Database** Minimum Information

about a Biosynthetic

Gene cluster (MIBiG)

**Description** A community standard

http://mibig.

secondarymetabolites.org/

index.html

for annotations

and metadata on

biosynthetic gene

clusters and their

molecular products.

Plant Metabolic

Several plant metabolic

http://www.plantcyc.org/

Includes species/taxon‐specific

[49, 50]

262 Natural Products and Cancer Drug Discovery

[51]

data for more than 22 plant

species.

pathway databases.

Network (PMN)

Plant Reactome/"Cyc"

A pathway database for

http://gramene.org/

Currently includes gene

homology‐based pathway

projections to 62 plant species.

Provides a virtual compound

Currently contains only 47

[44, 52]

cluster combinations

library, which could be a useful

resource for computer‐aided

drug design of pharmaceutically

relevant chemical entities.

Includes hand‐curated links to

[53]

all major tools and databases

commonly used in the field

several crops and model

pathways

plant species.

Recombinant ClustScan

A database of gene

http://csdb.bioserv.pbf.hr/

csdb/RCSDB.html

cluster recombinants

and their corresponding

chemical structures.

Database

SMBP **Table 1.**

Summary of currently available database resources for retrieving genomic data for biosynthesis prediction.

Secondary metabolites

http://www.

secondarymetabolites.org/

bioinformatics portal.

Pathways

**Web accessibility**

**Advantages** Facilitates the standardised

deposition and retrieval of

biosynthetic gene cluster data.

Useful for the development of

comprehensive comparative

analysis tools. Available for

download

**Disadvantages**

**Reference**

[18]



**Tool** antiSMASH\*

A web server and tool for

the automatic genomic

identification and analysis of

biosynthetic gene clusters.

AraNet

Gene function identification

http://www.functionalnet.

org/aranet/

and genetic dissection of plant

traits.

BioCyc/CycSim/

Online tools for genome‐scale

https://biocyc.org/http://

www.genoscope.cns.fr/

cycsim

https://metacyc.org/

http://csdb.bioserv.pbf.hr/

Focuses on gene clusters

[52]

encoding PKSs in

*Streptomyces* sp. and

related bacterial genera.

[59]

[60]

csdb/RCSDB.html

metabolic modelling.

MetaCyc

CompGen

Carry out *in silico* homologous

recombination between gene

clusters.

E‐zyme From Metabolite to

A web server to find

http://FMM.mbc.nctu.edu.tw/

Both local and global graphical

views of the metabolic pathways are

designed.

biosynthetic routes between

two metabolites within the

KEGG database.

Organisation and analysis of

http://www.geneious.com/

Includes a public application

[61]

programming interface (API)

available for developers. Freely

available for download.

basic

sequence data.

Genomes‐to‐Natural

Prediction, combinatorial

http://magarveylab.ca/gnp/

Uses LC–MS/MS data of crude

Focuses on bacterial

[62]

NPs.

extracts to make predictions in a

high‐throughput manner.

design and identification

of PKs and NRPs from

biosynthetic assembly lines.

Products platform

(GNP)

Geneious

metabolite (FMM)

Assignment of EC numbers.

http://www.genome.jp/

Classifies enzymatic reactions and

links the enzyme genes or proteins to

reactions in metabolic pathways.

tools/e‐zyme/

**Utility**

**Web accessibility**

http://antismash.

secondarymetabolites.org.

**Advantage** Detects putative gene clusters

of unknown types. Identifies

similarities of identified clusters to

any of 1172 clusters with known end

products, etc.

Had greater precision than literature‐

Applicability is limited

[56]

264 Natural Products and Cancer Drug Discovery

to one species ‐ *A.* 

*thaliana.*

based protein interactions (21%)

for 55% of tested genes. Is highly

predictive for diverse biological

pathways.

Support the design and simulation of

[57, 58]

knockout experiments, e.g. deletions

mutants on specified media, etc.

**Disadvantage** Designed for analysis of

[55]

BGCs in microbes.

**Reference**


**Table 2.** Summary of current computational tools which could be useful for the plant genomic data analysis.

transcriptional regulatory elements in the upstream sequences, and an analysis specific to the organism being studied.

PlantClusterFinder, SAVI and WikiPathways for plants are all purpose tools designed to assist in the prediction of metabolic gene cluster from plant genomes, although WikiPathways for plants has currently included mostly data for rice and *Arabidopsis* sp. SAVI has the added advantage of offering the user the possibility of including pathway metadata (e.g. taxonomic distribution, key reactions, etc.) and offering the possibility to decide which pathway(s) to keep and which to remove or validate manually.

### **5. Some computational methods for efficient production and the** *de novo* **engineering of natural products**

Two main areas for computational tools can be distinguished: on the one hand the rational modification of genomes for the production of molecules by host organisms, and on the other hand the modification or the *de novo* design of gene clusters for the biosynthesis of novel NPs. For both genetic engineering approaches, the already known genomes of bacteria, fungi and more and more plants provide the basic datasets. A very important computational approach for a rational modification of NP‐producing host organisms is the genome‐scale metabolic modelling [77, 78].

Automatic assignments of functional annotations of all genes in a genome are ideally proven by manual curation and enriched by current knowledge about the metabolic network of sub‐ jected organisms. The curated genomes are then applied to a complete automatic reconstruc‐ tion of the metabolic pathways of the cell. These metabolic models are normally encoded in the Systems Biology Markup Language (SBML) and are compatible with various software tools, for example, Cytoscape [79], which can be applied for static network analyses. For instance, missing enzymes (gaps) within the network become apparent by substrates that are not taken up or have not been produced by the cell, as well as products that are not consumed by other reactions and are not secreted from cell. The RAST annotation pipeline provides a full automatic server for predicting all gene functions and discovering new pathways in microbial genomes of bacteria [80]. Such models can then be used to predict the turnover rate of each reaction in a Flux Balance Analysis (FBA) [81]. Several tools have been built, which apply FBA to identify enzymes that should be either introduced or knocked‐out in the organ‐ ism to increase production rate in the host organisms. A widely used FBA package is the MATLAB‐based COBRA Toolbox [82]. With CycSim [58], BioMet [83] and FAME [84] power‐ ful web‐based FBA applications were published that do not require any software installation.

Within the last 10 years, FBA was applied to support numerous genetic engineering approaches, for example, for the determination of minimal media in *Helicobacter pylori* [85], for growth rate predictions in *Bacillus subtilis* [86] or for the development of metabolic engineering strategies in *Pseudomonas putida* [87]. Based on FBA, it was possible to increase vanillin production in baker's yeast by twofold and enhance sesquiterpene production in the same species [88, 89].

**Tool** PathVisio

**Utility** A biological pathway analysis

software that allows users

to draw, edit and analyse

biological pathways.

Pathway

Maps homologous genes from

http://jaiswallab.cgrb.

oregonstate.edu/software/

one species onto the PathVisio

pathway diagram of another

PGS

species.

PlantClusterFinder

Prediction

Genomes to natural products

http://magarveylab.ca/prism/

Open‐source, user‐friendly web

available application.

prediction informatics for

secondary metabolomes.

informatics

for secondary

metabolomes

(PRISM)

RetroPath Semi‐Automated

Predicts metabolic pathways

https://dpb.carnegiescience.

edu/labs/rhee‐lab/software

using pathway metadata (e.g.

taxonomic distribution, key

reactions, etc.).

Validation

Infrastructure (SAVI)

WebAUGUSTUS

WikiPathways for

A community pathway

curation portal.

\*Currently provided detection rules for 44 classes and subclasses of SMs.

Summary of current computational tools which could be useful for the plant genomic data analysis.

plants

**Table 2.**

Gene prediction tool.

http://bioinf.uni‐greifswald.

One of the most accurate tools for

Focuses on eukaryotes.

Currently limited to rice

[70, 75, 76]

and *Arabidopsis* sp.

[74]

eukaryotic gene prediction.

de/webaugustus

http://plants.wikipathways.

Freely available.

org

A webserver for retrosynthetic

http://www.jfaulon.com/

Integrates pathway prediction and

[72, 73]

ranking, prediction of compatibility

with host genes, toxicity prediction

and metabolic modeling.

Decides which pathways to keep,

Only the algorithm

is available. Lacks a

graphical user interface.

remove or validate manually.

Available for download.

bioretrosynth/

pathway design.

Predicts metabolic gene

https://dpb.carnegiescience.

Focuses on plant species. Available

Only the algorithm

is available. Lacks a

graphical user interface

Focuses on microbial

[71]

SMs.

for download.

edu/labs/rhee‐lab/software

clusters from plant genomes.

GeneSWAPPER

**Web accessibility**

http://www.pathvisio.org/

Plugins are included, which provide

advanced analysis methods,

visualisation options or additional

import/export functionality.

Available for download.

Improves the functionalities of

[70]

266 Natural Products and Cancer Drug Discovery

PathVisio and WikiPathways for

plants.

**Advantage**

**Disadvantage**

**Reference**

[68, 69]

The rational modification of a given genome to design novel molecules needs a detailed understanding of the producing gene clusters. Well‐studied gene clusters such as polyketide synthases consist of specific domain types that can be identified by trained hidden Markov models that are stored in related databases, for example, PFAM [90]. Gene cluster analysis tools such as antiSMASH [55, 91] or PRISM [71] analyse a given gene cluster to predict the specific domains and to describe the architecture of a gene cluster. However, the pre‐ diction of the structure of the resulting natural products is a difficult task because sub‐ strate recognition of active sites and the correct ordering of enzymatic reactions has to be predicted. If subjected enzymes are catalysing multiple substrates, the availability of each substrate has to be predicted. Most frequently, the automatic analysis of a cluster is based on the deduction of information from gene clusters similar to the queried one. If well‐annotated similar gene clusters do not exist, the prediction of the structure of the biosynthesised NP is challenging. With more and more knowledge about the structure of natural products and the encoding sequences, the relation between the composition of the active sites and substrate binding will be better understood. Existing algorithms are often based on machine‐learning approaches and predict the correct substrates for a selected set of enzyme families [92]. For the prediction of NPs synthesised by non‐ribosomal peptide synthetases, such a sequence‐based prediction method is integrated in the related web‐ server NRPSpredictor2 [93]. Rational substitution of residues to generate novel molecules still requires a detailed manual analysis of the encoding gene cluster, and new software tools that propose mutations leading to novel molecules might accelerate this approach considerably in future.

### **6. Selected natural products with promising anticancer properties from African sources**

Recent reviews on the anticancer potential of African flora have discussed the anticancer, cytotoxic, antiproferative and antitumour activities of about 500 NPs [9–12]. In this section, we focus on the most promising (recent) results for anticancer SMs from African flora (**Table 3**, **Figure 2**), published after the last reviews. The isolation of two new lignans; 3α‐O‐(β‐*D*‐glucopyranosyl) desoxypodophyllotoxin (**1**) and 4‐O‐(β‐*D*‐glucopyranosyl) dehydropodophyllotoxin (**2**), alongside other known lignans (**3** and **4**), have been reported from the species, *Cleistanthus boivinianus* (Phyllanthaceae), collected in Madagascar (coordinates 13°06′37″S 049°09′39″E) [94]. These compounds showed potent to moderate antiproliferative activities against the A2780 ovarian cancer cell line, with compound **1** showing potent antiproliferative activity against the HCT‐116 human colon carcinoma cell line (IC50 = 0.03 µM). The known compounds with promising activities from this species included the lignans; (±)‐β‐apopicropodophyllin (**3**, PubChem CID: 6452099), (−)‐desoxypodophyllotoxin (**4**, PubChem CID: 345501). The same authors also isolated a new butanolide, macrocarpolide A (**5**, PubChem CID: 122372160) and two new secobutanolides; macrocarpolides B (**6**, PubChem CID: 122372161) and C (**7**, PubChem CID: 122372162), together with other known compounds from the ethanol extract of the roots of the Madagascan


The rational modification of a given genome to design novel molecules needs a detailed understanding of the producing gene clusters. Well‐studied gene clusters such as polyketide synthases consist of specific domain types that can be identified by trained hidden Markov models that are stored in related databases, for example, PFAM [90]. Gene cluster analysis tools such as antiSMASH [55, 91] or PRISM [71] analyse a given gene cluster to predict the specific domains and to describe the architecture of a gene cluster. However, the pre‐ diction of the structure of the resulting natural products is a difficult task because sub‐ strate recognition of active sites and the correct ordering of enzymatic reactions has to be predicted. If subjected enzymes are catalysing multiple substrates, the availability of each substrate has to be predicted. Most frequently, the automatic analysis of a cluster is based on the deduction of information from gene clusters similar to the queried one. If well‐annotated similar gene clusters do not exist, the prediction of the structure of the biosynthesised NP is challenging. With more and more knowledge about the structure of natural products and the encoding sequences, the relation between the composition of the active sites and substrate binding will be better understood. Existing algorithms are often based on machine‐learning approaches and predict the correct substrates for a selected set of enzyme families [92]. For the prediction of NPs synthesised by non‐ribosomal peptide synthetases, such a sequence‐based prediction method is integrated in the related web‐ server NRPSpredictor2 [93]. Rational substitution of residues to generate novel molecules still requires a detailed manual analysis of the encoding gene cluster, and new software tools that propose mutations leading to novel molecules might accelerate this approach

**6. Selected natural products with promising anticancer properties from** 

Recent reviews on the anticancer potential of African flora have discussed the anticancer, cytotoxic, antiproferative and antitumour activities of about 500 NPs [9–12]. In this section, we focus on the most promising (recent) results for anticancer SMs from African flora (**Table 3**, **Figure 2**), published after the last reviews. The isolation of two new lignans; 3α‐O‐(β‐*D*‐glucopyranosyl) desoxypodophyllotoxin (**1**) and 4‐O‐(β‐*D*‐glucopyranosyl) dehydropodophyllotoxin (**2**), alongside other known lignans (**3** and **4**), have been reported from the species, *Cleistanthus boivinianus* (Phyllanthaceae), collected in Madagascar (coordinates 13°06′37″S 049°09′39″E) [94]. These compounds showed potent to moderate antiproliferative activities against the A2780 ovarian cancer cell line, with compound **1** showing potent antiproliferative activity against the HCT‐116 human colon carcinoma cell line (IC50 = 0.03 µM). The known compounds with promising activities from this species included the lignans; (±)‐β‐apopicropodophyllin (**3**, PubChem CID: 6452099), (−)‐desoxypodophyllotoxin (**4**, PubChem CID: 345501). The same authors also isolated a new butanolide, macrocarpolide A (**5**, PubChem CID: 122372160) and two new secobutanolides; macrocarpolides B (**6**, PubChem CID: 122372161) and C (**7**, PubChem CID: 122372162), together with other known compounds from the ethanol extract of the roots of the Madagascan

considerably in future.

268 Natural Products and Cancer Drug Discovery

**African sources**

**Table 3.** Summary of recently published selected promising anticancer SMs from African flora.

Computational Studies and Biosynthesis of Natural Products with Promising Anticancer Properties http://dx.doi.org/10.5772/67650 269

**Figure 2.** Chemical structures of selected anticancer SMs from African flora.

species *Ocotea macrocarpa* (Lauraceae), which showed antiproliferative activities against the A2780 ovarian cell line [95]. The known isolates included the butanolides; linderano‐ lide B (**8**, PubChem CID: 53308122) and isolinderanolide (**9**, PubChem CID: 44576054). The anticancer activities showed IC50 values of 2.57 (**5**), 1.98 (**6**), 1.67 (**7**), 2.43 (**8**) and 1.65 µM (**9**) against A2780 ovarian cancer cell lines. Additionally, the leaves of *Cleistochlamys kirkii* (Annonaceae) from Tanzania have been recently shown to be a rich source of polyoxygen‐ ated cyclohexene derivatives with antiplasmodial activities, along with very potent activi‐ ties against MDA‐MB‐231 triple‐negative human breast cancer cell line [96]. The isolates; cleistodienediol (**10**), cleistodienol A (**11**), cleistodienol B (**12**), cleistenechlorohydrin A (**13**), cleistenechlorohydrin B (**14**), cleistenediol F (**15**), cleistophenolide (**16**), *ent*‐subglain C (**17**) and melodorinol (**18**, PubChem CID: 6438687) showed some activities as low as IC50 = 0.09 µM against the aforementioned cancer cell lines. To the best of our knowledge, mode of action studies have not yet been conducted for the SMs **1** to **18** and *in vivo* activity data is currently unavailable.

### **7. Case studies**

species *Ocotea macrocarpa* (Lauraceae), which showed antiproliferative activities against the A2780 ovarian cell line [95]. The known isolates included the butanolides; linderano‐ lide B (**8**, PubChem CID: 53308122) and isolinderanolide (**9**, PubChem CID: 44576054). The anticancer activities showed IC50 values of 2.57 (**5**), 1.98 (**6**), 1.67 (**7**), 2.43 (**8**) and 1.65 µM (**9**) against A2780 ovarian cancer cell lines. Additionally, the leaves of *Cleistochlamys kirkii* (Annonaceae) from Tanzania have been recently shown to be a rich source of polyoxygen‐ ated cyclohexene derivatives with antiplasmodial activities, along with very potent activi‐ ties against MDA‐MB‐231 triple‐negative human breast cancer cell line [96]. The isolates; cleistodienediol (**10**), cleistodienol A (**11**), cleistodienol B (**12**), cleistenechlorohydrin A (**13**), cleistenechlorohydrin B (**14**), cleistenediol F (**15**), cleistophenolide (**16**), *ent*‐subglain C (**17**) and melodorinol (**18**, PubChem CID: 6438687) showed some activities as low as IC50 = 0.09 µM against the aforementioned cancer cell lines. To the best of our knowledge, mode of action studies have not yet been conducted for the SMs **1** to **18** and *in vivo* activity data is currently

**Figure 2.** Chemical structures of selected anticancer SMs from African flora.

270 Natural Products and Cancer Drug Discovery

unavailable.

In this section, we shall discuss specific examples of the investigation of biosynthesis of anti‐ cancer plant‐based SMs by (computational) analysis of genomic data.

### **7.1. Biogenesis of several anticancer metabolites by** *Ocimum tenuiflorum* **(Lamiaceae)**

Species from the genus *Ocimum* are well known for their high medicinal values and are there‐ fore used to cure a variety of ailments in Ayurveda, an Indian system of medicine [97, 98]. About 30 SMs have been reported from the genus *Ocimum*, with a variety of biological proper‐ ties [99]. Only 14 of these SMs belong to the five basic groups of compounds having a complete biosynthetic pathway information in the PMN database [49, 50], thereby leaving us with ~15 medicinally relevant metabolites from *Ocimum* sp. with unknown pathways. This has prompted further investigation on SMs with uncharted biosynthetic pathways. Several bioactive SMs, including the anticancer compounds; apigenin (**19**, PubChem CID: 5280443), rosmarinic acid (**20**, PubChem CID: 5281792), taxol (**21**, PubChem CID: 36314), ursolic acid (**22**, PubChem CID: 64945), oleanolic acid (**23**, PubChem CID: 10494) and the plant steroid sitosterol (**24**, PubChem CID: 222284) have been identified from the herb Krishna Tulsi (*O. tenuiflorum*, Lamiaceae), with the mature leaves retaining the medicinally relevant metabolites [100]. Upadhyay et al. carried out a draft genome analysis of the species and generated paired‐end and mate‐pair sequence libraries for the whole sequenced genome, together with transcriptomic analysis (RNA‐Seq) of two subtypes of *O. tenuiflorum* (Krishna and Rama Tulsi) and reporting the relative expression of genes in the both varieties. The authors further investigated the pathways, which lead to the biosynthesis of the identified SMs, with respect to similar pathways in *A. thaliana* and other model plants (e.g. *Oryza sativa japonica*). Six important genes (including *Q8RWT0* and *F1T282*) were expressed and identified from analysis of genome data. These were validated by q‐RT‐PCR on the different studied tissues (e.g. roots, mature leaves, etc.) of five closely related species (e.g. *O. gratissimum*, *O. sacharicum*, *O. kilmund, Solanum lycopersicum* and *Vitis vinifera*), which showed a high extent of urosolic acid‐producing genes in young leaves. The other identified anticancer metabolites included eugenol and ursolic acid. As an example, the authors employed sequence search algorithms to search for the three enzymes of the three‐step synthetic pathway of ursolic acid from squalene in the Tulsi genome. Each of these enzymes in Tulsi (squalene epoxidase, α‐amyrin synthase and α‐amyrin 2,8 monoxygenase) were queried from the PlantCyc database, starting from their protein sequences. The search for analogous enzymes in the model plants *O.sativa japonica* and *A. thaliana*, showed sequence identity covering from 50 to 80% of the query length. The whole genome and sequence analysis of *O. tenuiflorum* suggested that small amino acid changes at the functional sites of genes involved in metabolite synthesis pathways could confer special medicinal (particularly anticancer) properties to this herb.

### **7.2. Biosynthesis of the anticancer alkaloid noscapine by** *Papaver somniferum* **(Papaveraceae)**

Noscapine (**25**, PubChem CID: 275196) is an antitumour phthalideisoquinoline alkaloid from opium poppy (*Papaver somniferum*, Papaveraceae). Compound **25** is known to bind stoichiometrically to tubulin, alters its conformation, affects microtubule assembly (promotes microtubule polymerisation), hence arresting metaphase and inducing apoptosis in many cell types [101]. It has been demonstrated that the compound has potent antitumour activ‐ ity against solid murine lymphoid tumours (even when the drug was administered orally). This drug has also shown potency against human breast, ovarian and bladder tumours implanted in nude mice and in dividing human cells [102, 103]. Although the compound is water‐soluble and absorbed after oral administration, its chemotherapeutic potential in human cancer could not be fully exploited for drug discovery projects because, like most SMs, this has been limited by the typically small amounts produced in the slow‐growing plant species [104]. The quest to improve production levels of the NP is essential for drug dis‐ covery. However, such would require a proper understanding biological processes under‐ lying the biosynthesis of this SM, known from isotope‐labelling experiments to be derived from scoulerine since the 1960s [105]. Winzer et al. have carried out a transcriptomic analysis, with the aim of elucidating the biosynthetic pathway of this important metabolite for the improvement of its commercial production in both poppy and other systems [106]. The anal‐ ysis of a high noscapine‐producing poppy variety, HN1, showed the exclusive expression of 10 genes encoding five distinct enzyme classes, whereas five functionally characterised genes (*BBE*, *TNMT*, *SaIR*, *SaIAT* and *T6ODM*) were present in all three of the studied poppy varieties, respectively, rich in morphine, thebaine and noscapine (HM1, HN1 and HT1). The authors analysed the expressed sequence tag (EST) abundance and discovered some previously uncharacterised genes expressed in HN1, which were completely absent from the other (HM1 and HT1) EST libraries. This led to the identification of the corresponding enzymes as three *O*‐methyltransferases (*PSMT1*, *PSMT2*, *PSMT3*), four cytochrome P450s (*CYP82X1*, *CYP82X2*, *CYP82Y1* and *CYP719A21*), an acetyltransferase (*PSAT1*), a carboxyles‐ terase (*PSCXE1*) and a short‐chain dehydrogenase/reductase (*PSSDR1*). Further analysis of an F2 mapping population, using HN1 and HM1 as parents, indicated that these genes are tightly linked in HN1. Moreover, bacterial artificial chromosome sequencing confirmed the existence of a complex BGC for plant alkaloids. Based on the knowledge derived from the investigation, the authors could make suggestions for the improved production of noscapine and related bioactive molecules by the molecular breeding of commercial poppy varieties or engineering of new production systems, for example, by virus‐induced gene silencing, which resulted in the accumulation of pathway intermediates, thus allowing gene function to be linked to noscapine synthesis [104, 106].

#### **7.3. Biosynthesis of vinblastine and vincristine by** *Catharanthus roseus* **(Apocynaceae)**

Vinblastine (**26**, PubChem CID: 13342) and vincristine (**27**, PubChem CID: 5978) are chemo‐ therapy drugs used to treat a number of cancer types. These are among the >120 known ter‐ penoid indole alkaloids from the medicinal plant *C. roseus*, also known as the Madagascar periwinkle [107]. Since these two very important anticancer compounds have only been produced in very low amounts in *C. roseus*, as opposed to the fairly high levels of several monomeric alkaloids (e.g. ajmalicine and serpentine) [108], attempts to improve the yields of compounds **26** and **27** have led to the genome‐wide transcript profiling of elicited *C. roseus* cell cultures, by cDNA‐amplified fragment‐length polymorphism combined with metabolic profiling [107]. This resulted in the identification of several gene‐to‐gene and gene‐to‐metabo‐ lite networks obtained by an attempt to establish correlations between the expression profiles of 417 gene tags and the accumulation profiles of 178 metabolite peaks. The results proved that different branches of terpenoid indole alkaloid biosynthesis and various other metabolic pathways are affected by differences in hormonal regulation. Thus, the investigations of Rischer et al. provided the foundations for a proper understanding of secondary metabolism in *C. roseus*, thereby enhancing the applicability of metabolic engineering of Madagascar peri‐ winkle. This study provided the possibility of exploring a select number of genes (e.g. *STR*, *10HGO*, *T16H* and *DAT*) involved in biosynthesis of terpenoid indole alkaloids [107].

### **8. The way forward**

stoichiometrically to tubulin, alters its conformation, affects microtubule assembly (promotes microtubule polymerisation), hence arresting metaphase and inducing apoptosis in many cell types [101]. It has been demonstrated that the compound has potent antitumour activ‐ ity against solid murine lymphoid tumours (even when the drug was administered orally). This drug has also shown potency against human breast, ovarian and bladder tumours implanted in nude mice and in dividing human cells [102, 103]. Although the compound is water‐soluble and absorbed after oral administration, its chemotherapeutic potential in human cancer could not be fully exploited for drug discovery projects because, like most SMs, this has been limited by the typically small amounts produced in the slow‐growing plant species [104]. The quest to improve production levels of the NP is essential for drug dis‐ covery. However, such would require a proper understanding biological processes under‐ lying the biosynthesis of this SM, known from isotope‐labelling experiments to be derived from scoulerine since the 1960s [105]. Winzer et al. have carried out a transcriptomic analysis, with the aim of elucidating the biosynthetic pathway of this important metabolite for the improvement of its commercial production in both poppy and other systems [106]. The anal‐ ysis of a high noscapine‐producing poppy variety, HN1, showed the exclusive expression of 10 genes encoding five distinct enzyme classes, whereas five functionally characterised genes (*BBE*, *TNMT*, *SaIR*, *SaIAT* and *T6ODM*) were present in all three of the studied poppy varieties, respectively, rich in morphine, thebaine and noscapine (HM1, HN1 and HT1). The authors analysed the expressed sequence tag (EST) abundance and discovered some previously uncharacterised genes expressed in HN1, which were completely absent from the other (HM1 and HT1) EST libraries. This led to the identification of the corresponding enzymes as three *O*‐methyltransferases (*PSMT1*, *PSMT2*, *PSMT3*), four cytochrome P450s (*CYP82X1*, *CYP82X2*, *CYP82Y1* and *CYP719A21*), an acetyltransferase (*PSAT1*), a carboxyles‐ terase (*PSCXE1*) and a short‐chain dehydrogenase/reductase (*PSSDR1*). Further analysis of an F2 mapping population, using HN1 and HM1 as parents, indicated that these genes are tightly linked in HN1. Moreover, bacterial artificial chromosome sequencing confirmed the existence of a complex BGC for plant alkaloids. Based on the knowledge derived from the investigation, the authors could make suggestions for the improved production of noscapine and related bioactive molecules by the molecular breeding of commercial poppy varieties or engineering of new production systems, for example, by virus‐induced gene silencing, which resulted in the accumulation of pathway intermediates, thus allowing gene function to be

**7.3. Biosynthesis of vinblastine and vincristine by** *Catharanthus roseus* **(Apocynaceae)**

Vinblastine (**26**, PubChem CID: 13342) and vincristine (**27**, PubChem CID: 5978) are chemo‐ therapy drugs used to treat a number of cancer types. These are among the >120 known ter‐ penoid indole alkaloids from the medicinal plant *C. roseus*, also known as the Madagascar periwinkle [107]. Since these two very important anticancer compounds have only been produced in very low amounts in *C. roseus*, as opposed to the fairly high levels of several monomeric alkaloids (e.g. ajmalicine and serpentine) [108], attempts to improve the yields of compounds **26** and **27** have led to the genome‐wide transcript profiling of elicited *C. roseus* cell cultures, by cDNA‐amplified fragment‐length polymorphism combined with metabolic

linked to noscapine synthesis [104, 106].

272 Natural Products and Cancer Drug Discovery

The case studies show that the detailed computational analysis of the transcriptomic and metabolomic data of a plant species could reveal its metabolic capacity and hence help identify candidate genes involved in the biosynthesis of the important SMs it contains. Thus, modifying the plant genes could represent a premise for improving metabolite yield. It should be mentioned that other compounds from some of the aforementioned compound classes (**Table 3**), from both floral and microbial sources, have shown promis‐ ing anticancer activities [109–113], e.g. isolinderanolide B (**28**, PubChem CID: 53308122) (**Figure 3**), a butanolide from the stems of *Cinnamomum subavenium* (Lauraceae) had shown antiproliferative activity in T24 human bladder cancer cells by blocking cell cycle pro‐ gression and inducing apoptosis [112]. In addition, subamolide B (**29**, PubChem CID: 16104907), another butanolide from this same species, is known to induce cytotoxicity in human cutaneous squamous cell carcinoma through mitochondrial and CHOP‐dependent cell death pathways [113]. Meanwhile, obtusilactone B (**30**, PubChem CID: 101286261), from *Machilus thunbergii* (Lauraceae), is known to target barrier‐to‐autointegration factor to treat cancer [111].

From the African flora, apart from the Lauraceae, Phyllanthaceae and Annonaceae, known to be rich in anticancer metabolites, the genus *Tacca* of the yam family (Dioscoreaceae) is known for the abundant presence of taccalonolides, which are microtubule stabilisers with clinical potential for cancer treatment [114]. Additionally, the genus *Tamarix* (e.g. *T. aphylla*

**Figure 3.** Chemical structures of selected anticancer butanolides from Lauraceae.

and *T. nilotica* from Northern Africa), together with the genus *Reaumuria* (Tamaricaceae) are known for the abundant presence of tannins (gallo‐ellagitannin, gallotannins) with remark‐ able cytotoxic effects. The high salt content of the leaves of *Tamarix* species, rendering them useful locally as a fire barrier, and their adaptability to drought and high salinity are of equal interest. It therefore becomes urgent to investigate the genomics of some of the aforemen‐ tioned plant species, particularly those from the *Cinnamomum* sp*.*, *Ocotea* sp*.* and *Machilus* sp*.* (Lauraceae), *Tacca* sp. (Dioscoreaceae), *Cleistanthus* sp*.* (Phyllanthaceae), *Cleistochlamys* sp*.* (Annonaceae), *Tamarix* sp*.* (Tamaricaceae) and so on, and hence further investigate the genes or BGCs responsible for secondary metabolism with the view of understanding and better exploring the biosynthetic pathways of the anticancer SMs.

### **9. Conclusions**

It has been our intention in this chapter to provide a detailed overview of the important com‐ putational tools and resources for the analysis of plant genomic data and for the prediction of biosynthetic pathways in plants. We have taken a few case studies of anticancer SMs to illustrate this. Even though it is unclear how widespread plant genes are clusters, genes that encode the biosynthesis of several small plant SMs are well known, including the vital genes for the production of some highly potent anticancer drugs. With the use of the tools and data‐ bases described, along with the drop in the cost of whole genome sequencing in plant species, the future for the discovery of new plant‐based anticancer metabolites would involve the iden‐ tification of one or more genes or BGCs encoding the enzymes in the biosynthetic pathway for the target compound(s), followed by the co‐expression analysis, also exploiting the knowledge of the chemical structure of the target compound, for the identification of other enzymes that might be involved in this pathway. As an example, the exploration of the pathway for podo‐ phyllotoxin biosynthesis by the use transcriptome mining in *Podophyllum hexandrum* led to the identification biosynthetic genes, 29 of which were combinatorially expressed in the tobacco plant (*Nicotiana benthamiana*), leading to the identification of six pathway enzymes, among which is oxoglutarate‐dependent dioxygenase responsible for closing the core cyclohexane ring of the aryltetralin scaffold [115]. An alternative approach could be, if the metabolic path‐ way and nature of SMs are unknown, then the identified co‐expressed genes encoding the enzymes for secondary metabolism could be subjected to untargeted metabolomics for the elucidation of unknown pathways and chemical structures. As an example, a single patho‐ gen‐induced P450 enzyme, CYP82C2, with a combination of untargeted metabolomics and co‐expression analysis was used to uncover the complete biosynthetic pathway, which leads to the metabolite 4‐hydroxyindole‐3‐carbonyl nitrile, previously unknown to *Arabidopsis* sp. This rare and hitherto unprecedented plant metabolite, with a cyanogenic functionality revealed a hidden capacity of *Arabidopsis* sp. for cyanogenic glucoside biosynthesis. This was confirmed by expressing 4‐OH‐ICN engineering biosynthetic enzymes in *Saccharomyces cerevisiae* and *Nicotiana benthamiana*, to reconstitute the complete pathway *in vitro* and *in vivo*, thus validating the functions of the enzymes involved in the pathway [116].

### **Acknowledegments**

and *T. nilotica* from Northern Africa), together with the genus *Reaumuria* (Tamaricaceae) are known for the abundant presence of tannins (gallo‐ellagitannin, gallotannins) with remark‐ able cytotoxic effects. The high salt content of the leaves of *Tamarix* species, rendering them useful locally as a fire barrier, and their adaptability to drought and high salinity are of equal interest. It therefore becomes urgent to investigate the genomics of some of the aforemen‐ tioned plant species, particularly those from the *Cinnamomum* sp*.*, *Ocotea* sp*.* and *Machilus* sp*.* (Lauraceae), *Tacca* sp. (Dioscoreaceae), *Cleistanthus* sp*.* (Phyllanthaceae), *Cleistochlamys* sp*.* (Annonaceae), *Tamarix* sp*.* (Tamaricaceae) and so on, and hence further investigate the genes or BGCs responsible for secondary metabolism with the view of understanding and better

It has been our intention in this chapter to provide a detailed overview of the important com‐ putational tools and resources for the analysis of plant genomic data and for the prediction of biosynthetic pathways in plants. We have taken a few case studies of anticancer SMs to illustrate this. Even though it is unclear how widespread plant genes are clusters, genes that encode the biosynthesis of several small plant SMs are well known, including the vital genes for the production of some highly potent anticancer drugs. With the use of the tools and data‐ bases described, along with the drop in the cost of whole genome sequencing in plant species, the future for the discovery of new plant‐based anticancer metabolites would involve the iden‐ tification of one or more genes or BGCs encoding the enzymes in the biosynthetic pathway for the target compound(s), followed by the co‐expression analysis, also exploiting the knowledge of the chemical structure of the target compound, for the identification of other enzymes that might be involved in this pathway. As an example, the exploration of the pathway for podo‐ phyllotoxin biosynthesis by the use transcriptome mining in *Podophyllum hexandrum* led to the identification biosynthetic genes, 29 of which were combinatorially expressed in the tobacco plant (*Nicotiana benthamiana*), leading to the identification of six pathway enzymes, among which is oxoglutarate‐dependent dioxygenase responsible for closing the core cyclohexane ring of the aryltetralin scaffold [115]. An alternative approach could be, if the metabolic path‐ way and nature of SMs are unknown, then the identified co‐expressed genes encoding the enzymes for secondary metabolism could be subjected to untargeted metabolomics for the elucidation of unknown pathways and chemical structures. As an example, a single patho‐ gen‐induced P450 enzyme, CYP82C2, with a combination of untargeted metabolomics and co‐expression analysis was used to uncover the complete biosynthetic pathway, which leads to the metabolite 4‐hydroxyindole‐3‐carbonyl nitrile, previously unknown to *Arabidopsis* sp. This rare and hitherto unprecedented plant metabolite, with a cyanogenic functionality revealed a hidden capacity of *Arabidopsis* sp. for cyanogenic glucoside biosynthesis. This was confirmed by expressing 4‐OH‐ICN engineering biosynthetic enzymes in *Saccharomyces cerevisiae* and *Nicotiana benthamiana*, to reconstitute the complete pathway *in vitro* and *in vivo*, thus

exploring the biosynthetic pathways of the anticancer SMs.

validating the functions of the enzymes involved in the pathway [116].

**9. Conclusions**

274 Natural Products and Cancer Drug Discovery

FNK acknowledges a Georg Forster fellowship from the Alexander von Humboldt Foundation, Germany. CVS is currently a doctoral candidate financed by the German Academic Exchange Services (DAAD), Germany.

### **Competing interests**

The authors declare that they have no competing interests.

### **Abbreviations**


### **Author details**

Aurélien F.A. Moumbock<sup>1</sup> , Conrad V. Simoben2 , Ludger Wessjohann3 , Wolfgang Sippl2 , Stefan Günther4, and Fidele Ntie‐Kang1,2\*

\*Address all correspondence to: ntiekfidele@gmail.com

1 Department of Chemistry, University of Buea, Buea, Cameroon

2 Department of Pharmaceutical Chemistry, Martin‐Luther University of Halle‐Wittenberg, Halle, Germany

3 Leibniz Institute of Plant Biochemistry, Halle, Germany

4 Pharmaceutical Bioinformatics, Albert‐Ludwig‐University Freiburg, Freiburg, Germany

### **References**


Institutes of Health. In: Gurib‐Fakim A, editor. Novel plant bioresources: applications in food, medicine and cosmetics, 1st ed. Oxford: John Wiley & Sons Ltd; 2014, pp. 133‐149. doi:10.1002/9781118460566.ch10

[10] Nwodo JN, Ibezim A, Simoben CV, Ntie‐Kang F. Exploring cancer therapeutics with natural products from African medicinal plants, part II: alkaloids, terpenoids and fla‐ vonoids. Anticancer Agents Med Chem. 2016;16:108‐127. doi:10.2174/187152061566615 0520143827

**Author details**

Halle, Germany

**References**

2008, pp. 3‐16.

Front Biosci (Schol Ed). 2013;4:142‐156.

2013;41:D1124‐D1129. doi:10.1093/nar/gks1047

2014;54:2433‐2450. doi:10.1021/ci5003697

Aurélien F.A. Moumbock<sup>1</sup>

276 Natural Products and Cancer Drug Discovery

Günther4, and Fidele Ntie‐Kang1,2\*

, Conrad V. Simoben2

2 Department of Pharmaceutical Chemistry, Martin‐Luther University of Halle‐Wittenberg,

4 Pharmaceutical Bioinformatics, Albert‐Ludwig‐University Freiburg, Freiburg, Germany

[1] Cragg GM, Newman DJ. Plants as a source of anti‐cancer and anti‐HIV agents. Ann Appl

[2] Cragg GM, Grothaus PG, Newman DJ. Impact of natural products on developing new

[3] Lamari FN, Cordopatis P. Exploring the potential of natural products in cancer treat‐ ment. In: Missailidis S, editor. *Anticancer therapeutics*. West Sussex: Wiley‐Blackwell;

[4] Pan L, Chai HB, Kinghorn AD. Discovery of new anticancer agents from higher plants.

[5] Newman DJ, Cragg GM. Natural products as sources of new drugs from 1981 to 2014*.*

[6] Mangal M, Sagar P, Singh H, Raghava GPS, Agarwal SM. NPACT: naturally occur‐ ring plant‐based anti‐cancer compound activity‐target database. Nucleic Acids Res.

[7] Ntie‐Kang F, Nwodo JN, Ibezim A, Simoben CV, Karaman B, et al. Molecular model‐ ing of potential anticancer agents from African medicinal plants. J Chem Inf Model.

[8] Ntie‐Kang F, Simoben CV, Karaman B, Ngwa VF, Judson PN, et al. Pharmacophore modeling and *in silico* toxicity assessment of potential anticancer agents from African medicinal plants. Drug Des Dev Ther. 2016;10:2137‐2154. doi:10.2147/DDDT.S108118 [9] Beutler JA, Cragg GM, Iwu M, Newman DJ, Okunji C. Anticancer potential of African plants: the experience of the United States National Cancer Institute and National

anti‐cancer agents. Chem Rev. 2009;109:3012‐3043. doi:10.1021/cr900019j

\*Address all correspondence to: ntiekfidele@gmail.com

3 Leibniz Institute of Plant Biochemistry, Halle, Germany

1 Department of Chemistry, University of Buea, Buea, Cameroon

Biol. 2003;143:127‐133. doi:10.1111/j.1744‐7348.2003.tb00278.x

J Nat Prod. 2016;79:629‐661. doi:10.1021/acs.jnatprod.5b01055

, Ludger Wessjohann3

, Wolfgang Sippl2

, Stefan


[36] Zhao H, Medema MH. Standardization for natural product synthetic biology. Nat Prod Rep. 2016;33:920‐924. doi:10.1039/c6np00030d

[22] Rhee SY, Mutwil M. Towards revealing the functions of all genes in plants. Trends Plant

[23] Xu M, Rhee SY. Becoming data‐savvy in a big‐data world. Trends Plant Sci. 2014;19:619‐

[24] Chae L, Lee I, Shin J, Rhee SY. Towards understanding how molecular networks evolve in plants. Curr Opin Plant Biol. 2012;15:177‐184. doi:10.1016/j.pbi.2012.01.006

[25] Medema MH, Osbourn A. Computational genomic identification and functional recon‐ stitution of plant natural product biosynthetic pathways. Nat Prod Rep. 2016;33:951‐962.

[26] Weber T. *In silico* tools for the analysis of antibiotic biosynthetic pathways. Int J Med

[27] Li YF, Tsai KJ, Harvey CJ, Li JJ, Ary BE, et al. Comprehensive curation and analysis of fungal biosynthetic gene clusters of published natural products. Fungal Genet Biol.

[28] van der Lee TA, Medema MH. Computational strategies for genome‐based natural prod‐ uct discovery and engineering in fungi. Fungal Genet Biol. 2016;89:29‐36. doi:10.1016/j.

[29] Frey M, Chomet P, Glawischnig E, Stettner C, Grun S, et al. Analysis of a chemi‐ cal plant defense mechanism in grasses. Science. 1997;277:696‐699. doi:10.1126/

[30] Osbourn A. Gene clusters for secondary metabolic pathways: an emerging theme in

[31] Figueiredo AC, Barroso JG, Pedro LG, Scheffer JJC. Factors affecting secondary metab‐ olite production in plants: volatile components and essential oils. Flavour Fragr J.

[32] Leal MC, Hilario A, Munro MHG, Blunt JW, Calado R. Natural products discovery needs improved taxonomic and geographic information. Nat Prod Rep. 2016;33:747‐750.

[33] Luo Y, Enghiad B, Zhao H. New tools for reconstruction and heterologous expression of natural product biosynthetic gene clusters. Nat Prod Rep. 2016;33:174‐182. doi:10.1039/

[34] Carbonell P, Currin A, Jervis AJ, Rattray NJW, Swainston N, et al. Bioinformatics for the synthetic biology of natural products: integrating across the Design‐Build‐Test cycle.

[35] Song MC, Kim EJ, Kim E, Rathwell K, Nama SJ, et al. Microbial biosynthesis of medicinally important plant secondary metabolites. Nat Prod Rep. 2014;31:1497‐1509.

Nat Prod Rep. 2016;33:925‐932. doi:10.1039/c6np00018e

plant biology. Plant Physiol. 2010;154:531‐535. doi:10.1104/pp.110.161315

Sci. 2014;19:212‐221. doi:10.1016/j.tplants.2013.10.006

Microbiol. 2014;304:230‐235. doi:10.1016/j.ijmm.2014.02.001

622. doi:10.1016/j.tplants.2014.08.003

201689:18‐28. doi:10.1016/j.fgb.2016.01.012

2008;23:213‐226. doi:10.1002/ffj.1875

doi:10.1039/c6np00035e

278 Natural Products and Cancer Drug Discovery

fgb.2016.01.006

science.277.5326.696

doi:10.1039/c5np00130g

doi:10.1039/c4np00057a

c5np00085h


[63] Kanehisa M. KEGG bioinformatics resource for plant genomics and metabolomics. Methods Mol Biol. 2016;1374:55‐70. doi:10.1007/978‐1‐4939‐3167‐5\_3

[50] Dreher K. Putting the plant metabolic network pathway databases to work: going offline to gain new capabilities. Methods Mol Biol. 2014;1083:151‐171. doi:10.1007/978‐1‐62703

[51] Naithani S, Preece J, D'Eustachio P, Gupta P, Amarasinghe V, et al. Plant Reactome: a resource for plant pathways and comparative analysis. Nucleic Acids Res.

[52] Starcevic A, Wolf K, Diminic J, Zucko J, Ruzic IT, et al. Recombinatorial biosynthesis of polyketides. J Ind Microbiol Biotechnol. 2012;39:503‐511. doi:10.1007/s10295‐011‐1049‐x

[53] Weber T, Kim HU. The secondary metabolite bioinformatics portal: computational tools to facilitate synthetic biology of secondary metabolite production. Synth Syst Biotechnol.

[54] Medema MH, van Raaphorst R, Takano E, Breitling R. Computational tools for the syn‐ thetic design of biochemical pathways. Nat Rev Microbiol. 2012;10:191‐202. doi:10.1038/

[55] Weber T, Blin K, Duddela S, Krug D, Kim HU, et al. antiSMASH 3.0—a comprehen‐ sive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res.

[56] Lee I, Ambaru B, Thakkar P, Marcotte EM, Rhee SY. Rational association of genes with traits using a genome‐scale gene network for *Arabidopsis thaliana*. Nat Biotechnol.

[57] Caspi R, Billington R, Ferrer L, Foerster H, Fulcher CA, et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome data‐

[58] Le Fèvre F, Smidtas S, Combe C, Durot M, d'Alché‐Buc F, et al. CycSim—an online tool for exploring and experimenting with genome‐scale metabolic models. Bioinformatics.

[59] Yamanishi Y, Hattori M, Kotera M, Goto S, Kanehisa M. E‐zyme: predicting poten‐ tial EC numbers from the chemical transformation pattern of substrate‐product pairs.

[60] Chou CH, Chang WC, Chiu CM, Huang CC, Huang HD. FMM: a web server for metabolic pathway reconstruction and comparative analysis. Nucleic Acids Res.

[61] Kearse M, Moir R, Wilson A, Stones‐Havas S, Cheung M, et al. Geneious basic: an inte‐ grated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28:1647‐1649. doi:10.1093/bioinformatics/bts199 [62] Johnston CW, Skinnider MA, Wyatt MA, Li X, Ranieri MRM, et al. An automated Genomes‐to‐Natural Products platform (GNP) for the discovery of modular natural

bases. Nucleic Acids Res. 2016;44:D471‐D480. doi:10.1093/nar/gkv1164

Bioinformatics. 2009;25:i179‐i186. doi:10.1093/bioinformatics/btp223

products. Nat Commun. 2015;6:8421. doi:10.1038/ncomms9421

2017;45:D1029‐D1039. doi:10.1093/nar/gkw932

2016;1:69‐79. doi:10.1016/j.synbio.2015.12.002

2015;43:W237‐W243. doi:10.1093/nar/gkv437

2009;25:1987‐1988. doi:10.1093/bioinformatics/btp268

2009;37:W129‐W134. doi:10.1093/nar/gkp264

2010;28:149‐156. doi:10.1038/nbt.1603

‐661‐0\_10

280 Natural Products and Cancer Drug Discovery

nrmicro2717


[89] Asadollahi MA, Maury J, Patil KR, Schalk M, Clark A, Nielsen J. Enhancing sesquiter‐ pene production in *Saccharomyces cerevisiae* through *in silico* driven metabolic engineer‐ ing. Metab Eng. 2009;11:328‐334. doi:10.1016/j.ymben.2009.07.001

[76] Kutmon M, Riutta A, Nunes N, Hanspers K, Willighagen EL, et al. WikiPathways: cap‐ turing the full diversity of pathway knowledge. Nucleic Acids Res. 2016;44:D488‐D494.

[77] Durot M, Bourguignon PY, Schachter V. Genome‐scale models of bacterial metab‐ olism: reconstruction and applications. FEMS Microbiol Rev. 2009;33:164‐190.

[78] Feist AM, Herrgård MJ, Thiele I, Reed JL, Palsson BØ. Reconstruction of biochemi‐ cal networks in microorganisms. Nat Rev Microbiol. 2009;7:129‐143. doi:10.1038/

[79] Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, et al. Cytoscape: a software envi‐ ronment for integrated models of biomolecular interaction networks. Genome Res.

[80] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids

[81] Orth JD, Thiele I, Palsson BØ. What is flux balance analysis? Nat Biotechnol. 2010;28:245‐

[82] Schellenberger J, Que R, Fleming RM, Thiele I, Orth JD, et al. Quantitative prediction of cellular metabolism with constraint‐based models: the COBRA Toolbox v2.0. Nat

[83] Garcia‐Albornoz M, Thankaswamy‐Kosalai S, Nilsson A, Väremo L, Nookaew I, Nielsen J. BioMet Toolbox 2.0: genome‐wide analysis of metabolism and omics data. Nucleic

[84] Boele J, Olivier BG, Teusink B. FAME, the flux analysis and modeling environment. BMC

[85] Schilling CH, Covert MW, Famili I, Church GM, Edwards JS, Palsson BO. Genome‐scale metabolic model of *Helicobacter pylori* 26695. J Bacteriol. 2002;184:4582‐4593. doi:10.1128/

[86] Oh YK, Palsson BO, Park SM, Schilling CH, Mahadevan R. Genome‐scale reconstruction of metabolic network in *Bacillus subtilis* based on high‐throughput phenotyping and gene essentiality data. J Biol Chem. 2007;282:28791‐28799. doi:10.1074/jbc.M703759200

[87] Puchałka J, Oberhardt MA, Godinho M, Bielecka A, Regenhardt D, et al. Genome‐scale reconstruction and analysis of the *Pseudomonas putida* KT2440 metabolic network facili‐ tates applications in biotechnology. PLoS Comput Biol. 2008;4:e1000210. doi:10.1371/

[88] Henry CS, Broadbelt LJ, Hatzimanikatis V. Thermodynamics‐based metabolic flux anal‐

ysis. Biophys J. 2007;92:1792‐1805. doi:10.1529/biophysj.106.093138

doi:10.1093/nar/gkv1024

282 Natural Products and Cancer Drug Discovery

248. doi:10.1038/nbt.1614

JB.184.16.4582‐4593.2002

journal.pcbi.1000210

nrmicro1949

doi:10.1111/j.1574‐6976.2008.00146.x

2003;13:2498‐2504. doi:10.1101/gr.1239303

Res. 2014;42:D206‐D214. doi:10.1093/nar/gkt1226

Protoc. 2011;6:1290‐1307. doi:10.1038/nprot.2011.308

Acids Res. 2014;42:W175‐W181. doi:10.1093/nar/gku371

Syst Biol. 2012;6:8. doi:10.1186/1752‐0509‐6‐8


[115] Lau W, Sattely ES. Six enzymes from mayapple that complete the biosynthetic pathway to the etoposide aglycone. Science. 2015;349:1224‐1228. doi:10.1126/science.aac7202

[102] Ye K, Ke Y, Keshava N, Shanks J, Kapp JA, et al. Opium alkaloid noscapine is an anti‐ tumor agent that arrests metaphase and induces apoptosis in dividing cells. Proc Natl

[103] Zhou J, Gupta K, Yao J, Ye K, Panda D, et al. Paclitaxel‐resistant human ovarian cancer

[104] DellaPenna D, O'Connor SE. Plant gene clusters and opiates. Science. 2012;336:1648‐

[105] Battersby AR, Hirst M, McCaldin DJ, Southgate R, Staunton J. Alkaloid biosynthesis. XII. The biosynthesis of narcotine. J Chem Soc Perkin 1. 1968;17:2163‐2172. PMID:

[106] Winzer T, Gazda V, He Z, Kaminski F, Kern M, et al. A *Papaver somniferum* 10‐gene cluster for synthesis of the anticancer alkaloid noscapine. Science. 336:1704‐1708.

[107] Rischer H, Oresic M, Seppänen‐Laakso T, Katajamaa M, Lammertyn F, et al. Gene‐to‐ metabolite networks for terpenoid indole alkaloid biosynthesis in *Catharanthus roseus* cells. Proc Natl Acad Sci USA. 2006;103:5614‐5619. doi:10.1073/pnas.0601027103 [108] Noble RL. The discovery of the vinca alkaloids‐chemotherapeutic agents against can‐

[109] Dong HP, Wu HM, Chen SJ, Chen CY. The effect of butanolides from *Cinnamomum tenuifolium* on platelet aggregation. Molecules. 2013;18:11836‐11841. doi:10.3390/

[110] Hoshino S, Wakimoto T, Onaka H, Abe I. Chojalactones A‐C, cytotoxic butanolides isolated from *Streptomyces* sp. cultivated with mycolic acid containing bacterium. Org

[111] Kim W, Lyu HN, Kwon HS, Kim YS, Lee KH, et al. Obtusilactone B from *Machilus thunbergii* targets barrier‐to‐autointegration factor to treat cancer. Mol Pharmacol.

[112] Shen KH, Lin ES, Kuo PL, Chen CY, Hsu YL. Isolinderanolide B, a butanolide extracted from the stems of *Cinnamomum subavenium*, inhibits proliferation of T24 human bladder cancer cells by blocking cell cycle progression and inducing apoptosis. Integr Cancer

[113] Yang SY, Wang HM, Wu TW, Chen YJ, Shieh JJ, et al. Subamolide B isolated from medici‐ nal plant *Cinnamomum subavenium* induces cytotoxicity in human cutaneous squamous cell carcinoma cells through mitochondrial and CHOP‐dependent cell death pathways. Evid Based Complement Alternat Med. 2013,2013:630415. doi:10.1155/2013/630415 [114] Risinger AL, Mooberry SL. Taccalonolides: novel microtubule stabilizers with clinical

potential. Cancer Lett. 2010;291:14‐19. doi:10.1016/j.canlet.2009.09.020

‐terminal kinase‐mediated apoptosis in response to noscapine.

Acad Sci USA. 1998;95:1601‐1606. PMID: 9465062

J Biol Chem. 2002;277:39777‐39785. doi:10.1074/jbc.M203927200

cer. Biochem Cell Biol. 1990;68:1344‐1351. doi:10.1139/o90‐197

Lett. 2015;17:1501‐1504. doi:10.1021/acs.orglett.5b00385

Ther. 2011;10:350‐358. doi:10.1177/1534735410391662

2013;83:367‐376. doi:10.1124/mol.112.082578

cells undergo c‐Jun NH2

284 Natural Products and Cancer Drug Discovery

5691486

1649. doi:10.1126/science.1225473

doi:10.1126/science.1220757

molecules181011836

[116] Rajniak J, Barco B, Clay NK, Sattely ES. A new cyanogenic metabolite in *Arabidopsis* required for inducible pathogen defence. Nature. 2015;525:376‐379. doi:10.1038/ nature14907
