© Springer Nature Switzerland AG 2019
A. D. Kinghorn et al. (eds.)Progress in the Chemistry of Organic Natural Products 110Progress in the Chemistry of Organic Natural Products110https://doi.org/10.1007/978-3-030-14632-0_3

A Toolbox for the Identification of Modes of Action of Natural Products

Tiago Rodrigues1  
(1)
Chemical Biology, Instituto de Medicina Molecular João Lobo Antunes, Lisbon, Portugal
 
 
Tiago Rodrigues
1 Introduction
2 Molecular Docking
2.1 Identification of Modes of Action with Docking
3 Pharmacophore Model-Based Screening
3.1 Identification of Modes of Action with Pharmacophore Models
4 Molecular Similarity Searches
4.1 Identification of Modes of Action Through Structural Similarity
5 Machine Learning Methods
5.1 Identification of Modes of Action Using Learning Algorithms
6 Outlook
References

Keywords

CheminformaticsMachine learningChemical biologyMedicinal chemistryNatural productsDrug discoveryTarget identification

Tiago Rodrigues

received an M.Sc. in pharmaceutical sciences (2006) and a Ph.D. in medicinal chemistry (2010) from the University of Lisbon, Portugal. He was a postdoctoral fellow at ETH Zürich (2011–2015) and then joined the Instituto de Medicina Molecular, Portugal, where he is currently a Staff Scientist. Since 2015, he is visiting Assistant Professor at the Faculty of Pharmacy, University of Lisbon. His research interests span a range of different disciplines, such as the development of drug delivery constructs, flow-assisted syntheses, and cheminformatics/machine learning for the deorphanization of natural products.../images/480635_1_En_3_Chapter/480635_1_En_3_Figa_HTML.gif

 

1 Introduction

Natural products have long played a leading role in successful chemical biology and drug discovery, providing chemotypes sufficiently tailored to serve as chemical probes, drug leads or, at the very least, as sources of inspiration for molecular design [14]. While the development of innovative chemistry has facilitated the access to new and more diverse natural products in amounts suitable for bioactivity screening [5], prioritizing target-based assays remains not only a bottleneck in drug discovery but is also troublesome [6]. In fact, screening natural products of interest in target-based assays is often motivated by a prior phenotype change observation induced by the studied natural product in cell-based assays, e.g., cancer cell growth inhibition [6, 7]. Typically, the effective development of such bioactive natural products as useful drug leads relies on the deconvolution of the phenotypic readout and correlation of the said phenotype with the engagement of any given drug target or targets [7]. It is now widely accepted that natural products, like small molecules, rarely are selective but engage dozens of related or unrelated targets [8], resulting in intricate pharmacology networks that might be explored in a drug discovery context [9, 10]. Crucially, such knowledge may bring benefits to the design of leads with lower probability of attrition and ultimately afford efficacious disease modulators.

Over the past few years, chemical proteomics (or chemoproteomics) has been established as the method of choice to identify binding counterparts for bioactive matter [7, 11]. In essence, the small molecule of interest is modified to incorporate a chemical handle prone to “tagging”; the modified chemical entity is then used to pull down proteins from cell lysates prior to subjecting such proteins to a downstream analytical method for identification (Fig. 1). In a recent prominent example, Cravatt and coworkers modified the diterpenoid ester ingenol mebutate—a first-in-class drug used for the treatment for actinic keratosis—to obtain a diazirine probe [12]. Using this photoreactive moiety the mitochondrial carnitine-acylcarnitine translocase SLC25A20 was identified as a functional target of ingenol mebutate. Despite the success in identifying a translocase as binding counterpart, membrane proteins are only seldomly identified as molecular targets using chemoproteomics as are proteins with low cellular expression [2]. Furthermore, the need for chemical modification of a molecule of interest may increase significantly chemical synthesis work, particularly in the case of natural products, and inadvertently disrupt the binding affinity towards relevant on- and off-targets [13]. Altogether, one may appreciate that the field of chemical proteomics is laborious, time consuming, and may require expensive equipment, and only provides motivated research hypotheses that must be validated with functional assays [6].
../images/480635_1_En_3_Chapter/480635_1_En_3_Fig1_HTML.png
Fig. 1

Typical workflow for identifying drug targets through chemical proteomics approaches

It is conceivable that in silico methods can provide viable alternatives to generate such motivated research hypotheses, yet within a fraction of time and resources spent. Virtual screening of an enumerated fraction of chemical space has been employed widely with vendor libraries as a means of accelerating hit discovery and prioritizing chemical matter for screening campaigns [14]. In contrast to chemical proteomics, where target identification is a step downstream from phenotypic assays, in silico screening often focuses on a drug target for which ligands are sought [3]; only in the event of successful experimental validation of the predicted ligand-target relationship is the engagement of the target correlated with modulation of disease or adverse drug reactions [13, 15].

In this contribution, an overview will be provided and a discussion of strengths and limitations of computational methods that have been successfully employed for unveiling targets in the natural product realm. In particular, molecular docking and pharmacophore model-based strategies will be described as a means of accounting for three-dimensionality in scrutinizing potential drug targets for either natural products or synthetic small molecules. Importantly, with the advent of big data in biological and chemical sciences [16, 17], molecular docking and pharmacophore screening have become suboptimal approaches to process large volumes of information. In fact, the increasing computer power, storage capacity, and improved algorithms to analyze unstructured and sparse data, are setting the tone for a new era of cheminformatics where artificial intelligence promises to tackle some of the long-standing problems in molecular informatics and chemistry in general [17, 18]. As such, a special focus is given to emerging machine learning tools that leverage topological descriptors as a workhorse to building predictive models, and how such approaches can drive future chemical biology and early drug discovery programs. By comparing different tools, some of them accessible through webservers [8, 19], this contribution aims at being a reference work for the motivated selection of any given tool according to the goal of the project.

2 Molecular Docking

Molecular docking has become a standard means of screening virtual libraries within the realm of receptor-based methods [20, 21]. In short, such methods sample the ligand conformation space in a user-defined “box”/binding site in an attempt to predict the so-called “docking pose” and rationalize, on a molecular structure level, the activity that any compound might present against a given protein [20]. Thus, molecular docking software tools do not aim at identifying ready-made and optimized ligands, but rather discriminate relevant chemical features responsible for a molecular recognition event. These compounds and spatial arrangement of features might then be further tuned through medicinal chemistry to enhance binding affinity and, ideally, improve functional activity. Despite the simplicity of the concept and the existence of several user-friendly tools to carry out molecular docking studies, the researcher must bear in mind several caveats for proper data interpretation [20, 22, 23]. For instance, docking solely provides motivated research hypotheses or can rationalize them prior to experimental observations. Given that docking models account for only a snapshot of the protein in a conformational ensemble [21], they ought to be validated in biochemical studies (e.g., site mutagenesis) and the accuracy of the output is tightly connected to the quality of the protein X-ray structure where docking is performed. Since X-ray structures represent electron density models, careful selection of the starting data is fundamental to avoid the exponential propagation of errors and inaccurate predicted poses. To this end, it is often advisable to select high-resolution structures (≤2.5 Å) and screen/correct amino acid residue rotamers, as assessed through Ramachandran plots [2426].

While the search algorithms are generally able to find the correct pose [26], the scoring function that discerns the most likely and complementary ligand–target complex is often inaccurate at estimating the magnitude of the binding affinity. This is not a trivial task and endures as an active field of research. Binding affinity is best quantified by a free energy change between bound and unbound states as defined in Eq. (1):

$$ \Delta G=\Delta H\hbox{--} T\Delta S $$
(1)
where G is the free energy of the ligand–receptor interaction, H is the enthalpy for the binding event, T is the absolute temperature, and S is the entropy. While the enthalpic contributions to binding can be both measured experimentally and modeled with some accuracy, this is not true for the entropic factor. Contributing to this is the limited information of protein flexibility in docking studies, the critical role of water molecules in mediating ligand–protein interactions or their displacement if unfavorable [2729]. For example, inhibition of HIV protease by transition state mimetics occurs via displacement of a catalytic water molecule [30]. Usually, only a rough estimation of entropy change can be provided or else this is assumed to be identical in all cases. To mitigate this limitation, software tools such as WaterMap, can now evaluate statistically the position and the importance of each water molecule, and estimate if they are either structural or bulk solvent [31, 32]. Indeed, the physics-based modulation of water molecules can directly impact the entropic factor of binding for the ligand–target complex and provide more accurate modeling results [29]. Nonetheless, the scoring function data from mainstream molecular docking software tools should be analyzed with caution. These data serve well the purpose of generating a rank ordered list of ligands and help prioritize further investigations, but do not correlate with binding affinities.

2.1 Identification of Modes of Action with Docking

In keeping the drawbacks of molecular docking in mind, and analyzing the generated binding poses with healthy skepticism, it has been possible to deploy this technology with great effectiveness on natural products with the goal of unveiling putative binding partners that explain therapeutic effects and/or adverse drug reactions. For example, through inverse molecular docking, i.e., docking a single structure into several binding pockets of a large array of proteins, cyclooxygenase-2 (COX2; 56% inhibition at a concentration of 0.4 μM) and peroxisome proliferator-activated receptor gamma (PPARγ; active at concentrations above 10 μM) were identified expeditiously as targets of meranzin (1) (Fig. 2). Importantly, this natural product displayed concentration-dependent effects and potencies comparable to indomethacin (COX2 ligand) and rosiglitazone (PPARγ ligand) [33], suggesting that it could serve as a source of inspiration to design improved target effectors.
../images/480635_1_En_3_Chapter/480635_1_En_3_Fig2_HTML.png
Fig. 2

Predicted interactions between the natural product meranzin (1) and cyclooxygenase-2 (COX2) and the peroxisome proliferator-activated receptor gamma (PPAR γ)

3 Pharmacophore Model-Based Screening

A pharmacophore is the ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target structure and to trigger (or to block) its biological response. A pharmacophore does not represent a real molecule or a real association of functional groups, but a purely abstract concept that accounts for the common molecular interaction capacities of a group of compounds toward their target structure. The pharmacophore can be considered the largest common denominator shared by a set of active molecules. This definition avoids a misuse often found in the medicinal chemistry literature, which consists of naming as pharmacophores simple chemical functionalities such as guanidines, sulfonamides, or dihydroimidazoles (formerly imidazolines), or typical structural skeletons such as flavones, phenothiazines, prostaglandins, or steroids. In summary, Wermuth’s definition of 3D pharmacophores encompasses different regions of molecules in 3D space that encode steric and electronic properties and are responsible for molecular recognition. Through application of the molecular similarity principle, one may then assume that ligands with similar pharmacophore feature arrangements are likely to bind to the same targets. With this in mind, it is then possible to rapidly identify isofunctional molecules without explicitly comparing chemical structures, which may considerably speed up the search process, when compared to molecular docking. Moreover, pharmacophores are a convenient and physicochemically valid means of comparing molecules and perform scaffold hopping, taking into account that entities with similar biological behavior can present disparate frameworks. As with the case of molecular docking, the output of pharmacophore model-based screening may vary considerably, depending on the software tool employed. Indeed, different tools present distinct pharmacophore feature assignment rules (Table 1), but all of them consider a tolerance zone that can be occupied by the atoms conferring a given property/feature.
Table 1

Comparison of pharmacophore feature assignment schemes, by four popular software tools

Feature

LigandScout

MOE

Phase

Catalyst

H-bond

Acceptor and donor located on heavy atom

Acceptor and donor located on heavy atom

Donor located on hydrogen and acceptor on heavy atom

Acceptor and donor located on heavy atom

Lipophilic

Aromatic rings are recognized

Aromatic rings are not recognized

Aromatic rings are not recognized

Aromatic rings are recognized

Aromatic

Represented with plane orientation

Depends on pharmacophore scheme

Represented with plane orientation

Represented with plane orientation

Charge transfer

No explicit charges

Explicit charges

No explicit charges

No explicit charges

It is good practice to take into account a range of different molecules binding to the same target being queried and generate several pharmacophore models for virtual screening purposes. Indeed, despite binding to the same target, it is not uncommon that modulators of a given target recognize different surface patches or recognize particular subpockets within a binding pocket. In this case, the ligands will modulate the same target through disparate binding modes. As such, it is sensible to cluster molecules in the reference ligand set by structural similarity, and generate as many models as the number of chemotypes, if there is no compelling evidence of identical modes of binding. Pharmacophore models may be computed either by performing multiple ligand alignment or ideally, by superimposing known bioactive conformations. In doing so, one is more likely to build relevant models for virtual screening. With such data in hand, features and their tolerance spheres can then be calculated automatically. As in the case of the reference ligand (training) set, conformers must be calculated and stored for the search (test) ligand set. While searching for matches to the pharmacophore model within a conformer set is not a computationally expensive task, the same cannot be said regarding the conformer generation routine. A force field must be selected, the potential energy of each ligand minimized and a user-defined array of energetically distinct conformers assembled. One may intuitively consider that an accurate three-dimensional representation of the ligands is key to the successful use of pharmacophore models. However, it has been suggested that the impact of the bioactive conformation on the overall database enrichment is limited [3436]. Nevertheless, the computation of reasonable low-energy conformers is an important and a difficult task [37, 38]. This consideration is particularly true for natural products [39], for which the high content of stereogenic centers can lead to several inaccurate and/or irrelevant conformers, as exemplified by archazolid A (2) (Fig. 3). Fortunately, as laborious as the conformer generation step may be, each search database needs to undergo the process only once—the output can be stored for future use. Taken together, and considering the caveats of conformer generation, pharmacophore model-based virtual screening is a viable alternative to molecular docking for rapid retrieval of hits.
../images/480635_1_En_3_Chapter/480635_1_En_3_Fig3_HTML.png
Fig. 3

Structure of a natural product and the superimposition of energy-minimized conformers, as computed with MOE (Chemical Computing Group, Canada). Data show that several distinct conformers are generated from the same structure

3.1 Identification of Modes of Action with Pharmacophore Models

Using 3D pharmacophore models, Rollinger and coworkers have successfully interrogated binding and engagement of targets by different natural products and their analogues. For example, a range of metabolites from common rue, Ruta graveolens (Plate 1), were screened against a panel of more than 2000 pharmacophore models to prioritize biochemical assays and experimentally confirm arborinine (3) (Fig. 4) and rutamarin (4) as inhibitors of the human rhinovirus coat protein and the G-protein coupled cannabinoid-2 receptor, respectively [40]. Moreover, hecogenin (5) isolated from the sisal plant, Agave sisalana, the labdane diterpenoid hispanolone (6), from Ballota africana, and lasalocid (7), from Streptomyces lasaliensis, have been identified as modulators of 11β-hydroxysteroid dehydrogenase [41], whereas several depside/depsidones, including perlatolic acid (8) from Pertusaria globularis and physodic acid (9) from Pseudevernia furfuracea were associated with inhibition of microsomal prostaglandin E2 synthase-1 [42]. Finally, PPARγ was identified as target for biphenyl-based natural products, such as dieugenol (10) from aged clove basil (Ocimum gratissimum), magnolol (11) from the cortex of Magnolia officinalis, tetrahydrodieugenol (12) from the flowers of Syzygium aromaticum, and honokiol (13) also from the cortex of M. officinalis [43, 44].
../images/480635_1_En_3_Chapter/480635_1_En_3_Fig4_HTML.jpg
Plate 1

Ruta graveolens. Photograph: Jörg Hempel, Creative Commons 3.0

../images/480635_1_En_3_Chapter/480635_1_En_3_Fig5a_HTML.png../images/480635_1_En_3_Chapter/480635_1_En_3_Fig5b_HTML.png
Fig. 4

Structures of natural products deorphanized through pharmacophore model-based virtual screening

4 Molecular Similarity Searches

Both molecular docking and 3D pharmacophore screening have been applied with great effectiveness to unveil putative binding counterparts for natural products. However, they rely on the computation of meaningful conformations; as discussed above, which is a particularly challenging endeavor. Additionally, these methods persist in being computationally expensive and arguably are of limited throughput.

In contrast to 3D methods, topological (2D) approaches offer viable alternatives of comparable accuracy, yet at a fraction of computational cost and speed [45]. Importantly, from a target inference point of view, the use of topological descriptors is well motivated, as similar ligands (and hence a similar resulting descriptor vector) are likely to bind to identical targets [19]. Thus, using appropriate descriptors/features to compare and correlate small molecules is key for success. Despite the high interest in designing efficient and highly informative descriptors, e.g., the extended 3-dimensional fingerprint (E3FP) that encodes stereochemical information [46], some methods remain mainstream. Among them is the use of physicochemical descriptors, MACCS keys, or extended connectivity fingerprints (ECFPs) of different correlation diameters. Irrespective of the approach undertaken, the goal is to translate molecular structure into computable units that can be compared by one of several available metrics. Arguably, the Tanimoto-Jacquard coefficient/index [Eq. (2)] is the most widely employed metric to compare fingerprints, but others, such as dice similarity and Euclidean or Manhattan distances [Eq. (3)] have equally found applicability in cheminformatics to assess similarity between distinct molecules [47, 48]. The Tanimoto coefficient computes a value between zero and one to quantify the fingerprint similarity. A value of zero means complete dissimilarity between fingerprints of molecules under comparison, whereas a value of one indicates full identity. Therefore, the higher the value, the more similar the molecules will be according to the chosen fingerprint. Although there is no hard cutoff for similarity, it is generally accepted that a value equal or higher than 0.7–0.8 is obtained for similar ligands. Notably, the Tanimoto coefficient will vary significantly, depending on the chosen fingerprint and the number of bits (a certain substructural element) selected to store structural information. This will critically influence the accuracy of the approach and the molecules prioritized for experimental validation.

$$ T=\frac{c}{a+b-c} $$
(2)
where T is the Tanimoto coefficient, a and b are the numbers of bits set for molecules A and B, under comparison, and c is the number of common bits in the fingerprints of molecules A and B.
Euclidean and Manhattan distances can be computed through the Minkowski metric D, according to the formula (3):

$$ D=\sqrt[p]{\sum \limits_{i=1}^n{\left|{\mathbf{A}}_i-{\mathbf{B}}_i\right|}^p} $$
(3)
where n is the number of descriptor elements for molecules A and B. The formula affords the Manhattan and Euclidean distances for p = 1 or 2, respectively.

In principle, any molecule with experimentally confirmed bioactivity against the target of interest can be used as starting point (reference) for similarity searches. However, taking into account that the goal of the method is to retrieve hits from a search database, a high-affinity ligand is a better motivated choice as reference molecule. Naturally, the selection of the descriptors employed and the metric used to assess similarity are cornerstones for the success of a screening campaign [49]. A wealth of screening techniques and software is available (some of which is implemented in open-source pipelining tools like KNIME), and their proper selection depends on the goal, suitability and availability, among others [5052]. Irrespective of the screening strategy, similarity (or distance) values are calculated and stored in a database, which are then sorted in order of decreasing similarity (or increasing distance) to the query/reference molecules. The rank ordered list is provided as output for human inspection, wherein the molecule with the smallest distance or higher similarity is called the “nearest neighbor.”

4.1 Identification of Modes of Action Through Structural Similarity

The Similarity Ensemble Approach (SEA, SeaChange Pharmaceuticals; webserver: http://​sea.​bkslab.​org) [19, 5355] leverages the similarity search concept discussed herein, coupled to probabilistic models to ascertain the relevance of its predictions. Notably, SEA allows prioritizing drug targets for screening with speed unrivaled by the abovementioned 3D methods. Having been developed primarily to identify on- and off-targets for synthetic small molecules, one can expect that high rates of false-positive predictions are obtained for natural products for which the frameworks diverge from those in the reference ligand database (ChEMBL [56]). While a thorough proof-of-concept is warranted, there is encouraging evidence that SEA can also perform efficiently, with natural products such as physalins B, D, F, and G (1417) having been associated successfully with antiplasmodial activity (Fig. 5) [57].
../images/480635_1_En_3_Chapter/480635_1_En_3_Fig6_HTML.png
Fig. 5

Structures of antiplasmodial physalins

5 Machine Learning Methods

Machine (statistical) learning is an (re-)emerging technology in chemical biology and drug discovery, with the potential to reshape how fundamental science is performed [18]. Its greatest value resides in leveraging an increasing amount of chemical and biological data to identify patterns and establish correlations that are otherwise intractable to human analyses [58]. Indeed, the recent progress made in all of computer storage, hardware, and algorithms provides a platform to foster investigations using machine learning as research tool. As in other modeling techniques, e.g., traditional quantitative structure-activity/property relationships, clearly defining the research question is key to allow an appropriate strategy selection. Moreover, a certain amount of quality data is warranted to ensure that generalizable models are obtained for prospective deployment. To this end, tuning hyperparameters and performing cross-validation studies are equally important steps to assess whether the selected algorithm is under- or overfitting the training data. As a consequence of under- or overfit models, performance will be compromised when applied on related, yet previously unseen data. In brief, machine-learning technologies can be subdivided into three different categories, depending on the type of output and data requirements for learning:
  1. 1.

    Regression (supervised learning) if the output is a numeric value

     
  2. 2.

    Classification (supervised learning) if the output is a label

     
  3. 3.

    Clustering (unsupervised learning) if the algorithm associates data solely based on its structure

     

Independently of the method, all machine-learning approaches have proven useful in early drug discovery by streamlining processes and facilitating the design of relevant experiments. On one hand, regression and classification models have been employed prospectively for de novo design of small-molecule effectors [59], prediction of pharmacokinetics [60], prediction of drug-likeness [61], prediction of synthesis routes [62], optimization of chemical reactions [63], and conformational sampling [39], among many others. On the other hand, clustering methods have proven useful in the analysis of bioactivity landscapes [64, 65].

Given its utility for a number of tasks and the increase of bioactivity data for small molecules, machine learning has found applicability in research programs aiming at identifying targets for bioactive molecules of synthetic and natural origin [17]. Indeed, the need for minimal computational effort to afford statistically motivated research hypotheses renders machine learning as an attractive alternative to molecular docking and pharmacophore-based virtual screening.

5.1 Identification of Modes of Action Using Learning Algorithms

The Prediction of Activity Spectra for Substances (PASS) is available as an online tool (http://​www.​pharmaexpert.​ru/​passonline/​) [66, 67], which uses topological fragment structure descriptors [68] and leverages a Bayesian-like method to infer > 2500 kinds of activities, including drug targets, for the queried molecules. Being a Bayes theorem-inspired method, PASS outputs probabilities of a studied molecule being active (Pa) or inactive (Pi). As such, a potentially interesting target for experimental validation will afford Pa > Pi, and the higher the difference, the more promising the target ought to be. To date, several marine sponge alkaloids have been scrutinized with PASS, and antitumor activity has been suggested for the great majority of them (80%) [69]. In addition to antitumor activity, PASS has also been able to predict different kinds of activities for halitulin (18) from the sponge Haliclona tulearensis and betulin bishemiphthalate (19) a derivative of the triterpene betulin obtained from birch bark (Fig. 6). Thus, data suggest that these natural products may find broad applicability as therapeutics upon experimental confirmation of ligand–target correlations.
../images/480635_1_En_3_Chapter/480635_1_En_3_Fig7_HTML.png
Fig. 6

Examples of a natural product 18 and a natural product derivative 19 studied with PASS

Considering the intricate frameworks in natural products and their dissimilarity to those entailed in synthetic molecules in reference datasets, one may argue that fingerprints and substructural descriptors are suboptimal to leverage confident target predictions in natural product space. Indeed, SEA and PASS were designed for synthetic entities, and may afford less accurate predictions than software tools tailored for natural products. To mitigate this limitation, the Chemically Advanced Template Search (CATS) computes topological pairwise correlations of atom types in a given molecule, up to a distance of 10 bonds [70, 71]. This simple pharmacophore descriptor provides a fuzzy and size-independent molecular representation, which has proven well suited for scaffold hopping and correlation of structurally dissimilar chemical entities. According to the CATS descriptors, feature pairs are expressed as the number of bonds along the shortest path connecting two non-hydrogen nodes in the molecular graph. Atoms are typed as one of six possible features: hydrogen bond donor, hydrogen bond acceptor, positively charged, negatively charged, lipophilic, and aromatic, resulting in a 210-dimensional vector (21 feature combinations × 10 bonds) that can be employed to predict drug targets.

Taking advantage of the CATS descriptors, the Self-Organizing Maps (SOMs)-based prediction of drug equivalence (SPiDER) software [8, 72] uses a neural network heuristically inspired to achieve a weighted projection of the descriptor/chemical space onto a toroidal map in unsupervised fashion. To do so, the algorithm takes into account the structure of the input data and runs until convergence or for a user-defined number of epochs. SOMs, such as those implemented in SPiDER are of straightforward interpretation since the local neighborhoods in data are preserved in the projection, i.e., similar data points are located in the same or adjacent neurons. Besides the CATS descriptors, the SPiDER software also uses 2D physicochemical properties computed by MOE (Chemical Computing Group, Canada) to afford a complementary vantage point on data for both reference ligands and queries. Next, through arithmetical combination of the CATS and physicochemical descriptors-derived SOMs, and analyses of background distances between ligands, a consensus output is obtained together with a p-like value that allows assessment of the prediction significance (Fig. 7).
../images/480635_1_En_3_Chapter/480635_1_En_3_Fig8_HTML.png
Fig. 7

Schematics of the SPiDER method workflow

Although SPiDER was originally developed with the goal of inferring targets for de novo-designed small molecules [8], i.e., chemotypes displaying structural dissimilarity to their seed structures, Schneider and coworkers recognized that the same concept could be applied efficiently to deorphanize natural products and interrogate their polypharmacological profiles. In a first report, SPiDER was applied prospectively to the macrocyclic natural product archazolid A (2) [73]. As 2 differs considerably from ligands in the SPiDER reference database, only low confidence predictions could be obtained. The observation led to the deconvolution of the macrocyclic structure into its computationally generated fragments, assuming that the bioactivity fingerprint of 2 could be partly stored into those fragments and used subsequently as surrogate structures for SPiDER processing. Interestingly, different fragments afforded identical confident predictions, which were used to initiate biochemical assays. Compound 2 was confirmed as modulating COX2, PPARγ, glucocorticoid receptor (GR), mPGES-1, and 5-lipoxygenase (5-LO), among others. Albeit not confirmed experimentally, modulation of these targets may contribute to its possible anticancer activity (Fig. 8). Similarly, the highly cytotoxic macrocycle doliculide (20) from the Japanese sea hare (Aplysia juliana) was deorphanized as a nanomolar-potent prostanoid receptor 3 antagonist using synthetically motivated fragments to leverage a target prediction routine. Inhibition of the prostanoid receptor 3 may also be involved in cancer progression [74].
../images/480635_1_En_3_Chapter/480635_1_En_3_Fig9_HTML.png
Fig. 8

Structures of archazolid A (2) and doliculide (20) and bioactivities identified by SPiDER

The SPiDER method has equally shown accuracy in identifying drug targets for fragment-like natural products. While (–)-sparteine (21) modulates the κ-opioid receptor (EC 50 = 245 μM, Fig. 9) [3], isomacroin (22) was found to be an inhibitor of the platelet-derived growth factor receptor alpha kinase (PDGFRα) without selectivity for the beta isoform, but with negligible effects against a panel of diverse kinases [15]. Through substitution of the imidazole ring to the N-methylpyrrole counterpart, activity against PDGFRα was abrogated, which indicated the paramount role of the imidazole moiety as a hinge-binding motif. Indeed, compound 22 is a substructure of a single-digit nanomolar PDGFRβ inhibitor developed by the pharmaceutical industry [75], further attesting to the validation of natural products as starting points for hit-to-lead optimization programs. In another case study, graveolinin (23) was identified as a COX2 and serotonin 5-HT2B modulator. Indeed, inhibition of COX2 may explain the antiplatelet aggregation effect displayed by extracts of Ruta graveolens (Plate 1), for which the major constituent is 23 [76]. Importantly, despite the structural dissimilarity to typical COX2 inhibitors, a similar pharmacophore can explain the prediction made by SPiDER, and suggests that potent COX2 inhibitors inspired by 23 can be developed.
../images/480635_1_En_3_Chapter/480635_1_En_3_Fig10_HTML.png
Fig. 9

Structures of fragment-like natural products deorphanized with SPiDER

Finally, (–)-englerin A (24) (Fig. 10), a known renal antitumor cell agent from the African plant Phyllanthus engleri, which increases intracellular calcium concentration through activation of the transient receptor potential channel canonical 4 and 5 (TRPC4/5) [77, 78] was suggested as a voltage-gated calcium Cav1.2 channel ligand [79]. As has occurred for 2 and 20, prediction of targets with the full natural product structure afforded only few confident predictions. To augment the number of confidently predicted targets, the authors used piperlongumine (25) from Piper longum as a pharmacophore surrogate for SPiDER, assuming that targets inferred for alkaloid 25 would equally represent motivated research hypotheses for 24, from a cheminformatics vantage point. A range of biochemical and cell-based assays confirmed that 24 moderately antagonized Cav1.2 channels (IC 50 = 6 μM). Despite the low relevance of the finding to further explain the antitumor activity of 24, the study afforded a rationale to graft natural product-derived fragments and tailor Cav1.2 modulators. Interestingly, SPiDER was also able to predict TRP channels as binding counterparts for 24, which could significantly speed up target exploration studies. The result from the pseudo-prospective evaluation of 24 is in line with the observation that natural products are privileged ligands of TRP channels [80]. Altogether, the validation of 24 as a calcium channel modulator provides another example of the utility of machine learning in identifying membrane proteins as targets of bioactive matter.
../images/480635_1_En_3_Chapter/480635_1_En_3_Fig11_HTML.png
Fig. 10

(–)-Englerin A (24) and piperlongumine (25) display pharmacophore feature commonalities that allow cross-structure target inference. Cyan = hydrogen bond donor/acceptor; green = lipophilic; orange = aromatic and/or sp2 hybridized

The SPiDER method has recently been replicated to afford the Target Inference Generator (TIGER) tool that also leverages a consensus of two SOMs, but slightly modified CATS descriptors, i.e., without charged features, and a disparate statistical approach. By encoding ligand–target relationships, TIGER is capable of performing qualitative predictions of up to 331 targets [81], among which orexin 1/2, glucocorticoid and cholecystokinin-2 receptors that were experimentally validated for the marine natural product (±)-marinopyrrole A (26) isolated from a Streptomyces sp. (Fig. 11). In an additional prospective application of TIGER, resveratrol (27) was predicted and experimentally confirmed to modulate the estrogen receptor β (ERβ, K i = 0.4 μM) with a reasonable degree of selectivity over its α counterpart (ERα, K i = 4 μM) [82].
../images/480635_1_En_3_Chapter/480635_1_En_3_Fig12_HTML.png
Fig. 11

Structures of natural products that have been studied by the TIGER method

Built with the goal of scrutinizing the qualitative SPiDER predictions, Rodrigues et al. reported the Drug–Target Relationship Predictor (DEcRyPT) software tool [13] that uses random forest technology to predict affinity values for targets of interest. In short, random forest models leverage the individual predictions of a user-defined number of decision trees, built with only a subset of all data. As such, each decision tree functions as a weak estimator, but as an ensemble, more robust and reliable predictions are made, with the added value of theoretically reducing over-fitting and improving the model generalizability. The tool DEcRyPT was built with curated and transformed bioactivity data as collected from ChEMBL, v.22 [56] and the CATS2 descriptors [70]. Applying DEcRyPT to β-lapachone (28) (Fig. 12), originally isolated from the heartwood of the South American Lapacho tree (Tabebuia avellanedae), 5-LO emerged as potential target of interest. Using multiple cell-free assays, the authors confirmed that 28 must be converted to its hydroquinone form, which acts as a nanomolar inhibitor of 5-LO (IC 50 = 240 nM). Importantly, the authors ruled out unspecific inhibition by colloidal aggregates [83], by confirming inhibitory activity of 5-LO independently of the presence or absence of Triton X-100. Moreover, inhibition of 5-LO was comparable in cell-free and whole-cell assays, confirming the absence of permeability issues that could hamper further exploratory work on this chemotype. Unexpectedly, compound 28 displayed selectivity for 5-LO over its congeners 15- and 12-LO, which suggests binding to an allosteric site (Fig. 12). Elaboration of these preliminary findings showed that in fact, compound 28 does not compete with the natural 5-LO ligand—arachidonic acid—nor is a general metal chelator. Conversely, the potency of 28 is reduced significantly in competition assays with phosphatidylcholine, which binds at the interface of the catalytic and C2-like domains. Finally, inhibition of 5-LO by the hydroquinone form of 28 (Fig. 12) could be correlated with the antitumor effects, as cells overexpressing 5-LO are more sensitive to the natural product.
../images/480635_1_En_3_Chapter/480635_1_En_3_Fig13_HTML.png
Fig. 12

Mechanism of anticancer activity of β-lapachone (28). (a) Natural product 28 is converted in the intracellular compartment to the corresponding hydroquinone, which is a potent, reversible, allosteric inhibitor of 5-lipoxygenase (5-LO). (b) Differentiated HL-60 cell line overexpresses 5-LO (left) and are more sensitive to 28 (middle and right). IC 50 (differentiated) = 0.18 μM; IC 50 (control) = 0.39 μM (middle). Percentage of live HL-60 cells in the differentiated and control groups when treated with 0.5 μM of 28. ∗∗p < 0.005 (two-tailed t-Student test)

In another application of DEcRyPT, secondary pharmacology was unveiled for DMP-1 (29)—a synthetic analogue of militarinone A (30) (Fig. 13) isolated from the mycelium of the entomogenous fungus Paecilomyces militaris [84]. The tool DEcRyPT predicted potent modulation of the cannabinoid receptor 1 (CB1) by 29 (predicted affinity of 0.16 μM) with high confidence, which was confirmed experimentally by determining functional antagonism with a potency of 0.32 μM, and displacement of a radiolabeled ligand with a K i value of 3.2 μM [84]. Again, the identification of a trans-membrane protein was facilitated by machine intelligence, which could have hardly been done through chemical proteomics.
../images/480635_1_En_3_Chapter/480635_1_En_3_Fig14_HTML.png
Fig. 13

Structure of militarinone A (30) and its derivative DMP-1 (29)

6 Outlook

Target identification and deconvolution of phenotypic readouts is an important step in early discovery programs. While this is a challenging task for synthetic small molecules, the difficulty is typically magnified for natural products, given the poorer synthetic accessibility and troublesome derivatization tendencies. However, with such knowledge in hand, developing bioactive natural products and designing analogues may be facilitated and assisted by state-of-the-art computational technologies. In this contribution, different in silico methods that can be of utility to unveil pharmacology of natural products have been discussed, and in a broader sense any small molecule of interest, by generating motivated research hypotheses for confirmation in biochemistry laboratories.

There is no universal best method and both 3D and 2D approaches can be deployed efficiently by keeping in mind their caveats and limitations. Still, any computational method is certain to fail occasionally even when properly employed, but more often when applied outside its domain of applicability. However, there is compelling evidence that the accuracy and scope of computational methods are improving considerably. This offers great prospects for more successful case studies in the deconvolution of modes of action and biochemical liabilities of natural products. Much of the current enthusiasm is spearheaded by the emergence of big data, faster computers, and more efficient algorithms for pattern recognition, which parallels the need for sustainable drug discovery. Machine learning is primed to analyze large volumes of data; these algorithms will equally benefit from high-quality negative data, which historically tends to be neglected. With the rise of digital chemistry, it is expected that laborious tasks such as target identification will be increasingly automated, thus opening new avenues for probabilistic drug discovery.

Acknowledgments

Tiago Rodrigues is a Marie Skłodowska-Curie Fellow (Grant 743640) and acknowledges the FCT/FEDER (02/SAICT/2017, Grant 28333) for funding.