1 Introduction
Pharmaceutical drug discovery is a long-lasting and costly process, spanning over 12 to 15 years and costing about 1–2 billion US Dollars [1]. The process to identify new active pharmaceutical ingredients (API) [2] starts with target identification and validation steps and follows hit identification, lead discovery and lead optimization to acquire safe and effective new drug molecules at the preclinical stage [3]. Biological screening is used to identify possible target of a hit molecule as a developable drug-candidate as the first step in drug discovery. Advances in systematic biological screening have generated automated parallel biological screening technologies, called high throughput screening (HTS) [4]. Virtual screening (VS) is a widely applied computational approach, which is performed as a hit identification method in early stages of drug discovery pipeline. VS protocols involve searching chemical libraries to identify hit compounds with a putative affinity for a specific biological target (enzyme or a receptor) for further development.
CADD methods in conjunction with VS studies emerged as valuable tools to speed up this long process and limit the cost expansion of R&D. Such studies demand a strong combination of computational resources and skills, biochemical understanding and medicinal motivation.

Drug discovery pipeline presented together with some of the computational approaches which are used to rationalize the process.
CADD implementations deal with computational calculations of both pharmacokinetics and pharmacodynamics parameters. Therefore, ADME and even toxicity (Tox) properties of a given compound can be predicted with computational chemistry programs prior to any experimental studies. Furthermore, in silico approaches can also be applied to determine the putative interactions between a ligand and a receptor (Fig. 1).
In this chapter, we briefly address challenges and applications of biochemical - and computational-drug discovery and development approaches, and their transformation in VS to be applied in cloud platforms.
2 Background
2.1 Drug Discovery and Development
Until the 19th century, the drug discovery and development process was based on the trial and error learning approach for diagnosis and curing the diseases. The therapeutic effect was completely produced with natural products (NPs) [6]. These drugs were obtained from the whole or a fraction of the NPs that contain the active pharmaceutical ingredient [7]. Natural compounds are an important source for drugs, helped with improved sensitivity and better means for their biochemical separation [8].
Starting with the first successful in vitro organic synthesis in laboratory by Wöhler [9], it was clear that organic compounds could be produced out of the bodies of the living organisms. However, to synthesize chemically different organic compounds, structural information had to be explained in a more efficient way. In 1858, Kekulé proposed structural theories which successfully followed by different theories from the different scientists [10] leading to the discoveries of new findings [11].
Research Paradigms in Classical Terms of Drug Discovery
Magic Bullet.

Summary of the biological understanding after the “magic bullet” approach.
Research Paradigms in Modern Terms of Drug Discovery

Common diseases are polygenic, and this is one of the biggest challenges in the field.
Magic Shotgun.

Single compound may affect multiple targets. Such side-effects may be positive or undesired.
Several in silico approaches have been developed to predict side-effects and to repurpose drugs that are already on the market.
Individualized Medicine.
The genetic polymorphisms of drug targets, metabolizing enzymes, or transporters may explain differences in the molecular pathophysiology of patients, even of those who are assigned with the same diagnosis [17]. The aim is to find drugs to compensate genetic deficiencies, i.e. to fight the disease at its roots. As of today, with the transcriptome and the genetic parameters obtained from a patient’s tissue or blood, one can be informed about their contributions to disorders [18]. For instance, pharmacogenetics investigates how a genetic variation affects the binding site of a drug. That may suggest a higher dosage because of a disturbed molecular recognition.
Target Fishing.
It can be regarded as the inverse screening wherein the ligand is profiled against a wide array of biological targets to elucidate its molecular mechanism of action by experimental or computational means.
Drug Repositioning.
An approach to identify new medicinal applications for approved drugs to treat other diseases because the drugs may bind to other receptors.
Polypharmacology.
Special ligand design approach to exert an effect on multiple disease-associated biological targets.
2.2 Molecular Recognition Theories

Key and lock model.
Induced Fit.

Induced fit model.
Thermodynamics in Ligand-Protein Binding.

Desolvation effect and thermodynamics in ligand-protein binding.

In this equation, R is the gas constant, T is the absolute temperature, Kd is the equilibrium constant and Ki is the inhibitory constant that exist in the Cheng-Prusoff equation [40].
2.3 Chemical Space
In 1996, Regine Bohacek, generated predictions about possible chemical compound types that might be chemically accessed. Her estimation was pointing to 1060 chemical compounds making up “chemical space” that was virtually identified by using carbon, oxygen or nitrogen atoms, and by considering linear molecules with up to 30 atoms. While making the predictions, Bohacek regarded chemical groups to be stable and chemical branches, ring structures and stereochemical possibilities were taken into account [22].
In later studies, “the limits of the chemical space” was drawn to be between 1018–10200, according to the results of the analyses by different methods and descriptors. Although there are many reports about this range to be accepted as Bohacek defined. But it is expected that this number will continuously increase by the discovery of new chemical skeletons [23–26]. Additionally, the number of organic compounds accessed experimentally is 108, according to CAS and Beilstein databases which contain records obtained from the scientific papers, those have been published by the scientific community, since 1771 [27, 28].
2.4 Rational Drug Design
The increasing scientific knowledge for novel drug discovery has opened new horizons and generated useful technological developments for the researchers in the field. When these new tools are wisely brought together with recent knowledge, they would provide many advantages in drug design and development studies. Moreover, the available theoretical and experimental knowledge about drug safety and the idea of being appropriate for human use generate extra difficulties for drug design and development [29]. It is known that not all candidate molecules with high potency can reach to a drug status due to several reasons such as inefficient systemic exposure, unwanted side effects and off-target effects. Also, a drug may not be right for every patient due to the genetic variations and off-target binding. This also effects drugs that are already on the market (Table 1).
However, molecular reasoning may give second chances for drugs that once failed in late clinical studies (at great expense) or that have been retracted from clinical use. With an improved molecular understanding, and with hindsight from the now feasible pharmacogenetics and –genomics, these compounds have a chance to find their niche for a reentry.
Definition of the key terms related with informatic approaches used in early stage drug discovery.

3 Computer Aided Drug Design (CADD)
Development of mathematical formulas to calculate the potential and kinetic energies of biomolecular systems has made possible the implementation of such complex calculations with computers [34]. CADD is applicable for hit molecule discovery for new different chemotypes and for designing new derivatives.
CADD processes may be divided into molecular mechanical methods and quantum mechanical methods. In both techniques, the results are obtained through energy-based calculations. Molecular mechanics deals with the calculations at the molecular level that can be performed on an atomic basis, while quantum mechanics involves electron related complex calculations performed at the quantum level [34].
During existence of the obscurity in drug discovery studies, it is hard to reach the desired target. But, the physicochemical parameter as a factor can be useful about this topic by measuring the ADME properties [30, 35]. Also, the drug-candidate should be bound to its target with high affinity [30]. In relation to that, drug design processes are carried out within the framework of selected strategies, with the acquisition of three-dimensional bioactive conformation of the ligands. CADD is used for identifying and designing biologically active compounds and this field can be synergistically integrated with all other medicinal chemistry related fields like pharmacology, biology and chemistry [36].
3.1 Ligand-Based Drug Discovery

Basic workflow of ligand-based modeling applications.
Quantitative Structure Activity/Property Relationships (QSAR/QSPR).

In the course of the history, many researchers have conducted studies to relate physicochemical properties and biological effects of compounds with different approaches. Through these studies, the necessity to consider molecular substitutions has emerged to try to explain biological effect or physicochemical property of a chemical structure. In 1937, Louis Hammett compared ionization rates of benzene derivatives substituted in various positions [40]. The first quantitative parameter was determined as sigma (σ), the potential electronic contribution value, which is defined by calculating electronic substituent constant values. This was the first identified quantitative parameter [38]. Then the first steric parameter for ester hydrolysis was determined by Taft as Es constant [41]. Many other two-dimensional parameters have been continued to be developed for being used in QSAR studies [38]. Later, a multi-parametric formula was presented by Corwin Hansch that brought these parameters together. The formula was designed to calculate minimal concentration needed to mathematically formulate the biological activity, as logarithm of concentration and was measured with several independent factors in different cases; such as partition coefficient (log P), aromatic substituent constant (π), electronic substituent constant (σ), steric parameter (Es). Free-Wilson [42], Fujita-Ban [43], A Mixed Approach - Based on Hansch and Free-Wilson Analysis [44], and Kubinyi Bilinear Analysis Method [45] are some of the first generated models used in QSAR analysis.
When all the steps are evaluated, it can be observed that QSAR has different applications. Structural activity or physicochemical property studies performed with one independent variable are named 0D-QSAR calculations and the studies with multivariate equations are called 1D-QSAR. In such equations, log P and Hammett constant can be used as independent variables. Studies that take into account the molecular descriptors and fingerprints containing information about structural and bonding characteristics resulting from the two-dimensional molecular presentation are called 2D-QSAR, if extra structural information (e.g. chirality) is included in studies these studies are named as 2.5D QSAR [46].
Summary of some of the 3D-QSAR approaches.

Pharmacophore Modeling.
A pharmacophore aggregates functional groups of chemical compounds which are responsible for the biological response and which exhibit appropriate interaction with biological target. The term pharmacophore modeling refers to the identification and 3D display of important pharmacophore groups to illustrate the basic interactions between the ligand and the receptor. Pharmacophore modeling is generally applied to determine common structural features within a series of similar or diverse molecules by subtracting 3D maps. Once determined, the generated pharmacophore hypotheses may be used to virtually screen and to predict the biological activity of other molecules [37]. The model is generally obtained from the information belonging to the ligands. However, it is also possible to generate a pharmacophore model from the receptor itself or with a combined approach as well.
After the formation of possible bioactive conformers of the compounds, pharmacophore model can be generated in 3D by aligning the structures and mapping the 3D binding properties. More than one pharmacophore hypothesis can be generated and the most suitable one(s) can be identified by enrichment factor within their applicability domain. While generating the model(s), the optimum volume of each pharmacophore property takes the major interest.
The aim of pharmacophore modeling is to determine the optimum volume for the identified properties. If identified volumes are larger than required, the selectivity of active compounds by this model decreases and active and inactive compounds can be found together by using this model. Conversely, if the fields are smaller than they need to be, the active compounds cannot be identified by pharmacophore screening. While creating a pharmacophore model; HBA and HBD features, hydrophobic (aliphatic or aromatic) properties and negative/positive charges can be used. Besides, it is possible to identify the desirable/undesirable regions or features without any specificity [37]. Most pharmacophore modeling programs create these properties according to optimized ligand-receptor interactions. The ideas behind pharmacophores are also applied to specify new compounds that aggregate as many pharmacophores, e.g. from a series of in silico ligand docking experiments or after a first in vitro validation. The assembly of new compounds can be iterative, i.e. growing from existing binders, or starting a de novo design of novel drugs.
Machine Learning (ML).
Relations between ligand/receptor structural properties and biochemical effects are found by statistical models. With the advent of computational chemistry and advances in structural biology, an increasing number of features may be identified in silico for any given ligand and receptor, which may affect virtual screening. The integration of all these data sources and tools allows for the formulation of many models.
ML covers the computer-aided development of models (or rules) and their evaluation. The use of ML techniques for drug discovery has been increasing recently both in ligand-based and structure-based studies to find rules from a set of molecular features and to predict a specific property [49]. Estimation of ADME-Tox properties by the use of physicochemical descriptors, generation of hit or lead molecules with the studies on prediction of biological activity, development of homology models, determination of bioactive conformation by the help of docking studies or pharmacophore modeling are some of the examples of ML applications in drug discovery [49].
ML can be applied as regression or classification models [49]. In the regression model, quantitative models are formed automatically. Statistically the most appropriate model is selected from the generated ones. Classifiers utilize such models to cluster the data. The learning process is carried out by known characteristics of the molecules to predict their activities/properties. In the classification model, the branches are formed on a classification tree. Biological and chemical data are placed on the leaves of the branches of the tree. It can be generated and used for various statistical decision-making scenarios. Artificial neural networks, support vector machines, decision trees and random forest techniques are some of the most applied ML techniques used in drug discovery studies [49].
3.2 Structure-Based Drug Design

Obtaining the 3D protein structure.
Homology Modeling.
Homology modeling techniques are used to predict 3D representations of biomolecular structures. The model generation is done with the sequence of monomers (nucleotides or amino acids). The applied algorithm transfers spatial arrangements from high-resolution crystal structures of other phylogenetically sequence-similar biological structures [58].

Basic workflow for homology model generation.
Docking.

Basic workflow for docking simulations.
Docking concepts, and their limitations.
Technology | Limitations |
---|---|
Rigid Docking. Receptor structure is treated as a rigid body and the ligands are prepared with conformational sample(s), then fit into the active site of the protein | The overall protein structure and its active site residues are flexible, this affects the binding orientations of the ligands in docking results |
Induced-fit Docking. Flexibility of the active site residues is considered for the protein and these flexible active site residues flexibly adapt to accommodate the ligand | It is a computationally expensive approach that requires careful arrangement of the docking parameters and cuts off the values prior to docking simulation |
Covalent Docking. In this method, binding region of the receptor that ligands bind covalently is identified and the docking procedure is performed in this specific condition | This method may generate chemically wrong binding pattern and incorrect binding pose but lowers the computational search costs of docking |
Peptide Docking. Peptide docking method is used to determine binding modes of peptide structures in the active site of its biological target | Peptides, or fractions of proteins, are large and flexible ligands and hence difficult to parameterize and computationally expensive compared to the small molecules |
Protein-Protein Docking. It is the general name of the docking method that is used to predict protein-protein or protein-DNA interactions those are taking place biologically | This is the most computationally demanding approach to simulate the interactions, due to the size and complexity of the macromolecules |
Reverse Docking. It can be applied for target fishing the drugs or active substances, or to become aware of side-effects | Conceptionally, this approach requires a catalog of configurations for screening many targets by docking [66] |
Molecular Dynamics (MD).

General workflow for MD simulations.
4 Application of Virtual Screening

General VS workflow.
4.1 Accessing Compound Databases
Compound databases.
Database/URL | Comment |
---|---|
BindingDB [69] www.bindingdb.org | Around 1.5 million binding affinity data points are available which are obtained from patents and scientific articles |
chEMBL [70] www.ebi.ac.uk/chembl | More than 15 million biological activity results are available curated from scientific papers and patents |
ChemSpider [71] www.chemspider.com | Contains over 67 million structural data, obtained from different data sources |
DrugBank [72–76] www.drugbank.ca | Database of drug entries linked to their biological targets |
DrugPort www.ebi.ac.uk/thornton-srv/databases/drugport | Structural representation of drugs with their receptors with data from DrugBank and PDB |
eMolecules www.emolecules.com | Database comprises over 7 million screening compounds and 1.5 million building blocks, which are ready to order |
Molport www.molport.com | Comprising around 20 million ready to be synthetized molecules, over 7 million screening compounds and around 0.5 million building blocks, which are ready to order |
MMsINC [77] mms.dsfarm.unipd.it/MMsINC/search | Helps to search subsets of the database, containing over 4 million unique compounds with tautomeric and ionic states at physiological conditions as well as possible stable conformer for each molecular entry |
Open PHACTS [78] www.openphacts.org | Large knowledge-base of data integration for compounds, their receptors, and pathways |
PubChem [79] pubchem.ncbi.nlm.nih.gov | Contains around 100 million compounds and 250 million bioactivity data records |
ZINC [80] zinc.docking.org | Contains around 740 million purchasable molecules in 2D and 220 million purchasable lead-like molecules in 3D |
4.2 Preparing Ligands for Virtual Screening
It is important to assign the correct stereoisomeric or tautomeric forms and protonation states of each ligand at a specified pH to avoid changing physicochemical and conformational behavior of the molecules. Parameterization during this stage must be done very carefully for a VS study, because the molecules are generally stored in 1D or 2D within the databases. For example, a chiral entry within the database may be in a single enantiomer or a racemic mixture or a molecule may not be neutral as it is stored in the database and may be at different protonation states at different pH values.
Conformational flexibility of ligands is also important and computationally more expensive. Every additional rotation increases the number of conformations that needs to be generated and this results with a computationally more expensive VS process.
4.3 Compound Filtering
The main idea of compound filtering is to exclude the molecules which are not carrying suitable pharmacokinetic or pharmacodynamic properties. The aim is to prevent possible risks in preclinical or clinical phases of the studies. This also helps to keep down computational costs in VS studies.
Clustering by Chemical Diversity.
It is possible to estimate the chemical similarities of the compounds by computer-based calculations. This lets identification of similar or diverse subsets of the libraries. Many different techniques are used for this purpose. The basic approach is to use similarity algorithms based on mathematical parameters obtained from chemical descriptors [81, 82]. Such calculations are generally done by calculating molecular fingerprints, which are a way of encoding the structure of a molecule, usually as binary digits used to symbolize the presence or absence of a chemical substructure within the molecule allowing for a similarity search in a chemical database. The most common similarity and classification algorithm is Tanimoto’s similarity estimation and classification practice that regulate mathematical classification studies [82].
Filtering with Physicochemical Parameters.
One of the major examples of filtering by pharmacokinetics related approaches is based on the scientific analysis of the drugs. Most drugs in the market can be used orally and it is found that there is a generally applicable rule to determine whether a biologically active compound has physicochemical properties that would make it an orally administrable drug in humans. In 1997, it is formulated by Christopher Lipinski [83, 84] and called Lipinski’s rule of five (RO5). It is designed to evaluate drug-likeness of a compound, to be used in drug discovery and development studies. The properties of RO5 depend on; a molecular weight less than 500 Dalton, an octanol-water partition coefficient (log P) less than 5, equal or less than 5 HBD groups, equal or less than 10 HBA. Drug candidates which have properties of the RO5 tend to have more success in clinical trials, so have more chance to reach to the market. Such approaches let the identification of drug-like chemical space to fall within the biologically relevant subspace with a better pharmacokinetic profile of a vast unbounded chemical space.
Filtering with Ligand-Based or Receptor-Based Screening of Chemical Libraries.
Ligand-based approaches focus on model generation with pre-established binding affinity data of small molecules against biological targets. These approaches are used with calculated descriptors to predict the characteristics within the molecular database. Though, structure-based approaches don’t necessarily rely on existing data and try to place the molecules in the binding sites of the target and evaluate their potential affinity by binding scores within the complex biomolecular structures. Such computationally expensive calculations are preferred to run on distributed compute platforms to speed up the in silico screening process.
Ties with Pre-clinical Research.
The search for new biologically relevant compounds is not possible without a laboratory result in which theories are confirmed, refined or disproved. Considerable time and effort are needed to be invested into the development of such tests. Eventually, it is needed to transfer the findings to human tissues.
VS may support the selection of the ligands for testing once a drug target is identified. This process is long and demands deep insights into the molecular understanding of the pathology of a disease. The selection of a target is likely to be supported from genetic studies, i.e. an observed association of DNA variation with disease phenotypes, hints the gene to target. Knock-down/out experiments are performed in the laboratories to confirm the molecular mechanism that trigger the disease.
Many researchers on rare diseases do not have industrial partners, and no drugs may be marketed for such “orphan” diseases. There are many rare diseases and consequentially many patients. To find new drugs, the availability of in silico approaches may be a significant guidance for the researchers, e.g. for repositioning drugs already on the market. The same approach can be applied to identify the biological targets of traditional medicines obtained from NPs [6].
Ties with Individualized Medicine.
Individualized medicine identifies genetic variations that are causative for a disorder for every patient and adjusts therapy accordingly. Once a compound binds to its receptor, this yields a detailed molecular understanding how the drug works.
Such insights are of interest whenever a genetic variation may affect the binding of a drug to its receptor. Some drugs address cells with a high mutation rate, like a virus or a tumor cell. Mutations appear at random, but when the pathogen/cell benefits from it because of a lower binding affinity to the drug, it will continue to divide and pass that genetic information on. Thus, the pathogen or tumor evades the therapy. For that reason, The FightAIDS@Home project has searched for compounds with strong binding affinities to many sequence variations of the viral protein target [85].
Today, cancer patients are monitored by DNA sequencing to learn the changes in relative frequencies of genetic variations of the disease-tissue [86]. The adoption of next-generation sequencing technologies is a recent development. It will indicate the frequency of “escape routes” that may be closed with new drugs for the same target, possibly in the same binding pocket. It is with the same modeling technology that is at the root of in silico screening that this data can be interpreted. Without a molecular interpretation, clinicians would need to wait for insights from observations of how nuclear variation and subsequent therapy decisions affect the clinical outcome. A molecular reasoning, in contrast, is possible for every patient individually.
Automation.
Several tools are available to prepare receptors and ligands. Glue language can be used to integrate different processes of such calculations [87]. In addition, there are workflow environments for CADD in particular like OPAL [88] or KNIME [89]. The users can easily develop any integration of CADD workflow that needs to be automated. Recent works integrate formal descriptions in databases (like bio.tools) [90] for content-dependent semantics-driven execution of the tools [91].
4.4 Experimental Evaluation and Validation
- (a)
Test of binding affinities (calorimetry, SPR)
- (b)
Induced structural changes (co-crystallization, NMR)
- (c)
Gene/Protein-function inhibition/induction (cell line, tissue sample)
- (d)
Effect on disease models (animal trials)
These tests are expensive, may cost the lives of animals. If the wrong compounds are chosen for testing, then this delays the treatment of humans. Testing diverse sets of compounds is preferred with respect to their binding mode and chemical structure.
5 Infrastructures and Computational Needs
Virtual screening is a data parallel process. The throughput of ligands tested in the computer almost linearly scales with the number of processors contributing. A second concern is the number of applications contributing – to the screening itself. The availability of high-performance computing (HPC) infrastructure is needed to speed up the in silico screening studies. In addition, accessing stronger HPC facilities has direct impact on covering bigger chemical and conformational space of compounds in VS studies or testing higher numbers of druggable targets to identify biomolecular targets of the compounds in target fishing studies.
When it is applied on thousands or millions of a series of chemical compounds, the total VS simulation time exceeds the limits of single workstations. For example, a molecular docking-based VS run, covering chemical space of millions of compounds, may take years of computation time on a single workstation. The overall compute time is fixed, also with multiple machines contributing. But by distributing the load, the effective wall-clock time can be lowered. The same VS run can be taken to months-scale by accessing hundreds of central processing unit (CPU) cores with an in-house small or medium scale HPC infrastructure or hours/days-scale by accessing thousands of CPU cores with supercomputers or on-demand cloud resources. Recent technical advancements to employ graphical processing unit (GPU) for parallel computing have fostered the concept to apply on data, i.e. deep learning or molecular dynamics.
We here discuss alternative sources for computational infrastructures and single out unique properties of the cloud environments.
5.1 Local Compute Cluster
Many computers are integrated in a local network and combined with a grid software to distribute compute jobs. The concept of a batch system is familiar to IT specialists who maintain such hardware. Larger clusters with special hardware to allow for fast I/O are referred to as HPC environments. The VS is mostly compute-intensive, data parallel computation with individual compute nodes which do not need to communicate with each other during the computations.
5.2 Volunteer Computing
Volunteer computing refers to the concept of having a program running in the background addressing a scientific problem. If several thousand individuals contribute, then one has a compute power or storage resource that can compete with the large computational clusters. It should be noted that one needs to interact with the community to keep the momentum.
Several in silico screening runs have been performed voluntarily. One of the most well-known was the search for an HIV protease inhibitor in FightAIDS@Home [85] with the underlying BOINC technology [92]. Because of the enormous compute power acquired by devices that are not operated by humans, the decentralization of computing is a strategically important route. Tapping into these resources for computing and to generally support the interaction of devices at the periphery is called as edge computing.
5.3 Cloud Computing
Cloud technologies are a means to integrate different workflows and tools on dynamically allocated resources. These instances share respective services in their specification of CADD studies, and cloud computing is an emerging solution for VS needs. The infancy of cloud solutions was in “IaaS”, i.e. dynamically configured and started machines that were paid on use. Today, IaaS services are provided by many compute centers at very competitive price levels. Setups are easy to use for individuals familiar with remote access with command line interface (CLI) or graphical user interfaces (GUI).
Researchers in both academia and industry are likely to have access to a local compute cluster. But this is costly when the expenses of hardware upgrades, electricity and maintenance are added. Remote sources are highly competitive in pricing. This is meant for IT-savvy individuals who could use local resources if they had them. There is technology allowing these remote resources to behave much like an extension of a local cluster.
The most known cloud service providers are Amazon Web Services [93] and Google Cloud [94] but there are increasing numbers of open source cloud solutions that may be of interest. The OpenStack middleware [95] has evolved into a de facto standard to set up public or private cloud environment. When a software is accessed via a web interface, the server-side implementation becomes hidden. A look to the side, e.g. how payments are performed online with an interplay of many service providers, tell what clouds also mean: An interplay of services that scale.
Classification of cloud services.

5.4 Cloud-Based Solutions
Recent reviews [97, 98] and the “Click2Drug” catalog [99] give a vivid expression on describing many different sources for many partial solutions – available as semi-automated web services or as instances that may be started at one’s independent disposal. How exactly a transfer of information between these tools should be performed, or how these tools should be run to generate redundant findings with then higher confidence – all this is yet to be clarified – both semantically and technically.
In silico approaches have been integrated in drug discovery pipeline and big pharmaceutical companies are likely to perform these often. Cloud environments are used for their advantages to reduce setup costs for professionals. Also, for creative new approaches in tackling a disease, the information on which selected target is already crucial not to lose competitive advantage. This may not be acceptable to be computed in extra mural facilities. However, for common targets, the interception of individual VS results, e.g. from a volunteer computing project, may not be considered critical.
Cloud-based solutions for virtual screening and target fishing. The table lists services that describe features of ligands or their receptors, receptor-based and ligand-based services. Services integrating multiple tools are tagged as a workflow. F stands for “Feature”, R for “Receptor-based”, L for “Ligand-based” and W for “Workflows”.
Service | Properties | URL | Comment | |||
---|---|---|---|---|---|---|
F | R | L | W | |||
3decision | X | X | X | Collaboration environment for researchers to exchange opinions on ligand-receptor interactions | ||
AceCloud [100] | X | www.acellera.com/products/acecloud-molecular-dynamics-cloud-computing | Cloud image to run MD simulations with CLI on Amazon Cloud, can be used for high-throughput MD [101] | |||
Achilles [102] | X | X | Blind docking server with web interface | |||
BindScope [103] | X | X | X | X | Structure-based binding prediction tool | |
DINC WebServer [104] | X | A meta-docking web service for large ligands | ||||
DOCK Blaster [105] | X | Automated molecular docking-based VS web service | ||||
DockingServer [106] | X | Molecular docking and VS service | ||||
Evias Cloud Virtual Screening Tool [107] | X | X | Integrated with chemical library management system as a scalable HPC platform for structure-based VS service as a web service | |||
HDOCK [108] | X | Protein-protein and protein-nucleic acid docking server | ||||
HADDOCK Web Server [109] | X | Web-based biomolecular structure docking service | ||||
idock [110] | X | Flexible docking-based VS tool with web interface | ||||
iScreen [111] | X | X | A web server which started with docking of traditional Chinese medicine | |||
LigandScout Remote [112] | X | X | X | X | A desktop application, letting access to the cloud-based HPC for VS studies | |
mCule [113] | X | X | X | X | Integrates structure-based VS tools with purchasable chemical space | |
MEDock [114] | X | X | A web server for predicting binding site and generating docking calculations | |||
MTiOpenScreen [115] | X | X | bioserv.rpbs.univ-paris-diderot.fr/services/MTiOpenScreen | Integration of AutoDock and MTiOpenScreen in bioinformatics Mobyle environment | ||
ParDOCK [116] | X | Fully automated rigid docking server, based on Monte Carlo search technique | ||||
X | A web-server for generating docking simulations with geometry and shape complementarity principles | |||||
Polypharmacology Browser 2 (PPB2) [119] | X | A web server letting target prediction for the ligands | ||||
ProBiS [120] | X | X | X | A web-based analysis tool for binding site identification | ||
ProBiS-CHARMMing [121] | In addition to ProBis, it is possible to do energy minimization on ligand-protein complexes | |||||
Py-CoMFA | X | X | Web-based platform that allows generation and validation of 3D-QSAR models | |||
SwissDock [122] | X | X | Web-based docking service to predict ligand-protein binding | |||
USR-VS [123] | X | Ligand-based VS web server using shape recognition techniques | ||||
ZincPharmer [124] | X | X | Online pharmacophore-based VS tool for screening the purchasable subset of the ZINC or Molport databases |
6 Conclusions
The goal of VS is to identify novel hit molecules within the vast chemical space to ameliorate symptoms of a disease. Hereto, the compound interacts with a target protein and changes its action in a pathological condition. None of the in silico molecular modeling techniques can generate perfect models for all kinds of biochemical processes. However, a large variety of tools is provided by the scientific community. Alone or in combination, there are available VS technologies which suite the problem at hand. With clouds as HPC resources, complete workflows have been established to directly address the needs of medicinal chemists. The same technology also supports collaborative efforts with computational chemists to adjust workflows to emerging preclinical demands.
Another problem to address is the demand for HPC facilities for VS projects. These are indirect costs when using workflows already prepared as a cloud service. Clouds are attractive both for scaling with computational demands and for the enormous diversity of tools one can integrate. To minimize the costs and to complete the computation as quickly as possible, workflows should be well considered to select which tool to use while performing VS studies.
VS projects demand the investment of considerable time prior to the computation for their preparation and afterwards for in vitro analyses. For any isolated project it is likely to be more cost effective to share the compute facilities and expertise by using cloud-based solutions.
The authors of this text hope to have helped with an initial team building for interdisciplinary projects: There is a lot of good work to be done, algorithmically or on pre-clinical data at hand, but few groups can develop and use these techniques all on their own.
Acknowledgement
This work was partially supported by the Scientific and Technological Research Council of Turkey (Technology and Innovation Funding Programmes Directorate Grant No. 7141231 and Academic Research Funding Program Directorate Grant No. 112S596) and EU financial support, received through the cHiPSet COST Action IC1406.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.