DOCKSTRING: easy molecular docking yields better benchmarks for ligand
design
- URL: http://arxiv.org/abs/2110.15486v1
- Date: Fri, 29 Oct 2021 01:37:13 GMT
- Title: DOCKSTRING: easy molecular docking yields better benchmarks for ligand
design
- Authors: Miguel Garc\'ia-Orteg\'on, Gregor N. C. Simm, Austin J. Tripp, Jos\'e
Miguel Hern\'andez-Lobato, Andreas Bender and Sergio Bacallado
- Abstract summary: We present DOCKSTRING, a bundle for meaningful and robust comparison of machine learning models consisting of three components.
The Python package implements a robust ligand and target preparation protocol that allows non-experts to obtain meaningful docking scores.
Our dataset is the first to include docking poses, as well as the first of its size that is a full matrix.
- Score: 3.848364262836075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The field of machine learning for drug discovery is witnessing an explosion
of novel methods. These methods are often benchmarked on simple physicochemical
properties such as solubility or general druglikeness, which can be readily
computed. However, these properties are poor representatives of objective
functions in drug design, mainly because they do not depend on the candidate's
interaction with the target. By contrast, molecular docking is a widely
successful method in drug discovery to estimate binding affinities. However,
docking simulations require a significant amount of domain knowledge to set up
correctly which hampers adoption. To this end, we present DOCKSTRING, a bundle
for meaningful and robust comparison of ML models consisting of three
components: (1) an open-source Python package for straightforward computation
of docking scores; (2) an extensive dataset of docking scores and poses of more
than 260K ligands for 58 medically-relevant targets; and (3) a set of
pharmaceutically-relevant benchmark tasks including regression, virtual
screening, and de novo design. The Python package implements a robust ligand
and target preparation protocol that allows non-experts to obtain meaningful
docking scores. Our dataset is the first to include docking poses, as well as
the first of its size that is a full matrix, thus facilitating experiments in
multiobjective optimization and transfer learning. Overall, our results
indicate that docking scores are a more appropriate evaluation objective than
simple physicochemical properties, yielding more realistic benchmark tasks and
molecular candidates.
Related papers
- Dockformer: A transformer-based molecular docking paradigm for large-scale virtual screening [29.947687129449278]
Deep learning algorithms can provide data-driven research and development models to increase the speed of the docking process.
A novel deep learning-based docking approach named Dockformer is introduced in this study.
The experimental results show that Dockformer achieves success rates of 90.53% and 82.71% on the PDBbind core set and PoseBusters benchmarks, respectively.
arXiv Detail & Related papers (2024-11-11T06:25:13Z) - One-step Structure Prediction and Screening for Protein-Ligand Complexes using Multi-Task Geometric Deep Learning [6.605588716386855]
We show that LigPose can be accurately tackled with a single model, namely LigPose, based on multi-task geometric deep learning.
LigPose represents the ligand and the protein pair as a graph, with the learning of binding strength and atomic interactions as auxiliary tasks.
Experiments show LigPose achieved state-of-the-art performance on major tasks in drug research.
arXiv Detail & Related papers (2024-08-21T05:53:50Z) - Smiles2Dock: an open large-scale multi-task dataset for ML-based molecular docking [0.0]
We introduce Smiles2Dock, an open large-scale multi-task dataset for molecular docking.
We dock 1.7 million from the ChEMBL database against 15 AlphaFold proteins, giving us more than 25 million protein-ligand binding scores.
Our dataset and code are publicly available to support the development of novel ML-based methods for molecular docking.
arXiv Detail & Related papers (2024-06-09T11:13:03Z) - Multi-scale Iterative Refinement towards Robust and Versatile Molecular
Docking [17.28573902701018]
Molecular docking is a key computational tool utilized to predict the binding conformations of small molecules to protein targets.
We introduce DeltaDock, a robust and versatile framework designed for efficient molecular docking.
arXiv Detail & Related papers (2023-11-30T14:09:20Z) - FABind: Fast and Accurate Protein-Ligand Binding [127.7790493202716]
$mathbfFABind$ is an end-to-end model that combines pocket prediction and docking to achieve accurate and fast protein-ligand binding.
Our proposed model demonstrates strong advantages in terms of effectiveness and efficiency compared to existing methods.
arXiv Detail & Related papers (2023-10-10T16:39:47Z) - Class Anchor Margin Loss for Content-Based Image Retrieval [97.81742911657497]
We propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimize for the L2 metric without the need of generating pairs.
We evaluate the proposed objective in the context of few-shot and full-set training on the CBIR task, by using both convolutional and transformer architectures.
arXiv Detail & Related papers (2023-06-01T12:53:10Z) - SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity
Prediction [127.43571146741984]
Drug-Target Affinity (DTA) is of vital importance in early-stage drug discovery.
wet experiments remain the most reliable method, but they are time-consuming and resource-intensive.
Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue.
We present the SSM-DTA framework, which incorporates three simple yet highly effective strategies.
arXiv Detail & Related papers (2022-06-20T14:53:25Z) - Tyger: Task-Type-Generic Active Learning for Molecular Property
Prediction [121.97742787439546]
How to accurately predict the properties of molecules is an essential problem in AI-driven drug discovery.
To reduce annotation cost, deep Active Learning methods are developed to select only the most representative and informative data for annotating.
We propose a Task-type-generic active learning framework (termed Tyger) that is able to handle different types of learning tasks in a unified manner.
arXiv Detail & Related papers (2022-05-23T12:56:12Z) - Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost
Functions [80.12620331438052]
deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features.
Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets.
We argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance.
arXiv Detail & Related papers (2020-06-25T08:46:37Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - We Should at Least Be Able to Design Molecules That Dock Well [5.751280593108197]
We propose a benchmark based on docking, a popular computational method for assessing molecule binding to a protein.
We observe that popular graph-based generative models fail to generate molecules with a high docking score when trained using a realistically sized training set.
We propose a simplified version of the benchmark based on a simpler scoring function, and show that the tested models are able to partially solve it.
arXiv Detail & Related papers (2020-06-20T16:40:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.