We Should at Least Be Able to Design Molecules That Dock Well
- URL: http://arxiv.org/abs/2006.16955v5
- Date: Tue, 13 Jun 2023 18:15:18 GMT
- Title: We Should at Least Be Able to Design Molecules That Dock Well
- Authors: Tobiasz Cieplinski, Tomasz Danel, Sabina Podlewska, Stanislaw
Jastrzebski
- Abstract summary: We propose a benchmark based on docking, a popular computational method for assessing molecule binding to a protein.
We observe that popular graph-based generative models fail to generate molecules with a high docking score when trained using a realistically sized training set.
We propose a simplified version of the benchmark based on a simpler scoring function, and show that the tested models are able to partially solve it.
- Score: 5.751280593108197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing compounds with desired properties is a key element of the drug
discovery process. However, measuring progress in the field has been
challenging due to the lack of realistic retrospective benchmarks, and the
large cost of prospective validation. To close this gap, we propose a benchmark
based on docking, a popular computational method for assessing molecule binding
to a protein. Concretely, the goal is to generate drug-like molecules that are
scored highly by SMINA, a popular docking software. We observe that popular
graph-based generative models fail to generate molecules with a high docking
score when trained using a realistically sized training set. This suggests a
limitation of the current incarnation of models for de novo drug design.
Finally, we propose a simplified version of the benchmark based on a simpler
scoring function, and show that the tested models are able to partially solve
it. We release the benchmark as an easy to use package available at
https://github.com/cieplinski-tobiasz/smina-docking-benchmark. We hope that our
benchmark will serve as a stepping stone towards the goal of automatically
generating promising drug candidates.
Related papers
- Dockformer: A transformer-based molecular docking paradigm for large-scale virtual screening [29.947687129449278]
Deep learning algorithms can provide data-driven research and development models to increase the speed of the docking process.
A novel deep learning-based docking approach named Dockformer is introduced in this study.
The experimental results show that Dockformer achieves success rates of 90.53% and 82.71% on the PDBbind core set and PoseBusters benchmarks, respectively.
arXiv Detail & Related papers (2024-11-11T06:25:13Z) - Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines.
Academic research is often restrained to public datasets on the order of ten thousand samples.
We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z) - Delta Score: Improving the Binding Assessment of Structure-Based Drug
Design Methods [14.272327734087598]
We introduce the delta score, a novel evaluation metric grounded in tangible pharmaceutical requisites.
Our experiments reveal that molecules produced by current deep generative models significantly lag behind ground reference truth when assessed with the delta score.
arXiv Detail & Related papers (2023-11-01T08:37:39Z) - Lo-Hi: Practical ML Drug Discovery Benchmark [0.0]
One of the hopes of drug discovery is to use machine learning models to predict molecular properties.
Existing benchmarks for molecular property prediction are unrealistic and are too different from applying the models in practice.
We have created a new practical emphLo-Hi benchmark, corresponding to the real drug discovery process.
arXiv Detail & Related papers (2023-10-10T08:06:32Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales [65.01417261415833]
We present an approach to predict the pre-training loss based on our observations that Maximal Update Parametrization (muP) enables accurate fitting of scaling laws.
With around 14% of the one-time pre-training cost, we can accurately forecast the loss for models up to 52B.
Our goal with nanoLM is to empower researchers with limited resources to reach meaningful conclusions on large models.
arXiv Detail & Related papers (2023-04-14T00:45:01Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - DOCKSTRING: easy molecular docking yields better benchmarks for ligand
design [3.848364262836075]
We present DOCKSTRING, a bundle for meaningful and robust comparison of machine learning models consisting of three components.
The Python package implements a robust ligand and target preparation protocol that allows non-experts to obtain meaningful docking scores.
Our dataset is the first to include docking poses, as well as the first of its size that is a full matrix.
arXiv Detail & Related papers (2021-10-29T01:37:13Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.