Protein-Ligand Docking Surrogate Models: A SARS-CoV-2 Benchmark for Deep
Learning Accelerated Virtual Screening
- URL: http://arxiv.org/abs/2106.07036v1
- Date: Sun, 13 Jun 2021 16:27:38 GMT
- Title: Protein-Ligand Docking Surrogate Models: A SARS-CoV-2 Benchmark for Deep
Learning Accelerated Virtual Screening
- Authors: Austin Clyde, Thomas Brettin, Alex Partin, Hyunseung Yoo, Yadu Babuji,
Ben Blaiszik, Andre Merzky, Matteo Turilli, Shantenu Jha, Arvind Ramanathan,
Rick Stevens
- Abstract summary: We show surrogate docking models have six orders of magnitude more throughput than standard docking protocols.
We demonstrate the power of high-speed surrogate models by running each target against 1 billion molecules in under a day.
Our analysis of the speedup explains that to screen more molecules under a docking paradigm, another order of magnitude speedup must come from model accuracy rather than computing speed.
- Score: 0.2561613071031438
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a benchmark to study surrogate model accuracy for protein-ligand
docking. We share a dataset consisting of 200 million 3D complex structures and
2D structure scores across a consistent set of 13 million ``in-stock''
molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome.
Our work shows surrogate docking models have six orders of magnitude more
throughput than standard docking protocols on the same supercomputer node
types. We demonstrate the power of high-speed surrogate models by running each
target against 1 billion molecules in under a day (50k predictions per GPU
seconds). We showcase a workflow for docking utilizing surrogate ML models as a
pre-filter. Our workflow is ten times faster at screening a library of
compounds than the standard technique, with an error rate less than 0.01\% of
detecting the underlying best scoring 0.1\% of compounds. Our analysis of the
speedup explains that to screen more molecules under a docking paradigm,
another order of magnitude speedup must come from model accuracy rather than
computing speed (which, if increased, will not anymore alter our throughput to
screen molecules). We believe this is strong evidence for the community to
begin focusing on improving the accuracy of surrogate models to improve the
ability to screen massive compound libraries 100x or even 1000x faster than
current techniques.
Related papers
- Foundation Models for Structural Health Monitoring [17.37816294594306]
We propose for the first time the use of Transformer neural networks, with a Masked Auto-Encoder architecture, as Foundation Models for Structural Health Monitoring.
We demonstrate the ability of these models to learn generalizable representations from multiple large datasets through self-supervised pre-training.
We showcase the effectiveness of our foundation models using data from three operational viaducts.
arXiv Detail & Related papers (2024-04-03T13:32:44Z) - Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines.
Academic research is often restrained to public datasets on the order of ten thousand samples.
We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z) - Semi-Autoregressive Streaming ASR With Label Context [70.76222767090638]
We propose a streaming "semi-autoregressive" ASR model that incorporates the labels emitted in previous blocks as additional context.
Experiments show that our method outperforms the existing streaming NAR model by 19% relative on Tedlium2, 16%/8% on Librispeech-100 clean/other test sets, and 19%/8% on the Switchboard(SWB)/Callhome(CH) test sets.
arXiv Detail & Related papers (2023-09-19T20:55:58Z) - Publishing Efficient On-device Models Increases Adversarial
Vulnerability [58.6975494957865]
In this paper, we study the security considerations of publishing on-device variants of large-scale models.
We first show that an adversary can exploit on-device models to make attacking the large models easier.
We then show that the vulnerability increases as the similarity between a full-scale and its efficient model increase.
arXiv Detail & Related papers (2022-12-28T05:05:58Z) - Deep Surrogate Docking: Accelerating Automated Drug Discovery with Graph
Neural Networks [0.9785311158871759]
We introduce Deep Surrogate Docking (DSD), a framework that applies deep learning-based surrogate modeling to accelerate the docking process substantially.
We show that the DSD workflow combined with the FiLMv2 architecture provides a 9.496x speedup in molecule screening with a 3% recall error rate.
arXiv Detail & Related papers (2022-11-04T19:36:02Z) - YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs [14.85882314822983]
In order to map deep neural network (DNN) based object detection models to edge devices, one typically needs to compress such models significantly.
In this paper, we propose a novel edge GPU friendly module for multi-scale feature interaction.
We also propose a novel learning backbone adoption inspired by the changing translational information flow across various tasks.
arXiv Detail & Related papers (2021-10-26T14:02:59Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - ParaVS: A Simple, Fast, Efficient and Flexible Graph Neural Network
Framework for Structure-Based Virtual Screening [2.5137859989323537]
We introduce a docking-based SBVS method and a deep learning non-docking-based method that is able to avoid the computational cost of the docking process.
The inference speed of ParaVS-ND is about 3.6e5 molecule / core-hour, while a conventional docking-based method is around 20, which is about 16000 times faster.
arXiv Detail & Related papers (2021-02-08T08:24:05Z) - Fast, Accurate, and Simple Models for Tabular Data via Augmented
Distillation [97.42894942391575]
We propose FAST-DAD to distill arbitrarily complex ensemble predictors into individual models like boosted trees, random forests, and deep networks.
Our individual distilled models are over 10x faster and more accurate than ensemble predictors produced by AutoML tools like H2O/AutoSklearn.
arXiv Detail & Related papers (2020-06-25T09:57:47Z) - SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier
Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples.
We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.