ParaVS: A Simple, Fast, Efficient and Flexible Graph Neural Network
Framework for Structure-Based Virtual Screening
- URL: http://arxiv.org/abs/2102.06086v1
- Date: Mon, 8 Feb 2021 08:24:05 GMT
- Title: ParaVS: A Simple, Fast, Efficient and Flexible Graph Neural Network
Framework for Structure-Based Virtual Screening
- Authors: Junfeng Wu, Dawei Leng, Lurong Pan
- Abstract summary: We introduce a docking-based SBVS method and a deep learning non-docking-based method that is able to avoid the computational cost of the docking process.
The inference speed of ParaVS-ND is about 3.6e5 molecule / core-hour, while a conventional docking-based method is around 20, which is about 16000 times faster.
- Score: 2.5137859989323537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structure-based virtual screening (SBVS) is a promising in silico technique
that integrates computational methods into drug design. An extensively used
method in SBVS is molecular docking. However, the docking process can hardly be
computationally efficient and accurate simultaneously because classic mechanics
scoring function is used to approximate, but hardly reach, the quantum
mechanics precision in this method. In order to reduce the computational cost
of the protein-ligand scoring process and use data driven approach to boost the
scoring function accuracy, we introduce a docking-based SBVS method and,
furthermore, a deep learning non-docking-based method that is able to avoid the
computational cost of the docking process. Then, we try to integrate these two
methods into an easy-to-use framework, ParaVS, that provides both choices for
researchers. Graph neural network (GNN) is employed in ParaVS, and we explained
how our in-house GNN works and how to model ligands and molecular targets. To
verify our approaches, cross validation experiments are done on two datasets,
an open dataset Directory of Useful Decoys: Enhanced (DUD.E) and an in-house
proprietary dataset without computational generated artificial decoys
(NoDecoy). On DUD.E we achieved a state-of-the-art AUC of 0.981 and a
state-of-the-art enrichment factor at 2% of 36.2; on NoDecoy we achieved an AUC
of 0.974. We further finish inference of an open database, Enamine REAL
Database (RDB), that comprises over 1.36 billion molecules in 4050 core-hours
using our ParaVS non-docking method (ParaVS-ND). The inference speed of
ParaVS-ND is about 3.6e5 molecule / core-hour, while this number of a
conventional docking-based method is around 20, which is about 16000 times
faster. The experiments indicate that ParaVS is accurate, computationally
efficient and can be generalized to different molecular.
Related papers
- A Specialized Semismooth Newton Method for Kernel-Based Optimal
Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples.
We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z) - DSDP: A Blind Docking Strategy Accelerated by GPUs [6.221048348194304]
We take the advantage of both traditional and machine-learning based methods, and present a method Deep Site and Docking Pose (DSDP) to improve the performance of blind docking.
DSDP reaches a 2 top-1 success rate (RMSD 2 AA) on an unbiased and challenging test dataset with 1.2 s wall-clock computational time per system.
Its performances on DUD-E dataset and the time-split PDBBind dataset used in EquiBind, TankBind, and DiffDock are also effective.
arXiv Detail & Related papers (2023-03-16T07:00:21Z) - Deep Surrogate Docking: Accelerating Automated Drug Discovery with Graph
Neural Networks [0.9785311158871759]
We introduce Deep Surrogate Docking (DSD), a framework that applies deep learning-based surrogate modeling to accelerate the docking process substantially.
We show that the DSD workflow combined with the FiLMv2 architecture provides a 9.496x speedup in molecule screening with a 3% recall error rate.
arXiv Detail & Related papers (2022-11-04T19:36:02Z) - TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent
Kernels [141.29156234353133]
State-of-the-art convex learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions.
We show this disparity can largely be attributed to challenges presented by non-NISTity.
We propose a Train-Convexify neural network (TCT) procedure to sidestep this issue.
arXiv Detail & Related papers (2022-07-13T16:58:22Z) - SS-GNN: A Simple-Structured Graph Neural Network for Affinity Prediction [8.508602451200352]
We propose a simple-structured graph neural network (GNN) model named SS-GNN to accurately predict drug-target binding affinity (DTBA)
By constructing a single undirected graph based on a distance threshold to represent protein-ligand interactions, the scale of the graph data is greatly reduced.
For a typical protein-ligand complex, affinity prediction takes only 0.2 ms.
arXiv Detail & Related papers (2022-05-25T04:47:13Z) - Active-learning-based non-intrusive Model Order Reduction [0.0]
In this work, we propose a new active learning approach with two novelties.
A novel idea with our approach is the use of single-time step snapshots from the system states taken from an estimation of the reduced-state space.
We also introduce a use case-independent validation strategy based on Probably Approximately Correct (PAC) learning.
arXiv Detail & Related papers (2022-04-08T22:33:51Z) - EAutoDet: Efficient Architecture Search for Object Detection [110.99532343155073]
EAutoDet framework can discover practical backbone and FPN architectures for object detection in 1.4 GPU-days.
We propose a kernel reusing technique by sharing the weights of candidate operations on one edge and consolidating them into one convolution.
In particular, the discovered architectures surpass state-of-the-art object detection NAS methods and achieve 40.1 mAP with 120 FPS and 49.2 mAP with 41.3 FPS on COCO test-dev set.
arXiv Detail & Related papers (2022-03-21T05:56:12Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - FNAS: Uncertainty-Aware Fast Neural Architecture Search [54.49650267859032]
Reinforcement learning (RL)-based neural architecture search (NAS) generally guarantees better convergence yet suffers from the requirement of huge computational resources.
We propose a general pipeline to accelerate the convergence of the rollout process as well as the RL process in NAS.
Experiments on the Mobile Neural Architecture Search (MNAS) search space show the proposed Fast Neural Architecture Search (FNAS) accelerates standard RL-based NAS process by 10x.
arXiv Detail & Related papers (2021-05-25T06:32:52Z) - Inception Convolution with Efficient Dilation Search [121.41030859447487]
Dilation convolution is a critical mutant of standard convolution neural network to control effective receptive fields and handle large scale variance of objects.
We propose a new mutant of dilated convolution, namely inception (dilated) convolution where the convolutions have independent dilation among different axes, channels and layers.
We explore a practical method for fitting the complex inception convolution to the data, a simple while effective dilation search algorithm(EDO) based on statistical optimization is developed.
arXiv Detail & Related papers (2020-12-25T14:58:35Z) - Passive Batch Injection Training Technique: Boosting Network Performance
by Injecting Mini-Batches from a different Data Distribution [39.8046809855363]
This work presents a novel training technique for deep neural networks that makes use of additional data from a distribution that is different from that of the original input data.
To the best of our knowledge, this is the first work that makes use of different data distribution to aid the training of convolutional neural networks (CNNs)
arXiv Detail & Related papers (2020-06-08T08:17:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.