MoleHD: Ultra-Low-Cost Drug Discovery using Hyperdimensional Computing
- URL: http://arxiv.org/abs/2106.02894v3
- Date: Sat, 5 Feb 2022 19:58:11 GMT
- Title: MoleHD: Ultra-Low-Cost Drug Discovery using Hyperdimensional Computing
- Authors: Dongning Ma, Rahul Thapa, Xun Jiao
- Abstract summary: We present MoleHD, a method based on brain-inspired hyperdimensional computing (HDC) for molecular property prediction.
MoleHD achieves highest ROC-AUC score on random and scaffold splits on average across 3 datasets.
To the best of our knowledge, this is the first HDC-based method for drug discovery.
- Score: 2.7462881838152913
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Modern drug discovery is often time-consuming, complex and cost-ineffective
due to the large volume of molecular data and complicated molecular properties.
Recently, machine learning algorithms have shown promising results in virtual
screening of automated drug discovery by predicting molecular properties. While
emerging learning methods such as graph neural networks and recurrent neural
networks exhibit high accuracy, they are also notoriously computation-intensive
and memory-intensive with operations such as feature embeddings or deep
convolutions. In this paper, we propose a viable alternative to existing
learning methods by presenting MoleHD, a method based on brain-inspired
hyperdimensional computing (HDC) for molecular property prediction. We develop
HDC encoders to project SMILES representation of a molecule into
high-dimensional vectors that are used for HDC training and inference. We
perform an extensive evaluation using 29 classification tasks from 3
widely-used molecule datasets (Clintox, BBBP, SIDER) under three splits methods
(random, scaffold, and stratified). By an comprehensive comparison with 8
existing learning models including SOTA graph/recurrent neural networks, we
show that MoleHD is able to achieve highest ROC-AUC score on random and
scaffold splits on average across 3 datasets and achieve second-highest on
stratified split. Importantly, MoleHD achieves such performance with
significantly reduced computing cost and training efforts. To the best of our
knowledge, this is the first HDC-based method for drug discovery. The promising
results presented in this paper can potentially lead to a novel path in drug
discovery research.
Related papers
- Molecule Joint Auto-Encoding: Trajectory Pretraining with 2D and 3D
Diffusion [19.151643496588022]
We propose a pretraining method for molecule joint auto-encoding (MoleculeJAE)
MoleculeJAE can learn both the 2D bond (topology) and 3D conformation (geometry) information.
Empirically, MoleculeJAE proves its effectiveness by reaching state-of-the-art performance on 15 out of 20 tasks.
arXiv Detail & Related papers (2023-12-06T12:58:37Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules.
By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures.
When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - HD-Bind: Encoding of Molecular Structure with Low Precision,
Hyperdimensional Binary Representations [3.3934198248179026]
Hyperdimensional Computing (HDC) is a proposed learning paradigm that is able to leverage low-precision binary vector arithmetic.
We show that HDC-based inference methods are as much as 90 times more efficient than more complex representative machine learning methods.
arXiv Detail & Related papers (2023-03-27T21:21:46Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Scalable training of graph convolutional neural networks for fast and
accurate predictions of HOMO-LUMO gap in molecules [1.8947048356389908]
This work focuses on building GCNN models on HPC systems to predict material properties of millions of molecules.
We use HydraGNN, our in-house library for large-scale GCNN training, leveraging distributed data parallelism in PyTorch.
We perform parallel training on two open-source large-scale graph datasets to build a GCNN predictor for an important quantum property known as the HOMO-LUMO gap.
arXiv Detail & Related papers (2022-07-22T20:54:22Z) - Graph-based Molecular Representation Learning [59.06193431883431]
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science.
Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning.
arXiv Detail & Related papers (2022-07-08T17:43:20Z) - ATOM3D: Tasks On Molecules in Three Dimensions [91.72138447636769]
Deep neural networks have recently gained significant attention.
In this work we present ATOM3D, a collection of both novel and existing datasets spanning several key classes of biomolecules.
We develop three-dimensional molecular learning networks for each of these tasks, finding that they consistently improve performance.
arXiv Detail & Related papers (2020-12-07T20:18:23Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z) - Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First
Data Release [8.090016327163564]
This data release encompasses structural information on the 4.2 B molecules and 60 TB of pre-computed data.
One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules.
Future releases will expand the data to include more detailed molecular simulations, computed models, and other products.
arXiv Detail & Related papers (2020-05-28T01:33:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.