HD-Bind: Encoding of Molecular Structure with Low Precision,
Hyperdimensional Binary Representations
- URL: http://arxiv.org/abs/2303.15604v1
- Date: Mon, 27 Mar 2023 21:21:46 GMT
- Title: HD-Bind: Encoding of Molecular Structure with Low Precision,
Hyperdimensional Binary Representations
- Authors: Derek Jones, Jonathan E. Allen, Xiaohua Zhang, Behnam Khaleghi,
Jaeyoung Kang, Weihong Xu, Niema Moshiri, Tajana S. Rosing
- Abstract summary: Hyperdimensional Computing (HDC) is a proposed learning paradigm that is able to leverage low-precision binary vector arithmetic.
We show that HDC-based inference methods are as much as 90 times more efficient than more complex representative machine learning methods.
- Score: 3.3934198248179026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Publicly available collections of drug-like molecules have grown to comprise
10s of billions of possibilities in recent history due to advances in chemical
synthesis. Traditional methods for identifying ``hit'' molecules from a large
collection of potential drug-like candidates have relied on biophysical theory
to compute approximations to the Gibbs free energy of the binding interaction
between the drug to its protein target. A major drawback of the approaches is
that they require exceptional computing capabilities to consider for even
relatively small collections of molecules.
Hyperdimensional Computing (HDC) is a recently proposed learning paradigm
that is able to leverage low-precision binary vector arithmetic to build
efficient representations of the data that can be obtained without the need for
gradient-based optimization approaches that are required in many conventional
machine learning and deep learning approaches. This algorithmic simplicity
allows for acceleration in hardware that has been previously demonstrated for a
range of application areas. We consider existing HDC approaches for molecular
property classification and introduce two novel encoding algorithms that
leverage the extended connectivity fingerprint (ECFP) algorithm.
We show that HDC-based inference methods are as much as 90 times more
efficient than more complex representative machine learning methods and achieve
an acceleration of nearly 9 orders of magnitude as compared to inference with
molecular docking. We demonstrate multiple approaches for the encoding of
molecular data for HDC and examine their relative performance on a range of
challenging molecular property prediction and drug-protein binding
classification tasks. Our work thus motivates further investigation into
molecular representation learning to develop ultra-efficient pre-screening
tools.
Related papers
- Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Mitigating Molecular Aggregation in Drug Discovery with Predictive
Insights from Explainable AI [1.0499611180329804]
A lack of understanding of the causes of molecular aggregation introduces difficulty in the development of predictive tools for detecting aggregating molecules.
We present an examination of the molecular features differentiating datasets of aggregating and non-aggregating molecules, as well as a machine learning approach to predicting molecular aggregation.
Our method uses explainable graph neural networks and counterfactuals to reliably predict and explain aggregation, giving additional insights and design rules for future screening.
arXiv Detail & Related papers (2023-06-03T22:30:45Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - LIMO: Latent Inceptionism for Targeted Molecule Generation [14.391216237573369]
We present Latent Inceptionism on Molecules (LIMO), which significantly accelerates molecule generation with an inceptionism-like technique.
Comprehensive experiments show that LIMO performs competitively on benchmark tasks.
One of our generated drug-like compounds has a predicted $K_D$ of $6 cdot 10-14$ M against the human estrogen receptor.
arXiv Detail & Related papers (2022-06-17T21:05:58Z) - MoleHD: Ultra-Low-Cost Drug Discovery using Hyperdimensional Computing [2.7462881838152913]
We present MoleHD, a method based on brain-inspired hyperdimensional computing (HDC) for molecular property prediction.
MoleHD achieves highest ROC-AUC score on random and scaffold splits on average across 3 datasets.
To the best of our knowledge, this is the first HDC-based method for drug discovery.
arXiv Detail & Related papers (2021-06-05T13:33:21Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z) - TorsionNet: A Reinforcement Learning Approach to Sequential Conformer
Search [17.2131835813425]
We present an efficient sequential conformer search technique based on reinforcement learning under the rigid rotor approximation.
Our experimental results show that torsionNet outperforms the highest scoring chemoinformatics method by 4x on large alkanes, and by several orders of magnitude on the previously unexplored biopolymer lignin.
arXiv Detail & Related papers (2020-06-12T11:03:29Z) - DeepGS: Deep Representation Learning of Graphs and Sequences for
Drug-Target Binding Affinity Prediction [8.292330541203647]
We propose a novel end-to-end learning framework, called DeepGS, which uses deep neural networks to extract the local chemical context from amino acids and SMILES sequences.
We have conducted extensive experiments to compare our proposed method with state-of-the-art models including KronRLS, Sim, DeepDTA and DeepCPI.
arXiv Detail & Related papers (2020-03-31T01:35:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.