Context-enriched molecule representations improve few-shot drug
discovery
- URL: http://arxiv.org/abs/2305.09481v1
- Date: Mon, 24 Apr 2023 17:58:05 GMT
- Title: Context-enriched molecule representations improve few-shot drug
discovery
- Authors: Johannes Schimunek, Philipp Seidl, Lukas Friedrich, Daniel Kuhn,
Friedrich Rippmann, Sepp Hochreiter, and G\"unter Klambauer
- Abstract summary: We introduce a new method for few-shot drug discovery.
Our main idea is to enrich a molecule representation by knowledge about known context or reference molecules.
Our approach is compared with other few-shot methods for drug discovery on the FS-Mol benchmark dataset.
- Score: 8.379853456273674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A central task in computational drug discovery is to construct models from
known active molecules to find further promising molecules for subsequent
screening. However, typically only very few active molecules are known.
Therefore, few-shot learning methods have the potential to improve the
effectiveness of this critical phase of the drug discovery process. We
introduce a new method for few-shot drug discovery. Its main idea is to enrich
a molecule representation by knowledge about known context or reference
molecules. Our novel concept for molecule representation enrichment is to
associate molecules from both the support set and the query set with a large
set of reference (context) molecules through a Modern Hopfield Network.
Intuitively, this enrichment step is analogous to a human expert who would
associate a given molecule with familiar molecules whose properties are known.
The enrichment step reinforces and amplifies the covariance structure of the
data, while simultaneously removing spurious correlations arising from the
decoration of molecules. Our approach is compared with other few-shot methods
for drug discovery on the FS-Mol benchmark dataset. On FS-Mol, our approach
outperforms all compared methods and therefore sets a new state-of-the art for
few-shot learning in drug discovery. An ablation study shows that the
enrichment step of our method is the key to improve the predictive quality. In
a domain shift experiment, we further demonstrate the robustness of our method.
Code is available at https://github.com/ml-jku/MHNfs.
Related papers
- Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design [62.68420322996345]
Existing structured-based drug design methods treat all ligand atoms equally.
We propose a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold.
Our approach achieves state-of-the-art performance in generating high-affinity molecules.
arXiv Detail & Related papers (2024-02-26T05:21:21Z) - Multi-Modal Representation Learning for Molecular Property Prediction:
Sequence, Graph, Geometry [6.049566024728809]
Deep learning-based molecular property prediction has emerged as a solution to the resource-intensive nature of traditional methods.
In this paper, we propose a novel multi-modal representation learning model, called SGGRL, for molecular property prediction.
To ensure consistency across modalities, SGGRL is trained to maximize the similarity of representations for the same molecule while minimizing similarity for different molecules.
arXiv Detail & Related papers (2024-01-07T02:18:00Z) - MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures [2.5563339057415218]
MolIG is a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures.
It amalgamates the strengths of both molecular representation forms.
It exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups.
arXiv Detail & Related papers (2023-11-28T10:28:35Z) - Graph-based Molecular Representation Learning [59.06193431883431]
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science.
Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning.
arXiv Detail & Related papers (2022-07-08T17:43:20Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Fragment-based Sequential Translation for Molecular Optimization [23.152338167332374]
We propose a flexible editing paradigm that generates molecules using learned molecular fragments.
We use a variational autoencoder to encode molecular fragments in a coherent latent space.
We then utilize as a vocabulary for editing molecules to explore the complex chemical property space.
arXiv Detail & Related papers (2021-10-26T21:20:54Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z) - Goal directed molecule generation using Monte Carlo Tree Search [15.462930062711237]
We propose a novel method, which we call unitMCTS, to perform molecule generation by making a unit change to the molecule at every step using Monte Carlo Tree Search.
We show that this method outperforms the recently published techniques on benchmark molecular optimization tasks such as QED and penalized logP.
arXiv Detail & Related papers (2020-10-30T17:49:59Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.