Active Deep Kernel Learning of Molecular Functionalities: Realizing
Dynamic Structural Embeddings
- URL: http://arxiv.org/abs/2403.01234v1
- Date: Sat, 2 Mar 2024 15:34:31 GMT
- Title: Active Deep Kernel Learning of Molecular Functionalities: Realizing
Dynamic Structural Embeddings
- Authors: Ayana Ghosh, Maxim Ziatdinov and, Sergei V. Kalinin
- Abstract summary: This paper explores an approach for active learning in molecular discovery using Deep Kernel Learning (DKL)
DKL offers a more holistic perspective by correlating structure with properties, creating latent spaces that prioritize molecular functionality.
The formation of exclusion regions around certain compounds indicates unexplored areas with potential for groundbreaking functionalities.
- Score: 0.26716003713321473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Exploring molecular spaces is crucial for advancing our understanding of
chemical properties and reactions, leading to groundbreaking innovations in
materials science, medicine, and energy. This paper explores an approach for
active learning in molecular discovery using Deep Kernel Learning (DKL), a
novel approach surpassing the limits of classical Variational Autoencoders
(VAEs). Employing the QM9 dataset, we contrast DKL with traditional VAEs, which
analyze molecular structures based on similarity, revealing limitations due to
sparse regularities in latent spaces. DKL, however, offers a more holistic
perspective by correlating structure with properties, creating latent spaces
that prioritize molecular functionality. This is achieved by recalculating
embedding vectors iteratively, aligning with the experimental availability of
target properties. The resulting latent spaces are not only better organized
but also exhibit unique characteristics such as concentrated maxima
representing molecular functionalities and a correlation between predictive
uncertainty and error. Additionally, the formation of exclusion regions around
certain compounds indicates unexplored areas with potential for groundbreaking
functionalities. This study underscores DKL's potential in molecular research,
offering new avenues for understanding and discovering molecular
functionalities beyond classical VAE limitations.
Related papers
- Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties [55.2480439325792]
This work introduces AMPTCR, a molecular surface representation that combines local quantum-derived scalar fields and custom topological descriptors within an aligned point cloud format.<n>For molecular weight, results confirm that AMPTCR encodes physically meaningful data, with a validation R2 of 0.87.<n>In the bacterial inhibition task, AMPTCR enables both classification and direct regression of E. coli inhibition values.
arXiv Detail & Related papers (2025-07-22T04:35:50Z) - Learning Hierarchical Interaction for Accurate Molecular Property Prediction [8.488251667425887]
We propose a Hierarchical Interaction Message Passing Mechanism, which serves as the foundation of our novel model, HimNet.
Our method enables interaction-aware representation learning across atomic, motif, and molecular levels via hierarchical attention-guided message passing.
Our method exhibits promising hierarchical interpretability, aligning well with chemical intuition on representative molecules.
arXiv Detail & Related papers (2025-04-28T15:19:28Z) - Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)
KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.
This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions [0.0]
We introduce an active learning approach that discerns underlying cause-effect relationships through strategic sampling.
This method identifies the smallest subset of the dataset capable of encoding the most information representative of a much larger chemical space.
The identified causal relations are then leveraged to conduct systematic interventions, optimizing the design task within a chemical space that the models have not encountered previously.
arXiv Detail & Related papers (2024-04-05T17:15:48Z) - Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation [0.0]
Mol-AIR is a reinforcement learning-based framework using adaptive intrinsic rewards for goal-directed molecular generation.
In benchmark tests, Mol-AIR demonstrates superior performance over existing approaches in generating molecules with desired properties.
arXiv Detail & Related papers (2024-03-29T10:44:51Z) - TwinBooster: Synergising Large Language Models with Barlow Twins and
Gradient Boosting for Enhanced Molecular Property Prediction [0.0]
In this study, we use a fine-tuned large language model to integrate biological assays based on their textual information.
This architecture uses both assay information and molecular fingerprints to extract the true molecular information.
TwinBooster enables the prediction of properties of unseen bioassays and molecules by providing state-of-the-art zero-shot learning tasks.
arXiv Detail & Related papers (2024-01-09T10:36:20Z) - From molecules to scaffolds to functional groups: building context-dependent molecular representation via multi-channel learning [10.025809630976065]
This paper introduces a novel pre-training framework that learns robust and generalizable chemical knowledge.
Our approach demonstrates competitive performance across various molecular property benchmarks.
arXiv Detail & Related papers (2023-11-05T23:47:52Z) - Unsupervised Learning of Molecular Embeddings for Enhanced Clustering
and Emergent Properties for Chemical Compounds [2.6803933204362336]
We introduce various methods to detect and cluster chemical compounds based on their SMILES data.
Our first method, analyzing the graphical structures of chemical compounds using embedding data, employs vector search to meet our threshold value.
We also used natural language description embeddings stored in a vector database with GPT3.5, which outperforms the base model.
arXiv Detail & Related papers (2023-10-25T18:00:24Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Deep Kernel Methods Learn Better: From Cards to Process Optimization [0.7587345054583298]
We show that DKL with active learning can produce a more compact and smooth latent space.
We demonstrate this behavior using a simple cards data set and extend it to the optimization of domain-generated trajectories in physical systems.
arXiv Detail & Related papers (2023-03-25T20:21:29Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Discovery of structure-property relations for molecules via
hypothesis-driven active learning over the chemical space [0.0]
We introduce a novel approach for the active learning over the chemical spaces based on hypothesis learning.
We construct the hypotheses on the possible relationships between structures and functionalities of interest based on a small subset of data.
This approach combines the elements from the symbolic regression methods such as SISSO and active learning into a single framework.
arXiv Detail & Related papers (2023-01-06T14:22:43Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - Graph neural networks for the prediction of molecular structure-property
relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph.
GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors.
We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z) - Graph-based Molecular Representation Learning [59.06193431883431]
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science.
Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning.
arXiv Detail & Related papers (2022-07-08T17:43:20Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Generative Enriched Sequential Learning (ESL) Approach for Molecular
Design via Augmented Domain Knowledge [1.4410716345002657]
generative machine learning techniques can generate novel chemical structures based on molecular fingerprint representation.
Lack of supervised domain knowledge can mislead the learning procedure to be relatively biased to the prevalent molecules observed in the training data.
We alleviated this drawback by augmenting the training data with domain knowledge, e.g. quantitative estimates of the drug-likeness score (QEDs)
arXiv Detail & Related papers (2022-04-05T20:16:11Z) - ASGN: An Active Semi-supervised Graph Neural Network for Molecular
Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules.
In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution.
At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.