Discovery of structure-property relations for molecules via
hypothesis-driven active learning over the chemical space
- URL: http://arxiv.org/abs/2301.02665v2
- Date: Mon, 8 May 2023 20:05:41 GMT
- Title: Discovery of structure-property relations for molecules via
hypothesis-driven active learning over the chemical space
- Authors: Ayana Ghosh, Sergei V. Kalinin and Maxim A. Ziatdinov
- Abstract summary: We introduce a novel approach for the active learning over the chemical spaces based on hypothesis learning.
We construct the hypotheses on the possible relationships between structures and functionalities of interest based on a small subset of data.
This approach combines the elements from the symbolic regression methods such as SISSO and active learning into a single framework.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Discovery of the molecular candidates for applications in drug targets,
biomolecular systems, catalysts, photovoltaics, organic electronics, and
batteries, necessitates development of machine learning algorithms capable of
rapid exploration of the chemical spaces targeting the desired functionalities.
Here we introduce a novel approach for the active learning over the chemical
spaces based on hypothesis learning. We construct the hypotheses on the
possible relationships between structures and functionalities of interest based
on a small subset of data and introduce them as (probabilistic) mean functions
for the Gaussian process. This approach combines the elements from the symbolic
regression methods such as SISSO and active learning into a single framework.
The primary focus of constructing this framework is to approximate physical
laws in an active learning regime toward a more robust predictive performance,
as traditional evaluation on hold-out sets in machine learning doesn't account
for out-of-distribution effects and may lead to a complete failure on unseen
chemical space. Here, we demonstrate it for the QM9 dataset, but it can be
applied more broadly to datasets from both domains of molecular and solid-state
materials sciences.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions [0.0]
We introduce an active learning approach that discerns underlying cause-effect relationships through strategic sampling.
This method identifies the smallest subset of the dataset capable of encoding the most information representative of a much larger chemical space.
The identified causal relations are then leveraged to conduct systematic interventions, optimizing the design task within a chemical space that the models have not encountered previously.
arXiv Detail & Related papers (2024-04-05T17:15:48Z) - Active Deep Kernel Learning of Molecular Functionalities: Realizing
Dynamic Structural Embeddings [0.26716003713321473]
This paper explores an approach for active learning in molecular discovery using Deep Kernel Learning (DKL)
DKL offers a more holistic perspective by correlating structure with properties, creating latent spaces that prioritize molecular functionality.
The formation of exclusion regions around certain compounds indicates unexplored areas with potential for groundbreaking functionalities.
arXiv Detail & Related papers (2024-03-02T15:34:31Z) - From molecules to scaffolds to functional groups: building context-dependent molecular representation via multi-channel learning [10.025809630976065]
This paper introduces a novel pre-training framework that learns robust and generalizable chemical knowledge.
Our approach demonstrates competitive performance across various molecular property benchmarks.
arXiv Detail & Related papers (2023-11-05T23:47:52Z) - Unsupervised Learning of Molecular Embeddings for Enhanced Clustering
and Emergent Properties for Chemical Compounds [2.6803933204362336]
We introduce various methods to detect and cluster chemical compounds based on their SMILES data.
Our first method, analyzing the graphical structures of chemical compounds using embedding data, employs vector search to meet our threshold value.
We also used natural language description embeddings stored in a vector database with GPT3.5, which outperforms the base model.
arXiv Detail & Related papers (2023-10-25T18:00:24Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - Transferring Chemical and Energetic Knowledge Between Molecular Systems
with Machine Learning [5.27145343046974]
We propose a novel methodology for transferring knowledge obtained from simple molecular systems to a more complex one.
We focus on the classification of high and low free-energy states.
Our results show a remarkable AUC of 0.92 for transfer learning from tri-alanine to the deca-alanine system.
arXiv Detail & Related papers (2022-05-06T16:21:00Z) - Federated Learning of Molecular Properties in a Heterogeneous Setting [79.00211946597845]
We introduce federated heterogeneous molecular learning to address these challenges.
Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients.
FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.
arXiv Detail & Related papers (2021-09-15T12:49:13Z) - ASGN: An Active Semi-supervised Graph Neural Network for Molecular
Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules.
In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution.
At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.