Generative Enriched Sequential Learning (ESL) Approach for Molecular
Design via Augmented Domain Knowledge
- URL: http://arxiv.org/abs/2204.02474v1
- Date: Tue, 5 Apr 2022 20:16:11 GMT
- Title: Generative Enriched Sequential Learning (ESL) Approach for Molecular
Design via Augmented Domain Knowledge
- Authors: Mohammad Sajjad Ghaemi, Karl Grantham, Isaac Tamblyn, Yifeng Li, Hsu
Kiang Ooi
- Abstract summary: generative machine learning techniques can generate novel chemical structures based on molecular fingerprint representation.
Lack of supervised domain knowledge can mislead the learning procedure to be relatively biased to the prevalent molecules observed in the training data.
We alleviated this drawback by augmenting the training data with domain knowledge, e.g. quantitative estimates of the drug-likeness score (QEDs)
- Score: 1.4410716345002657
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deploying generative machine learning techniques to generate novel chemical
structures based on molecular fingerprint representation has been well
established in molecular design. Typically, sequential learning (SL) schemes
such as hidden Markov models (HMM) and, more recently, in the sequential deep
learning context, recurrent neural network (RNN) and long short-term memory
(LSTM) were used extensively as generative models to discover unprecedented
molecules. To this end, emission probability between two states of atoms plays
a central role without considering specific chemical or physical properties.
Lack of supervised domain knowledge can mislead the learning procedure to be
relatively biased to the prevalent molecules observed in the training data that
are not necessarily of interest. We alleviated this drawback by augmenting the
training data with domain knowledge, e.g. quantitative estimates of the
drug-likeness score (QEDs). As such, our experiments demonstrated that with
this subtle trick called enriched sequential learning (ESL), specific patterns
of particular interest can be learnt better, which led to generating de novo
molecules with ameliorated QEDs.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Active Deep Kernel Learning of Molecular Functionalities: Realizing
Dynamic Structural Embeddings [0.26716003713321473]
This paper explores an approach for active learning in molecular discovery using Deep Kernel Learning (DKL)
DKL offers a more holistic perspective by correlating structure with properties, creating latent spaces that prioritize molecular functionality.
The formation of exclusion regions around certain compounds indicates unexplored areas with potential for groundbreaking functionalities.
arXiv Detail & Related papers (2024-03-02T15:34:31Z) - From molecules to scaffolds to functional groups: building context-dependent molecular representation via multi-channel learning [10.025809630976065]
This paper introduces a novel pre-training framework that learns robust and generalizable chemical knowledge.
Our approach demonstrates competitive performance across various molecular property benchmarks.
arXiv Detail & Related papers (2023-11-05T23:47:52Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - A Systematic Survey of Chemical Pre-trained Models [38.57023440288189]
Training Deep Neural Networks (DNNs) from scratch often requires abundant labeled molecules, which are expensive to acquire in the real world.
To alleviate this issue, tremendous efforts have been devoted to Molecular Pre-trained Models (CPMs)
CPMs are pre-trained using large-scale unlabeled molecular databases and then fine-tuned over specific downstream tasks.
arXiv Detail & Related papers (2022-10-29T03:53:11Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Graph-based Molecular Representation Learning [59.06193431883431]
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science.
Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning.
arXiv Detail & Related papers (2022-07-08T17:43:20Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Knowledge-informed Molecular Learning: A Survey on Paradigm Transfer [20.893861195128643]
Machine learning, notably deep learning, has significantly propelled molecular investigations within the biochemical sphere.
Traditionally, modeling for such research has centered around a handful of paradigms.
To enhance the generation and decipherability of purely data-driven models, scholars have integrated biochemical domain knowledge into these molecular study models.
arXiv Detail & Related papers (2022-02-17T06:18:02Z) - Do Large Scale Molecular Language Representations Capture Important
Structural Information? [31.76876206167457]
We present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer.
Experiments show that the learned molecular representation performs competitively, when compared to graph-based and fingerprint-based supervised learning baselines.
arXiv Detail & Related papers (2021-06-17T14:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.