Domain-Agnostic Molecular Generation with Chemical Feedback
- URL: http://arxiv.org/abs/2301.11259v6
- Date: Mon, 4 Mar 2024 12:54:34 GMT
- Title: Domain-Agnostic Molecular Generation with Chemical Feedback
- Authors: Yin Fang, Ningyu Zhang, Zhuo Chen, Lingbing Guo, Xiaohui Fan, Huajun
Chen
- Abstract summary: MolGen is a pre-trained molecular language model tailored specifically for molecule generation.
It internalizes structural and grammatical insights through the reconstruction of over 100 million molecular SELFIES.
Our chemical feedback paradigm steers the model away from molecular hallucinations, ensuring alignment between the model's estimated probabilities and real-world chemical preferences.
- Score: 44.063584808910896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The generation of molecules with desired properties has become increasingly
popular, revolutionizing the way scientists design molecular structures and
providing valuable support for chemical and drug design. However, despite the
potential of language models in molecule generation, they face challenges such
as generating syntactically or chemically flawed molecules, having narrow
domain focus, and struggling to create diverse and feasible molecules due to
limited annotated data or external molecular databases. To tackle these
challenges, we introduce MolGen, a pre-trained molecular language model
tailored specifically for molecule generation. Through the reconstruction of
over 100 million molecular SELFIES, MolGen internalizes structural and
grammatical insights. This is further enhanced by domain-agnostic molecular
prefix tuning, fostering robust knowledge transfer across diverse domains.
Importantly, our chemical feedback paradigm steers the model away from
molecular hallucinations, ensuring alignment between the model's estimated
probabilities and real-world chemical preferences. Extensive experiments on
well-known benchmarks underscore MolGen's optimization capabilities in
properties such as penalized logP, QED, and molecular docking. Additional
analyses confirm its proficiency in accurately capturing molecule
distributions, discerning intricate structural patterns, and efficiently
exploring the chemical space. Code is available at
https://github.com/zjunlp/MolGen.
Related papers
- MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction [14.353313239109337]
MolTRES is a novel chemical language representation learning framework.
It incorporates generator-discriminator training, allowing the model to learn from more challenging examples.
Our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.
arXiv Detail & Related papers (2024-07-09T01:14:28Z) - Navigating Chemical Space with Latent Flows [20.95884505685799]
We propose a new framework, ChemFlow, to traverse chemical space through navigating the latent space learned by molecule generative models through flows.
We validate the efficacy of ChemFlow on molecule manipulation and single- and multi-objective optimization tasks under both supervised and unsupervised molecular discovery settings.
arXiv Detail & Related papers (2024-05-07T03:55:57Z) - From molecules to scaffolds to functional groups: building context-dependent molecular representation via multi-channel learning [10.025809630976065]
This paper introduces a novel pre-training framework that learns robust and generalizable chemical knowledge.
Our approach demonstrates competitive performance across various molecular property benchmarks.
arXiv Detail & Related papers (2023-11-05T23:47:52Z) - Interactive Molecular Discovery with Natural Language [69.89287960545903]
We propose the conversational molecular design, a novel task adopting natural language for describing and editing target molecules.
To better accomplish this task, we design ChatMol, a knowledgeable and versatile generative pre-trained model, enhanced by injecting experimental property information.
arXiv Detail & Related papers (2023-06-21T02:05:48Z) - An Equivariant Generative Framework for Molecular Graph-Structure
Co-Design [54.92529253182004]
We present MolCode, a machine learning-based generative framework for underlineMolecular graph-structure underlineCo-design.
In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure.
Our investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design.
arXiv Detail & Related papers (2023-04-12T13:34:22Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Interpretable Molecular Graph Generation via Monotonic Constraints [19.401468196146336]
Deep graph generative models treat molecule design as graph generation problems.
Existing models have many shortcomings, including poor interpretability and controllability toward desired molecular properties.
This paper proposes new methodologies for molecule generation with interpretable and deep controllable models.
arXiv Detail & Related papers (2022-02-28T08:35:56Z) - Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning [68.8204255655161]
We introduce a novel framework for scalable 3D design that uses a hierarchical agent to build molecules.
In a variety of experiments, we show that our agent, guided only by energy considerations, can efficiently learn to produce molecules with over 100 atoms.
arXiv Detail & Related papers (2022-02-01T18:54:24Z) - Graph Energy-based Model for Substructure Preserving Molecular Design [15.939981475281309]
Our Graph Energy-based Model, or GEM, can fix substructures and generate the rest.
The experimental results show that the GEMs trained from chemistry datasets successfully generate novel molecules.
arXiv Detail & Related papers (2021-02-09T01:46:12Z) - Learning Latent Space Energy-Based Prior Model for Molecule Generation [59.875533935578375]
We learn latent space energy-based prior model with SMILES representation for molecule modeling.
Our method is able to generate molecules with validity and uniqueness competitive with state-of-the-art models.
arXiv Detail & Related papers (2020-10-19T09:34:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.