Geometric Representation Condition Improves Equivariant Molecule Generation
- URL: http://arxiv.org/abs/2410.03655v2
- Date: Mon, 10 Feb 2025 06:34:22 GMT
- Title: Geometric Representation Condition Improves Equivariant Molecule Generation
- Authors: Zian Li, Cai Zhou, Xiyuan Wang, Xingang Peng, Muhan Zhang,
- Abstract summary: We introduce GeoRCG, a framework to enhance the performance of molecular generative models.
We decompose the molecule generation process into two stages: first, generating an informative geometric representation; second, generating a molecule conditioned on the representation.
We observe significant quality improvements in unconditional molecule generation tasks on the widely-used QM9 and GEOM-DRUG datasets.
- Score: 24.404588237915732
- License:
- Abstract: Recent advancements in molecular generative models have demonstrated substantial potential in accelerating scientific discovery, particularly in drug design. However, these models often face challenges in generating high-quality molecules, especially in conditional scenarios where specific molecular properties must be satisfied. In this work, we introduce GeoRCG, a general framework to enhance the performance of molecular generative models by integrating geometric representation conditions with provable theoretical guarantees. We decompose the molecule generation process into two stages: first, generating an informative geometric representation; second, generating a molecule conditioned on the representation. Compared to directly generating a molecule, the relatively easy-to-generate representation in the first stage guides the second-stage generation to reach a high-quality molecule in a more goal-oriented and much faster way. Leveraging EDM and SemlaFlow as the base generators, we observe significant quality improvements in unconditional molecule generation tasks on the widely-used QM9 and GEOM-DRUG datasets. More notably, in the challenging conditional molecular generation task, our framework achieves an average 31\% performance improvement over state-of-the-art approaches, highlighting the superiority of conditioning on semantically rich geometric representations over conditioning on individual property values as in previous approaches. Furthermore, we show that, with such representation guidance, the number of diffusion steps can be reduced to as small as 100 while largely preserving the generation quality achieved with 1,000 steps, thereby significantly accelerating the generation process.
Related papers
- MolMiner: Transformer architecture for fragment-based autoregressive generation of molecular stories [7.366789601705544]
Chemical validity, interpretability of the generation process and flexibility to variable molecular sizes are among some of the remaining challenges for generative models in computational materials design.
We propose an autoregressive approach that decomposes molecular generation into a sequence of discrete and interpretable steps.
Our results show that the model can effectively bias the generation distribution according to the prompted multi-target objective.
arXiv Detail & Related papers (2024-11-10T22:00:55Z) - Rethinking Molecular Design: Integrating Latent Variable and Auto-Regressive Models for Goal Directed Generation [0.6800113478497425]
We return to the simplest representation of molecules, and investigate overlooked limitations of classical generative approaches.
We propose a hybrid model in the form of a novel regularizer that leverages the strengths of both to improve validity, conditional generation, and style transfer of molecular sequences.
arXiv Detail & Related papers (2024-08-19T11:50:23Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Pre-training of Molecular GNNs via Conditional Boltzmann Generator [0.0]
We propose a pre-training method for molecular GNNs using an existing dataset of molecular conformations.
We show that our model has a better prediction performance for molecular properties than existing pre-training methods.
arXiv Detail & Related papers (2023-12-20T15:30:15Z) - Improving Molecular Properties Prediction Through Latent Space Fusion [9.912768918657354]
We present a multi-view approach that combines latent spaces derived from state-of-the-art chemical models.
Our approach relies on two pivotal elements: the embeddings derived from MHG-GNN, which represent molecular structures as graphs, and MoLFormer embeddings rooted in chemical language.
We demonstrate the superior performance of our proposed multi-view approach compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2023-10-20T20:29:32Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.