Bridging the Gap between Learning and Inference for Diffusion-Based Molecule Generation
- URL: http://arxiv.org/abs/2411.05472v1
- Date: Fri, 08 Nov 2024 10:53:39 GMT
- Title: Bridging the Gap between Learning and Inference for Diffusion-Based Molecule Generation
- Authors: Peidong Liu, Wenbo Zhang, Xue Zhe, Jiancheng Lv, Xianggen Liu,
- Abstract summary: GapDiff is a training framework that mitigates the data distributional disparity between training and inference.
We conduct experiments using a 3D molecular generation model on the CrossDocked 2020 dataset.
- Score: 18.936142688346816
- License:
- Abstract: The efficacy of diffusion models in generating a spectrum of data modalities, including images, text, and videos, has spurred inquiries into their utility in molecular generation, yielding significant advancements in the field. However, the molecular generation process with diffusion models involves multiple autoregressive steps over a finite time horizon, leading to exposure bias issues inherently. To address the exposure bias issue, we propose a training framework named GapDiff. The core idea of GapDiff is to utilize model-predicted conformations as ground truth probabilistically during training, aiming to mitigate the data distributional disparity between training and inference, thereby enhancing the affinity of generated molecules. We conduct experiments using a 3D molecular generation model on the CrossDocked2020 dataset, and the vina energy and diversity demonstrate the potency of our framework with superior affinity. GapDiff is available at \url{https://github.com/HUGHNew/gapdiff}.
Related papers
- Conditional Synthesis of 3D Molecules with Time Correction Sampler [58.0834973489875]
Time-Aware Conditional Synthesis (TACS) is a novel approach to conditional generation on diffusion models.
It integrates adaptively controlled plug-and-play "online" guidance into a diffusion model, driving samples toward the desired properties.
arXiv Detail & Related papers (2024-11-01T12:59:25Z) - Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset.
We develop constrained diffusion models by imposing diffusion constraints based on desired distributions.
We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z) - Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design [30.241533997522236]
We develop context-guided diffusion (CGD), a simple plug-and-play method that leverages unlabeled data and smoothness constraints to improve the out-of-distribution generalization of guided diffusion models.
This approach leads to substantial performance gains across various settings, including continuous, discrete, and graph-structured diffusion processes with applications across drug discovery, materials science, and protein design.
arXiv Detail & Related papers (2024-07-16T17:34:00Z) - LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space [55.5427001668863]
We present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation.
LDMol comprises a molecule autoencoder that produces a learnable and structurally informative feature space.
We show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing.
arXiv Detail & Related papers (2024-05-28T04:59:13Z) - Geometric-Facilitated Denoising Diffusion Model for 3D Molecule Generation [32.464905769094536]
Existing diffusion-based generative methods on de novo 3D molecule generation face two major challenges.
We introduce a Dual-Track Transformer Network (DTN) to fully excevate global spatial relationships and learn high quality representations.
As for the second challenge, we design Geometric-Facilitated Loss (GFLoss) which intervenes the formation of bonds during the training period, instead of directly embedding edges into the latent space.
arXiv Detail & Related papers (2024-01-05T07:29:21Z) - Navigating the Design Space of Equivariant Diffusion-Based Generative
Models for De Novo 3D Molecule Generation [1.3124513975412255]
Deep generative diffusion models are a promising avenue for 3D de novo molecular design in materials science and drug discovery.
We explore the design space of E(3)-equivariant diffusion models, focusing on previously unexplored areas.
We present the EQGAT-diff model, which consistently outperforms established models for the QM9 and GEOM-Drugs datasets.
arXiv Detail & Related papers (2023-09-29T14:53:05Z) - Directional diffusion models for graph representation learning [9.457273750874357]
We propose a new class of models called it directional diffusion models
These models incorporate data-dependent, anisotropic, and directional noises in the forward diffusion process.
We conduct extensive experiments on 12 publicly available datasets, focusing on two distinct graph representation learning tasks.
arXiv Detail & Related papers (2023-06-22T21:27:48Z) - Diff-Instruct: A Universal Approach for Transferring Knowledge From
Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models.
We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models.
Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z) - Diffusion Models in Vision: A Survey [80.82832715884597]
A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage.
Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens.
arXiv Detail & Related papers (2022-09-10T22:00:30Z) - A Survey on Generative Diffusion Model [75.93774014861978]
Diffusion models are an emerging class of deep generative models.
They have certain limitations, including a time-consuming iterative generation process and confinement to high-dimensional Euclidean space.
This survey presents a plethora of advanced techniques aimed at enhancing diffusion models.
arXiv Detail & Related papers (2022-09-06T16:56:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.