Learning Flexible Forward Trajectories for Masked Molecular Diffusion
- URL: http://arxiv.org/abs/2505.16790v3
- Date: Sun, 13 Jul 2025 14:11:54 GMT
- Title: Learning Flexible Forward Trajectories for Masked Molecular Diffusion
- Authors: Hyunjin Seo, Taewon Kim, Sihyun Yu, SungSoo Ahn,
- Abstract summary: Masked diffusion models (MDMs) have achieved notable progress in modeling discrete data, while their potential in molecular generation remains underexplored.<n>We introduce the surprising result that naively applying standards MDMs severely degrades the performance.<n>We propose Masked Element-wise Learnable Diffusion (MELD) that orchestrates per-element corruption trajectories to avoid collision between distinct molecular graphs.
- Score: 14.219676204069655
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Masked diffusion models (MDMs) have achieved notable progress in modeling discrete data, while their potential in molecular generation remains underexplored. In this work, we explore their potential and introduce the surprising result that naively applying standards MDMs severely degrades the performance. We identify the critical cause of this issue as a state-clashing problem-where the forward diffusion of distinct molecules collapse into a common state, resulting in a mixture of reconstruction targets that cannot be learned using typical reverse diffusion process with unimodal predictions. To mitigate this, we propose Masked Element-wise Learnable Diffusion (MELD) that orchestrates per-element corruption trajectories to avoid collision between distinct molecular graphs. This is achieved through a parameterized noise scheduling network that assigns distinct corruption rates to individual graph elements, i.e., atoms and bonds. Extensive experiments on diverse molecular benchmarks reveal that MELD markedly enhances overall generation quality compared to element-agnostic noise scheduling, increasing the chemical validity of vanilla MDMs on ZINC250K from 15% to 93%, Furthermore, it achieves state-of-the-art property alignment in conditional generation tasks.
Related papers
- Graph Diffusion that can Insert and Delete [14.488714063757278]
Generative models of graphs based on discrete Denoising Diffusion Probabilistic Models (DDPMs) offer a principled approach to molecular generation.<n>In this paper, we reformulate the noising and denoising processes to support monotonic insertion and deletion of nodes.<n>The resulting model, which we call GrIDDD, dynamically grows or shrinks the chemical graph during generation.
arXiv Detail & Related papers (2025-06-06T19:45:45Z) - Remasking Discrete Diffusion Models with Inference-Time Scaling [12.593164604625384]
We introduce the remasking diffusion model (ReMDM) sampler, a method that can be applied to pretrained masked diffusion models in a principled way.<n>Most interestingly, ReMDM endows discrete diffusion with a form of inference-time compute scaling.
arXiv Detail & Related papers (2025-03-01T02:37:51Z) - FragFM: Efficient Fragment-Based Molecular Generation via Discrete Flow Matching [0.3345437353879254]
We introduce FragFM, a novel fragment-based discrete flow matching framework for molecular graph generation.<n>FragFM generates molecules at the fragment level, leveraging a coarse-to-fine autoencoding mechanism to reconstruct atom-level details.
arXiv Detail & Related papers (2025-02-19T07:01:00Z) - DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
We present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task.<n>To develop a robust decoder that bridges latent embeddings and molecular structures, we pretrain the diffusion decoder with fingerprint-structure pairs.<n>Experiments on established benchmarks show that DiffMS outperforms existing models on de novo molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z) - Conditional Synthesis of 3D Molecules with Time Correction Sampler [58.0834973489875]
Time-Aware Conditional Synthesis (TACS) is a novel approach to conditional generation on diffusion models.
It integrates adaptively controlled plug-and-play "online" guidance into a diffusion model, driving samples toward the desired properties.
arXiv Detail & Related papers (2024-11-01T12:59:25Z) - Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Mitigating Exposure Bias in Score-Based Generation of Molecular Conformations [6.442534896075223]
We propose a method for measuring exposure bias in Score-Based Generative Models used for molecular conformation generation.
We design a new compensation algorithm Input Perturbation (IP), which is adapted from a method originally designed for DPMs only.
We achieve new state-of-the-art performance on the GEOM-Drugs dataset and are on par with GEOM-QM9.
arXiv Detail & Related papers (2024-09-21T04:54:37Z) - Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling [47.82616476928464]
Masked diffusion models (MDMs) have emerged as a popular research topic for generative modeling of discrete data.<n>We show that both training and sampling of MDMs are theoretically free from the time variable.<n>We identify, for the first time, an underlying numerical issue, even with the commonly used 32-bit floating-point precision.
arXiv Detail & Related papers (2024-09-04T17:48:19Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.