Cross-Modality Controlled Molecule Generation with Diffusion Language Model
- URL: http://arxiv.org/abs/2508.14748v1
- Date: Wed, 20 Aug 2025 14:48:44 GMT
- Title: Cross-Modality Controlled Molecule Generation with Diffusion Language Model
- Authors: Yunzhe Zhang, Yifei Wang, Khanh Vinh Nguyen, Pengyu Hong,
- Abstract summary: Cross-Modality Controlled Molecule Generation with Diffusion Language Model (CMCM-DLM)<n>Our approach builds upon a pre-trained diffusion model, incorporating two trainable modules, the Structure Control Module (SCM) and the Property Control Module (PCM)<n>Phase I employs the SCM to inject structural constraints during the early diffusion steps, effectively anchoring the molecular backbone.<n>Phase II builds on this by further introducing PCM to guide the later stages of inference to refine the generated molecules, ensuring their chemical properties match the specified targets.
- Score: 14.435311248340824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current SMILES-based diffusion models for molecule generation typically support only unimodal constraint. They inject conditioning signals at the start of the training process and require retraining a new model from scratch whenever the constraint changes. However, real-world applications often involve multiple constraints across different modalities, and additional constraints may emerge over the course of a study. This raises a challenge: how to extend a pre-trained diffusion model not only to support cross-modality constraints but also to incorporate new ones without retraining. To tackle this problem, we propose the Cross-Modality Controlled Molecule Generation with Diffusion Language Model (CMCM-DLM), demonstrated by two distinct cross modalities: molecular structure and chemical properties. Our approach builds upon a pre-trained diffusion model, incorporating two trainable modules, the Structure Control Module (SCM) and the Property Control Module (PCM), and operates in two distinct phases during the generation process. In Phase I, we employs the SCM to inject structural constraints during the early diffusion steps, effectively anchoring the molecular backbone. Phase II builds on this by further introducing PCM to guide the later stages of inference to refine the generated molecules, ensuring their chemical properties match the specified targets. Experimental results on multiple datasets demonstrate the efficiency and adaptability of our approach, highlighting CMCM-DLM's significant advancement in molecular generation for drug discovery applications.
Related papers
- MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models [37.89307688620534]
We introduce MolHIT, a powerful molecular graph generation framework that overcomes long-standing performance limitations in existing methods.<n>Overall, MolHIT achieves new state-of-the-art performance on the MOSES dataset with near-perfect validity for the first time in graph diffusion.
arXiv Detail & Related papers (2026-02-19T18:27:11Z) - MODA: A Unified 3D Diffusion Framework for Multi-Task Target-Aware Molecular Generation [16.07694748790297]
We introduce MODA, a diffusion framework that unifies fragment growing, linker design, scaffold hopping, and side-chain decoration with a Bayesian mask scheduler.<n>During training, a contiguous spatial fragment is masked and then denoised in one pass, enabling the model to learn shared geometric and chemical priors across tasks.
arXiv Detail & Related papers (2025-07-09T18:19:50Z) - AdaptMol: Adaptive Fusion from Sequence String to Topological Structure for Few-shot Drug Discovery [7.338199946027998]
We present AdaptMol, a prototypical network integrating Adaptive multimodal fusion for representation.<n>This framework employs a dual-level attention mechanism to dynamically integrate global and local molecular features.<n>Experiments on three commonly used benchmarks under 5-shot and 10-shot settings demonstrate that AdaptMol achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-05-17T07:12:12Z) - Conditional Synthesis of 3D Molecules with Time Correction Sampler [58.0834973489875]
Time-Aware Conditional Synthesis (TACS) is a novel approach to conditional generation on diffusion models.
It integrates adaptively controlled plug-and-play "online" guidance into a diffusion model, driving samples toward the desired properties.
arXiv Detail & Related papers (2024-11-01T12:59:25Z) - LDMol: A Text-to-Molecule Diffusion Model with Structurally Informative Latent Space Surpasses AR Models [55.5427001668863]
We present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation.<n> Experiments show that LDMol outperforms the existing autoregressive baselines on the text-to-molecule generation benchmark.<n>We show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing.
arXiv Detail & Related papers (2024-05-28T04:59:13Z) - ControlMol: Adding Substructure Control To Molecule Diffusion Models [2.8372258697984627]
We propose a two-stage training approach, consisting of condition learning and condition optimization.<n>In our experiments, only trained on randomly partitioned sub-structure data, the proposed method outperforms previous techniques by generating more valid and diverse molecules.
arXiv Detail & Related papers (2024-04-22T14:36:19Z) - AUTODIFF: Autoregressive Diffusion Modeling for Structure-based Drug Design [16.946648071157618]
We propose a diffusion-based fragment-wise autoregressive generation model for structure-based drug design (SBDD)
We design a novel molecule assembly strategy named conformal motif that preserves the conformation of local structures of molecules first.
We then encode the interaction of the protein-ligand complex with an SE(3)-equivariant convolutional network and generate molecules motif-by-motif with diffusion modeling.
arXiv Detail & Related papers (2024-04-02T14:44:02Z) - DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design [62.68420322996345]
Existing structured-based drug design methods treat all ligand atoms equally.
We propose a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold.
Our approach achieves state-of-the-art performance in generating high-affinity molecules.
arXiv Detail & Related papers (2024-02-26T05:21:21Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z) - Scaffold-constrained molecular generation [0.0]
We build on the well-known SMILES-based Recurrent Neural Network (RNN) generative model, with a modified sampling procedure to achieve scaffold-constrained generation.
We showcase the method's ability to perform scaffold-constrained generation on various tasks.
arXiv Detail & Related papers (2020-09-15T15:41:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.