M^4olGen: Multi-Agent, Multi-Stage Molecular Generation under Precise Multi-Property Constraints
- URL: http://arxiv.org/abs/2601.10131v2
- Date: Fri, 16 Jan 2026 04:22:46 GMT
- Title: M^4olGen: Multi-Agent, Multi-Stage Molecular Generation under Precise Multi-Property Constraints
- Authors: Yizhan Li, Florence Cloutier, Sifan Wu, Ali Parviz, Boris Knyazev, Yan Zhang, Glen Berseth, Bang Liu,
- Abstract summary: textbfM olGen is a fragment-level, retrieval-augmented, two-stage framework for molecule generation under multi-property constraints.<n>A dataset with reasoning chains of fragment edits and measured property deltas underpins both stages.<n>Experiments on generation under two sets of property constraints show consistent gains in validity and precise satisfaction of multi-property targets.
- Score: 35.83366265234892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating molecules that satisfy precise numeric constraints over multiple physicochemical properties is critical and challenging. Although large language models (LLMs) are expressive, they struggle with precise multi-objective control and numeric reasoning without external structure and feedback. We introduce \textbf{M olGen}, a fragment-level, retrieval-augmented, two-stage framework for molecule generation under multi-property constraints. Stage I : Prototype generation: a multi-agent reasoner performs retrieval-anchored, fragment-level edits to produce a candidate near the feasible region. Stage II : RL-based fine-grained optimization: a fragment-level optimizer trained with Group Relative Policy Optimization (GRPO) applies one- or multi-hop refinements to explicitly minimize the property errors toward our target while regulating edit complexity and deviation from the prototype. A large, automatically curated dataset with reasoning chains of fragment edits and measured property deltas underpins both stages, enabling deterministic, reproducible supervision and controllable multi-hop reasoning. Unlike prior work, our framework better reasons about molecules by leveraging fragments and supports controllable refinement toward numeric targets. Experiments on generation under two sets of property constraints (QED, LogP, Molecular Weight and HOMO, LUMO) show consistent gains in validity and precise satisfaction of multi-property targets, outperforming strong LLMs and graph-based algorithms.
Related papers
- MoRA: On-the-fly Molecule-aware Low-Rank Adaptation Framework for LLM-based Multi-Modal Molecular Assistant [23.60380089071549]
We propose an instance-specific parameter space alignment approach for each molecule on-the-fly.<n>MoRA produces a unique set of low-rank adaptation weights for each input molecular graph.<n>Experiments demonstrate that MoRA's instance-specific dynamic adaptation outperforms statically adapted baselines.
arXiv Detail & Related papers (2025-10-14T07:54:43Z) - Composable Score-based Graph Diffusion Model for Multi-Conditional Molecular Generation [85.58520120011269]
We propose Composable Score-based Graph Diffusion model (CSGD), which extends score matching to discrete graphs via concrete scores.<n>We show that CSGD achieves state-of-the-art performance with a 15.3% average improvement in controllability over prior methods.<n>Our findings highlight the practical advantages of score-based modeling for discrete graph generation and its capacity for flexible, multi-property molecular design.
arXiv Detail & Related papers (2025-09-11T13:37:56Z) - Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization [51.104444856052204]
We present MultiMol, a collaborative large language model (LLM) system designed to guide multi-objective molecular optimization.<n>In evaluations across six multi-objective optimization tasks, MultiMol significantly outperforms existing methods, achieving a 82.30% success rate.
arXiv Detail & Related papers (2025-03-05T13:47:55Z) - ChatMol: A Versatile Molecule Designer Based on the Numerically Enhanced Large Language Model [11.166536730901102]
Goal-oriented de novo molecule design is a crucial yet challenging task in drug discovery.<n>We propose ChatMol, a novel approach that leverages Large Language Models for molecule design across diverse constraint settings.<n> Experimental results across single-property, substructure-property, and multi-property constrained tasks demonstrate that ChatMol consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-02-27T06:05:45Z) - Multi-Attribute Constraint Satisfaction via Language Model Rewriting [67.5778646504987]
Multi-Attribute Constraint Satisfaction (MACS) is a method capable of finetuning language models to satisfy user-specified constraints on multiple external real-value attributes.<n>Our work opens new avenues for generalized and real-value multi-attribute control, with implications for diverse applications spanning NLP and bioinformatics.
arXiv Detail & Related papers (2024-12-26T12:36:39Z) - Balancing property optimization and constraint satisfaction for constrained multi-property molecular optimization [13.665517935917048]
We propose a constrained multi-property molecular optimization framework (CMOMO), which is a flexible and efficient method to simultaneously optimize multiple molecular properties.
Experimental results show the superior performance of the proposed CMOMO over five state-of-the-art molecular optimization methods.
arXiv Detail & Related papers (2024-11-19T02:01:13Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - Controlled Molecule Generator for Optimizing Multiple Chemical
Properties [9.10095508718581]
We propose a new optimized molecule generator model based on the Transformer with two constraint networks.
Experiments demonstrate that our proposed model outperforms state-of-the-art models by a significant margin for optimizing multiple properties simultaneously.
arXiv Detail & Related papers (2020-10-26T21:26:14Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.