Related papers: Large Language Models for Controllable Multi-property Multi-objective Molecule Optimization

Large Language Models for Controllable Multi-property Multi-objective Molecule Optimization

URL: http://arxiv.org/abs/2505.23987v1
Date: Thu, 29 May 2025 20:29:14 GMT
Title: Large Language Models for Controllable Multi-property Multi-objective Molecule Optimization
Authors: Vishal Dey, Xiao Hu, Xia Ning,
Abstract summary: We introduce C-MuMOInstruct, the first instruction-tuning dataset focused on multi-property optimization with explicit, property-specific objectives.<n>We develop GeLLMO-Cs, a series of instruction-tuned LLMs that can perform targeted property-specific optimization.<n>Our experiments show that GeLLMO-Cs consistently outperform strong baselines, achieving up to 126% higher success rate.
Score: 2.152507712409726
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In real-world drug design, molecule optimization requires selectively improving multiple molecular properties up to pharmaceutically relevant levels, while maintaining others that already meet such criteria. However, existing computational approaches and instruction-tuned LLMs fail to capture such nuanced property-specific objectives, limiting their practical applicability. To address this, we introduce C-MuMOInstruct, the first instruction-tuning dataset focused on multi-property optimization with explicit, property-specific objectives. Leveraging C-MuMOInstruct, we develop GeLLMO-Cs, a series of instruction-tuned LLMs that can perform targeted property-specific optimization. Our experiments across 5 in-distribution and 5 out-of-distribution tasks show that GeLLMO-Cs consistently outperform strong baselines, achieving up to 126% higher success rate. Notably, GeLLMO-Cs exhibit impressive 0-shot generalization to novel optimization tasks and unseen instructions. This offers a step toward a foundational LLM to support realistic, diverse optimizations with property-specific objectives. C-MuMOInstruct and code are accessible through https://github.com/ninglab/GeLLMO-C.

Related papers

Generalizable Heuristic Generation Through Large Language Models with Meta-Optimization [14.919482411153185]
Heuristic design with large language models (LLMs) has emerged as a promising approach for tackling optimization problems.<n>Existing approaches often rely on manually predefined evolutionary generalizations and single-task training schemes.<n>We propose Meta-Optimization of Heuristics (MoH), a novel framework that operates at the level of meta-learning.
arXiv Detail & Related papers (2025-05-27T08:26:27Z)
ChatMol: A Versatile Molecule Designer Based on the Numerically Enhanced Large Language Model [11.166536730901102]
Goal-oriented de novo molecule design is a crucial yet challenging task in drug discovery.<n>We propose ChatMol, a novel approach that leverages Large Language Models for molecule design across diverse constraint settings.<n> Experimental results across single-property, substructure-property, and multi-property constrained tasks demonstrate that ChatMol consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-02-27T06:05:45Z)
GeLLMO: Generalizing Large Language Models for Multi-property Molecule Optimization [2.152507712409726]
Large Language Models (LLMs) demonstrate remarkable out-of-domain generalizability to novel tasks.<n>We introduce MuMOInstruct, the first high-quality instruction-tuning dataset specifically focused on complex multi-property molecule optimization tasks.<n>We develop GeLLMOs, a series of instruction-tuned LLMs for molecule optimization.
arXiv Detail & Related papers (2025-02-19T03:14:11Z)
DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization [53.27954325490941]
Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives.<n>This research introduces a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model.
arXiv Detail & Related papers (2025-02-11T04:00:21Z)
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks.<n>Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales.<n>We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z)
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization [65.64108848398696]
We introduce a preference optimization (PO) process to enhance the multimodal reasoning capabilities of MLLMs.<n>Specifically, we design an automated preference data construction pipeline to create MMPR, a high-quality, large-scale multimodal reasoning preference dataset.<n>We explore integrating PO with MLLMs, developing a simple yet effective method, termed Mixed Preference Optimization (MPO), which boosts multimodal CoT performance.
arXiv Detail & Related papers (2024-11-15T18:59:27Z)
MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time [50.41806216615488]
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora. To make LLMs more usable, aligning them with human preferences is essential. We propose an effective method, textbf MetaAlign, which aims to help LLMs dynamically align with various explicit or implicit preferences specified at inference time.
arXiv Detail & Related papers (2024-10-18T05:31:13Z)
XMOL: Explainable Multi-property Optimization of Molecules [2.320539066224081]
We propose Explainable Multi-property Optimization of Molecules (XMOL) to optimize multiple molecular properties simultaneously. Our approach builds on state-of-the-art geometric diffusion models, extending them to multi-property optimization. We integrate interpretive and explainable techniques throughout the optimization process.
arXiv Detail & Related papers (2024-09-12T06:35:04Z)
Multi-Reference Preference Optimization for Large Language Models [56.84730239046117]
We introduce a novel closed-form formulation for direct preference optimization using multiple reference models. The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models. Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance.
arXiv Detail & Related papers (2024-05-26T00:29:04Z)
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark [166.40879020706151]
This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during fine-tuning. Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.
arXiv Detail & Related papers (2024-02-18T14:08:48Z)
Are Large Language Models Good Prompt Optimizers? [65.48910201816223]
We conduct a study to uncover the actual mechanism of LLM-based Prompt Optimization. Our findings reveal that the LLMs struggle to identify the true causes of errors during reflection, tending to be biased by their own prior knowledge. We introduce a new "Automatic Behavior Optimization" paradigm, which directly optimize the target model's behavior in a more controllable manner.
arXiv Detail & Related papers (2024-02-03T09:48:54Z)
DrugAssist: A Large Language Model for Molecule Optimization [29.95488215594247]
DrugAssist is an interactive molecule optimization model that performs optimization through human-machine dialogue. DrugAssist has achieved leading results in both single and multiple property optimization. We publicly release a large instruction-based dataset called MolOpt-Instructions for fine-tuning language models on molecule optimization tasks.
arXiv Detail & Related papers (2023-12-28T10:46:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.