SEISMO: Increasing Sample Efficiency in Molecular Optimization with a Trajectory-Aware LLM Agent
- URL: http://arxiv.org/abs/2602.00663v1
- Date: Sat, 31 Jan 2026 11:23:48 GMT
- Title: SEISMO: Increasing Sample Efficiency in Molecular Optimization with a Trajectory-Aware LLM Agent
- Authors: Fabian P. Krüger, Andrea Hunklinger, Adrian Wolny, Tim J. Adler, Igor Tetko, Santiago David Villalba,
- Abstract summary: We introduce SEISMO, an online, inference-time molecular optimization agent.<n>It updates after every oracle call without the need for population-based or batched learning.<n>It achieves a 2-3 times higher area under the curve than prior methods, often reaching near-maximal task scores within 50 oracle calls.
- Score: 0.7377073690542307
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Optimizing the structure of molecules to achieve desired properties is a central bottleneck across the chemical sciences, particularly in the pharmaceutical industry where it underlies the discovery of new drugs. Since molecular property evaluation often relies on costly and rate-limited oracles, such as experimental assays, molecular optimization must be highly sample-efficient. To address this, we introduce SEISMO, an LLM agent that performs strictly online, inference-time molecular optimization, updating after every oracle call without the need for population-based or batched learning. SEISMO conditions each proposal on the full optimization trajectory, combining natural-language task descriptions with scalar scores and, when available, structured explanatory feedback. Across the Practical Molecular Optimization benchmark of 23 tasks, SEISMO achieves a 2-3 times higher area under the optimisation curve than prior methods, often reaching near-maximal task scores within 50 oracle calls. Our additional medicinal-chemistry tasks show that providing explanatory feedback further improves efficiency, demonstrating that leveraging domain knowledge and structured information is key to sample-efficient molecular optimization.
Related papers
- DrugR: Optimizing Molecular Drugs through LLM-based Explicit Reasoning [24.70952870676648]
DrugR is a large language model that introduces explicit, step-by-step pharmacological reasoning into the optimization process.<n>Our approach integrates domain-specific continual pretraining, supervised fine-tuning via reverse data engineering, and self-balanced multi-granular reinforcement learning.<n> Experimental results demonstrate that DrugR achieves comprehensive enhancement across multiple properties without compromising structural similarity or target binding affinity.
arXiv Detail & Related papers (2026-02-09T02:26:25Z) - Purely Agentic Black-Box Optimization for Biological Design [34.69167338457496]
Key challenges in biological design-such as small-molecule drug discovery, antimicrobial peptide development, and protein engineering-can be framed as black-box optimization.<n>Existing methods rely mainly on raw structural data and struggle to exploit the rich scientific literature.<n>We cast biological black-box optimization as a fully agentic, language-based reasoning process.
arXiv Detail & Related papers (2026-01-29T22:45:07Z) - ChemBOMAS: Accelerated BO in Chemistry with LLM-Enhanced Multi-Agent System [72.63341091857959]
We introduce ChemBOMAS: a large language model (LLM)-enhanced multi-agent system that accelerates Bayesian optimization.<n>Data-driven strategy involves an 8B-scale LLM regressor fine-tuned on a mere 1% labeled samples.<n>The knowledge-driven strategy employs a hybrid Retrieval-Augmented Generation approach to guide LLM in dividing the search space.<n>ChemBOMAS set a new state-of-the-art, accelerating optimization efficiency by up to 5-fold compared to baseline methods.
arXiv Detail & Related papers (2025-09-10T16:24:08Z) - ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z) - Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization [51.104444856052204]
We present MultiMol, a collaborative large language model (LLM) system designed to guide multi-objective molecular optimization.<n>In evaluations across six multi-objective optimization tasks, MultiMol significantly outperforms existing methods, achieving a 82.30% success rate.
arXiv Detail & Related papers (2025-03-05T13:47:55Z) - DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization [53.27954325490941]
Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives.<n>This research introduces a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model.
arXiv Detail & Related papers (2025-02-11T04:00:21Z) - Conditional Latent Space Molecular Scaffold Optimization for Accelerated Molecular Design [14.63544637436396]
We introduce Conditional Latent Space Molecular Scaffold Optimization (CLaSMO)<n>CLaSMO integrates a Conditional Variational Autoencoder (CVAE) with Latent Space Bayesian Optimization (LSBO) to strategically modify molecules.<n>Our LSBO setting improves the sample-efficiency of the molecular optimization, and our modification approach helps us to obtain molecules with higher chances of real-world applicability.
arXiv Detail & Related papers (2024-11-03T03:17:38Z) - Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model [20.250683535089617]
We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM)<n>By fusing physically and chemically detailed semantics with specialized molecular representations, TransDLM effectively integrates diverse information sources to guide precise optimization.
arXiv Detail & Related papers (2024-10-17T14:30:27Z) - XMOL: Explainable Multi-property Optimization of Molecules [2.320539066224081]
We propose Explainable Multi-property Optimization of Molecules (XMOL) to optimize multiple molecular properties simultaneously.
Our approach builds on state-of-the-art geometric diffusion models, extending them to multi-property optimization.
We integrate interpretive and explainable techniques throughout the optimization process.
arXiv Detail & Related papers (2024-09-12T06:35:04Z) - DrugAssist: A Large Language Model for Molecule Optimization [29.95488215594247]
DrugAssist is an interactive molecule optimization model that performs optimization through human-machine dialogue.
DrugAssist has achieved leading results in both single and multiple property optimization.
We publicly release a large instruction-based dataset called MolOpt-Instructions for fine-tuning language models on molecule optimization tasks.
arXiv Detail & Related papers (2023-12-28T10:46:56Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - Optimizing Molecules using Efficient Queries from Property Evaluations [66.66290256377376]
We propose QMO, a generic query-based molecule optimization framework.
QMO improves the desired properties of an input molecule based on efficient queries.
We show that QMO outperforms existing methods in the benchmark tasks of optimizing small organic molecules.
arXiv Detail & Related papers (2020-11-03T18:51:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.