ChemBOMAS: Accelerated BO in Chemistry with LLM-Enhanced Multi-Agent System
- URL: http://arxiv.org/abs/2509.08736v2
- Date: Mon, 10 Nov 2025 15:52:53 GMT
- Title: ChemBOMAS: Accelerated BO in Chemistry with LLM-Enhanced Multi-Agent System
- Authors: Dong Han, Zhehong Ai, Pengxiang Cai, Shanya Lu, Jianpeng Chen, Zihao Ye, Shuzhou Sun, Ben Gao, Lingli Ge, Weida Wang, Xiangxin Zhou, Xihui Liu, Mao Su, Wanli Ouyang, Lei Bai, Dongzhan Zhou, Tao Xu, Yuqiang Li, Shufei Zhang,
- Abstract summary: We introduce ChemBOMAS: a large language model (LLM)-enhanced multi-agent system that accelerates Bayesian optimization.<n>Data-driven strategy involves an 8B-scale LLM regressor fine-tuned on a mere 1% labeled samples.<n>The knowledge-driven strategy employs a hybrid Retrieval-Augmented Generation approach to guide LLM in dividing the search space.<n>ChemBOMAS set a new state-of-the-art, accelerating optimization efficiency by up to 5-fold compared to baseline methods.
- Score: 72.63341091857959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian optimization (BO) is a powerful tool for scientific discovery in chemistry, yet its efficiency is often hampered by the sparse experimental data and vast search space. Here, we introduce ChemBOMAS: a large language model (LLM)-enhanced multi-agent system that accelerates BO through synergistic data- and knowledge-driven strategies. Firstly, the data-driven strategy involves an 8B-scale LLM regressor fine-tuned on a mere 1% labeled samples for pseudo-data generation, robustly initializing the optimization process. Secondly, the knowledge-driven strategy employs a hybrid Retrieval-Augmented Generation approach to guide LLM in dividing the search space while mitigating LLM hallucinations. An Upper Confidence Bound algorithm then identifies high-potential subspaces within this established partition. Across the LLM-refined subspaces and supported by LLM-generated data, BO achieves the improvement of effectiveness and efficiency. Comprehensive evaluations across multiple scientific benchmarks demonstrate that ChemBOMAS set a new state-of-the-art, accelerating optimization efficiency by up to 5-fold compared to baseline methods.
Related papers
- Pre-trained knowledge elevates large language models beyond traditional chemical reaction optimizers [0.0]
We demonstrate that pre-trained knowledge in large language models (LLMs) fundamentally changes this paradigm.<n>LLM-GO excels precisely where traditional methods struggle: complex categorical spaces requiring domain understanding rather than mathematical optimization.
arXiv Detail & Related papers (2025-08-27T21:09:51Z) - Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs [13.478684527247129]
This paper designs Reasoning BO, a novel framework that leverages reasoning models to guide the sampling process in BO.<n> Reasoning BO provides real-time sampling recommendations along with critical insights grounded in plausible scientific theories.<n>The framework demonstrates its capability to progressively refine sampling strategies through real-time insights and hypothesis evolution.
arXiv Detail & Related papers (2025-05-19T08:20:40Z) - Distilling and exploiting quantitative insights from Large Language Models for enhanced Bayesian optimization of chemical reactions [0.0]
Large language models (LLMs) have demonstrated that chemical information present in foundation training data can give them utility for processing chemical data.<n>We show that chemical information from LLMs can be elicited and used for transfer learning to accelerate the BO of reaction conditions to maximize yield.
arXiv Detail & Related papers (2025-04-11T12:45:07Z) - Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs [38.86873408585195]
Web browsing agents powered by large language models (LLMs) have shown tremendous potential in automating complex web-based tasks.<n>Existing approaches typically rely on large LLMs to explore web environments and generate trajectory data.<n>We propose AgentSymbiotic, an iterative framework that couples data synthesis with task-performance.
arXiv Detail & Related papers (2025-02-11T20:41:49Z) - DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization [53.27954325490941]
Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives.<n>This research introduces a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model.
arXiv Detail & Related papers (2025-02-11T04:00:21Z) - The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? [54.18519360412294]
Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility.<n>This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance.<n>We analyze experimental results obtained from testing DeepSeek-R1 on our benchmark and reveal the critical ethical concerns raised by this highly acclaimed model.
arXiv Detail & Related papers (2025-01-20T06:35:01Z) - Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - Sequential Large Language Model-Based Hyper-parameter Optimization [0.0]
This study introduces SLLMBO, an innovative framework leveraging large language models (LLMs) for hyper- parameter optimization (HPO)<n>It incorporates dynamic search space adaptability, enhanced parameter space exploitation, and a novel LLM-tree-structured parzen estimator (LLM-TPE) sampler.<n>This comprehensive benchmarking evaluates multiple LLMs, including GPT-3.5-Turbo, GPT-4o, Claude-Sonnet-3.5, and Gemini-1.5-Flash.
arXiv Detail & Related papers (2024-10-27T00:50:30Z) - Ranking over Regression for Bayesian Optimization and Molecule Selection [0.0680892187976602]
We introduce Rank-based Bayesian Optimization (RBO), which utilizes a ranking model as the surrogate.
We present a comprehensive investigation of RBO's optimization performance compared to conventional BO on various chemical datasets.
We conclude RBO is an effective alternative to regression-based BO, especially for optimizing novel chemical compounds.
arXiv Detail & Related papers (2024-10-11T22:38:14Z) - Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System [75.25394449773052]
Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving.<n>Yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods.<n>We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness.
arXiv Detail & Related papers (2024-10-10T17:00:06Z) - Applying Multi-Fidelity Bayesian Optimization in Chemistry: Open Challenges and Major Considerations [0.0]
Multi fidelity Bayesian optimization (MFBO) leverages experimental and or computational data of varying quality and resource cost to optimize towards desired maxima cost effectively.
Here, we investigate the application of MFBO to accelerate the identification of promising molecules or materials.
arXiv Detail & Related papers (2024-09-11T11:22:17Z) - SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [50.26966969163348]
Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG)
Existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries.
We propose Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm.
arXiv Detail & Related papers (2024-06-17T06:48:31Z) - Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark [166.40879020706151]
This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during fine-tuning.
Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques.
Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.
arXiv Detail & Related papers (2024-02-18T14:08:48Z) - Optimizing Molecules using Efficient Queries from Property Evaluations [66.66290256377376]
We propose QMO, a generic query-based molecule optimization framework.
QMO improves the desired properties of an input molecule based on efficient queries.
We show that QMO outperforms existing methods in the benchmark tasks of optimizing small organic molecules.
arXiv Detail & Related papers (2020-11-03T18:51:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.