Purely Agentic Black-Box Optimization for Biological Design
- URL: http://arxiv.org/abs/2601.22382v1
- Date: Thu, 29 Jan 2026 22:45:07 GMT
- Title: Purely Agentic Black-Box Optimization for Biological Design
- Authors: Natalie Maus, Yimeng Zeng, Haydn Thomas Jones, Yining Huang, Gaurav Ng Goel, Alden Rose, Kyurae Kim, Hyun-Su Lee, Marcelo Der Torossian Torres, Fangping Wan, Cesar de la Fuente-Nunez, Mark Yatskar, Osbert Bastani, Jacob R. Gardner,
- Abstract summary: Key challenges in biological design-such as small-molecule drug discovery, antimicrobial peptide development, and protein engineering-can be framed as black-box optimization.<n>Existing methods rely mainly on raw structural data and struggle to exploit the rich scientific literature.<n>We cast biological black-box optimization as a fully agentic, language-based reasoning process.
- Score: 34.69167338457496
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many key challenges in biological design-such as small-molecule drug discovery, antimicrobial peptide development, and protein engineering-can be framed as black-box optimization over vast, complex structured spaces. Existing methods rely mainly on raw structural data and struggle to exploit the rich scientific literature. While large language models (LLMs) have been added to these pipelines, they have been confined to narrow roles within structure-centered optimizers. We instead cast biological black-box optimization as a fully agentic, language-based reasoning process. We introduce Purely Agentic BLack-box Optimization (PABLO), a hierarchical agentic system that uses scientific LLMs pretrained on chemistry and biology literature to generate and iteratively refine biological candidates. On both the standard GuacaMol molecular design and antimicrobial peptide optimization tasks, PABLO achieves state-of-the-art performance, substantially improving sample efficiency and final objective values over established baselines. Compared to prior optimization methods that incorporate LLMs, PABLO achieves competitive token usage per run despite relying on LLMs throughout the optimization loop. Beyond raw performance, the agentic formulation offers key advantages for realistic design: it naturally incorporates semantic task descriptions, retrieval-augmented domain knowledge, and complex constraints. In follow-up in vitro validation, PABLO-optimized peptides showed strong activity against drug-resistant pathogens, underscoring the practical potential of PABLO for therapeutic discovery.
Related papers
- BioLM-Score: Language-Prior Conditioned Probabilistic Geometric Potentials for Protein-Ligand Scoring [23.407269396970168]
We present BioLM-Score, a simple yet generalizable protein-ligand scoring model that couples modeling with representation learning.<n> Evaluations on the CASF-2016 benchmark demonstrate significant improvements across docking, scoring, ranking, and screening tasks.<n>In summary, BioLM-Score provides a principled and practical alternative to existing scoring functions, combining efficiency, generalization, and interpretability for structure-based drug discovery.
arXiv Detail & Related papers (2026-02-09T12:31:49Z) - DrugR: Optimizing Molecular Drugs through LLM-based Explicit Reasoning [24.70952870676648]
DrugR is a large language model that introduces explicit, step-by-step pharmacological reasoning into the optimization process.<n>Our approach integrates domain-specific continual pretraining, supervised fine-tuning via reverse data engineering, and self-balanced multi-granular reinforcement learning.<n> Experimental results demonstrate that DrugR achieves comprehensive enhancement across multiple properties without compromising structural similarity or target binding affinity.
arXiv Detail & Related papers (2026-02-09T02:26:25Z) - SEISMO: Increasing Sample Efficiency in Molecular Optimization with a Trajectory-Aware LLM Agent [0.7377073690542307]
We introduce SEISMO, an online, inference-time molecular optimization agent.<n>It updates after every oracle call without the need for population-based or batched learning.<n>It achieves a 2-3 times higher area under the curve than prior methods, often reaching near-maximal task scores within 50 oracle calls.
arXiv Detail & Related papers (2026-01-31T11:23:48Z) - Self Distillation Fine-Tuning of Protein Language Models Improves Versatility in Protein Design [61.2846583160056]
Supervised fine-tuning (SFT) is a standard approach for adapting large language models to specialized domains.<n>This is in part because high-quality annotated data are far more difficult to obtain for proteins than for natural language.<n>We present a simple and general recipe for fast SFT of PLMs, designed to improve the fidelity, reliability, and novelty of generated protein sequences.
arXiv Detail & Related papers (2025-12-10T05:34:47Z) - ChemBOMAS: Accelerated BO in Chemistry with LLM-Enhanced Multi-Agent System [72.63341091857959]
We introduce ChemBOMAS: a large language model (LLM)-enhanced multi-agent system that accelerates Bayesian optimization.<n>Data-driven strategy involves an 8B-scale LLM regressor fine-tuned on a mere 1% labeled samples.<n>The knowledge-driven strategy employs a hybrid Retrieval-Augmented Generation approach to guide LLM in dividing the search space.<n>ChemBOMAS set a new state-of-the-art, accelerating optimization efficiency by up to 5-fold compared to baseline methods.
arXiv Detail & Related papers (2025-09-10T16:24:08Z) - DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization [53.27954325490941]
Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives.<n>This research introduces a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model.
arXiv Detail & Related papers (2025-02-11T04:00:21Z) - Constrained composite Bayesian optimization for rational synthesis of polymeric particles [0.7499722271664147]
This study integrates constrained and composite Bayesian optimization (CCBO) to perform efficient target value optimization under black-box feasibility constraints.<n>CCBO strategically avoided infeasible conditions and efficiently optimized particle production towards predefined size targets.<n>Experiments validated CCBO capability to guide the rational synthesis of poly(lactic-co-glycolic acid) particles with diameters of 300 nm and 3.0 $mu$m via electrospraying.
arXiv Detail & Related papers (2024-11-06T14:40:03Z) - LightCPPgen: An Explainable Machine Learning Pipeline for Rational Design of Cell Penetrating Peptides [0.32985979395737786]
We introduce an innovative approach for the de novo design of CPPs, leveraging the strengths of machine learning (ML) and optimization algorithms.
Our strategy, named Light CPPgen, integrates a LightGBM-based predictive model with a genetic algorithm (GA)
The GA solutions specifically target the candidate sequences' penetrability score, while trying to maximize similarity with the original non-penetrating peptide.
arXiv Detail & Related papers (2024-05-31T10:57:25Z) - Large Language Models to Enhance Bayesian Optimization [57.474613739645605]
We present LLAMBO, a novel approach that integrates the capabilities of Large Language Models (LLM) within Bayesian optimization.
At a high level, we frame the BO problem in natural language, enabling LLMs to iteratively propose and evaluate promising solutions conditioned on historical evaluations.
Our findings illustrate that LLAMBO is effective at zero-shot warmstarting, and enhances surrogate modeling and candidate sampling, especially in the early stages of search when observations are sparse.
arXiv Detail & Related papers (2024-02-06T11:44:06Z) - Designing Biological Sequences via Meta-Reinforcement Learning and
Bayesian Optimization [68.28697120944116]
We train an autoregressive generative model via Meta-Reinforcement Learning to propose promising sequences for selection.
We pose this problem as that of finding an optimal policy over a distribution of MDPs induced by sampling subsets of the data.
Our in-silico experiments show that meta-learning over such ensembles provides robustness against reward misspecification and achieves competitive results.
arXiv Detail & Related papers (2022-09-13T18:37:27Z) - ODBO: Bayesian Optimization with Search Space Prescreening for Directed Protein Evolution [18.726398852721204]
We propose an efficient, experimental design-oriented closed-loop optimization framework for protein directed evolution.
ODBO employs a combination of novel low-dimensional protein encoding strategy and Bayesian optimization enhanced with search space prescreening via outlier detection.
We conduct and report four protein directed evolution experiments that substantiate the capability of the proposed framework for finding variants with properties of interest.
arXiv Detail & Related papers (2022-05-19T13:21:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.