Related papers: STELLA: Self-Evolving LLM Agent for Biomedical Research

STELLA: Self-Evolving LLM Agent for Biomedical Research

URL: http://arxiv.org/abs/2507.02004v1
Date: Tue, 01 Jul 2025 20:52:01 GMT
Title: STELLA: Self-Evolving LLM Agent for Biomedical Research
Authors: Ruofan Jin, Zaixi Zhang, Mengdi Wang, Le Cong,
Abstract summary: We introduce STELLA, a self-evolving AI agent designed to overcome limitations.<n> STELLA employs a multi-agent architecture that autonomously improves its own capabilities.<n>We demonstrate that STELLA achieves state-of-the-art accuracy on a suite of biomedical benchmarks.
Score: 40.841136388072385
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid growth of biomedical data, tools, and literature has created a fragmented research landscape that outpaces human expertise. While AI agents offer a solution, they typically rely on static, manually curated toolsets, limiting their ability to adapt and scale. Here, we introduce STELLA, a self-evolving AI agent designed to overcome these limitations. STELLA employs a multi-agent architecture that autonomously improves its own capabilities through two core mechanisms: an evolving Template Library for reasoning strategies and a dynamic Tool Ocean that expands as a Tool Creation Agent automatically discovers and integrates new bioinformatics tools. This allows STELLA to learn from experience. We demonstrate that STELLA achieves state-of-the-art accuracy on a suite of biomedical benchmarks, scoring approximately 26\% on Humanity's Last Exam: Biomedicine, 54\% on LAB-Bench: DBQA, and 63\% on LAB-Bench: LitQA, outperforming leading models by up to 6 percentage points. More importantly, we show that its performance systematically improves with experience; for instance, its accuracy on the Humanity's Last Exam benchmark almost doubles with increased trials. STELLA represents a significant advance towards AI Agent systems that can learn and grow, dynamically scaling their expertise to accelerate the pace of biomedical discovery.

Related papers

ConfAgents: A Conformal-Guided Multi-Agent Framework for Cost-Efficient Medical Diagnosis [11.18347744454527]
We introduce HealthFlow, a self-evolving AI agent that overcomes limitations through a novel meta-level evolution mechanism.<n>HealthFlow autonomously refines its own high-level problem-solving policies by distilling procedural successes and failures into a durable, strategic knowledge base.<n>Our experiments demonstrate that HealthFlow's self-evolving approach significantly outperforms state-of-the-art agent frameworks.
arXiv Detail & Related papers (2025-08-06T22:39:38Z)
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience [71.82719117238307]
We propose SEAgent, an agentic self-evolving framework enabling computer-use agents to evolve through interactions with unfamiliar software.<n>We validate the effectiveness of SEAgent across five novel software environments within OS-World.<n>Our approach achieves a significant improvement of 23.2% in success rate, from 11.3% to 34.5%, over a competitive open-source CUA.
arXiv Detail & Related papers (2025-08-06T17:58:46Z)
HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research [16.963151914975438]
We introduce HealthFlow, a self-evolving AI agent that overcomes limitations through a novel meta-level evolution mechanism.<n>HealthFlow autonomously refines its own high-level problem-solving policies by distilling procedural successes and failures into a durable, strategic knowledge base.<n>Our experiments demonstrate that HealthFlow's self-evolving approach significantly outperforms state-of-the-art agent frameworks.
arXiv Detail & Related papers (2025-08-04T17:08:47Z)
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence [87.08051686357206]
Large Language Models (LLMs) have demonstrated strong capabilities but remain fundamentally static.<n>As LLMs are increasingly deployed in open-ended, interactive environments, this static nature has become a critical bottleneck.<n>This survey provides the first systematic and comprehensive review of self-evolving agents.
arXiv Detail & Related papers (2025-07-28T17:59:05Z)
SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam? [51.112225746095746]
We introduce X-Master, a tool-augmented reasoning agent designed to emulate human researchers.<n>X-Masters sets a new state-of-the-art record on Humanity's Last Exam with a score of 32.1%.
arXiv Detail & Related papers (2025-07-07T17:50:52Z)
Towards Artificial Intelligence Research Assistant for Expert-Involved Learning [64.7438151207189]
Large Language Models (LLMs) and Large Multi-Modal Models (LMMs) have emerged as transformative tools in scientific research.<n>We present textbfARtificial textbfIntelligence research assistant for textbfExpert-involved textbfLearning (ARIEL)
arXiv Detail & Related papers (2025-05-03T14:21:48Z)
A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI [70.06771291117965]
We introduce Biomedica, an open-source dataset derived from the PubMed Central Open Access subset.<n>Biomedica contains over 6 million scientific articles and 24 million image-text pairs.<n>We provide scalable streaming and search APIs through a web server, facilitating seamless integration with AI systems.
arXiv Detail & Related papers (2025-03-26T05:56:46Z)
BioAgents: Democratizing Bioinformatics Analysis with Multi-Agent Systems [6.668992155393883]
We propose a multi-agent system built on small language models, fine-tuned on bioinformatics data, and enhanced with retrieval augmented generation (RAG)<n>Our system, BioAgents, enables local operation and personalization using proprietary data.<n>We observe performance comparable to human experts on conceptual genomics tasks, and suggest next steps to enhance code generation capabilities.
arXiv Detail & Related papers (2025-01-10T19:30:59Z)
BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions.<n>BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model.<n>It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z)
Exploring Autonomous Agents through the Lens of Large Language Models: A Review [0.0]
Large Language Models (LLMs) are transforming artificial intelligence, enabling autonomous agents to perform diverse tasks across various domains. They face challenges such as multimodality, human value alignment, hallucinations, and evaluation. Evaluation platforms like AgentBench, WebArena, and ToolLLM provide robust methods for assessing these agents in complex scenarios.
arXiv Detail & Related papers (2024-04-05T22:59:02Z)
Empowering Biomedical Discovery with AI Agents [15.125735219811268]
We envision "AI scientists" as systems capable of skeptical learning and reasoning. Biomedical AI agents combine human creativity and expertise with AI's ability to analyze large datasets. AI agents can impact areas ranging from virtual cell simulation, programmable control of phenotypes, and the design of cellular circuits to developing new therapies.
arXiv Detail & Related papers (2024-04-03T16:08:01Z)
ADVISE: AI-accelerated Design of Evidence Synthesis for Global Development [2.6293574825904624]
This study develops an AI agent based on a bidirectional encoder representations from transformers (BERT) model. We explore the effectiveness of the human-AI hybrid team in accelerating the evidence synthesis process. Results show that incorporating the BERT-based AI agent into the human team can reduce the human screening effort by 68.5%.
arXiv Detail & Related papers (2023-05-02T01:29:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.