InternAgent: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification
- URL: http://arxiv.org/abs/2505.16938v3
- Date: Tue, 22 Jul 2025 15:05:22 GMT
- Title: InternAgent: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification
- Authors: InternAgent Team, Bo Zhang, Shiyang Feng, Xiangchao Yan, Jiakang Yuan, Runmin Ma, Yusong Hu, Zhiyin Yu, Xiaohan He, Songtao Huang, Shaowei Hou, Zheng Nie, Zhilong Wang, Jinyao Liu, Tianshuo Peng, Peng Ye, Dongzhan Zhou, Shufei Zhang, Xiaosong Wang, Yilan Zhang, Meng Li, Zhongying Tu, Xiangyu Yue, Wangli Ouyang, Bowen Zhou, Lei Bai,
- Abstract summary: InternAgent is a unified closed-loop multi-agent framework to conduct Autonomous Scientific Research.<n>It has demonstrated its versatility across 12 scientific research tasks.<n>It has achieved promising performance gains in several scientific fields with significantly less time cost compared to human efforts.
- Score: 24.752098402554743
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial Intelligence (AI) is accelerating the transformation of scientific research paradigms, not only enhancing research efficiency but also driving innovation. We introduce InternAgent, a unified closed-loop multi-agent framework to conduct Autonomous Scientific Research (ASR) across various scientific research fields, enabling researchers to tackle complicated problems in these fields with unprecedented speed and precision. InternAgent highlights three key advantages: 1) Scalability: InternAgent has demonstrated its versatility across 12 scientific research tasks, capable of generating innovative ideas to enhance the performance of baseline code. 2) Interactivity: InternAgent provides an interface for human expert feedback and multi-agent interaction in automated end-to-end processes, allowing for the seamless integration of domain expert knowledge. 3) Efficiency: InternAgent has achieved promising performance gains in several scientific fields with significantly less time cost compared to human efforts. For instance, in reaction yield prediction, it increased from 27.6% to 35.4% in just 12 hours; in enhancer activity prediction, accuracy rose from 0.65 to 0.79 with only 4 hours of processing; and in 2D semantic segmentation, precision advanced from 78.8% to 81.0% in a mere 30 hours.
Related papers
- InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery [138.0404718571971]
We introduce InternAgent-1.5, a unified system designed for end-to-end scientific discovery.<n>The system is built on a structured architecture composed of three coordinated subsystems for generation, verification, and evolution.<n>We evaluate InternAgent-1.5 on scientific reasoning benchmarks such as GAIA, HLE, GPQA, and FrontierScience.
arXiv Detail & Related papers (2026-02-09T18:36:06Z) - AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite [75.58737079136942]
We present AstaBench, a suite that provides the first holistic measure of agentic ability to perform scientific research.<n>Our suite comes with the first scientific research environment with production-grade search tools.<n>Our evaluation of 57 agents across 22 agent classes reveals several interesting findings.
arXiv Detail & Related papers (2025-10-24T17:10:26Z) - EvolveSearch: An Iterative Self-Evolving Search Agent [98.18686493123785]
Large language models (LLMs) have transformed agentic information seeking capabilities through the integration of tools such as search engines and web browsers.<n>We propose EvolveSearch, a novel iterative self-evolution framework that combines SFT and RL to enhance agentic web search capabilities without any external human-annotated reasoning data.
arXiv Detail & Related papers (2025-05-28T15:50:48Z) - ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows [82.07367406991678]
Large Language Models (LLMs) have extended their impact beyond Natural Language Processing.<n>Among these, computer-using agents are capable of interacting with operating systems as humans do.<n>We introduce ScienceBoard, which encompasses a realistic, multi-domain environment featuring dynamic and visually rich scientific software.
arXiv Detail & Related papers (2025-05-26T12:27:27Z) - WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning [37.89715280583421]
WebAgent-R1 is an end-to-end multi-turn reinforcement learning framework for training web agents.<n>Experiments on the WebArena-Lite benchmark demonstrate the effectiveness of WebAgent-R1, boosting the task success rate of Qwen-2.5-3B from 6.1% to 33.9%.<n>In-depth analyses reveal the effectiveness of the thinking-based prompting strategy and test-time scaling through increased interactions for web tasks.
arXiv Detail & Related papers (2025-05-22T09:07:43Z) - R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution [60.80016554091364]
R&D-Agent is a dual-agent framework for iterative exploration.<n>The Researcher agent uses performance feedback to generate ideas, while the Developer agent refines code based on error feedback.<n>R&D-Agent is evaluated on MLE-Bench and emerges as the top-performing machine learning engineering agent.
arXiv Detail & Related papers (2025-05-20T06:07:00Z) - IRIS: Interactive Research Ideation System for Accelerating Scientific Discovery [27.218896203253987]
IRIS is an open-source platform designed for researchers to leverage large language models (LLMs)-assisted scientific ideation.<n>IRIS incorporates innovative features to enhance ideation, including adaptive test-time compute expansion via Monte Carlo Tree Search (MCTS), fine-grained feedback mechanism, and query-based literature synthesis.<n>We conduct a user study with researchers across diverse disciplines, validating the effectiveness of our system in enhancing ideation.
arXiv Detail & Related papers (2025-04-23T14:01:36Z) - Completing A Systematic Review in Hours instead of Months with Interactive AI Agents [21.934330935124866]
We introduce InsightAgent, a human-centered interactive AI agent powered by large language models.<n>InsightAgent partitions a large literature corpus based on semantics and employs a multi-agent design for more focused processing.<n>Our user studies with 9 medical professionals demonstrate that the visualization and interaction mechanisms can effectively improve the quality of synthesized SRs.
arXiv Detail & Related papers (2025-04-21T02:57:23Z) - AgentRxiv: Towards Collaborative Autonomous Research [3.583084119066612]
AgentRxiv lets agents collaborate toward research goals and enables researchers to accelerate discovery.<n>We find that the best performing strategy generalizes to benchmarks in other domains.<n>These findings suggest that autonomous agents may play a role in designing future AI systems alongside humans.
arXiv Detail & Related papers (2025-03-23T15:16:42Z) - Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization [51.104444856052204]
We present MultiMol, a collaborative large language model (LLM) system designed to guide multi-objective molecular optimization.<n>In evaluations across six multi-objective optimization tasks, MultiMol significantly outperforms existing methods, achieving a 82.30% success rate.
arXiv Detail & Related papers (2025-03-05T13:47:55Z) - Towards an AI co-scientist [48.11351101913404]
We introduce an AI co-scientist, a multi-agent system built on Gemini 2.0.<n>The AI co-scientist is intended to help uncover new, original knowledge and to formulate demonstrably novel research hypotheses.<n>The system's design incorporates a generate, debate, and evolve approach to hypothesis generation, inspired by the scientific method.
arXiv Detail & Related papers (2025-02-26T06:17:13Z) - From Intention To Implementation: Automating Biomedical Research via LLMs [30.32209981487504]
This paper introduces BioResearcher, the first end-to-end automated system designed to streamline the entire biomedical research process.<n>By decomposing complex tasks into logically related sub-tasks, BioResearcher effectively addresses the challenges of multidisciplinary requirements and logical complexity.<n>BioResearcher successfully achieves an average execution success rate of 63.07% across eight previously unmet research objectives.
arXiv Detail & Related papers (2024-12-12T16:35:05Z) - AI Expands Scientists' Impact but Contracts Science's Focus [11.634306888037273]
We analyze 67.9 million research papers across six major fields using a validated language model.<n>Scientists who adopt AI tools publish 67.37% more papers, receive 3.16 times more citations, and become team leaders 4 years earlier than non-adopters.
arXiv Detail & Related papers (2024-12-10T18:24:17Z) - CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark [11.794931453828974]
CORE-Bench is a benchmark consisting of 270 tasks based on 90 scientific papers across three disciplines (computer science, social science, and medicine)
We provide an evaluation system to measure the accuracy of agents in a fast and parallelizable way.
The best agent achieved an accuracy of 21% on the hardest task, showing the vast scope for improvement in automating routine scientific tasks.
arXiv Detail & Related papers (2024-09-17T17:13:19Z) - Tree Search for Language Model Agents [69.43007235771383]
We propose an inference-time search algorithm for LM agents to perform exploration and multi-step planning in interactive web environments.
Our approach is a form of best-first tree search that operates within the actual environment space.
It is the first tree search algorithm for LM agents that shows effectiveness on realistic web tasks.
arXiv Detail & Related papers (2024-07-01T17:07:55Z) - DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents [49.74065769505137]
We introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery.
It includes 120 different challenge tasks spanning eight topics each with three levels of difficulty and several parametric variations.
We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks.
arXiv Detail & Related papers (2024-06-10T20:08:44Z) - BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions.<n>BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model.<n>It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.<n>ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.<n>We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.