Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?
- URL: http://arxiv.org/abs/2511.13646v3
- Date: Mon, 24 Nov 2025 15:55:51 GMT
- Title: Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?
- Authors: Chunqiu Steven Xia, Zhe Wang, Yan Yang, Yuxiang Wei, Lingming Zhang,
- Abstract summary: Large Language Models (LLMs) are reshaping almost all industries, including software engineering.<n>We propose Live-SWE-agent, the first live software agent that can autonomously and continuously evolve itself on-the-fly when solving real-world software problems.<n>Our evaluation on the widely studied SWE-bench Verified benchmark shows that Live-SWE-AGENT can achieve an impressive solve rate of 77.4% without test-time scaling.
- Score: 19.772188613944596
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are reshaping almost all industries, including software engineering. In recent years, a number of LLM agents have been proposed to solve real-world software problems. Such software agents are typically equipped with a suite of coding tools and can autonomously decide the next actions to form complete trajectories to solve end-to-end software tasks. While promising, they typically require dedicated design and may still be suboptimal, since it can be extremely challenging and costly to exhaust the entire agent scaffold design space. Recognizing that software agents are inherently software themselves that can be further refined/modified, researchers have proposed a number of self-improving software agents recently, including the Darwin-Gödel Machine (DGM). Meanwhile, such self-improving agents require costly offline training on specific benchmarks and may not generalize well across different LLMs or benchmarks. In this paper, we propose Live-SWE-agent, the first live software agent that can autonomously and continuously evolve itself on-the-fly during runtime when solving real-world software problems. More specifically, Live-SWE-agent starts with the most basic agent scaffold with only access to bash tools (e.g., mini-SWE-agent), and autonomously evolves its own scaffold implementation while solving real-world software problems. Our evaluation on the widely studied SWE-bench Verified benchmark shows that LIVE-SWE-AGENT can achieve an impressive solve rate of 77.4% without test-time scaling, outperforming all existing software agents, including the best proprietary solution. Moreover, Live-SWE-agent outperforms state-of-the-art manually crafted software agents on the recent SWE-Bench Pro benchmark, achieving the best-known solve rate of 45.8%.
Related papers
- Toward Training Superintelligent Software Agents through Self-Play SWE-RL [66.11447353341926]
Self-play SWE-RL is a first step toward training paradigms for superintelligent software agents.<n>Our approach takes minimal data assumptions, only requiring access to sandboxed repositories with source code and installed dependencies.<n>Our results, albeit early, suggest a path where agents autonomously gather extensive learning experiences from real-world software repositories.
arXiv Detail & Related papers (2025-12-21T00:49:40Z) - Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning [84.70211451226835]
Large Language Model (LLM) Agents are constrained by a dependency on human-curated data.<n>We introduce Agent0, a fully autonomous framework that evolves high-performing agents without external data.<n>Agent0 substantially boosts reasoning capabilities, improving the Qwen3-8B-Base model by 18% on mathematical reasoning and 24% on general reasoning benchmarks.
arXiv Detail & Related papers (2025-11-20T05:01:57Z) - TOM-SWE: User Mental Modeling For Software Engineering Agents [75.28749912645127]
ToM-SWE is a dual-agent architecture that pairs a primary software-engineering (SWE) agent with a lightweight theory-of-mind (ToM) partner agent.<n>ToM-SWE infers user goals, constraints, and preferences from instructions and interaction history.<n>In two software engineering benchmarks, ToM-SWE improves task success rates and user satisfaction.
arXiv Detail & Related papers (2025-10-24T16:09:51Z) - SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience [71.82719117238307]
We propose SEAgent, an agentic self-evolving framework enabling computer-use agents to evolve through interactions with unfamiliar software.<n>We validate the effectiveness of SEAgent across five novel software environments within OS-World.<n>Our approach achieves a significant improvement of 23.2% in success rate, from 11.3% to 34.5%, over a competitive open-source CUA.
arXiv Detail & Related papers (2025-08-06T17:58:46Z) - Unified Software Engineering agent as AI Software Engineer [14.733475669942276]
Large Language Model (LLM) technology has raised expectations for automated coding.<n>In this paper, we seek to understand this question by developing a Unified Software Engineering agent or USEagent.<n>We envision USEagent as the first draft of a future AI Software Engineer which can be a team member in future software development teams involving both AI and humans.
arXiv Detail & Related papers (2025-06-17T16:19:13Z) - Can Agents Fix Agent Issues? [15.50260831159089]
LLM-based agent systems are emerging as a new software paradigm and have been widely adopted across diverse domains such as medicine, robotics, and programming.<n>Maintaining these systems requires substantial effort, as they are inevitably prone to bugs and continually evolve to meet changing external requirements.<n>Recent software engineering (SE) agents have shown promise in addressing issues in traditional software systems, but it remains unclear how effectively they can resolve real-world issues in agent systems.
arXiv Detail & Related papers (2025-05-27T05:45:03Z) - LLM Agents Making Agent Tools [2.5529148902034637]
Tool use has turned large language models (LLMs) into powerful agents that can perform complex multi-step tasks.<n>But these tools must be implemented in advance by human developers.<n>We propose ToolMaker, an agentic framework that autonomously transforms papers with code into LLM-compatible tools.
arXiv Detail & Related papers (2025-02-17T11:44:11Z) - Agentless: Demystifying LLM-based Software Engineering Agents [12.19683999553113]
We build Agentless -- an agentless approach to automatically solve software development problems.
Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic three-phase process of localization, repair, and patch validation.
Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance and low cost.
arXiv Detail & Related papers (2024-07-01T17:24:45Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering [79.07755560048388]
SWE-agent is a system that facilitates LM agents to autonomously use computers to solve software engineering tasks.
SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs.
We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively.
arXiv Detail & Related papers (2024-05-06T17:41:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.