SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training
- URL: http://arxiv.org/abs/2602.03411v1
- Date: Tue, 03 Feb 2026 11:38:48 GMT
- Title: SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training
- Authors: Huatong Song, Lisheng Huang, Shuang Sun, Jinhao Jiang, Ran Le, Daixuan Cheng, Guoxin Chen, Yiwen Hu, Zongchao Chen, Wayne Xin Zhao, Yang Song, Tao Zhang, Ji-Rong Wen,
- Abstract summary: We present SWE-Master, an open-source framework for building effective software engineering agents.<n>SWE-Master systematically explores the complete agent development pipeline, including teacher-trajectory synthesis and data curation.<n>We evaluate SWE-Master on SWE-bench Verified, a standard benchmark for realistic software engineering tasks.
- Score: 78.37721886775215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this technical report, we present SWE-Master, an open-source and fully reproducible post-training framework for building effective software engineering agents. SWE-Master systematically explores the complete agent development pipeline, including teacher-trajectory synthesis and data curation, long-horizon SFT, RL with real execution feedback, and inference framework design. Starting from an open-source base model with limited initial SWE capability, SWE-Master demonstrates how systematical optimization method can elicit strong long-horizon SWE task solving abilities. We evaluate SWE-Master on SWE-bench Verified, a standard benchmark for realistic software engineering tasks. Under identical experimental settings, our approach achieves a resolve rate of 61.4\% with Qwen2.5-Coder-32B, substantially outperforming existing open-source baselines. By further incorporating test-time scaling~(TTS) with LLM-based environment feedback, SWE-Master reaches 70.8\% at TTS@8, demonstrating a strong performance potential. SWE-Master provides a practical and transparent foundation for advancing reproducible research on software engineering agents. The code is available at https://github.com/RUCAIBox/SWE-Master.
Related papers
- Immersion in the GitHub Universe: Scaling Coding Agents to Mastery [60.359983359258955]
ScaleSWE is an automated, sandboxed multi agent workflow designed to construct high quality SWE data at scale.<n>The system coordinates three specialized agents for environment setup, test creation, and problem description synthesis to process 6 million pull requests across 5200 repositories.
arXiv Detail & Related papers (2026-02-10T15:30:19Z) - SWE-World: Building Software Engineering Agents in Docker-Free Environments [91.17484806743641]
SWE-World is a Docker-free framework that replaces physical execution environments with a learned surrogate for training and evaluating software engineering agents.<n>We show that SWE-World raises Qwen2.5-Coder-32B from 6.2% to 52.0% via Docker-free SFT, 55.0% with Docker-free RL, and 68.2% with further TTS.
arXiv Detail & Related papers (2026-02-03T11:44:39Z) - SWE-RM: Execution-free Feedback For Software Engineering Agents [61.86380395896069]
Execution-based feedback is widely used in the development of coding agents through test-time scaling (TTS) and reinforcement learning (RL)<n>In contrast, execution-free feedback from reward models can provide more fine-grained signals without depending on unit test cases.<n>We introduce SWE-RM, an accurate and robust reward model adopting a mixture-of-experts architecture with 30B total parameters and 3B activated during inference.
arXiv Detail & Related papers (2025-12-26T08:26:18Z) - Toward Training Superintelligent Software Agents through Self-Play SWE-RL [66.11447353341926]
Self-play SWE-RL is a first step toward training paradigms for superintelligent software agents.<n>Our approach takes minimal data assumptions, only requiring access to sandboxed repositories with source code and installed dependencies.<n>Our results, albeit early, suggest a path where agents autonomously gather extensive learning experiences from real-world software repositories.
arXiv Detail & Related papers (2025-12-21T00:49:40Z) - Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling [18.390443362388623]
Trae Agent is the first agent-based ensemble reasoning approach for repository-level issue resolution.<n>We conduct experiments using three leading large language models (LLMs) on the widely-adopted SWE-bench benchmark.<n>Trae Agent consistently achieves superior performance, with an average improvement of 10.22% over all baselines in terms of Pass@1.
arXiv Detail & Related papers (2025-07-31T09:37:22Z) - SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling [39.53265893083118]
Large language models (LLMs) have advanced rapidly from conversational problem solving to addressing real-world tasks involving tool use.<n>To address this issue, we present SWE-Dev, an SWE agent built upon open-source LLMs.<n> Experiments on the SWE-bench-Verified benchmark show that the SWE-Dev models can achieve top performance among all open SWE agents.
arXiv Detail & Related papers (2025-06-09T11:03:16Z) - First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training [37.80193099472551]
We propose MM-UPT, a simple yet effective framework for unsupervised post-training of MLLMs.<n>Our experiments demonstrate that such training method effectively improves the reasoning ability of Qwen2.5-VL-7B.<n>We extend our framework to a data self-generation setting, designing two strategies that prompt the MLLM to synthesize new training samples.
arXiv Detail & Related papers (2025-05-28T15:11:16Z) - Training Software Engineering Agents and Verifiers with SWE-Gym [89.55822534364727]
SWE-Gym is the first environment for training real-world software engineering (SWE) agents.<n>SWE-Gym contains 2,438 real-world Python task instances.<n>We use SWE-Gym to train language model based SWE agents, achieving up to 19% absolute gains in resolve rate.
arXiv Detail & Related papers (2024-12-30T18:15:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.