SWE-Universe: Scale Real-World Verifiable Environments to Millions
- URL: http://arxiv.org/abs/2602.02361v1
- Date: Mon, 02 Feb 2026 17:20:30 GMT
- Title: SWE-Universe: Scale Real-World Verifiable Environments to Millions
- Authors: Mouxiang Chen, Lei Zhang, Yunlong Feng, Xuwu Wang, Wenting Zhao, Ruisheng Cao, Jiaxi Yang, Jiawei Chen, Mingze Li, Zeyao Ma, Hao Ge, Zongmeng Zhang, Zeyu Cui, Dayiheng Liu, Jingren Zhou, Jianling Sun, Junyang Lin, Binyuan Hui,
- Abstract summary: SWE-Universe is a framework for automatically constructing real-world software engineering (SWE) verifiable environments from GitHub pull requests (PRs)<n>We propose a building agent powered by an efficient custom-trained model to overcome the prevalent challenges of automatic building.<n>We demonstrate the profound value of our environments through large-scale agentic mid-training and reinforcement learning.
- Score: 84.63665266236963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose SWE-Universe, a scalable and efficient framework for automatically constructing real-world software engineering (SWE) verifiable environments from GitHub pull requests (PRs). To overcome the prevalent challenges of automatic building, such as low production yield, weak verifiers, and prohibitive cost, our framework utilizes a building agent powered by an efficient custom-trained model. This agent employs iterative self-verification and in-loop hacking detection to ensure the reliable generation of high-fidelity, verifiable tasks. Using this method, we scale the number of real-world multilingual SWE environments to a million scale (807,693). We demonstrate the profound value of our environments through large-scale agentic mid-training and reinforcement learning. Finally, we applied this technique to Qwen3-Max-Thinking and achieved a score of 75.3% on SWE-Bench Verified. Our work provides both a critical resource and a robust methodology to advance the next generation of coding agents.
Related papers
- Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning [62.499592503950026]
Large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments.<n>We propose Agent World Model (AWM), a fully synthetic environment generation pipeline.<n>We scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets.
arXiv Detail & Related papers (2026-02-10T18:55:41Z) - Immersion in the GitHub Universe: Scaling Coding Agents to Mastery [60.359983359258955]
ScaleSWE is an automated, sandboxed multi agent workflow designed to construct high quality SWE data at scale.<n>The system coordinates three specialized agents for environment setup, test creation, and problem description synthesis to process 6 million pull requests across 5200 repositories.
arXiv Detail & Related papers (2026-02-10T15:30:19Z) - MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering [54.236614097082395]
We introduce MEnvAgent, a framework for automated Environment construction.<n>MEnvAgent employs a multi-agent Planning-Execution-Verification architecture to autonomously resolve construction failures.<n>MEnvData-SWE is the largest open-source polyglot dataset of realistic verifiable Docker environments to date.
arXiv Detail & Related papers (2026-01-30T11:36:10Z) - EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots [68.29056647487519]
Embodied AI is fueled by high-fidelity simulation and large-scale data collection.<n>However, this scaling capability remains bottlenecked by a reliance on labor-intensive manual oversight.<n>We introduce textscEmboCoach-Bench, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies.
arXiv Detail & Related papers (2026-01-29T11:33:49Z) - Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem [90.17610617854247]
We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimize the production pipeline for agentic model.<n>ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering.<n>We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories.
arXiv Detail & Related papers (2025-12-31T14:03:39Z) - Multi-Docker-Eval: A `Shovel of the Gold Rush' Benchmark on Automatic Environment Building for Software Engineering [38.724704918577295]
Multi-Docker-Eval benchmark includes 40 real-world repositories spanning 9 programming languages.<n>Overall success rate of current models is low (F2P at most 37.7%), with environment construction being the primary bottleneck.<n>These findings provide actionable guidelines for building scalable, fully automated SWE pipelines.
arXiv Detail & Related papers (2025-12-07T16:43:45Z) - app.build: A Production Framework for Scaling Agentic Prompt-to-App Generation with Environment Scaffolding [0.09198412216120845]
We present app.build, an open-source framework that improves LLM-based application generation through systematic validation and structured environments.<n>Our approach combines multi-layered validation pipelines, stack-specific orchestration, and model-agnostic architecture, implemented across three reference stacks.<n>We demonstrate that comprehensive validation achieves 73.3% viability rate with 30% reaching perfect quality scores, while open-weights models achieve 80.8% of closed-model performance when provided structured environments.
arXiv Detail & Related papers (2025-09-03T13:41:45Z) - Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning [29.605396813225386]
We show how reinforcement learning can be used to train agents for multi-turn interactive tasks.<n>Our methodology offers a practical approach for training capable agents for multi-turn interactive tasks using open-weight models.
arXiv Detail & Related papers (2025-08-05T14:30:47Z) - Grammarization-Based Grasping with Deep Multi-Autoencoder Latent Space Exploration by Reinforcement Learning Agent [0.0]
We propose a novel framework for robotic grasping based on the idea of compressing high-dimensional target and gripper features in a common latent space.
Our approach simplifies grasping by using three autoencoders dedicated to the target, the gripper, and a third one that fuses their latent representations.
arXiv Detail & Related papers (2024-11-13T12:26:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.