Immersion in the GitHub Universe: Scaling Coding Agents to Mastery
- URL: http://arxiv.org/abs/2602.09892v1
- Date: Tue, 10 Feb 2026 15:30:19 GMT
- Title: Immersion in the GitHub Universe: Scaling Coding Agents to Mastery
- Authors: Jiale Zhao, Guoxin Chen, Fanzhe Meng, Minghao Li, Jie Chen, Hui Xu, Yongshuai Sun, Xin Zhao, Ruihua Song, Yuan Zhang, Peng Wang, Cheng Chen, Jirong Wen, Kai Jia,
- Abstract summary: ScaleSWE is an automated, sandboxed multi agent workflow designed to construct high quality SWE data at scale.<n>The system coordinates three specialized agents for environment setup, test creation, and problem description synthesis to process 6 million pull requests across 5200 repositories.
- Score: 60.359983359258955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Achieving mastery in real world software engineering tasks is fundamentally bottlenecked by the scarcity of large scale, high quality training data. Scaling such data has been limited by the complexity of environment setup, unit test generation, and problem statement curation. In this paper, we propose ScaleSWE, an automated, sandboxed multi agent workflow designed to construct high quality SWE data at scale. The system coordinates three specialized agents for environment setup, test creation, and problem description synthesis to process 6 million pull requests across 5200 repositories, producing Scale SWE Data: 100k verified SWE instances, the largest such dataset to date. It substantially surpasses existing real world datasets in repository diversity and reflects realistic task complexity. We further demonstrate the dataset utility for training by distilling 71498 high quality trajectories and finetuning Qwen30BA3BInstruct to produce ScaleSWE Agent. Our agent achieves a 64 resolve rate on SWE Bench Verified a nearly three fold improvement over the base model. ScaleSWE provides a scalable, reproducible approach for data construction to advance LLM based software engineering. Scale SWE will be publicly available.
Related papers
- SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale [39.33317467753191]
SWE-rebench V2 is an automated pipeline for harvesting executable real-world SWE tasks and constructing RL training environments at scale.<n>We construct a dataset of 32,000+ tasks spanning 20 languages and 3,600+ repositories, with pre-built images for reproducible execution.<n>To further scale training data, we additionally release 120,000+ tasks with installation instructions, fail-to-pass tests and rich metadata.
arXiv Detail & Related papers (2026-02-27T10:06:10Z) - SWE-World: Building Software Engineering Agents in Docker-Free Environments [91.17484806743641]
SWE-World is a Docker-free framework that replaces physical execution environments with a learned surrogate for training and evaluating software engineering agents.<n>We show that SWE-World raises Qwen2.5-Coder-32B from 6.2% to 52.0% via Docker-free SFT, 55.0% with Docker-free RL, and 68.2% with further TTS.
arXiv Detail & Related papers (2026-02-03T11:44:39Z) - SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training [78.37721886775215]
We present SWE-Master, an open-source framework for building effective software engineering agents.<n>SWE-Master systematically explores the complete agent development pipeline, including teacher-trajectory synthesis and data curation.<n>We evaluate SWE-Master on SWE-bench Verified, a standard benchmark for realistic software engineering tasks.
arXiv Detail & Related papers (2026-02-03T11:38:48Z) - SWE-Universe: Scale Real-World Verifiable Environments to Millions [84.63665266236963]
SWE-Universe is a framework for automatically constructing real-world software engineering (SWE) verifiable environments from GitHub pull requests (PRs)<n>We propose a building agent powered by an efficient custom-trained model to overcome the prevalent challenges of automatic building.<n>We demonstrate the profound value of our environments through large-scale agentic mid-training and reinforcement learning.
arXiv Detail & Related papers (2026-02-02T17:20:30Z) - EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis [101.67583081810136]
Large language models (LLMs) are expected to be trained to act as agents in various real-world environments.<n>This process relies on rich and varied tool-interaction sandboxes.<n>We propose EnvScaler, an automated framework for scalable tool-interaction environments.
arXiv Detail & Related papers (2026-01-09T14:32:06Z) - SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories [15.458389392000706]
SWE-Mirror is a pipeline that distills a real-world issue's semantic essence, mirrors it into another repository with a configured Gym environment, and re-animates it as a verifiable issue-resolving task.<n>Applying SWE-Mirror to 40 repositories across 4 languages, we have curated a dataset with 60,671 issue-resolving tasks.<n>Post-training experiments show that models trained with the dataset exhibit improvements in issue-resolving capabilities.
arXiv Detail & Related papers (2025-09-10T16:15:23Z) - Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs [19.766885088032932]
Software engineering (SWE) has emerged as a crucial testbed for next-generation LLM agents.<n>Most existing datasets are limited to only a few thousand GitHub-sourced instances.<n>We propose an incremental, automated data-curation pipeline that systematically scales both the volume and diversity of SWE datasets.
arXiv Detail & Related papers (2025-06-24T03:53:36Z) - SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling [39.53265893083118]
Large language models (LLMs) have advanced rapidly from conversational problem solving to addressing real-world tasks involving tool use.<n>To address this issue, we present SWE-Dev, an SWE agent built upon open-source LLMs.<n> Experiments on the SWE-bench-Verified benchmark show that the SWE-Dev models can achieve top performance among all open SWE agents.
arXiv Detail & Related papers (2025-06-09T11:03:16Z) - SWE-smith: Scaling Data for Software Engineering Agents [100.30273957706237]
SWE-smith is a novel pipeline for generating software engineering training data at scale.<n>We create a dataset of 50k instances sourced from 128 GitHub repositories.<n>We train SWE-agent-LM-32B, achieving 40.2% Pass@1 resolve rate on the SWE-bench Verified benchmark.
arXiv Detail & Related papers (2025-04-30T16:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.