Fugu-MT 論文翻訳(概要): Toward Training Superintelligent Software Agents through Self-Play SWE-RL

論文の概要: Toward Training Superintelligent Software Agents through Self-Play SWE-RL

arxiv url: http://arxiv.org/abs/2512.18552v1
Date: Sun, 21 Dec 2025 00:49:40 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-23 18:54:32.392368
Title: Toward Training Superintelligent Software Agents through Self-Play SWE-RL
Title（参考訳）: セルフプレイSWE-RLによる超知能ソフトウェアエージェントの育成に向けて
Authors: Yuxiang Wei, Zhiqing Sun, Emily McMilin, Jonas Gehring, David Zhang, Gabriel Synnaeve, Daniel Fried, Lingming Zhang, Sida Wang,
Abstract要約: セルフプレイSWE-RLは、超知能ソフトウェアエージェントのトレーニングパラダイムに向けた第一歩である。当社のアプローチでは,ソースコードとインストール済みの依存関係を備えたサンドボックスリポジトリへのアクセスのみを必要としています。我々の成果は、早い段階で、エージェントが現実世界のソフトウェアリポジトリから広範囲にわたる学習経験を自律的に収集する道のりを示唆している。
参考スコア（独自算出の注目度）: 66.11447353341926
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While current software agents powered by large language models (LLMs) and agentic reinforcement learning (RL) can boost programmer productivity, their training data (e.g., GitHub issues and pull requests) and environments (e.g., pass-to-pass and fail-to-pass tests) heavily depend on human knowledge or curation, posing a fundamental barrier to superintelligence. In this paper, we present Self-play SWE-RL (SSR), a first step toward training paradigms for superintelligent software agents. Our approach takes minimal data assumptions, only requiring access to sandboxed repositories with source code and installed dependencies, with no need for human-labeled issues or tests. Grounded in these real-world codebases, a single LLM agent is trained via reinforcement learning in a self-play setting to iteratively inject and repair software bugs of increasing complexity, with each bug formally specified by a test patch rather than a natural language issue description. On the SWE-bench Verified and SWE-Bench Pro benchmarks, SSR achieves notable self-improvement (+10.4 and +7.8 points, respectively) and consistently outperforms the human-data baseline over the entire training trajectory, despite being evaluated on natural language issues absent from self-play. Our results, albeit early, suggest a path where agents autonomously gather extensive learning experiences from real-world software repositories, ultimately enabling superintelligent systems that exceed human capabilities in understanding how systems are constructed, solving novel challenges, and autonomously creating new software from scratch.
Abstract（参考訳）: 大規模言語モデル(LLM)とエージェント強化学習(RL)を使用した現在のソフトウェアエージェントは、プログラマの生産性を高めることができる一方で、トレーニングデータ(GitHubのイシューやプルリクエストなど)と環境(パス・ツー・パス、フェール・ツー・パステストなど)は、人間の知識やキュレーションに大きく依存しており、超知能に対する根本的な障壁となっている。本稿では,SWE-RL(Self-play SWE-RL)について述べる。このアプローチでは,ソースコードとインストール済みの依存関係を持ったサンドボックスリポジトリへのアクセスのみを必要としています。これらの実世界のコードベースを基盤として、1つのLLMエージェントは、自己プレイ環境で強化学習を通じてトレーニングされ、複雑さが増すソフトウェアバグを反復的に注入し、修正する。 SWE-bench Verified と SWE-Bench Pro のベンチマークでは、SSR は注目すべき自己改善(+10.4 と +7.8 の点)を達成し、セルフプレイから欠落した自然言語問題で評価されているにもかかわらず、トレーニング軌跡全体において人間のデータベースラインを一貫して上回っている。私たちの成果は、早い段階で、エージェントが現実世界のソフトウェアリポジトリから広範な学習経験を自律的に収集し、究極的には、システムの構築方法を理解し、新しい課題を解決し、スクラッチから新しいソフトウェアを自律的に作成する超知能システムを可能にしたことを示唆しています。

論文の概要: Toward Training Superintelligent Software Agents through Self-Play SWE-RL

関連論文リスト