Fugu-MT 論文翻訳(概要): From Patches to Trajectories: Privileged Process Supervision for Software-Engineering Agents

論文の概要: From Patches to Trajectories: Privileged Process Supervision for Software-Engineering Agents

arxiv url: http://arxiv.org/abs/2605.21996v1
Date: Thu, 21 May 2026 04:54:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 16:35:42.09568
Title: From Patches to Trajectories: Privileged Process Supervision for Software-Engineering Agents
Title（参考訳）: パッチからトラジェクトリへ - ソフトウェアエンジニアリングエージェントのための原始的なプロセススーパービジョン
Authors: Murong Ma, Tianyu Chen, Yun Lin, Shuai Lu, Qinglin Zhu, Yeyun Gong, Zhiyong Huang, Peng Cheng, Yan Lu, Jin Song Dong,
Abstract要約: 教師の長い軌道上の監督された微調整(SFT)は、オープンソフトウェアエンジニアリング(SWE)エージェントに調査と推論を浸透させる主要な方法である。本稿では,P2T (Patches-to-Trajectories) を提案する。P2T (Patches-to-Trajectories) は,P2T (Patches-to-Trajectories) において,P2T (Patches-to-Trajectories) とP2T (Patches-to-Trajectories) の2つの最適化法である。
参考スコア（独自算出の注目度）: 56.31499185764872
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Supervised fine-tuning (SFT) on long teacher trajectories is the dominant way to instill investigation and reasoning in open software-engineering (SWE) agents. Since every retained response becomes an imitation target, the student inherits the final outcome and intermediate flaws, including ungrounded leaps and redundant loops. High-quality training data must be effective(each step is grounded and narrows the agent's epistemic gap to the correct fix) and efficient(each step is information-bearing rather than redundant or looping). Existing recipes filter or relabel teacher rollouts using only a binary terminal verifier, which does not directly target these axes and provides no supervision on instances where the teacher fails. Most real issue includes a developer-authored reference patch, $p^\star$, revealing the file paths, runtime behaviors, and coding conventions presupposed by the correct fix, yet standard pipelines discard it. We propose Patches-to-Trajectories (P2T), which uses $p^\star$ as privileged information during curation and formulates trajectory construction as bi-objective optimization over per-step effectiveness and trajectory length. A reverse phase distills $p^\star$ into a latent process graph, $G^\star$, of contextual facts and solution milestones. A forward phase curates trajectories from blinded teacher continuations by scoring per-step progress against $G^\star$ under a leakage-blocking groundedness check and retaining the shortest effective segments. Using only 1.8k curated SWE-Gym instances, P2T improves effectiveness and efficiency over outcome-filtered SFT and its tool-error-masking variant. On SWE-bench Verified, it raises Pass@1 by up to 10.8 points while reducing per-instance inference cost by ~15%, with consistent gains on SWE-bench Lite. Size-matched ablations and qualitative analysis further isolate trajectory quality from data scale.
Abstract（参考訳）: 教師の長い軌道上の監督された微調整(SFT)は、オープンソフトウェアエンジニアリング(SWE)エージェントに調査と推論を浸透させる主要な方法である。保持された全ての応答が模倣対象となるため、学生は未踏の跳躍や冗長ループを含む最終結果と中間欠陥を継承する。高品質なトレーニングデータは効果的でなければならない(各ステップは接地され、エージェントの疫学的なギャップを正しい修正に絞り込む)。既存のレシピは、バイナリ端末検証器のみを使用して、これらの軸を直接対象とせず、教師が失敗するインスタンスの監視を提供していない。実際の問題としては、開発者が承認した参照パッチである$p^\star$があり、ファイルパス、実行時の振る舞い、正しい修正によって想定されるコーディング規約を明らかにするが、標準的なパイプラインはそれを破棄する。 P2T(Patches-to-Trajectories)を提案する。P2T(Patches-to-Trajectories)は、P2T(Patches-to-Trajectories)で、P2T(Patches-to-Trajectories)は、P2T(Patches-to-Trajectories)の略。逆相は、文脈事実と解のマイルストーンの潜在過程グラフである$G^\star$に$p^\star$を蒸留する。前フェーズは、リーク遮断接地チェックの下でステップごとの進歩をG^\star$に対してスコアし、最も短い有効セグメントを保持することにより、盲目教師の継続からの軌道を硬化させる。 1.8kのSWE-Gymインスタンスのみを使用して、P2Tは結果フィルタリングSFTとそのツールエラーマスキングのバリエーションよりも効率と効率を向上させる。 SWE-bench Verifiedでは、Pass@1を最大10.8ポイント引き上げ、インスタンスごとの推論コストを15%削減し、SWE-bench Liteで一貫した利得を得る。サイズマッチングによる短縮と定性的分析により、データスケールから軌道品質をさらに分離する。

論文の概要: From Patches to Trajectories: Privileged Process Supervision for Software-Engineering Agents

関連論文リスト