Fugu-MT 論文翻訳(概要): What Makes Interaction Trajectories Effective for Training Terminal Agents?

論文の概要: What Makes Interaction Trajectories Effective for Training Terminal Agents?

arxiv url: http://arxiv.org/abs/2606.03461v1
Date: Tue, 02 Jun 2026 10:37:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-03 22:00:04.941909
Title: What Makes Interaction Trajectories Effective for Training Terminal Agents?
Title（参考訳）: 相互作用軌跡がターミナルエージェントの訓練に有効か?
Authors: Sidi Yang, Chaofan Tao, Jierun Chen, Tiezheng Yu, Ruoyu Wang, Yuxin Jiang, Yiming Du, Wendong Xu, Jing Xiong, Taiqiang Wu, Lifeng Shang, Xiaohui Li, Ngai Wong, Haoli Bai,
Abstract要約: Terminal-Legoは、現実世界の問題を環境検証されたエージェントタスクに変換するスケーラブルなパイプラインである。下着剤であるDeepSeek-V3.2の軌跡を微調整した学生は、はるかに強力な一般化を示している。 Qwen3-32Bは15.3kの終端レゴ軌道しか持たず、終端ベンチ2.0で24.3%のスコアを獲得し、データボリュームの30倍以上で確立された以前のSOTAのパフォーマンスと競合する。
参考スコア（独自算出の注目度）: 55.62817294510983
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stronger code agents are commonly assumed to be superior teachers for post-training, yet this assumption remains poorly disentangled from task difficulty, harness design, and student capacity. We investigate this pedagogical link using Terminal-Lego, a scalable pipeline that transforms multi-domain real-world issues into environment-verified agentic tasks. Surprisingly, standalone performance does not dictate teaching efficacy: while Claude Opus 4.6 achieves higher scores on Terminal-Bench 2.0, students fine-tuned on trajectories from DeepSeek-V3.2, a lower-scoring agent, exhibit significantly stronger generalization. We attribute this "pedagogical paradox" to Environment-Grounded Supervision (EGS): trajectories that explicitly expose inspect-act-verify behaviors through harness-visible interactions allow students to internalize robust problem-solving routines rather than fragile action sequences. Scaling analysis reveals exceptional data efficiency: with only 15.3k Terminal-Lego trajectories, for example, Qwen3-32B achieves a 24.3% score on Terminal-Bench 2.0, rivaling previous SOTA performance established with over 30x the data volume. Our results suggest that the frontier of agent post-training lies beyond mere outcome-matching, shifting the focus toward "Harness Engineering", where the systematic design of environment-grounded interaction structures serves as the primary catalyst for reproducible and generalizable agentic intelligence.
Abstract（参考訳）: より強いコードエージェントは、ポストトレーニングの優れた教師であると一般的に考えられているが、この仮定は、タスクの難しさ、ハーネスデザイン、学生の能力からかなり遠ざかっている。マルチドメインの実世界の問題から環境検証されたエージェントタスクへ変換するスケーラブルなパイプラインである Terminal-Lego を用いて,この教育的リンクについて検討する。 Claude Opus 4.6 は Terminal-Bench 2.0 の高得点を達成しているのに対して、低ランクエージェントである DeepSeek-V3.2 のトラジェクトリを微調整した学生は、はるかに強力な一般化を示している。我々は,この「教育的パラドックス」を環境保護スーパービジョン(EGS: Environment-Grounded Supervision, EGS: Environment-Grounded Supervision)に当てはめている。例えば、Qwen3-32Bは、データボリュームの30倍以上で確立された以前のSOTAのパフォーマンスに匹敵する24.3%のスコアを得る。この結果から, エージェント・ポストトレーニングのフロンティアは単なる結果マッチング以上のものであり, 環境と接する相互作用構造の体系的設計が, 再現性および一般化可能なエージェント・インテリジェンスの主要な触媒となる「ハーネス・エンジニアリング」に焦点を移すことが示唆された。

論文の概要: What Makes Interaction Trajectories Effective for Training Terminal Agents?

関連論文リスト