Fugu-MT 論文翻訳(概要): Terminal-World: Scaling Terminal-Agent Environments via Agent Skills

論文の概要: Terminal-World: Scaling Terminal-Agent Environments via Agent Skills

arxiv url: http://arxiv.org/abs/2605.20876v1
Date: Wed, 20 May 2026 08:14:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.566733
Title: Terminal-World: Scaling Terminal-Agent Environments via Agent Skills
Title（参考訳）: ターミナルワールド:エージェントスキルによるターミナルエージェント環境のスケーリング
Authors: Zihao Cheng, Hongru Wang, Zeming Liu, Xinyi Wang, Xiangrong Zhu, Yuhang Guo, Wei Lin, Jeff Z. Pan, Yunhong Wang,
Abstract要約: エージェントスキルを中心的な合成プリミティブとして利用する完全自動化パイプラインである Terminal-World を紹介する。我々は,5,723のトレーニング環境を構築し,端末-ワールド-8B/14B/32Bを6つのベンチマークで評価した。 Terminal-World-32B は Terminal-Bench 2.0 で Nemotron-Terminal-32B を +4.5 Pass@1 (31.5) で上回り、43.8 Pass@3 に達する。
参考スコア（独自算出の注目度）: 52.39713754337834
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Terminal agents extend Large Language Models with the ability to execute tasks directly in command-line environments, but their progress is bottlenecked by the scarcity of high-quality training data. Existing approaches bootstrap from partial sources such as human-defined seeds or GitHub repositories to instantiate one component and then complete the rest, producing tasks confined to narrow seed distributions, environments misaligned with task semantics, and inefficient trajectories from unguided exploration. To address these limitations, we introduce Terminal-World, a fully automated pipeline that uses agent skills as the central synthesis primitive, which jointly encode what to accomplish, when to apply (preconditions and environment state), and how to execute, enabling task instructions, environments, and teacher trajectories to be co-derived. To further broaden the synthesis space, Terminal-World composes skills into skill teams and skill graphs for multi-role and cross-domain task synthesis. Using this pipeline, we construct 5,723 training environments and train Terminal-World-8B/14B/32B, evaluated across 6 benchmarks where the Terminal-World series consistently outperforms terminal-agent baselines. Notably, using the same teacher model and only 1.2% of the training data, Terminal-World-32B surpasses Nemotron-Terminal-32B on Terminal-Bench 2.0 by +4.5 Pass@1 (31.5) and achieves 43.8 Pass@3.
Abstract（参考訳）: ターミナルエージェントは、コマンドライン環境で直接タスクを実行する機能を備えた大規模言語モデルを拡張するが、その進歩は高品質なトレーニングデータの不足によってボトルネックとなる。既存のアプローチは、人間の定義したシードやGitHubリポジトリなどの部分的なソースからブートストラップして、ひとつのコンポーネントをインスタンス化し、残りを完了する。これらの制約に対処するために,エージェントスキルを中心的な合成プリミティブとして利用する完全自動化パイプラインである Terminal-World を紹介した。合成空間をさらに広げるために、Terminal-Worldは、スキルチームとマルチロールおよびクロスドメインタスク合成のためのスキルグラフにスキルを組み込む。このパイプラインを用いて5,723のトレーニング環境を構築し,端末-ワールド-8B/14B/32Bをトレーニングし,端末-ワールドシリーズが端末-エージェントベースラインを一貫して上回る6つのベンチマークで評価した。特に、同じ教師モデルとトレーニングデータの1.2%しか使用していないターミナル-ワールド-32Bは、ターミナル-ベンチ2.0のネモトロン-ターミナル-32Bを +4.5 Pass@1 (31.5) で上回り、43.8 Pass@3を達成している。

論文の概要: Terminal-World: Scaling Terminal-Agent Environments via Agent Skills

関連論文リスト