Fugu-MT 論文翻訳(概要): OSExpert: Computer-Use Agents Learning Professional Skills via Exploration

論文の概要: OSExpert: Computer-Use Agents Learning Professional Skills via Exploration

arxiv url: http://arxiv.org/abs/2603.07978v1
Date: Mon, 09 Mar 2026 05:27:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:15.509433
Title: OSExpert: Computer-Use Agents Learning Professional Skills via Exploration
Title（参考訳）: OSExpert: コンピュータ利用エージェントがプロのスキルを探索して学ぶ
Authors: Jiateng Liu, Zhenhailong Wang, Rushi Wang, Bingxuan Li, Jeonghwan Kim, Aditi Tiwari, Pengfei Yu, Denghui Zhang, Heng Ji,
Abstract要約: 汎用コンピュータ利用エージェントは、人間の専門家ほど役に立たない。本研究では,環境の単位関数を探索し,検証するための深度優先探索アルゴリズムを提案する。エージェントは、合成タスクのカリキュラムを自己構築するために、ユニットスキル間の構成性を利用する。
参考スコア（独自算出の注目度）: 55.660669638732024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: General-purpose computer-use agents have shown impressive performance across diverse digital environments. However, our new benchmark, OSExpert-Eval, indicates they remain far less helpful than human experts. Although inference-time scaling enables adaptation, these agents complete complex tasks inefficiently with degraded performance, transfer poorly to unseen UIs, and struggle with fine-grained action sequences. To solve the problem, we introduce a GUI-based depth-first search (GUI-DFS) exploration algorithm to comprehensively explore and verify an environment's unit functions. The agent then exploits compositionality between unit skills to self-construct a curriculum for composite tasks. To support fine-grained actions, we curate a database of action primitives for agents to discover during exploration; these are saved as a skill set once the exploration is complete. We use the learned skills to improve the agent's performance and efficiency by (1) enriching agents with ready-to-use procedural knowledge, allowing them to plan only once for long trajectories and generate accurate actions, and (2) enabling them to end inference-time scaling earlier by realizing their boundary of capabilities. Extensive experiments show that our environment-learned agent takes a meaningful step toward expert-level computer use, achieving a around 20 percent performance gain on OSExpert-Eval and closing the efficiency gap to humans by around 80 percent
Abstract（参考訳）: 汎用コンピュータ利用エージェントは、様々なデジタル環境において印象的な性能を示している。しかし、私たちの新しいベンチマークであるOSExpert-Evalは、人間の専門家ほど役に立たないことを示しています。推論時間のスケーリングは適応を可能にするが、これらのエージェントは、劣化したパフォーマンスで非効率に複雑なタスクを完了し、目に見えないUIに貧弱に転送し、きめ細かいアクションシーケンスで苦労する。この問題を解決するために,GUI-DFS探索アルゴリズムを導入し,環境の単位関数を包括的に探索し検証する。エージェントは、合成タスクのカリキュラムを自己構築するために、ユニットスキル間の構成性を利用する。詳細な動作を支援するため,探索中にエージェントが発見するためのアクションプリミティブのデータベースをキュレートする。学習した技術は,(1)使い慣れた手続き的知識を持つエージェントを豊かにすることにより,エージェントのパフォーマンスと効率を向上させるため,長い行程のみを計画し,正確な行動を生成するとともに,(2)能力の境界を達成して推論時間のスケーリングを早期に終わらせることを可能にする。大規模な実験によると、我々の環境学習エージェントは、専門家レベルのコンピュータ利用に向けて有意義な一歩を踏み出し、OSExpert-Evalで約20%の性能向上を達成し、人間の効率ギャップを約80%短縮した。

論文の概要: OSExpert: Computer-Use Agents Learning Professional Skills via Exploration

関連論文リスト