Fugu-MT 論文翻訳(概要): HATS: Hardness-Aware Trajectory Synthesis for GUI Agents

論文の概要: HATS: Hardness-Aware Trajectory Synthesis for GUI Agents

arxiv url: http://arxiv.org/abs/2603.12138v1
Date: Thu, 12 Mar 2026 16:40:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:26.222226
Title: HATS: Hardness-Aware Trajectory Synthesis for GUI Agents
Title（参考訳）: HATS:GUIエージェントのためのハードネス対応軌道合成
Authors: Rui Shao, Ruize Gao, Bin Xie, Yixing Li, Kaiwen Zhou, Shuai Wang, Weili Guan, Gongwei Chen,
Abstract要約: 本稿では,ハードネスを意識した軌道合成フレームワークHATSを提案する。我々は、硬さをアクションに関連する意味的あいまいさの度合いとして定義する。 HATSでトレーニングされたエージェントは、ベンチマークGUI環境における最先端のベースラインを一貫して上回っていることを示す。
参考スコア（独自算出の注目度）: 46.54830370011904
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graphical user interface (GUI) agents powered by large vision-language models (VLMs) have shown remarkable potential in automating digital tasks, highlighting the need for high-quality trajectory data to support effective agent training. Yet existing trajectory synthesis pipelines often yield agents that fail to generalize beyond simple interactions. We identify this limitation as stemming from the neglect of semantically ambiguous actions, whose meanings are context-dependent, sequentially dependent, or visually ambiguous. Such actions are crucial for real-world robustness but are under-represented and poorly processed in current datasets, leading to semantic misalignment between task instructions and execution. To address these issues, we propose HATS, a Hardness-Aware Trajectory Synthesis framework designed to mitigate the impact of semantic ambiguity. We define hardness as the degree of semantic ambiguity associated with an action and develop two complementary modules: (1) hardness-driven exploration, which guides data collection toward ambiguous yet informative interactions, and (2) alignment-guided refinement, which iteratively validates and repairs instruction-execution alignment. The two modules operate in a closed loop: exploration supplies refinement with challenging trajectories, while refinement feedback updates the hardness signal to guide future exploration. Extensive experiments show that agents trained with HATS consistently outperform state-of-the-art baselines across benchmark GUI environments.
Abstract（参考訳）: 大規模視覚言語モデル(VLM)を利用したグラフィカルユーザインタフェース(GUI)エージェントは、デジタルタスクの自動化において顕著な可能性を示し、効果的なエージェントトレーニングを支援するための高品質なトラジェクトリデータの必要性を強調している。しかし、既存の軌道合成パイプラインは単純な相互作用を超えた一般化に失敗するエージェントを生じることが多い。この制限は、文脈依存的、逐次依存的、視覚的曖昧な意味を持つ意味的曖昧な行動の無視に起因していると認識する。このようなアクションは現実世界の堅牢性には不可欠だが、現在のデータセットでは表現不足で処理が不十分なため、タスク命令と実行のセマンティックなミスアライメントにつながる。これらの問題に対処するために,意味的あいまいさの影響を緩和するハードネス・アウェア・トラジェクトリ・シンセサイザー・フレームワークHATSを提案する。硬さを動作に関連する意味的あいまいさの度合いとして定義し,(1)不明瞭で情報的な相互作用にデータ収集を誘導する硬さ駆動探索,(2)アライメント誘導改良,(2)命令・実行アライメントを反復的に検証し修復する硬さ駆動探索という2つの相補的なモジュールを開発する。 2つのモジュールはクローズドループで動作し、探査は困難な軌道で改良を供給し、改良のフィードバックは将来の探査を導くために硬度信号を更新する。大規模な実験により、HATSでトレーニングされたエージェントは、ベンチマークGUI環境全体で一貫して最先端のベースラインを上回ります。

論文の概要: HATS: Hardness-Aware Trajectory Synthesis for GUI Agents

関連論文リスト