Fugu-MT 論文翻訳(概要): ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

論文の概要: ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

arxiv url: http://arxiv.org/abs/2604.11784v1
Date: Mon, 13 Apr 2026 17:52:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:16.733362
Title: ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
Title（参考訳）: ClawGUI: GUIエージェントのトレーニング、評価、デプロイのための統一フレームワーク
Authors: Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen,
Abstract要約: textbfClawGUI-RLは、並列仮想環境と実際の物理デバイスの両方をサポートする最初のオープンソースのGUIエージェントRLインフラストラクチャを提供する。 textbfClawGUI-Evalは6つのベンチマークと11以上のモデルで完全に標準化された評価パイプラインを実行する。 textbfClawGUI-Agentは、12以上のチャットプラットフォームを通じて、Android、HarmonyOS、iOSにトレーニングされたエージェントを提供し、ハイブリッドCLI-GUIコントロールとパーソナライズされたパーソナライズされたメモリを提供する。
参考スコア（独自算出の注目度）: 54.04035382782041
License: http://creativecommons.org/licenses/by/4.0/
Abstract: GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity than by the absence of a coherent full-stack infrastructure: online RL training suffers from environment instability and closed pipelines, evaluation protocols drift silently across works, and trained agents rarely reach real users on real devices. We present \textbf{ClawGUI}, an open-source framework addressing these three gaps within a single harness. \textbf{ClawGUI-RL} provides the first open-source GUI agent RL infrastructure with validated support for both parallel virtual environments and real physical devices, integrating GiGPO with a Process Reward Model for dense step-level supervision. \textbf{ClawGUI-Eval} enforces a fully standardized evaluation pipeline across 6 benchmarks and 11+ models, achieving 95.8\% reproduction against official baselines. \textbf{ClawGUI-Agent} brings trained agents to Android, HarmonyOS, and iOS through 12+ chat platforms with hybrid CLI-GUI control and persistent personalized memory. Trained end to end within this pipeline, \textbf{ClawGUI-2B} achieves 17.1\% Success Rate on MobileWorld GUI-Only, outperforming the same-scale MAI-UI-2B baseline by 6.0\%.
Abstract（参考訳）: GUIエージェントは、プログラム的なAPIではなく、視覚的なインターフェースを通じてアプリケーションを駆動し、タップ、スワイプ、キーストロークを通じて任意のソフトウェアと対話し、CLIベースのエージェントでは不可能なアプリケーションの長いテールに達する。オンラインRLトレーニングは環境不安定性とクローズドパイプラインに悩まされ、評価プロトコルは作業中に静かにドリフトし、トレーニングされたエージェントが実際のデバイス上で実際のユーザに到達することはめったにない。我々は,これら3つのギャップを1つのハーネス内で解決するオープンソースフレームワークである‘textbf{ClawGUI}を提示する。 \textbf{ClawGUI-RL} は、並列仮想環境と実際の物理デバイスの両方をサポートする最初のオープンソースの GUI エージェント RL インフラストラクチャを提供する。 \textbf{ClawGUI-Eval} は6つのベンチマークと11以上のモデルで完全に標準化された評価パイプラインを実行し、公式のベースラインに対して95.8\%の再現を達成した。 \textbf{ClawGUI-Agent}は、12以上のチャットプラットフォームを通じて、Android、HarmonyOS、iOSにトレーニングされたエージェントを提供し、ハイブリッドCLI-GUIコントロールとパーソナライズされたパーソナライズされたメモリを提供する。このパイプライン内でエンドツーエンドにトレーニングされた \textbf{ClawGUI-2B} は MobileWorld GUI 上で 17.1\% の成功率を達成した。

論文の概要: ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

関連論文リスト