Fugu-MT 論文翻訳(概要): STAMP: Training Explicit Memory for Mobile GUI Agents in Controllable and Scalable Virtual Environments

論文の概要: STAMP: Training Explicit Memory for Mobile GUI Agents in Controllable and Scalable Virtual Environments

arxiv url: http://arxiv.org/abs/2605.29324v1
Date: Thu, 28 May 2026 04:00:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:55.65243
Title: STAMP: Training Explicit Memory for Mobile GUI Agents in Controllable and Scalable Virtual Environments
Title（参考訳）: STAMP: 制御可能でスケーラブルな仮想環境におけるモバイルGUIエージェントのための明示的なメモリのトレーニング
Authors: Junyang Wang, Haiyang Xu, Xi Zhang, Zhaoqing Zhu, Ming Yan, Jieping Ye, Jitao Sang,
Abstract要約: モバイルエージェントは即座に反応制御を行うが、メモリを必要とする現実的なロングホライゾンタスクでは頻繁に失敗する。制御可能な仮想環境を通じて,モバイルエージェントの明示的なメモリをトレーニングするフレームワークSTAMPを提案する。結果のStampGUIエージェントは、メモリワールドベンチマークに新たなハイウォーターマークを設定し、例外的なメモリ精度とタスクレジリエンスを実証します。
参考スコア（独自算出の注目度）: 63.39393178045112
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mobile GUI agents excel at immediate reactive control but frequently fail in realistic, long-horizon tasks that require memory. This failure stems from a fundamental conflict between limited context windows and token-heavy screenshots. To save the limited context, agents must progressively discard older visual history, permanently losing crucial transient information. Furthermore, existing action-centric datasets fail to teach agents what or when to explicitly memorize, and augmenting static real-world data is prohibitively expensive and lacks interactive verification. To resolve this, we present STAMP, a framework that trains explicit memory in mobile agents through controllable virtual environments, where deterministic memory variables are programmatically injected into synthesized tasks to control what must be memorized, when it should be encoded, and when it must later be retrieved, thereby producing verifiable supervised data at scale and enabling online reinforcement learning through environment-driven reward feedback. Evaluated on our newly introduced Memory-World benchmark, the resulting Stamp-GUI agent achieves state-of-the-art performance among GUI-specialized models and sets a new high watermark on our Memory-World benchmark, demonstrating exceptional memory accuracy and task resilience while maintaining strong general mobile navigation capabilities.
Abstract（参考訳）: モバイルGUIエージェントは即座に反応制御を行うが、メモリを必要とする現実的な長期タスクでは頻繁に失敗する。この失敗は、限られたコンテキストウィンドウとトークンの多いスクリーンショットの根本的な衝突に起因する。限られた状況を保存するためには、エージェントはより古い視覚履歴を徐々に破棄し、重要な過渡的な情報を永久に失わなければならない。さらに、既存のアクション中心のデータセットは、何をいつ明示的に記憶すべきかをエージェントに教えることに失敗し、静的な現実世界データの増大は違法に高価であり、インタラクティブな検証が欠如している。この問題を解決するために、STAMPは、制御可能な仮想環境を通じてモバイルエージェントの明示的なメモリをトレーニングするフレームワークで、決定論的メモリ変数をプログラム的に合成タスクに注入し、暗記すべきタスク、コード化すべきタスク、後から検索する必要があるタスクを制御し、これにより、大規模に検証可能な教師付きデータを生成し、環境駆動型報酬フィードバックを通じてオンライン強化学習を可能にする。新たに導入した Memory-World ベンチマークで評価した結果,Stamp-GUI エージェントは GUI 仕様モデル間での最先端性能を実現し,メモリ-World ベンチマークに新たなハイウォーターマークを設定し,優れたモバイルナビゲーション機能を維持しながら,例外的なメモリ精度とタスクレジリエンスを実証した。

論文の概要: STAMP: Training Explicit Memory for Mobile GUI Agents in Controllable and Scalable Virtual Environments

関連論文リスト