Fugu-MT 論文翻訳(概要): MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration

論文の概要: MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration

arxiv url: http://arxiv.org/abs/2605.26546v1
Date: Tue, 26 May 2026 04:53:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-27 17:51:41.675519
Title: MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration
Title（参考訳）: MobileExplorer: オンライン探索によるモバイルGUIエージェントのオンデバイス推論の高速化
Authors: Runxi Huang, Liyu Zhang, Shengzhong Liu, Xiaomin Ouyang,
Abstract要約: MobileExplorerは、オンライン探索を通じて、ビジョンベースのモバイルGUIエージェントのデバイス上の推論を加速する。高速だが簡単なバックトラッキング戦略が失敗した場合に、初期UI状態をロールバックして復元する2段階のメカニズムを設計する。 MobileExplorerは、平均的な推論ステップ数とエンドツーエンドのレイテンシを23%削減し、タスクの成功率を最大5%向上させる。
参考スコア（独自算出の注目度）: 3.5101477906303633
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mobile graphical user interface (GUI) agents enable AI models to autonomously operate smartphones on behalf of users. However, most existing systems focus primarily on optimizing task accuracy and rely on cloud-hosted models for inference, which introduces privacy concerns and network-dependent latency. As a result, fully on-device deployment of mobile GUI agents remains underexplored. We propose MobileExplorer, a new framework that accelerates on-device inference for vision-based mobile GUI agents via online exploration. The key idea is to exploit the long per-step reasoning time of vision-language models (VLMs) by performing lightweight, parallel exploration of UI elements. During model inference, the agent proactively probes semantically relevant UI elements and records these exploration traces as structured memory. To ensure reliable execution in live mobile environments, we design a two-level rollback mechanism that robustly restores the initial UI state when a fast but naive backtracking strategy fails. The collected exploration traces are then summarized into concise contextual hints and injected into the prompt to enhance the subsequent reasoning step. We evaluate MobileExplorer on multiple off-the-shelf devices using the AndroidWorld benchmark, as well as newly designed, more complex tasks and dynamic on-device environments. MobileExplorer reduces the average number of reasoning steps and end-to-end latency by 23\%, while maintaining or improving task success rates by up to 5\%. A video demonstration of MobileExplorer performance in the real world is available at https://youtu.be/thK7MJmdlvM .
Abstract（参考訳）: モバイルグラフィカルユーザインタフェース(GUI)エージェントは、AIモデルがユーザに代わってスマートフォンを自律的に操作できるようにする。しかし、既存のシステムのほとんどは、主にタスクの精度の最適化に重点を置いており、プライバシの懸念とネットワーク依存のレイテンシをもたらすクラウドホストモデルに依存している。結果として、モバイルGUIエージェントの完全なオンデバイスデプロイはまだ未検討のままである。我々は,モバイルGUIエージェントのデバイス上での推論を,オンライン探索を通じて高速化する新しいフレームワークであるMobileExplorerを提案する。キーとなるアイデアは、UI要素の軽量で並列な探索を実行することで、視覚言語モデル(VLM)の長いステップごとの推論時間を活用することである。モデル推論の間、エージェントは積極的に意味のあるUI要素を探索し、これらの探索トレースを構造化メモリとして記録する。ライブモバイル環境での信頼性を確保するため,高速かつ簡単なバックトラッキング戦略が失敗した場合に,初期UI状態を堅牢に復元する2段階のロールバック機構を設計する。収集された探索トレースはその後、簡潔な文脈ヒントにまとめられ、プロンプトに注入され、その後の推論ステップが強化される。 AndroidWorldベンチマークを用いて、複数のオフザシェルフデバイス上でMobileExplorerを評価し、新たに設計されたより複雑なタスクとデバイス上の動的環境について検討した。 MobileExplorerは、平均的な推論ステップ数とエンドツーエンドのレイテンシを23倍に削減し、タスクの成功率を最大5倍に維持または改善する。実世界のMobileExplorerパフォーマンスのデモビデオはhttps://youtu.be/thK7MJmdlvM で公開されている。

論文の概要: MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration

関連論文リスト