Fugu-MT 論文翻訳(概要): Beyond the GUI Paradigm: Do Mobile Agents Need the Phone Screen?

論文の概要: Beyond the GUI Paradigm: Do Mobile Agents Need the Phone Screen?

arxiv url: http://arxiv.org/abs/2606.19388v1
Date: Tue, 16 Jun 2026 02:36:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-19 18:23:39.436367
Title: Beyond the GUI Paradigm: Do Mobile Agents Need the Phone Screen?
Title（参考訳）: GUIのパラダイムを超えて:モバイルエージェントは電話画面を必要とするか?
Authors: Li Gu, Zihuan Jiang, Linqiang Guo, Zhixiang Chi, Ziqiang Wang, Huan Liu, Yuanhao Yu, Tse-Hsun Chen, Yang Wang,
Abstract要約: モバイルプラットフォームは、デバイスサービスとデータへの直接アクセスを提供するコマンドラインインターフェース(CLI)を公開する。 AndroidWorldとMobileWorldの4つのモデルAPIにまたがる3つのコーディングエージェントを,モバイル固有のポストトレーニングなしで評価する。私たちは、AndroidWorld(103/116タスクCLI解決可能)で88.8%、MobileWorld(101/117タスクCLI解決可能)で86.3%に達するオラクルCLIソリューションを提供しています。モバイルCLIエージェントに関する将来の研究をサポートするため、エージェントの実装、オラクルソリューション、CLI-Advantageスイート、評価インフラストラクチャをオープンにします。
参考スコア（独自算出の注目度）: 23.855513024800526
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in mobile agents are dominated by the GUI paradigm, in which agents perceive UI information and emit screen interactions. However, mobile platforms also expose a command-line interface (CLI) that provides direct access to device services and data. We argue CLI deserves first-class consideration alongside GUI. We evaluate three coding agents (Claude Code, Terminus-2, mini-swe-agent) across four model APIs on AndroidWorld and MobileWorld without any mobile-specific post-training, comparing against three reproducible GUI baselines (GUI-Owl-1.5-32B, MAI-UI, Qwen3-VL-32B). Claude Code (Opus 4.7) reaches 71.8\% and 51.9\%, outperforming every reproducible GUI baseline (69.3/68.1/57.8\% on AndroidWorld; 43.2/26.3/13.3\% on MobileWorld), while every other CLI configuration remains competitive. To establish the paradigm's ceiling, we provide oracle CLI solutions that reach 88.8\% on AndroidWorld (103/116 tasks CLI-solvable) and 86.3\% on MobileWorld (101/117 tasks CLI-solvable), indicating substantial room for future improvement. To cover everyday user intents beyond the GUI scope, we introduce the \textbf{CLI-Advantage Task Suite}, comprising 45 templates across five categories: bulk operations, multi-condition filtering, aggregation, cross-app workflows, and hidden device state. Every CLI agent outperforms every GUI baseline in all five categories, with substantially fewer steps per task (10.7 vs.\ 18.6). To support future research on mobile CLI agents, we will open-source agent implementations, oracle solutions, the CLI-Advantage suite, and evaluation infrastructure.
Abstract（参考訳）: モバイルエージェントの最近の進歩はGUIパラダイムによって支配されており、エージェントはUI情報を認識し、スクリーンインタラクションを発行する。しかし、モバイルプラットフォームはまた、デバイスサービスやデータに直接アクセスするコマンドラインインターフェース(CLI)も公開している。 CLIはGUIとともに第一級に考慮すべきである、と私たちは主張する。 Claude Code,Terminus-2, mini-swe-agent) を AndroidWorld と MobileWorld の4つのモデル API に対して,再現可能な GUI ベースライン (GUI-Owl-1.5-32B, MAI-UI, Qwen3-VL-32B) と比較した。 Claude Code (Opus 4.7) は 71.8\% と 51.9\% に達し、再現可能なすべての GUI ベースライン (AndroidWorld では69.3/68.1/57.8\%、MobileWorld では43.2/26.3/13.3\%) を上回っている。パラダイムの天井を確立するために、AndroidWorld(103/116タスクCLI解決可能)で88.8\%、MobileWorld(101/117タスクCLI解決可能)で86.3\%に達するオラクルCLIソリューションを提供しています。 GUIの範囲を超えて日常的なユーザ意図をカバーするために、バルク操作、マルチコンディションフィルタリング、アグリゲーション、クロスアプリワークフロー、隠れデバイス状態という5つのカテゴリにまたがる45のテンプレートを含む、‘textbf{CLI-Advantage Task Suite}’を紹介します。すべてのCLIエージェントは5つのカテゴリですべてのGUIベースラインを上回り、タスク毎のステップ(10.7対10.7)はかなり少ない。 18.6)。モバイルCLIエージェントに関する将来の研究をサポートするため、エージェントの実装、オラクルソリューション、CLI-Advantageスイート、評価インフラストラクチャをオープンソースにします。

論文の概要: Beyond the GUI Paradigm: Do Mobile Agents Need the Phone Screen?

関連論文リスト