Fugu-MT 論文翻訳(概要): AppCopilot: Toward General, Accurate, Long-Horizon, and Efficient Mobile Agent

論文の概要: AppCopilot: Toward General, Accurate, Long-Horizon, and Efficient Mobile Agent

arxiv url: http://arxiv.org/abs/2509.02444v2
Date: Fri, 17 Oct 2025 00:57:58 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-20 15:58:54.414154
Title: AppCopilot: Toward General, Accurate, Long-Horizon, and Efficient Mobile Agent
Title（参考訳）: AppCopilot: 汎用,高精度,長距離,効率的なモバイルエージェントを目指して
Authors: Jingru Fan, Yufan Dang, Jingyao Wu, Huatao Li, Runde Yang, Xiyuan Yang, Yuheng Wang, Chen Qian,
Abstract要約: 本稿では,モバイルエージェントが現実的かつスケーラブルな影響をもたらすために解決すべき4つの中核的問題を特定する。アプリケーション間で動作するマルチモーダル,マルチエージェント,汎用モバイルエージェントであるAppCopilotを提案する。
参考スコア（独自算出の注目度）: 12.27790226999309
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the raid evolution of large language models and multimodal models, the mobile-agent landscape has proliferated without converging on the fundamental challenges. This paper identifies four core problems that should be solved for mobile agents to deliver practical, scalable impact: (1) generalization across tasks, APPs, and devices; (2) accuracy, specifically precise on-screen interaction and click targeting; (3) long-horizon capability for sustained, multi-step goals; and (4) efficiency, specifically high-performance runtime on resource-constrained devices. We present AppCopilot, a multimodal, multi-agent, general-purpose mobile agent that operates across applications. AppCopilot operationalizes this position through an end-to-end pipeline spanning data collection, training, finetuning, efficient inference, and PC/mobile application. At the model layer, it integrates multimodal foundation models with robust Chinese-English support. At the reasoning and control layer, it combines chain-of-thought reasoning, hierarchical task planning and decomposition, and multi-agent collaboration. At the execution layer, it enables experiential adaptation, voice interaction, function calling, cross-APP and cross-device orchestration, and comprehensive mobile APP support. The system design incorporates profiling-driven optimization for latency and memory across heterogeneous hardware. Empirically, AppCopilot achieves significant improvements on four dimensions: stronger generalization, higher precision of on screen actions, more reliable long horizon task completion, and faster, more resource efficient runtime. By articulating a cohesive position and a reference architecture that closes the loop from data collection, training to finetuning and efficient inference, this paper offers a concrete roadmap for general purpose mobile agent and provides actionable guidance.
Abstract（参考訳）: 大規模言語モデルとマルチモーダルモデルの襲撃進化により,モバイルエージェントのランドスケープは,基本的な課題を収束させることなく拡大してきた。本稿では,(1)タスク,APP,デバイス間の一般化,(2)画面上でのインタラクションとクリックターゲティングの精度,(3)持続的・多段階目標のための長期的機能,(4)資源制約のあるデバイス上での高性能ランタイムの効率,といった,モバイルエージェントが現実的かつスケーラブルなインパクトを実現する上で,解決すべき4つの中核的課題について述べる。アプリケーション間で動作するマルチモーダル,マルチエージェント,汎用モバイルエージェントであるAppCopilotを提案する。 AppCopilotは、データ収集、トレーニング、微調整、効率的な推論、PC/モバイルアプリケーションにまたがるエンドツーエンドパイプラインを通じて、この位置を運用する。モデル層では、マルチモーダル基礎モデルと中国語と英語の堅牢なサポートを統合している。推論と制御層では、連鎖推論、階層的なタスク計画と分解、マルチエージェントのコラボレーションを組み合わせています。実行層では、経験的適応、音声インタラクション、関数呼び出し、クロスプラットフォームとデバイス間のオーケストレーション、総合的なモバイルAPPサポートが可能である。システム設計には、プロファイリング駆動による、異種ハードウェア間のレイテンシとメモリの最適化が組み込まれている。経験的に、AppCopilotはより強力な一般化、画面アクションの高精度化、より信頼性の高い長期タスク補完、より高速でリソース効率の高いランタイムという、4つの面で大幅な改善を実現している。本稿では,データ収集からトレーニング,微調整,効率的な推論まで,ループを閉じる結合的な位置と参照アーキテクチャを具体化することにより,汎用モバイルエージェントのための具体的なロードマップを提供し,実用的なガイダンスを提供する。

論文の概要: AppCopilot: Toward General, Accurate, Long-Horizon, and Efficient Mobile Agent

関連論文リスト