Fugu-MT 論文翻訳(概要): AppCopilot: Toward General, Accurate, Long-Horizon, and Efficient Mobile Agent

論文の概要: AppCopilot: Toward General, Accurate, Long-Horizon, and Efficient Mobile Agent

arxiv url: http://arxiv.org/abs/2509.02444v1
Date: Tue, 02 Sep 2025 15:48:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-04 15:17:04.089237
Title: AppCopilot: Toward General, Accurate, Long-Horizon, and Efficient Mobile Agent
Title（参考訳）: AppCopilot: 汎用,高精度,長距離,効率的なモバイルエージェントを目指して
Authors: Jingru Fan, Yufan Dang, Jingyao Wu, Huatao Li, Runde Yang, Xiyuan Yang, Yuheng Wang, Zhong Zhang, Yaxi Lu, Yankai Lin, Zhiyuan Liu, Dahai Li, Chen Qian,
Abstract要約: 本稿では,モバイルエージェントが現実的かつスケーラブルな影響をもたらすためには,解決すべき4つの中核的問題を特定する。本稿では,マルチモーダル,マルチエージェント,汎用オンデバイスアシスタントであるAppCopilotを紹介する。 AppCopilotはアプリケーション間で動作し、データからデプロイメントまでの完全なクローズドループシステムを構成する。
参考スコア（独自算出の注目度）: 49.61420186190895
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the raid evolution of large language models and multimodal foundation models, the mobile-agent landscape has proliferated without converging on the fundamental challenges. This paper identifies four core problems that must be solved for mobile agents to deliver practical, scalable impact: (1) generalization across tasks, modalities, apps, and devices; (2) accuracy, specifically precise on-screen interaction and click targeting; (3) long-horizon capability for sustained, multi-step goals; and (4) efficiency, specifically high-performance runtime on resource-constrained devices. We present AppCopilot, a multimodal, multi-agent, general-purpose on-device assistant that operates across applications and constitutes a full-stack, closed-loop system from data to deployment. AppCopilot operationalizes this position through an end-to-end autonomous pipeline spanning data collection, training, deployment, high-quality and efficient inference, and mobile application development. At the model layer, it integrates multimodal foundation models with robust Chinese-English support. At the reasoning and control layer, it combines chain-of-thought reasoning, hierarchical task planning and decomposition, and multi-agent collaboration. At the execution layer, it enables user personalization and experiential adaptation, voice interaction, function calling, cross-app and cross-device orchestration, and comprehensive mobile app support. The system design incorporates profiling-driven optimization for latency, memory, and energy across heterogeneous hardware. Empirically, AppCopilot achieves significant improvements along all four dimensions: stronger generalization, higher-precision on-screen actions, more reliable long-horizon task completion, and faster, more resource-efficient runtime.
Abstract（参考訳）: 大規模言語モデルとマルチモーダル基盤モデルの襲撃進化により,モバイルエージェントの展望は,基本的な課題を収束させることなく拡大してきた。本稿では,(1)タスク,モダリティ,アプリ,デバイス間の一般化,(2)画面上でのインタラクションとクリックターゲティングの精度,(3)持続的,多段階的な目標達成のための長期的機能,(4)資源制約のあるデバイス上での高性能ランタイムの効率の4つを,モバイルエージェントが現実的かつスケーラブルなインパクトを実現するために解決しなければならない4つの中核的課題について述べる。アプリケーション間で動作し、データからデプロイメントまでの完全なクローズドループシステムを構成する、マルチモーダル、マルチエージェント、汎用オンデバイスアシスタントであるAppCopilotを紹介します。 AppCopilotは、データ収集、トレーニング、デプロイメント、高品質で効率的な推論、モバイルアプリケーション開発にまたがるエンドツーエンドの自律パイプラインを通じて、この位置を運用する。モデル層では、マルチモーダル基礎モデルと中国語と英語の堅牢なサポートを統合している。推論と制御層では、連鎖推論、階層的なタスク計画と分解、マルチエージェントのコラボレーションを組み合わせています。実行層では、ユーザパーソナライズと経験的適応、音声インタラクション、関数呼び出し、クロスアプリとクロスデバイスオーケストレーション、包括的なモバイルアプリサポートが可能である。システム設計には、不均一なハードウェア間でのレイテンシ、メモリ、エネルギのプロファイリング駆動最適化が組み込まれている。経験的に、AppCopilotは、より強力な一般化、より高精度なオンスクリーンアクション、より信頼性の高い長期タスク補完、より高速でリソース効率の高いランタイムという、4つの面で大きな改善を実現している。

論文の概要: AppCopilot: Toward General, Accurate, Long-Horizon, and Efficient Mobile Agent

関連論文リスト