Fugu-MT 論文翻訳(概要): Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System

論文の概要: Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System

arxiv url: http://arxiv.org/abs/2506.08972v1
Date: Tue, 10 Jun 2025 16:45:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-11 15:11:42.879315
Title: Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System
Title（参考訳）: 新しいベンチマークとスケジューリングシステムを用いた移動体エージェントの原子間一般化
Authors: Yuan Guo, Tingjia Miao, Zheng Wu, Pengzhou Cheng, Ming Zhou, Zhuosheng Zhang,
Abstract要約: 本研究は,モバイルエージェントを構成操作の3つのカテゴリで評価するためのベンチマークを導入する。 Agent-NEXUSは、構成モバイルタスクに取り組むための軽量で効率的なスケジューリングシステムである。
参考スコア（独自算出の注目度）: 28.996849369783032
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous agents powered by multimodal large language models have been developed to facilitate task execution on mobile devices. However, prior work has predominantly focused on atomic tasks -- such as shot-chain execution tasks and single-screen grounding tasks -- while overlooking the generalization to compositional tasks, which are indispensable for real-world applications. This work introduces UI-NEXUS, a comprehensive benchmark designed to evaluate mobile agents on three categories of compositional operations: Simple Concatenation, Context Transition, and Deep Dive. UI-NEXUS supports interactive evaluation in 20 fully controllable local utility app environments, as well as 30 online Chinese and English service apps. It comprises 100 interactive task templates with an average optimal step count of 14.05. Experimental results across a range of mobile agents with agentic workflow or agent-as-a-model show that UI-NEXUS presents significant challenges. Specifically, existing agents generally struggle to balance performance and efficiency, exhibiting representative failure modes such as under-execution, over-execution, and attention drift, causing visible atomic-to-compositional generalization gap. Inspired by these findings, we propose AGENT-NEXUS, a lightweight and efficient scheduling system to tackle compositional mobile tasks. AGENT-NEXUS extrapolates the abilities of existing mobile agents by dynamically decomposing long-horizon tasks to a series of self-contained atomic subtasks. AGENT-NEXUS achieves 24% to 40% task success rate improvement for existing mobile agents on compositional operation tasks within the UI-NEXUS benchmark without significantly sacrificing inference overhead. The demo video, dataset, and code are available on the project page at https://ui-nexus.github.io.
Abstract（参考訳）: モバイルデバイス上でのタスク実行を容易にするために,マルチモーダルな大規模言語モデルを利用した自律エージェントが開発された。しかしながら、以前の作業は主に、ショットチェーン実行タスクやシングルスクリーングラウンドタスクのようなアトミックタスクに重点を置いている一方で、現実のアプリケーションでは不可欠であるコンポジションタスクへの一般化を見越している。 UI-NEXUSは,構成操作の3つのカテゴリ – Simple Concatenation, Context Transition, Deep Dive – でモバイルエージェントを評価するために設計された,包括的なベンチマークだ。 UI-NEXUSは、完全に制御可能な20のローカルユーティリティアプリ環境と30のオンライン中国語および英語サービスアプリでインタラクティブな評価をサポートする。 100の対話型タスクテンプレートで構成され、平均最適ステップ数は14.05である。エージェントワークフローやエージェント・アズ・ア・モデルを備えたモバイルエージェントのさまざまな実験結果から,UI-NEXUSが大きな課題を呈していることがわかる。具体的には、既存のエージェントは一般に性能と効率のバランスをとるのに苦労し、過度な実行、過剰な実行、注意の漂流などの代表的な障害モードを示し、原子間合成の一般化のギャップを目に見えるものにする。これらの知見に触発されて,構成的移動課題に対処する軽量かつ効率的なスケジューリングシステムである Agent-NEXUS を提案する。 Agent-NEXUSは、長い水平タスクを一連の自己完結した原子サブタスクに動的に分解することで、既存の移動エージェントの能力を外挿する。 Agent-NEXUSは、既存のモバイルエージェントに対して、UI-NEXUSベンチマーク内の構成操作タスクにおいて、推論オーバーヘッドを大幅に犠牲にすることなく、24%から40%のタスク成功率の改善を実現している。デモビデオ、データセット、コードはプロジェクトページ(https://ui-nexus.github.io.com)で公開されている。

論文の概要: Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System

関連論文リスト