Fugu-MT 論文翻訳(概要): MAPLE: A Mobile Agent with Persistent Finite State Machines for Structured Task Reasoning

論文の概要: MAPLE: A Mobile Agent with Persistent Finite State Machines for Structured Task Reasoning

arxiv url: http://arxiv.org/abs/2505.23596v2
Date: Mon, 02 Jun 2025 18:32:13 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-04 16:31:03.67668
Title: MAPLE: A Mobile Agent with Persistent Finite State Machines for Structured Task Reasoning
Title（参考訳）: MAPLE: 構造化タスク推論のための永続有限状態マシンを備えた移動体エージェント
Authors: Linqiang Guo, Wei Liu, Yi Wen Heng, Tse-Hsun, Chen, Yang Wang,
Abstract要約: アプリケーションインタラクションをFSM(Finite State Machine)として抽象化する,状態認識型マルチエージェントフレームワークMAPLEを提案する。それぞれのUI画面を離散状態として、ユーザアクションをトランジションとしてモデル化し、FSMがアプリケーション実行の構造化された表現を提供できるようにします。 MAPLEは、計画、実行、検証、エラー回復、知識保持という4段階のタスク実行に責任を持つ特殊エージェントで構成されている。
参考スコア（独自算出の注目度）: 46.18718721121415
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mobile GUI agents aim to autonomously complete user-instructed tasks across mobile apps. Recent advances in Multimodal Large Language Models (MLLMs) enable these agents to interpret UI screens, identify actionable elements, and perform interactions such as tapping or typing. However, existing agents remain reactive: they reason only over the current screen and lack a structured model of app navigation flow, limiting their ability to understand context, detect unexpected outcomes, and recover from errors. We present MAPLE, a state-aware multi-agent framework that abstracts app interactions as a Finite State Machine (FSM). We computationally model each UI screen as a discrete state and user actions as transitions, allowing the FSM to provide a structured representation of the app execution. MAPLE consists of specialized agents responsible for four phases of task execution: planning, execution, verification, error recovery, and knowledge retention. These agents collaborate to dynamically construct FSMs in real time based on perception data extracted from the UI screen, allowing the GUI agents to track navigation progress and flow, validate action outcomes through pre- and post-conditions of the states, and recover from errors by rolling back to previously stable states. Our evaluation results on two challenging cross-app benchmarks, Mobile-Eval-E and SPA-Bench, show that MAPLE outperforms the state-of-the-art baseline, improving task success rate by up to 12%, recovery success by 13.8%, and action accuracy by 6.5%. Our results highlight the importance of structured state modeling in guiding mobile GUI agents during task execution. Moreover, our FSM representation can be integrated into future GUI agent architectures as a lightweight, model-agnostic memory layer to support structured planning, execution verification, and error recovery.
Abstract（参考訳）: モバイルGUIエージェントは、モバイルアプリ全体で自律的にユーザー命令されたタスクを完了させることを目的としている。 MLLM(Multimodal Large Language Models)の最近の進歩により、これらのエージェントはUI画面を解釈し、実行可能な要素を特定し、タップやタイピングなどのインタラクションを実行することができる。しかし、既存のエージェントはリアクティブのままであり、現在のスクリーン上でのみ推論し、アプリのナビゲーションフローの構造モデルが欠如しているため、コンテキストを理解し、予期しない結果を検出し、エラーから回復する能力が制限されている。アプリケーションインタラクションをFSM(Finite State Machine)として抽象化する,状態認識型マルチエージェントフレームワークであるMAPLEを提案する。それぞれのUI画面を離散状態として、ユーザアクションをトランジションとしてモデル化し、FSMがアプリケーション実行の構造化された表現を提供できるようにします。 MAPLEは、計画、実行、検証、エラー回復、知識保持という4段階のタスク実行に責任を持つ特殊エージェントで構成されている。これらのエージェントは、UI画面から抽出された知覚データに基づいて、リアルタイムでFSMを動的に構築し、GUIエージェントがナビゲーションの進捗とフローを追跡し、状態の事前条件と後条件によってアクションの結果を確認し、以前の安定状態にロールバックすることでエラーから回復する。評価の結果、Mobile-Eval-EとSPA-Benchは、MAPLEが最先端のベースラインを上回り、タスク成功率が最大12%向上し、リカバリが13.8%向上し、アクション精度が6.5%向上した。タスク実行中のモバイルGUIエージェントの誘導における構造化状態モデリングの重要性を強調した。さらに、FSM表現は、構造化計画、実行検証、エラー回復をサポートする軽量でモデルに依存しないメモリ層として、将来のGUIエージェントアーキテクチャに統合することができる。

論文の概要: MAPLE: A Mobile Agent with Persistent Finite State Machines for Structured Task Reasoning

関連論文リスト