Related papers: Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation

Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation

URL: http://arxiv.org/abs/2511.22235v1
Date: Thu, 27 Nov 2025 09:01:38 GMT
Title: Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation
Authors: Zehao Deng, Tianjie Ju, Zheng Wu, Zhuosheng Zhang, Gongshen Liu,
Abstract summary: Single-agent GUI agents struggle to balance high-level capabilities and low-level execution capability.<n>Unlike training a unified policy model, we focus on training high-level scheduling models.<n>We build the Coordinator-Executor-State Tracker framework, which can be integrated with any low-level Executor model.
Score: 25.0921056409982
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid development of large vision-language model (VLM) has greatly promoted the research of GUI agent. However, GUI agents still face significant challenges in handling long-horizon tasks. First, single-agent models struggle to balance high-level capabilities and low-level execution capability, facing prevalent issues of responsibility coupling and capability conflicts. Second, agents lack awareness of the task state, leading to progress loss in long-horizon tasks. To address these challenges, we propose a staged execution-feedback reinforcement learning algorithm. Unlike training a unified policy model, we focus on training high-level scheduling models. Specifically, we propose and train two agents: a Coordinator, responsible for the strategic planning and task decomposition; and a State Tracker, responsible for context compression and information management to maintain the task's state and coherence. Based on this, we built the Coordinator-Executor-State Tracker (CES) multi-agent framework, which can be integrated with any low-level Executor model, assisting the Executor in solving long-horizon tasks through task scheduling and state management. Experiments on long-horizon task benchmarks demonstrate that CES significantly enhances the system's planning and state management capabilities. Furthermore, analysis confirms that our trained high-level scheduling module is a generalizable, plug-and-play module that significantly enhances the long-horizon capabilities of various Executors. Code can be available at https://github.com/hehehahi4/CES.

Related papers

LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces [65.11019654023978]
LongCLI-Bench is a benchmark designed to evaluate agentic capabilities across long-horizon, realistic tasks.<n>We curated 20 high-quality, long-horizon tasks from over 1,000 computer science assignments and real-world tasks.<n>Experiments reveal that even state-of-the-art agents achieve pass rates below 20% in LongCLI-Bench.
arXiv Detail & Related papers (2026-02-15T23:12:57Z)
AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management [24.465443389008055]
AgentProg is a program-guided approach for agent context management.<n>It reframes the interaction history as a program with variables and control flow.<n> Experiments on AndroidWorld and our extended long-horizon task suite demonstrate that AgentProg has achieved the state-of-the-art success rates.
arXiv Detail & Related papers (2025-12-11T07:37:38Z)
Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation [57.12284831164602]
Mobile agents show immense potential, yet current state-of-the-art (SoTA) agents exhibit inadequate success rates on real-world, long-horizon, cross-application tasks.<n>We propose Mobile-Agent-RAG, a novel hierarchical multi-agent framework that innovatively integrates dual-level retrieval augmentation.
arXiv Detail & Related papers (2025-11-15T15:22:42Z)
CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning [18.826366389246385]
We propose a new mobile assistant architecture with constrained high-frequency optimized planning (CHOP)<n>Our approach overcomes the VLM's deficiency in GUI scenarios planning by using human-planned subtasks as the basis vector.<n>We evaluate our architecture in both English and Chinese contexts across 20 Apps, demonstrating significant improvements in both effectiveness and efficiency.
arXiv Detail & Related papers (2025-03-05T18:56:16Z)
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks [85.48034185086169]
Mobile-Agent-E is a hierarchical multi-agent framework capable of self-evolution through past experience.<n>Mobile-Agent-E achieves a 22% absolute improvement over previous state-of-the-art approaches.
arXiv Detail & Related papers (2025-01-20T20:35:46Z)
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation [23.026244256950086]
We propose MobA, a novel MLLM-based mobile assistant system.<n>We introduce an adaptive planning module that incorporates a reflection mechanism for error recovery.<n>We also present MobBench, a dataset designed for complex mobile interactions.
arXiv Detail & Related papers (2024-10-17T16:53:50Z)
PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer [47.924941959320996]
We propose a hierarchical planner designed for offline RL called PlanDQ. PlanDQ incorporates a diffusion-based planner at the high level, named D-Conductor, which guides the low-level policy through sub-goals. At the low level, we used a Q-learning based approach called the Q-Performer to accomplish these sub-goals.
arXiv Detail & Related papers (2024-06-10T20:59:53Z)
RL-GPT: Integrating Reinforcement Learning and Code-as-policy [82.1804241891039]
We introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent. The slow agent analyzes actions suitable for coding, while the fast agent executes coding tasks. This decomposition effectively focuses each agent on specific tasks, proving highly efficient within our pipeline.
arXiv Detail & Related papers (2024-02-29T16:07:22Z)
Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories. We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning. In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z)
Hierarchically Structured Scheduling and Execution of Tasks in a Multi-Agent Environment [1.0660480034605238]
In a warehouse environment, tasks appear dynamically. Consequently, a task management system that matches them with the workforce too early is necessarily sub-optimal. We propose to use deep reinforcement learning to solve both the high-level scheduling problem and the low-level multi-agent problem of schedule execution.
arXiv Detail & Related papers (2022-03-06T18:11:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.