Co-EPG: A Framework for Co-Evolution of Planning and Grounding in Autonomous GUI Agents
- URL: http://arxiv.org/abs/2511.10705v1
- Date: Thu, 13 Nov 2025 03:41:02 GMT
- Title: Co-EPG: A Framework for Co-Evolution of Planning and Grounding in Autonomous GUI Agents
- Authors: Yuan Zhao, Hualei Zhu, Tingyu Jiang, Shen Li, Xiaohang Xu, Hao Henry Wang,
- Abstract summary: Co-EPG is a self-iterative training framework for Co-Evolution of Planning and Grounding.<n>This work establishes a novel training paradigm for GUI agents, shifting from isolated optimization to an integrated, self-driven co-evolution approach.
- Score: 10.528687017443852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graphical User Interface (GUI) task automation constitutes a critical frontier in artificial intelligence research. While effective GUI agents synergistically integrate planning and grounding capabilities, current methodologies exhibit two fundamental limitations: (1) insufficient exploitation of cross-model synergies, and (2) over-reliance on synthetic data generation without sufficient utilization. To address these challenges, we propose Co-EPG, a self-iterative training framework for Co-Evolution of Planning and Grounding. Co-EPG establishes an iterative positive feedback loop: through this loop, the planning model explores superior strategies under grounding-based reward guidance via Group Relative Policy Optimization (GRPO), generating diverse data to optimize the grounding model. Concurrently, the optimized Grounding model provides more effective rewards for subsequent GRPO training of the planning model, fostering continuous improvement. Co-EPG thus enables iterative enhancement of agent capabilities through self-play optimization and training data distillation. On the Multimodal-Mind2Web and AndroidControl benchmarks, our framework outperforms existing state-of-the-art methods after just three iterations without requiring external data. The agent consistently improves with each iteration, demonstrating robust self-enhancement capabilities. This work establishes a novel training paradigm for GUI agents, shifting from isolated optimization to an integrated, self-driven co-evolution approach.
Related papers
- MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z) - Generative Actor Critic [74.04971271003869]
Generative Actor Critic (GAC) is a novel framework that decouples sequential decision-making by reframing textitpolicy evaluation as learning a generative model of the joint distribution over trajectories and returns.<n>Experiments on Gym-MuJoCo and Maze2D benchmarks demonstrate GAC's strong offline performance and significantly enhanced offline-to-online improvement compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-12-25T06:31:11Z) - GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising Recommendation [38.48999566011862]
We propose GPR (Generative Pre-trained Recommender), a one-model framework that redefines advertising recommendation as an end-to-end generative task.<n>We introduce three key innovations spanning unified representation, network architecture, and training strategy.<n>GPR has been fully deployed in the Tencent Weixin Channels advertising system, delivering significant improvements in key business metrics.
arXiv Detail & Related papers (2025-11-13T09:50:53Z) - EmbodiedBrain: Expanding Performance Boundaries of Task Planning for Embodied Intelligence [17.644658293987955]
Embodied AI agents are capable of robust spatial perception, effective task planning, and adaptive execution in physical environments.<n>Current large language models (LLMs) and multimodal LLMs (MLLMs) for embodied tasks suffer from key limitations.<n>We propose EmbodiedBrain, a novel vision-language foundation model available in both 7B and 32B parameter sizes.
arXiv Detail & Related papers (2025-10-23T14:05:55Z) - Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search [24.02739832976663]
Auto-bidding serves as a critical tool for advertisers to improve their performance.<n>Recent progress has demonstrated that AI-Generated Bidding (AIGB) achieves superior performance compared to typical offline reinforcement learning (RL)-based auto-bidding methods.<n>We propose AIGB-Pearl, a novel method that integrates generative planning and policy optimization.
arXiv Detail & Related papers (2025-09-19T12:30:26Z) - Improved particle swarm optimization algorithm: multi-target trajectory optimization for swarm drones [20.531764063763678]
Traditional Particle Swarm Optimization (PSO) methods struggle with premature convergence and latency in real-time scenarios.<n>We propose PE-PSO, an enhanced PSO-based online trajectory planner.<n>We develop a multi-agent framework that combines genetic algorithm (GA)-based task allocation with distributed PE-PSO, supporting scalable and coordinated trajectory generation.
arXiv Detail & Related papers (2025-07-18T04:31:49Z) - Reinforced Reasoning for Embodied Planning [18.40186665383579]
Embodied planning requires agents to make coherent multi-step decisions based on dynamic visual observations and natural language goals.<n>We introduce a reinforcement fine-tuning framework that brings R1-style reasoning enhancement into embodied planning.
arXiv Detail & Related papers (2025-05-28T07:21:37Z) - Killing Two Birds with One Stone: Unifying Retrieval and Ranking with a Single Generative Recommendation Model [71.45491434257106]
Unified Generative Recommendation Framework (UniGRF) is a novel approach that integrates retrieval and ranking into a single generative model.<n>To enhance inter-stage collaboration, UniGRF introduces a ranking-driven enhancer module.<n>UniGRF significantly outperforms existing models on benchmark datasets.
arXiv Detail & Related papers (2025-04-23T06:43:54Z) - Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z) - EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought [95.37585041654535]
Embodied AI is capable of planning and executing action sequences for robots to accomplish long-horizon tasks in physical environments.
In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI.
Experiments show the effectiveness of EmbodiedGPT on embodied tasks, including embodied planning, embodied control, visual captioning, and visual question answering.
arXiv Detail & Related papers (2023-05-24T11:04:30Z) - MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks [37.529217646431825]
In Goal-oriented Reinforcement learning, relabeling the raw goals in past experience to provide agents with hindsight ability is a major solution to the reward sparsity problem.
We develop FGI (Foresight Goal Inference), a new relabeling strategy that relabels the goals by looking into the future with a learned dynamics model.
To improve sample efficiency, we propose to use the dynamics model to generate simulated trajectories for policy training.
arXiv Detail & Related papers (2021-05-13T15:07:23Z) - Model-based Reinforcement Learning for Decentralized Multiagent
Rendezvous [66.6895109554163]
Underlying the human ability to align goals with other agents is their ability to predict the intentions of others and actively update their own plans.
We propose hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous.
arXiv Detail & Related papers (2020-03-15T19:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.