CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning
- URL: http://arxiv.org/abs/2603.02951v1
- Date: Tue, 03 Mar 2026 13:02:20 GMT
- Title: CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning
- Authors: Zhenquan Yao, Zitong Huang, Yihan Zeng, Jianhua Han, Hang Xu, Chun-Mei Feng, Jianwei Ma, Wangmeng Zuo,
- Abstract summary: Supervised Fine-Tuning (SFT) facilitates fast adaptation, it often triggers knowledge overwriting.<n>Reinforcement Learning (RL) demonstrates an inherent resilience that shields prior interaction logic from erasure.<n>We propose a textbfContinual textbfGUI textbfLearning framework that balances efficiency and skill retention.
- Score: 67.78566256784404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graphical User Interface (GUI) Agents, benefiting from recent advances in multimodal large language models (MLLM), have achieved significant development. However, due to the frequent updates of GUI applications, adapting to new tasks without forgetting old tasks in GUI continual learning remains an open problem. In this work, we reveal that while Supervised Fine-Tuning (SFT) facilitates fast adaptation, it often triggers knowledge overwriting, whereas Reinforcement Learning (RL) demonstrates an inherent resilience that shields prior interaction logic from erasure. Based on this insight, we propose a \textbf{C}ontinual \textbf{G}UI \textbf{L}earning (CGL) framework that dynamically balances adaptation efficiency and skill retention by enhancing the synergy between SFT and RL. Specifically, we introduce an SFT proportion adjustment mechanism guided by policy entropy to dynamically control the weight allocation between the SFT and RL training phases. To resolve explicit gradient interference, we further develop a specialized gradient surgery strategy. By projecting exploratory SFT gradients onto GRPO-based anchor gradients, our method explicitly clips the components of SFT gradients that conflict with GRPO. On top of that, we establish an AndroidControl-CL benchmark, which divides GUI applications into distinct task groups to effectively simulate and evaluate the performance of continual GUI learning. Experimental results demonstrate the effectiveness of our proposed CGL framework across continual learning scenarios. The benchmark, code, and model will be made publicly available.
Related papers
- PROPA: Toward Process-level Optimization in Visual Reasoning via Reinforcement Learning [30.44007644340425]
We introduce PROPA, a novel framework that integrates Monte Carlo Tree Search (MCTS) with GRPO to generate dense, process-level rewards and optimize reasoning at each intermediate step without human annotations.<n>Across seven benchmarks and four VLM backbones, PROPA consistently outperforms both SFT- and RLVR-based baselines.<n>It achieves up to 17.0% gains on in-domain tasks and 21.0% gains on out-of-domain tasks compared to existing state-of-the-art.
arXiv Detail & Related papers (2025-11-13T13:06:12Z) - Orcust: Stepwise-Feedback Reinforcement Learning for GUI Agent [12.334063115362758]
Orcust is a framework that integrates Principle-Constrained Reward Modeling and Online VM-Grounded Trajectory Construction.<n>OVTC spins up instrumented virtual machines to autonomously collect structured GUI interaction trajectories.
arXiv Detail & Related papers (2025-09-22T15:40:31Z) - GRIL: Knowledge Graph Retrieval-Integrated Learning with Large Language Models [59.72897499248909]
We propose a novel graph retriever trained end-to-end with Large Language Models (LLMs)<n>Within the extracted subgraph, structural knowledge and semantic features are encoded via soft tokens and the verbalized graph, respectively, which are infused into the LLM together.<n>Our approach consistently achieves state-of-the-art performance, validating the strength of joint graph-LLM optimization for complex reasoning tasks.
arXiv Detail & Related papers (2025-09-20T02:38:00Z) - SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control [38.81034547191083]
We introduce SWIRL, a staged workflow for interleaved reinforcement learning designed for multi-agent systems.<n> SWIRL reformulates MARL into a sequence of single-agent reinforcement learning tasks, updating one agent at a time while keeping the others fixed.<n>In application to mobile GUI control, SWIRL instantiates a Navigator that converts language and screen context into structured plans, and an Interactor that grounds these plans into executable atomic actions.
arXiv Detail & Related papers (2025-08-27T16:27:19Z) - CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks [11.121687042616974]
Reinforcement Learning (RL) can effectively enhance agents' performance in dynamic interactive GUI environments.<n>Most approaches collapse task-specific nuances into a single, coarse reward, leaving the agent with a uniform signal that yields inefficient policy updates.<n>We propose CRAFT-GUI, a curriculum learning framework based on Group Relative Policy Optimization (GRPO) that explicitly accounts for the varying difficulty across trajectories.
arXiv Detail & Related papers (2025-08-15T09:55:02Z) - On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification [61.607788999847564]
We present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM)<n>We reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generalization capabilities of model.<n>We propose Dynamic Fine-Tuning (DFT), stabilizing gradient updates for each token by dynamically rescaling the objective function with the probability of this token.
arXiv Detail & Related papers (2025-08-07T17:59:04Z) - Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections [65.36449542323277]
We present a unified theoretical framework bridgingSupervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training.<n>We propose a simple yet effective learning rate reduction approach that yields significant performance improvements.
arXiv Detail & Related papers (2025-06-15T05:42:29Z) - ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay [88.74638385288773]
Agentic Replay Policy Optimization improves performance on complex, long-horizon computer tasks.<n>We propose a task selection strategy that filters tasks based on baseline agent performance.<n>Experiments on the OSWorld benchmark demonstrate that ARPO achieves competitive results.
arXiv Detail & Related papers (2025-05-22T06:24:32Z) - Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation [101.09478572153239]
We propose an approach that guides VLM agents with process supervision by a reward model during GUI navigation and control at inference time.<n>This guidance allows the VLM agent to optimize actions at each inference step, thereby improving performance in both static and dynamic environments.
arXiv Detail & Related papers (2025-04-22T17:52:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.