UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
- URL: http://arxiv.org/abs/2509.11543v2
- Date: Wed, 24 Sep 2025 15:05:34 GMT
- Title: UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
- Authors: Zhengxi Lu, Jiabo Ye, Fei Tang, Yongliang Shen, Haiyang Xu, Ziwei Zheng, Weiming Lu, Ming Yan, Fei Huang, Jun Xiao, Yueting Zhuang,
- Abstract summary: We present Semi-online Reinforcement Learning, a novel paradigm that simulates online RL on offline trajectories.<n>To capture long-term training signals, Semi-online RL introduces discounted future returns into the reward computation.<n>Experiments show that ours Semi-online RL achieves SOTA performance among 7B models across four dynamic benchmarks.
- Score: 78.86567400365392
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graphical User Interface (GUI) agents have demonstrated remarkable progress in automating complex user interface interactions through reinforcement learning. However, current approaches face a fundamental dilemma: offline RL enables stable training on pre-collected trajectories, but struggles with multi-step task execution for lack of trajectory-level reward signals; online RL captures these signals through environment interaction, but suffers from sparse rewards and prohibitive deployment costs. To address it, we present Semi-online Reinforcement Learning, a novel paradigm that simulates online RL on offline trajectories. During each rollout process, we preserve the original model output within the multi-turn dialogue, where a Patch Module adaptively recovers the divergence between rollout and expert trajectories. To capture long-term training signals, Semi-online RL introduces discounted future returns into the reward computation and optimizes the policy with weighted step-level and episode-level advantages. We further introduce Semi-Online Performance (SOP), a metric that aligns better with true online performance, serving as a practical and effective proxy for real-world evaluation. Experiments show that ours Semi-online RL achieves SOTA performance among 7B models across four dynamic benchmarks, with significant gains over the base model (e.g., +12.0% on AndroidWorld, +23.8% on AITW), demonstrating significant progress in bridging the gap between offline training efficiency and online multi-turn reasoning. The code is available at https://github.com/X-PLUG/MobileAgent/tree/main/UI-S1.
Related papers
- DiRL: An Efficient Post-Training Framework for Diffusion Language Models [54.405206032785706]
Diffusion Language Models (dLLMs) have emerged as promising alternatives to Auto-Regressive (AR) models.<n>Existing methods suffer from computational inefficiency and objective mismatches between training and inference.<n>We introduce DiRL, an efficient post-training framework that tightly integrates FlexAttention-accelerated blockwise training with LMDeploy-optimized inference.
arXiv Detail & Related papers (2025-12-23T08:33:19Z) - Human-in-the-loop Online Rejection Sampling for Robotic Manipulation [55.99788088622936]
Hi-ORS stabilizes value estimation by filtering out negatively rewarded samples during online fine-tuning.<n>Hi-ORS fine-tunes a pi-base policy to master contact-rich manipulation in just 1.5 hours of real-world training.
arXiv Detail & Related papers (2025-10-30T11:53:08Z) - Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation [65.3648667980258]
Vision-language model (VLM) based GUI agents show promise for automating complex tasks, but face significant challenges in applying reinforcement learning (RL)<n>We propose DART, a Decoupled Agentic RL Training framework for GUI agents, which coordinates heterogeneous modules in a highly decoupled manner.<n>On the OSWorld benchmark, DART-GUI-7B achieves a 42.13% task success rate, a 14.61% absolute gain over the base model, and 7.34% higher than open-source SOTA.
arXiv Detail & Related papers (2025-09-28T13:19:20Z) - MOORL: A Framework for Integrating Offline-Online Reinforcement Learning [6.7265073544042995]
We propose Meta Offline-Online Reinforcement Learning (MOORL), a hybrid framework that unifies offline and online learning.<n>Our theoretical analysis demonstrates that the hybrid approach enhances exploration by effectively combining the complementary strengths of offline and online data.<n>With minimal computational overhead, MOORL achieves strong performance, underscoring its potential for practical applications in real-world scenarios.
arXiv Detail & Related papers (2025-06-11T10:12:50Z) - DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning [61.10299147201369]
This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents.
We build a scalable and parallelizable Android learning environment equipped with a VLM-based evaluator.
We demonstrate the effectiveness of DigiRL using the Android-in-the-Wild dataset, where our 1.3B VLM trained with RL achieves a 49.5% absolute improvement.
arXiv Detail & Related papers (2024-06-14T17:49:55Z) - Offline Trajectory Optimization for Offline Reinforcement Learning [42.306438854850434]
offline reinforcement learning aims to learn policies without online explorations.<n>Existing data augmentation methods for offline RL suffer from (i) trivial improvement from short-horizon simulation.<n>We propose offline trajectory optimization for offline reinforcement learning (OTTO)
arXiv Detail & Related papers (2024-04-16T08:48:46Z) - ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL.
By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance.
Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z) - Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning [93.99377042564919]
This paper tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages.
The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the "test bed" for offline policies.
We introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces.
arXiv Detail & Related papers (2023-05-24T15:45:35Z) - Using Offline Data to Speed Up Reinforcement Learning in Procedurally Generated Environments [16.62777710035937]
We study whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments.<n>We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data.
arXiv Detail & Related papers (2023-04-18T16:23:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.