UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents
- URL: http://arxiv.org/abs/2602.05832v1
- Date: Thu, 05 Feb 2026 16:21:43 GMT
- Title: UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents
- Authors: Han Xiao, Guozhi Wang, Hao Wang, Shilong Liu, Yuxiang Chai, Yue Pan, Yufeng Zhou, Xiaoxin Chen, Yafei Wen, Hongsheng Li,
- Abstract summary: Online Reinforcement Learning (RL) offers a promising paradigm for enhancing GUI agents through direct environment interaction.<n>We propose UI-Mem, a novel framework that enhances GUI online RL with a Hierarchical Experience Memory.<n>We show that UI-Mem significantly outperforms traditional RL baselines and static reuse strategies.
- Score: 50.053654092780825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online Reinforcement Learning (RL) offers a promising paradigm for enhancing GUI agents through direct environment interaction. However, its effectiveness is severely hindered by inefficient credit assignment in long-horizon tasks and repetitive errors across tasks due to the lack of experience transfer. To address these challenges, we propose UI-Mem, a novel framework that enhances GUI online RL with a Hierarchical Experience Memory. Unlike traditional replay buffers, our memory accumulates structured knowledge, including high-level workflows, subtask skills, and failure patterns. These experiences are stored as parameterized templates that enable cross-task and cross-application transfer. To effectively integrate memory guidance into online RL, we introduce Stratified Group Sampling, which injects varying levels of guidance across trajectories within each rollout group to maintain outcome diversity, driving the unguided policy toward internalizing guided behaviors. Furthermore, a Self-Evolving Loop continuously abstracts novel strategies and errors to keep the memory aligned with the agent's evolving policy. Experiments on online GUI benchmarks demonstrate that UI-Mem significantly outperforms traditional RL baselines and static reuse strategies, with strong generalization to unseen applications. Project page: https://ui-mem.github.io
Related papers
- SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning [83.98129545309277]
We propose SkillRL, a framework that bridges the gap between raw experience and policy improvement.<n>Our approach introduces an experience-based distillation mechanism to build a hierarchical skill library SkillBank.<n> Experimental results on ALF, WebShop and seven search-augmented tasks demonstrate that SkillRL achieves state-of-the-art performance.
arXiv Detail & Related papers (2026-02-09T03:17:17Z) - Darwinian Memory: A Training-Free Self-Regulating Memory System for GUI Agent Evolution [18.68532215387754]
Multimodal Large Language Model (MLLM) agents facilitate Graphical User Interface (GUI) automation but struggle with long-horizon, cross-application tasks.<n>Existing paradigms struggle to adapt to dynamic GUI environments, suffering from a mismatch between high-level intent and low-level execution.<n>We propose the Darwinian Memory System (DMS), a self-evolving architecture that constructs memory as a dynamic ecosystem governed by the law of survival of the fittest.
arXiv Detail & Related papers (2026-01-30T04:01:21Z) - Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory [89.65731902036669]
Evo-Memory is a streaming benchmark and framework for evaluating self-evolving memory in large language model (LLM) agents.<n>We evaluate over ten representative memory modules and evaluate them across 10 diverse multi-turn goal-oriented and single-turn reasoning and QA datasets.
arXiv Detail & Related papers (2025-11-25T21:08:07Z) - SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models [0.0]
Understanding long-context visual information remains a fundamental challenge for vision-language models.<n>We propose SCoPE VLM, a document navigation expert that leverages a novel Chain of Scroll mechanism.<n>SCoPE VLM is the first framework to explicitly model agentic reading patterns in multi-page document question answering.
arXiv Detail & Related papers (2025-10-22T17:47:12Z) - Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning [53.72709564555407]
Memo is a transformer-based architecture and training recipe for reinforcement learning.<n>It incorporates the creation and retrieval of memory by interleaving periodic summarization tokens with the inputs of a model during training.<n>We demonstrate Memo's effectiveness on a gridworld meta-RL benchmark and a multi-object navigation task in photo-realistic indoor settings.
arXiv Detail & Related papers (2025-10-22T16:24:47Z) - LLM-Driven Policy Diffusion: Enhancing Generalization in Offline Reinforcement Learning [23.628360655654507]
Reinforcement Learning (RL) is known for its strong decision-making capabilities and has been widely applied in various real-world scenarios.<n>Due to the limitations of offline data, RL agents often struggle to generalize to new tasks or environments.<n>We propose LLM-Driven Policy Diffusion (LLMDPD), a novel approach that enhances generalization in offline RL using task-specific prompts.
arXiv Detail & Related papers (2025-08-30T04:02:33Z) - MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment [63.62778707277929]
MobileGUI-RL is a scalable framework that trains GUI agent in online environment.<n>It synthesizes a curriculum of learnable tasks through self-exploration and filtering.<n>It adapts GRPO to GUI navigation with trajectory-aware advantages and composite rewards.
arXiv Detail & Related papers (2025-07-08T07:07:53Z) - Reinforcement Learning for Dynamic Memory Allocation [0.09960699557848594]
We present a framework in which an RL agent continuously learns from interactions with the system to improve memory management tactics.<n>Our results show that RL can successfully train agents that can match and surpass traditional allocation strategies.<n>We also explore the potential of history-aware policies that leverage previous allocation requests to enhance the allocator's ability to handle complex request patterns.
arXiv Detail & Related papers (2024-10-20T20:13:46Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.