Mobile-R1: Towards Interactive Reinforcement Learning for VLM-Based Mobile Agent via Task-Level Rewards
- URL: http://arxiv.org/abs/2506.20332v2
- Date: Fri, 27 Jun 2025 05:38:24 GMT
- Title: Mobile-R1: Towards Interactive Reinforcement Learning for VLM-Based Mobile Agent via Task-Level Rewards
- Authors: Jihao Gu, Qihang Ai, Yingyao Wang, Pi Bu, Jingxuan Xing, Zekun Zhu, Wei Jiang, Ziming Wang, Yingxiu Zhao, Ming-Liang Zhang, Jun Song, Yuning Jiang, Bo Zheng,
- Abstract summary: We introduce Mobile-R1, which employs interactive multi-turn reinforcement learning with task-level rewards for mobile agents.<n>Our training framework consists of three stages: initial format finetuning, single-step online training via action-level reward, followed by online training via task-level reward based on multi-turn trajectories.<n>This strategy is designed to enhance the exploration and error correction capabilities of Mobile-R1, leading to significant performance improvements.
- Score: 29.467847768352282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-language model-based mobile agents have gained the ability to not only understand complex instructions and mobile screenshots, but also optimize their action outputs via thinking and reasoning, benefiting from reinforcement learning, such as Group Relative Policy Optimization (GRPO). However, existing research centers on offline reinforcement learning training or online optimization using action-level rewards, which limits the agent's dynamic interaction with the environment. This often results in agents settling into local optima, thereby weakening their ability for exploration and error action correction. To address these challenges, we introduce an approach called Mobile-R1, which employs interactive multi-turn reinforcement learning with task-level rewards for mobile agents. Our training framework consists of three stages: initial format finetuning, single-step online training via action-level reward, followed by online training via task-level reward based on multi-turn trajectories. This strategy is designed to enhance the exploration and error correction capabilities of Mobile-R1, leading to significant performance improvements. Moreover, we have collected a dataset covering 28 Chinese applications with 24,521 high-quality manual annotations and established a new benchmark with 500 trajectories. We will open source all resources, including the dataset, benchmark, model weight, and codes: https://mobile-r1.github.io/Mobile-R1/.
Related papers
- From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning [59.88543114325153]
We introduce the Seeing-to-Experiencing framework to scale the capability of navigation foundation models with reinforcement learning.<n>S2E combines the strengths of pre-training on videos and post-training through RL.<n>We establish a comprehensive end-to-end evaluation benchmark, NavBench-GS, built on photorealistic 3DGS reconstructions of real-world scenes.
arXiv Detail & Related papers (2025-07-29T17:26:10Z) - Multi-Agent Reinforcement Learning for Dynamic Mobility Resource Allocation with Hierarchical Adaptive Grouping [2.167718390410225]
We propose a novel multi-agent reinforcement learning named Hierarchical Adaptive Grouping-based Sharing (HAG-PS) for dynamic mobility resource allocation.<n>HAG-PS aims to address two important research challenges regarding multi-agent reinforcement learning for mobility resource allocation.<n>We have performed extensive experimental studies based on real-world NYC bike sharing data, and demonstrated the superior performance of HAG-PS compared with other baseline approaches.
arXiv Detail & Related papers (2025-07-27T18:40:04Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Continuously Improving Mobile Manipulation with Autonomous Real-World RL [33.085671103158866]
We present a fully autonomous real-world RL framework for mobile manipulation that can learn policies without extensive instrumentation or human supervision.
This is enabled by task-relevant autonomy, which guides exploration towards object interactions and prevents stagnation near goal states.
We demonstrate that our approach allows Spot robots to continually improve their performance on a set of four challenging mobile manipulation tasks.
arXiv Detail & Related papers (2024-09-30T17:59:50Z) - PLANRL: A Motion Planning and Imitation Learning Framework to Bootstrap Reinforcement Learning [13.564676246832544]
We introduce PLANRL, a framework that chooses when the robot should use classical motion planning and when it should learn a policy.
PLANRL switches between two modes of operation: reaching a waypoint using classical techniques when away from the objects and fine-grained manipulation control when about to interact with objects.
We evaluate our approach across multiple challenging simulation environments and real-world tasks, demonstrating superior performance in terms of adaptability, efficiency, and generalization compared to existing methods.
arXiv Detail & Related papers (2024-08-07T19:30:08Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for
Mobile Manipulation [99.2543521972137]
ReLMoGen is a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals.
Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments.
ReLMoGen shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.
arXiv Detail & Related papers (2020-08-18T08:05:15Z) - On Reward Shaping for Mobile Robot Navigation: A Reinforcement Learning
and SLAM Based Approach [7.488722678999039]
We present a map-less path planning algorithm based on Deep Reinforcement Learning (DRL) for mobile robots navigating in unknown environment.
The planner is trained using a reward function shaped based on the online knowledge of the map of the training environment.
The policy trained in the simulation environment can be directly and successfully transferred to the real robot.
arXiv Detail & Related papers (2020-02-10T22:00:16Z) - MDLdroid: a ChainSGD-reduce Approach to Mobile Deep Learning for
Personal Mobile Sensing [14.574274428615666]
Running deep learning on devices offers several advantages including data privacy preservation and low-latency response for both model robustness and update.
Personal mobile sensing applications are mostly user-specific and highly affected by environment.
We present MDLdroid, a novel decentralized mobile deep learning framework to enable resource-aware on-device collaborative learning.
arXiv Detail & Related papers (2020-02-07T16:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.