DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
- URL: http://arxiv.org/abs/2410.14803v3
- Date: Tue, 12 Nov 2024 14:57:08 GMT
- Title: DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
- Authors: Taiyi Wang, Zhihao Wu, Jianheng Liu, Jianye Hao, Jun Wang, Kun Shao,
- Abstract summary: DistRL is a novel framework designed to enhance the efficiency of online RL fine-tuning for mobile device control agents.
On average, DistRL delivers a 3X improvement in training efficiency and enables training data collection 2.4X faster than the leading synchronous multi-machine methods.
- Score: 38.0441002097771
- License:
- Abstract: On-device control agents, especially on mobile devices, are responsible for operating mobile devices to fulfill users' requests, enabling seamless and intuitive interactions. Integrating Multimodal Large Language Models (MLLMs) into these agents enhances their ability to understand and execute complex commands, thereby improving user experience. However, fine-tuning MLLMs for on-device control presents significant challenges due to limited data availability and inefficient online training processes. This paper introduces DistRL, a novel framework designed to enhance the efficiency of online RL fine-tuning for mobile device control agents. DistRL employs centralized training and decentralized data acquisition to ensure efficient fine-tuning in the context of dynamic online interactions. Additionally, the framework is backed by our tailor-made RL algorithm, which effectively balances exploration with the prioritized utilization of collected data to ensure stable and robust training. Our experiments show that, on average, DistRL delivers a 3X improvement in training efficiency and enables training data collection 2.4X faster than the leading synchronous multi-machine methods. Notably, after training, DistRL achieves a 20% relative improvement in success rate compared to state-of-the-art methods on general Android tasks from an open benchmark, significantly outperforming existing approaches while maintaining the same training time. These results validate DistRL as a scalable and efficient solution, offering substantial improvements in both training efficiency and agent performance for real-world, in-the-wild device control tasks.
Related papers
- DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning [61.10299147201369]
This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents.
We build a scalable and parallelizable Android learning environment equipped with a VLM-based evaluator.
We demonstrate the effectiveness of DigiRL using the Android-in-the-Wild dataset, where our 1.3B VLM trained with RL achieves a 49.5% absolute improvement.
arXiv Detail & Related papers (2024-06-14T17:49:55Z) - M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic Manipulation [0.7564784873669823]
We propose Multimodal Contrastive Unsupervised Reinforcement Learning (M2CURL)
Our approach employs a novel multimodal self-supervised learning technique that learns efficient representations and contributes to faster convergence of RL algorithms.
We evaluate M2CURL on the Tactile Gym 2 simulator and we show that it significantly enhances the learning efficiency in different manipulation tasks.
arXiv Detail & Related papers (2024-01-30T14:09:35Z) - Grow Your Limits: Continuous Improvement with Real-World RL for Robotic
Locomotion [66.69666636971922]
We present APRL, a policy regularization framework that modulates the robot's exploration over the course of training.
APRL enables a quadrupedal robot to efficiently learn to walk entirely in the real world within minutes.
arXiv Detail & Related papers (2023-10-26T17:51:46Z) - Transfer of Reinforcement Learning-Based Controllers from Model- to
Hardware-in-the-Loop [1.8218298349840023]
Reinforcement Learning has great potential for autonomously training agents to perform complex control tasks.
To use RL effectively in embedded system function development, the generated agents must be able to handle real-world applications.
This work focuses on accelerating the training process of RL agents by combining Transfer Learning (TL) and X-in-the-Loop (XiL) simulation.
arXiv Detail & Related papers (2023-10-25T09:13:12Z) - Hybrid Reinforcement Learning for Optimizing Pump Sustainability in
Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs)
Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs.
Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv Detail & Related papers (2023-10-13T21:26:16Z) - Digital Twin Assisted Deep Reinforcement Learning for Online Admission
Control in Sliced Network [19.152875040151976]
We propose a digital twin (DT) accelerated DRL solution to address this issue.
A neural network-based DT is established with a customized output layer for queuing systems, trained through supervised learning, and then employed to assist the training phase of the DRL model.
Extensive simulations show that the DT-accelerated DRL improves resource utilization by over 40% compared to the directly trained state-of-the-art dueling deep Q-learning model.
arXiv Detail & Related papers (2023-10-07T09:09:19Z) - A Real-World Quadrupedal Locomotion Benchmark for Offline Reinforcement
Learning [27.00483962026472]
We benchmark 11 offline reinforcement learning algorithms in realistic quadrupedal locomotion dataset.
Experiments show that the best-performing ORL algorithms can achieve competitive performance compared with the model-free RL.
Our proposed benchmark will serve as a development platform for testing and evaluating the performance of ORL algorithms in real-world legged locomotion tasks.
arXiv Detail & Related papers (2023-09-13T13:18:29Z) - Train a Real-world Local Path Planner in One Hour via Partially
Decoupled Reinforcement Learning and Vectorized Diversity [8.068886870457561]
Deep Reinforcement Learning (DRL) has exhibited efficacy in resolving the Local Path Planning (LPP) problem.
Such application in the real world is immensely limited due to the deficient training efficiency and generalization capability of DRL.
A solution named Color is proposed, which consists of an Actor-Sharer-Learner (ASL) training framework and a mobile robot-oriented simulator Sparrow.
arXiv Detail & Related papers (2023-05-07T03:39:31Z) - DL-DRL: A double-level deep reinforcement learning approach for
large-scale task scheduling of multi-UAV [65.07776277630228]
We propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF)
Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs.
We also exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks.
arXiv Detail & Related papers (2022-08-04T04:35:53Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.