Train a Real-world Local Path Planner in One Hour via Partially
Decoupled Reinforcement Learning and Vectorized Diversity
- URL: http://arxiv.org/abs/2305.04180v2
- Date: Wed, 17 Jan 2024 08:37:45 GMT
- Title: Train a Real-world Local Path Planner in One Hour via Partially
Decoupled Reinforcement Learning and Vectorized Diversity
- Authors: Jinghao Xin, Jinwoo Kim, Zhi Li, and Ning Li
- Abstract summary: Deep Reinforcement Learning (DRL) has exhibited efficacy in resolving the Local Path Planning (LPP) problem.
Such application in the real world is immensely limited due to the deficient training efficiency and generalization capability of DRL.
A solution named Color is proposed, which consists of an Actor-Sharer-Learner (ASL) training framework and a mobile robot-oriented simulator Sparrow.
- Score: 8.068886870457561
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Reinforcement Learning (DRL) has exhibited efficacy in resolving the
Local Path Planning (LPP) problem. However, such application in the real world
is immensely limited due to the deficient training efficiency and
generalization capability of DRL. To alleviate these two issues, a solution
named Color is proposed, which consists of an Actor-Sharer-Learner (ASL)
training framework and a mobile robot-oriented simulator Sparrow. Specifically,
the ASL intends to improve the training efficiency of DRL algorithms. It
employs a Vectorized Data Collection (VDC) mode to expedite data acquisition,
decouples the data collection from model optimization by multithreading, and
partially connects the two procedures by harnessing a Time Feedback Mechanism
(TFM) to evade data underuse or overuse. Meanwhile, the Sparrow simulator
utilizes a 2D grid-based world, simplified kinematics, and conversion-free data
flow to achieve a lightweight design. The lightness facilitates vectorized
diversity, allowing diversified simulation setups across extensive copies of
the vectorized environments, resulting in a notable enhancement in the
generalization capability of the DRL algorithm being trained. Comprehensive
experiments, comprising 57 DRL benchmark environments, 32 simulated and 36
real-world LPP scenarios, have been conducted to corroborate the superiority of
our method in terms of efficiency and generalization. The code and the video of
this paper are accessible at https://github.com/XinJingHao/Color.
Related papers
- DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents [38.0441002097771]
DistRL is a novel framework designed to enhance the efficiency of online RL fine-tuning for mobile device control agents.
On average, DistRL delivers a 3X improvement in training efficiency and enables training data collection 2.4X faster than the leading synchronous multi-machine methods.
arXiv Detail & Related papers (2024-10-18T18:19:56Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic Manipulation [0.7564784873669823]
We propose Multimodal Contrastive Unsupervised Reinforcement Learning (M2CURL)
Our approach employs a novel multimodal self-supervised learning technique that learns efficient representations and contributes to faster convergence of RL algorithms.
We evaluate M2CURL on the Tactile Gym 2 simulator and we show that it significantly enhances the learning efficiency in different manipulation tasks.
arXiv Detail & Related papers (2024-01-30T14:09:35Z) - Transfer of Reinforcement Learning-Based Controllers from Model- to
Hardware-in-the-Loop [1.8218298349840023]
Reinforcement Learning has great potential for autonomously training agents to perform complex control tasks.
To use RL effectively in embedded system function development, the generated agents must be able to handle real-world applications.
This work focuses on accelerating the training process of RL agents by combining Transfer Learning (TL) and X-in-the-Loop (XiL) simulation.
arXiv Detail & Related papers (2023-10-25T09:13:12Z) - Personalized Federated Deep Reinforcement Learning-based Trajectory
Optimization for Multi-UAV Assisted Edge Computing [22.09756306579992]
UAVs can serve as intelligent servers in edge computing environments, optimizing their flight trajectories to maximize communication system throughput.
Deep reinforcement learning (DRL)-based trajectory optimization algorithms may suffer from poor training performance due to intricate terrain features and inadequate training data.
This work proposes a novel solution, namely personalized federated deep reinforcement learning (PF-DRL), for multi-UAV trajectory optimization.
arXiv Detail & Related papers (2023-09-05T12:54:40Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - Dynamic Network-Assisted D2D-Aided Coded Distributed Learning [59.29409589861241]
We propose a novel device-to-device (D2D)-aided coded federated learning method (D2D-CFL) for load balancing across devices.
We derive an optimal compression rate for achieving minimum processing time and establish its connection with the convergence time.
Our proposed method is beneficial for real-time collaborative applications, where the users continuously generate training data.
arXiv Detail & Related papers (2021-11-26T18:44:59Z) - POAR: Efficient Policy Optimization via Online Abstract State
Representation Learning [6.171331561029968]
State Representation Learning (SRL) is proposed to specifically learn to encode task-relevant features from complex sensory data into low-dimensional states.
We introduce a new SRL prior called domain resemblance to leverage expert demonstration to improve SRL interpretations.
We empirically verify POAR to efficiently handle tasks in high dimensions and facilitate training real-life robots directly from scratch.
arXiv Detail & Related papers (2021-09-17T16:52:03Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.