Sample-efficient Imitative Multi-token Decision Transformer for Real-world Driving
- URL: http://arxiv.org/abs/2407.02508v2
- Date: Fri, 04 Oct 2024 03:45:21 GMT
- Title: Sample-efficient Imitative Multi-token Decision Transformer for Real-world Driving
- Authors: Hang Zhou, Dan Xu, Yiding Ji,
- Abstract summary: We propose Sample-efficient Imitative Multi-token Decision Transformer (SimDT)
SimDT introduces multi-token prediction, online imitative learning pipeline and prioritized experience replay to sequence-modelling reinforcement learning.
Results exceed popular imitation and reinforcement learning algorithms both in open-loop and closed-loop settings on Waymax benchmark.
- Score: 18.34685506480288
- License:
- Abstract: Recent advancements in autonomous driving technologies involve the capability to effectively process and learn from extensive real-world driving data. Current imitation learning and offline reinforcement learning methods have shown remarkable promise in autonomous systems, harnessing the power of offline datasets to make informed decisions in open-loop (non-reactive agents) settings. However, learning-based agents face significant challenges when transferring knowledge from open-loop to closed-loop (reactive agents) environment. The performance is significantly impacted by data distribution shift, sample efficiency, the complexity of uncovering hidden world models and physics. To address these issues, we propose Sample-efficient Imitative Multi-token Decision Transformer (SimDT). SimDT introduces multi-token prediction, online imitative learning pipeline and prioritized experience replay to sequence-modelling reinforcement learning. The performance is evaluated through empirical experiments and results exceed popular imitation and reinforcement learning algorithms both in open-loop and closed-loop settings on Waymax benchmark. SimDT exhibits 41% reduction in collision rate and 18% improvement in reaching the destination compared with the baseline method.
Related papers
- Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous
Manipulation [61.7171775202833]
We introduce an efficient system for learning dexterous manipulation skills withReinforcement learning.
The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping.
Our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy.
arXiv Detail & Related papers (2023-09-06T19:05:31Z) - Benchmarking Offline Reinforcement Learning on Real-Robot Hardware [35.29390454207064]
Dexterous manipulation in particular remains an open problem in its general form.
We propose a benchmark including a large collection of data for offline learning from a dexterous manipulation platform on two tasks.
We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.
arXiv Detail & Related papers (2023-07-28T17:29:49Z) - Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems.
We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action.
We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z) - Offline Robot Reinforcement Learning with Uncertainty-Guided Human
Expert Sampling [11.751910133386254]
Recent advances in batch (offline) reinforcement learning have shown promising results in learning from available offline data.
We propose a novel approach that uses uncertainty estimation to trigger the injection of human demonstration data.
Our experiments show that this approach is more sample efficient when compared to a naive way of combining expert data with data collected from a sub-optimal agent.
arXiv Detail & Related papers (2022-12-16T01:41:59Z) - UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning
Leveraging Planning [1.1339580074756188]
Offline reinforcement learning (RL) provides a framework for learning decision-making from offline data.
Self-driving vehicles (SDV) learn a policy, which potentially even outperforms the behavior in the sub-optimal data set.
This motivates the use of model-based offline RL approaches, which leverage planning.
arXiv Detail & Related papers (2021-11-22T10:37:52Z) - Efficient Robotic Manipulation Through Offline-to-Online Reinforcement
Learning and Goal-Aware State Information [5.604859261995801]
We propose a unified offline-to-online RL framework that resolves the transition performance drop issue.
We introduce goal-aware state information to the RL agent, which can greatly reduce task complexity and accelerate policy learning.
Our framework achieves great training efficiency and performance compared with the state-of-the-art methods in multiple robotic manipulation tasks.
arXiv Detail & Related papers (2021-10-21T05:34:25Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z) - Multiplicative Controller Fusion: Leveraging Algorithmic Priors for
Sample-efficient Reinforcement Learning and Safe Sim-To-Real Transfer [18.50206483493784]
We present a novel approach to model-free reinforcement learning that can leverage existing sub-optimal solutions.
During training, our gated fusion approach enables the prior to guide the initial stages of exploration.
We show the efficacy of our Multiplicative Controller Fusion approach on the task of robot navigation.
arXiv Detail & Related papers (2020-03-11T05:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.