Rethinking Closed-loop Training for Autonomous Driving
- URL: http://arxiv.org/abs/2306.15713v1
- Date: Tue, 27 Jun 2023 17:58:39 GMT
- Title: Rethinking Closed-loop Training for Autonomous Driving
- Authors: Chris Zhang, Runsheng Guo, Wenyuan Zeng, Yuwen Xiong, Binbin Dai, Rui
Hu, Mengye Ren, Raquel Urtasun
- Abstract summary: We present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents.
We propose trajectory value learning (TRAVL), an RL-based driving agent that performs planning with multistep look-ahead.
Our experiments show that TRAVL can learn much faster and produce safer maneuvers compared to all the baselines.
- Score: 82.61418945804544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in high-fidelity simulators have enabled closed-loop training
of autonomous driving agents, potentially solving the distribution shift in
training v.s. deployment and allowing training to be scaled both safely and
cheaply. However, there is a lack of understanding of how to build effective
training benchmarks for closed-loop training. In this work, we present the
first empirical study which analyzes the effects of different training
benchmark designs on the success of learning agents, such as how to design
traffic scenarios and scale training environments. Furthermore, we show that
many popular RL algorithms cannot achieve satisfactory performance in the
context of autonomous driving, as they lack long-term planning and take an
extremely long time to train. To address these issues, we propose trajectory
value learning (TRAVL), an RL-based driving agent that performs planning with
multistep look-ahead and exploits cheaply generated imagined data for efficient
learning. Our experiments show that TRAVL can learn much faster and produce
safer maneuvers compared to all the baselines. For more information, visit the
project website: https://waabi.ai/research/travl
Related papers
- Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Boosting Offline Reinforcement Learning for Autonomous Driving with
Hierarchical Latent Skills [37.31853034449015]
We present a skill-based framework that enhances offline RL to overcome the long-horizon vehicle planning challenge.
Specifically, we design a variational autoencoder (VAE) to learn skills from offline demonstrations.
To mitigate posterior collapse of common VAEs, we introduce a two-branch sequence encoder to capture both discrete options and continuous variations of the complex driving skills.
arXiv Detail & Related papers (2023-09-24T11:51:17Z) - FastRLAP: A System for Learning High-Speed Driving via Deep RL and
Autonomous Practicing [71.76084256567599]
We present a system that enables an autonomous small-scale RC car to drive aggressively from visual observations using reinforcement learning (RL)
Our system, FastRLAP (faster lap), trains autonomously in the real world, without human interventions, and without requiring any simulation or expert demonstrations.
The resulting policies exhibit emergent aggressive driving skills, such as timing braking and acceleration around turns and avoiding areas which impede the robot's motion, approaching the performance of a human driver using a similar first-person interface over the course of training.
arXiv Detail & Related papers (2023-04-19T17:33:47Z) - Optimizing Trajectories for Highway Driving with Offline Reinforcement
Learning [11.970409518725491]
We propose a Reinforcement Learning-based approach to autonomous driving.
We compare the performance of our agent against four other highway driving agents.
We demonstrate that our offline trained agent, with randomly collected data, learns to drive smoothly, achieving as close as possible to the desired velocity, while outperforming the other agents.
arXiv Detail & Related papers (2022-03-21T13:13:08Z) - RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.
In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive.
They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z) - Generative Adversarial Imitation Learning for End-to-End Autonomous
Driving on Urban Environments [0.8122270502556374]
Generative Adversarial Imitation Learning (GAIL) can train policies without explicitly requiring to define a reward function.
We show that both of them are capable of imitating the expert trajectory from start to end after training ends.
arXiv Detail & Related papers (2021-10-16T15:04:13Z) - Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement
Learning [13.699336307578488]
Deep imitative reinforcement learning approach (DIRL) achieves agile autonomous racing using visual inputs.
We validate our algorithm both in a high-fidelity driving simulation and on a real-world 1/20-scale RC-car with limited onboard computation.
arXiv Detail & Related papers (2021-07-18T00:00:48Z) - On the Theory of Reinforcement Learning with Once-per-Episode Feedback [120.5537226120512]
We introduce a theory of reinforcement learning in which the learner receives feedback only once at the end of an episode.
This is arguably more representative of real-world applications than the traditional requirement that the learner receive feedback at every time step.
arXiv Detail & Related papers (2021-05-29T19:48:51Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z) - Accelerating Reinforcement Learning for Reaching using Continuous
Curriculum Learning [6.703429330486276]
We focus on accelerating reinforcement learning (RL) training and improving the performance of multi-goal reaching tasks.
Specifically, we propose a precision-based continuous curriculum learning (PCCL) method in which the requirements are gradually adjusted during the training process.
This approach is tested using a Universal Robot 5e in both simulation and real-world multi-goal reach experiments.
arXiv Detail & Related papers (2020-02-07T10:08:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.