Deep Surrogate Q-Learning for Autonomous Driving
- URL: http://arxiv.org/abs/2010.11278v2
- Date: Thu, 17 Feb 2022 18:50:37 GMT
- Title: Deep Surrogate Q-Learning for Autonomous Driving
- Authors: Maria Kalweit, Gabriel Kalweit, Moritz Werling, Joschka Boedecker
- Abstract summary: We propose Surrogate Q-learning for learning lane-change behavior for autonomous driving.
We show that the architecture leads to a novel replay sampling technique we call Scene-centric Experience Replay.
We also show that our methods enhance real-world applicability of RL systems by learning policies on the real highD dataset.
- Score: 17.30342128504405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Challenging problems of deep reinforcement learning systems with regard to
the application on real systems are their adaptivity to changing environments
and their efficiency w.r.t. computational resources and data. In the
application of learning lane-change behavior for autonomous driving, agents
have to deal with a varying number of surrounding vehicles. Furthermore, the
number of required transitions imposes a bottleneck, since test drivers cannot
perform an arbitrary amount of lane changes in the real world. In the
off-policy setting, additional information on solving the task can be gained by
observing actions from others. While in the classical RL setup this knowledge
remains unused, we use other drivers as surrogates to learn the agent's value
function more efficiently. We propose Surrogate Q-learning that deals with the
aforementioned problems and reduces the required driving time drastically. We
further propose an efficient implementation based on a permutation-equivariant
deep neural network architecture of the Q-function to estimate action-values
for a variable number of vehicles in sensor range. We show that the
architecture leads to a novel replay sampling technique we call Scene-centric
Experience Replay and evaluate the performance of Surrogate Q-learning and
Scene-centric Experience Replay in the open traffic simulator SUMO.
Additionally, we show that our methods enhance real-world applicability of RL
systems by learning policies on the real highD dataset.
Related papers
- An Examination of Offline-Trained Encoders in Vision-Based Deep Reinforcement Learning for Autonomous Driving [0.0]
Research investigates the challenges Deep Reinforcement Learning (DRL) faces in Partially Observable Markov Decision Processes (POMDP)
Our research adopts an offline-trained encoder to leverage large video datasets through self-supervised learning to learn generalizable representations.
We show that the features learned by watching BDD100K driving videos can be directly transferred to achieve lane following and collision avoidance in CARLA simulator.
arXiv Detail & Related papers (2024-09-02T14:16:23Z) - Unsupervised Domain Adaptation for Self-Driving from Past Traversal
Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments.
Our approach enhances LiDAR-based detection models using spatial quantized historical features.
Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z) - REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous
Manipulation [61.7171775202833]
We introduce an efficient system for learning dexterous manipulation skills withReinforcement learning.
The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping.
Our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy.
arXiv Detail & Related papers (2023-09-06T19:05:31Z) - Comprehensive Training and Evaluation on Deep Reinforcement Learning for
Automated Driving in Various Simulated Driving Maneuvers [0.4241054493737716]
This study implements, evaluating, and comparing the two DRL algorithms, Deep Q-networks (DQN) and Trust Region Policy Optimization (TRPO)
Models trained on the designed ComplexRoads environment can adapt well to other driving maneuvers with promising overall performance.
arXiv Detail & Related papers (2023-06-20T11:41:01Z) - FastRLAP: A System for Learning High-Speed Driving via Deep RL and
Autonomous Practicing [71.76084256567599]
We present a system that enables an autonomous small-scale RC car to drive aggressively from visual observations using reinforcement learning (RL)
Our system, FastRLAP (faster lap), trains autonomously in the real world, without human interventions, and without requiring any simulation or expert demonstrations.
The resulting policies exhibit emergent aggressive driving skills, such as timing braking and acceleration around turns and avoiding areas which impede the robot's motion, approaching the performance of a human driver using a similar first-person interface over the course of training.
arXiv Detail & Related papers (2023-04-19T17:33:47Z) - Learning energy-efficient driving behaviors by imitating experts [75.12960180185105]
This paper examines the role of imitation learning in bridging the gap between control strategies and realistic limitations in communication and sensing.
We show that imitation learning can succeed in deriving policies that, if adopted by 5% of vehicles, may boost the energy-efficiency of networks with varying traffic conditions by 15% using only local observations.
arXiv Detail & Related papers (2022-06-28T17:08:31Z) - Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills.
Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions.
This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z) - Learning Interactive Driving Policies via Data-driven Simulation [125.97811179463542]
Data-driven simulators promise high data-efficiency for driving policy learning.
Small underlying datasets often lack interesting and challenging edge cases for learning interactive driving.
We propose a simulation method that uses in-painted ado vehicles for learning robust driving policies.
arXiv Detail & Related papers (2021-11-23T20:14:02Z) - Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement
Learning [13.699336307578488]
Deep imitative reinforcement learning approach (DIRL) achieves agile autonomous racing using visual inputs.
We validate our algorithm both in a high-fidelity driving simulation and on a real-world 1/20-scale RC-car with limited onboard computation.
arXiv Detail & Related papers (2021-07-18T00:00:48Z) - Investigating Value of Curriculum Reinforcement Learning in Autonomous
Driving Under Diverse Road and Weather Conditions [0.0]
This paper focuses on investigating the value of curriculum reinforcement learning in autonomous driving applications.
We setup several different driving scenarios in a realistic driving simulator, with varying road complexity and weather conditions.
Results show that curriculum RL can yield significant gains in complex driving tasks, both in terms of driving performance and sample complexity.
arXiv Detail & Related papers (2021-03-14T12:05:05Z) - Hyperparameter Auto-tuning in Self-Supervised Robotic Learning [12.193817049957733]
Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resources.
We propose an auto-tuning technique based on the Evidence Lower Bound (ELBO) for self-supervised reinforcement learning.
Our method can auto-tune online and yields the best performance at a fraction of the time and computational resources.
arXiv Detail & Related papers (2020-10-16T08:58:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.