Related papers: Amortized Q-learning with Model-based Action Proposals for Autonomous Driving on Highways

Amortized Q-learning with Model-based Action Proposals for Autonomous Driving on Highways

URL: http://arxiv.org/abs/2012.03234v1
Date: Sun, 6 Dec 2020 11:04:40 GMT
Title: Amortized Q-learning with Model-based Action Proposals for Autonomous Driving on Highways
Authors: Branka Mirchevska, Maria H\"ugle, Gabriel Kalweit, Moritz Werling, Joschka Boedecker
Abstract summary: We introduce a Reinforcement Learning based approach that coupled with a trajectory planner, learns an optimal long-term driving strategy. By online generating locally optimal maneuvers as actions, we balance between the infinite low-level continuous action space and the limited flexibility of a fixed number of predefined standard lane-change actions.
Score: 10.687104237121408
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Well-established optimization-based methods can guarantee an optimal trajectory for a short optimization horizon, typically no longer than a few seconds. As a result, choosing the optimal trajectory for this short horizon may still result in a sub-optimal long-term solution. At the same time, the resulting short-term trajectories allow for effective, comfortable and provable safe maneuvers in a dynamic traffic environment. In this work, we address the question of how to ensure an optimal long-term driving strategy, while keeping the benefits of classical trajectory planning. We introduce a Reinforcement Learning based approach that coupled with a trajectory planner, learns an optimal long-term decision-making strategy for driving on highways. By online generating locally optimal maneuvers as actions, we balance between the infinite low-level continuous action space, and the limited flexibility of a fixed number of predefined standard lane-change actions. We evaluated our method on realistic scenarios in the open-source traffic simulator SUMO and were able to achieve better performance than the 4 benchmark approaches we compared against, including a random action selecting agent, greedy agent, high-level, discrete actions agent and an IDM-based SUMO-controlled agent.

Related papers

World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning [60.100794160682646]
We propose a new learning framework that jointly optimize state prediction and action selection through preference learning. To automatically collect trajectories and stepwise preference data without human annotation, we introduce a tree search mechanism for extensive exploration via trial-and-error. Our method significantly outperforms existing methods and GPT-4o when applied to Qwen2-VL (7B), LLaVA-1.6 (7B), and LLaMA-3.2 (11B)
arXiv Detail & Related papers (2025-03-13T15:49:56Z)
Integrating Higher-Order Dynamics and Roadway-Compliance into Constrained ILQR-based Trajectory Planning for Autonomous Vehicles [3.200238632208686]
Trajectory planning aims to produce a globally optimal route for Autonomous Passenger Vehicles. Existing implementations utilizing the vehicle bicycle kinematic model may not guarantee controllable trajectories. We augment this model by higher-order terms, including the first and second-order derivatives of curvature and longitudinal jerk.
arXiv Detail & Related papers (2023-09-25T22:30:18Z)
Bi-Level Optimization Augmented with Conditional Variational Autoencoder for Autonomous Driving in Dense Traffic [0.9281671380673306]
This paper presents a parameterized bi-level optimization that jointly computes the optimal behavioural decisions and the resulting trajectory. Our approach runs in real-time using a custom GPU-accelerated batch, and a Variational Autoencoder learnt warm-start strategy. Our approach outperforms state-of-the-art model predictive control and RL approaches in terms of collision rate while being competitive in driving efficiency.
arXiv Detail & Related papers (2022-12-05T12:56:42Z)
Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning [11.970409518725491]
We propose a Reinforcement Learning-based approach to autonomous driving. We compare the performance of our agent against four other highway driving agents. We demonstrate that our offline trained agent, with randomly collected data, learns to drive smoothly, achieving as close as possible to the desired velocity, while outperforming the other agents.
arXiv Detail & Related papers (2022-03-21T13:13:08Z)
Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior [135.78858513845233]
STRIVE is a method to automatically generate challenging scenarios that cause a given planner to produce undesirable behavior, like collisions. To maintain scenario plausibility, the key idea is to leverage a learned model of traffic motion in the form of a graph-based conditional VAE. A subsequent optimization is used to find a "solution" to the scenario, ensuring it is useful to improve the given planner.
arXiv Detail & Related papers (2021-12-09T18:03:27Z)
Model-Based Reinforcement Learning via Latent-Space Collocation [110.04005442935828]
We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions. We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
arXiv Detail & Related papers (2021-06-24T17:59:18Z)
Learning Space Partitions for Path Planning [54.475949279050596]
PlaLaM outperforms existing path planning methods in 2D navigation tasks, especially in the presence of difficult-to-escape local optima. These gains transfer to highly multimodal real-world tasks, where we outperform strong baselines in compiler phase ordering by up to 245% and in molecular design by up to 0.4 on properties on a 0-1 scale.
arXiv Detail & Related papers (2021-06-19T18:06:11Z)
An End-to-end Deep Reinforcement Learning Approach for the Long-term Short-term Planning on the Frenet Space [0.0]
This paper presents a novel end-to-end continuous deep reinforcement learning approach towards autonomous cars' decision-making and motion planning. For the first time, we define both states and action spaces on the Frenet space to make the driving behavior less variant to the road curvatures. The algorithm generates continuoustemporal trajectories on the Frenet frame for the feedback controller to track.
arXiv Detail & Related papers (2020-11-26T02:40:07Z)
Path Planning Followed by Kinodynamic Smoothing for Multirotor Aerial Vehicles (MAVs) [61.94975011711275]
We propose a geometrically based motion planning technique textquotedblleft RRT*textquotedblright; for this purpose. In the proposed technique, we modified original RRT* introducing an adaptive search space and a steering function. We have tested the proposed technique in various simulated environments.
arXiv Detail & Related papers (2020-08-29T09:55:49Z)
Decision-making for Autonomous Vehicles on Highway: Deep Reinforcement Learning with Continuous Action Horizon [14.059728921828938]
This paper utilizes the deep reinforcement learning (DRL) method to address the continuous-horizon decision-making problem on the highway. The running objective of the ego automated vehicle is to execute an efficient and smooth policy without collision. The PPO-DRL-based decision-making strategy is estimated from multiple perspectives, including the optimality, learning efficiency, and adaptability.
arXiv Detail & Related papers (2020-08-26T22:49:27Z)
The Importance of Prior Knowledge in Precise Multimodal Prediction [71.74884391209955]
Roads have well defined geometries, topologies, and traffic rules. In this paper we propose to incorporate structured priors as a loss function. We demonstrate the effectiveness of our approach on real-world self-driving datasets.
arXiv Detail & Related papers (2020-06-04T03:56:11Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.