Related papers: Lane-Merging Using Policy-based Reinforcement Learning and Post-Optimization

Lane-Merging Using Policy-based Reinforcement Learning and Post-Optimization

URL: http://arxiv.org/abs/2003.03168v1
Date: Fri, 6 Mar 2020 12:57:25 GMT
Title: Lane-Merging Using Policy-based Reinforcement Learning and Post-Optimization
Authors: Patrick Hart, Leonard Rychly, Alois Knol
Abstract summary: We combine policy-based reinforcement learning with local optimization to foster and synthesize the best of the two methodologies. We evaluate the proposed method using lane-change scenarios with a varying number of vehicles.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many current behavior generation methods struggle to handle real-world traffic situations as they do not scale well with complexity. However, behaviors can be learned off-line using data-driven approaches. Especially, reinforcement learning is promising as it implicitly learns how to behave utilizing collected experiences. In this work, we combine policy-based reinforcement learning with local optimization to foster and synthesize the best of the two methodologies. The policy-based reinforcement learning algorithm provides an initial solution and guiding reference for the post-optimization. Therefore, the optimizer only has to compute a single homotopy class, e.g.\ drive behind or in front of the other vehicle. By storing the state-history during reinforcement learning, it can be used for constraint checking and the optimizer can account for interactions. The post-optimization additionally acts as a safety-layer and the novel method, thus, can be applied in safety-critical applications. We evaluate the proposed method using lane-change scenarios with a varying number of vehicles.

Related papers

Rethinking Optimal Transport in Offline Reinforcement Learning [64.56896902186126]
In offline reinforcement learning, the data is provided by various experts and some of them can be sub-optimal. To extract an efficient policy, it is necessary to emphstitch the best behaviors from the dataset. We present an algorithm that aims to find a policy that maps states to a emphpartial distribution of the best expert actions for each given state.
arXiv Detail & Related papers (2024-10-17T22:36:43Z)
Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement [1.1510009152620668]
Current methods for constructive neural optimization usually train a policy using behavior cloning from expert solutions or policy gradient methods from reinforcement learning. We bridge the two by sampling multiple solutions for random instances using the current model in each epoch and then selecting the best solution as an expert trajectory for supervised imitation learning. We evaluate our approach on the Traveling Salesman Problem and the Capacitated Vehicle Routing Problem. The models trained with our method achieve comparable performance and generalization to those trained with expert data.
arXiv Detail & Related papers (2024-03-22T13:09:10Z)
Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control [75.28441662678394]
Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages. We propose several improvements on top of these approaches to learn global control policies quicker.
arXiv Detail & Related papers (2022-09-19T13:32:09Z)
Adaptive Decision Making at the Intersection for Autonomous Vehicles Based on Skill Discovery [13.134487965031667]
In urban environments, the complex and uncertain intersection scenarios are challenging for autonomous driving. To ensure safety, it is crucial to develop an adaptive decision making system that can handle the interaction with other vehicles. We propose a hierarchical framework that can autonomously accumulate and reuse knowledge.
arXiv Detail & Related papers (2022-07-24T11:56:45Z)
A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy. Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z)
Learning Interactive Driving Policies via Data-driven Simulation [125.97811179463542]
Data-driven simulators promise high data-efficiency for driving policy learning. Small underlying datasets often lack interesting and challenging edge cases for learning interactive driving. We propose a simulation method that uses in-painted ado vehicles for learning robust driving policies.
arXiv Detail & Related papers (2021-11-23T20:14:02Z)
Learning Interaction-aware Guidance Policies for Motion Planning in Dense Traffic Scenarios [8.484564880157148]
This paper presents a novel framework for interaction-aware motion planning in dense traffic scenarios. We propose to learn, via deep Reinforcement Learning (RL), an interaction-aware policy providing global guidance about the cooperativeness of other vehicles. The learned policy can reason and guide the local optimization-based planner with interactive behavior to pro-actively merge in dense traffic while remaining safe in case the other vehicles do not yield.
arXiv Detail & Related papers (2021-07-09T16:43:12Z)
Non-Stationary Off-Policy Optimization [50.41335279896062]
We study the novel problem of off-policy optimization in piecewise-stationary contextual bandits. In the offline learning phase, we partition logged data into categorical latent states and learn a near-optimal sub-policy for each state. In the online deployment phase, we adaptively switch between the learned sub-policies based on their performance.
arXiv Detail & Related papers (2020-06-15T09:16:09Z)
Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data. Can we learn effective policies via supervised learning without demonstrations? We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.