Related papers: SoftCTRL: Soft conservative KL-control of Transformer Reinforcement Learning for Autonomous Driving

SoftCTRL: Soft conservative KL-control of Transformer Reinforcement Learning for Autonomous Driving

URL: http://arxiv.org/abs/2410.22752v1
Date: Wed, 30 Oct 2024 07:18:00 GMT
Title: SoftCTRL: Soft conservative KL-control of Transformer Reinforcement Learning for Autonomous Driving
Authors: Minh Tri Huynh, Duc Dung Nguyen,
Abstract summary: We introduce a method that combines IL with Reinforcement learning (RL) using an implicit entropy-KL control that offers a simple way to reduce the over-conservation characteristic. In particular, we validate different challenging simulated urban scenarios from the unseen dataset, indicating that although IL can perform well in imitation tasks, our proposed method significantly improves robustness (over 17% reduction in failures) and generates human-like driving behavior.
Score: 0.6906005491572401
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, motion planning for urban self-driving cars (SDV) has become a popular problem due to its complex interaction of road components. To tackle this, many methods have relied on large-scale, human-sampled data processed through Imitation learning (IL). Although effective, IL alone cannot adequately handle safety and reliability concerns. Combining IL with Reinforcement learning (RL) by adding KL divergence between RL and IL policy to the RL loss can alleviate IL's weakness but suffer from over-conservation caused by covariate shift of IL. To address this limitation, we introduce a method that combines IL with RL using an implicit entropy-KL control that offers a simple way to reduce the over-conservation characteristic. In particular, we validate different challenging simulated urban scenarios from the unseen dataset, indicating that although IL can perform well in imitation tasks, our proposed method significantly improves robustness (over 17\% reduction in failures) and generates human-like driving behavior.

Related papers

RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies [30.632104005565832]
Rollouts as Demonstrations (RoaD) is a method to mitigate covariate shift when training autonomous driving policies in closed loop.<n>During rollout generation, RoaD incorporates expert guidance to bias trajectories toward high-quality behavior, producing informative yet realistic demonstrations for fine-tuning.<n>We demonstrate the effectiveness of RoaD on WOSAC, a large-scale traffic simulation benchmark, where it performs similar or better than the prior CL-SFT method.
arXiv Detail & Related papers (2025-12-01T18:52:03Z)
ROAD: Responsibility-Oriented Reward Design for Reinforcement Learning in Autonomous Driving [6.713954449470747]
This study introduces a responsibility-oriented reward function that explicitly incorporates traffic regulations into theReinforcement learning framework.<n>We introduce a Traffic Regulation Knowledge Graph and leveraged Vision-Language Models alongside Retrieval-Augmented Generation techniques to automate reward assignment.
arXiv Detail & Related papers (2025-05-30T08:00:51Z)
TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning [61.33599727106222]
TeLL-Drive is a hybrid framework that integrates a Teacher LLM to guide an attention-based Student DRL policy. A self-attention mechanism then fuses these strategies with the DRL agent's exploration, accelerating policy convergence and boosting robustness.
arXiv Detail & Related papers (2025-02-03T14:22:03Z)
CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving [45.05135725542318]
IMitation and Reinforcement Learning (CIMRL) approach enables training driving policies in simulation through leveraging imitative motion priors and safety constraints. By combining RL and imitation, we demonstrate our method achieves state-of-the-art results in closed loop simulation and real world driving benchmarks.
arXiv Detail & Related papers (2024-06-13T07:31:29Z)
RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes [57.319845580050924]
We propose a reinforcement learning framework that combines risk-sensitive control with an adaptive action space curriculum. We show that our algorithm is capable of learning high-speed policies for a real-world off-road driving task.
arXiv Detail & Related papers (2024-05-07T23:32:36Z)
Efficient Off-Policy Safe Reinforcement Learning Using Trust Region Conditional Value at Risk [16.176812250762666]
An on-policy safe RL method, called TRC, deals with a CVaR-constrained RL problem using a trust region method. To achieve outstanding performance in complex environments and satisfy safety constraints quickly, RL methods are required to be sample efficient. We propose novel surrogate functions, in which the effect of the distributional shift can be reduced, and introduce an adaptive trust-region constraint to ensure a policy not to deviate far from replay buffers.
arXiv Detail & Related papers (2023-12-01T04:29:19Z)
Towards Safe Autonomous Driving Policies using a Neuro-Symbolic Deep Reinforcement Learning Approach [6.961253535504979]
This paper introduces a novel neuro-symbolic model-free DRL approach, called DRL with Symbolic Logics (DRLSL) It combines the strengths of DRL (learning from experience) and symbolic first-order logics (knowledge-driven reasoning) to enable safe learning in real-time interactions of autonomous driving within real environments. We have implemented the DRLSL framework in autonomous driving using the highD dataset and demonstrated that our method successfully avoids unsafe actions during both the training and testing phases.
arXiv Detail & Related papers (2023-07-03T19:43:21Z)
A Multiplicative Value Function for Safe and Efficient Reinforcement Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic. The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns. We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z)
Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios [147.16925581385576]
We show how imitation learning combined with reinforcement learning can substantially improve the safety and reliability of driving policies. We train a policy on over 100k miles of urban driving data, and measure its effectiveness in test scenarios grouped by different levels of collision likelihood.
arXiv Detail & Related papers (2022-12-21T23:59:33Z)
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing [72.8062448549897]
offline reinforcement learning can exploit the massive amount of offline data for complex decision-making tasks. Current offline RL algorithms are generally designed to be conservative for value estimation and action selection. We propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.
arXiv Detail & Related papers (2022-06-06T18:07:41Z)
Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC) Our algorithm alleviates problems with local minima through a smooth critic function. We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z)
Model-based Safe Reinforcement Learning using Generalized Control Barrier Function [6.556257209888797]
This paper proposes a model-based feasibility enhancement technique of constrained RL. By using the model information, the policy can be optimized safely without violating actual safety constraints. The proposed method achieves up to four times fewer constraint violations and converges 3.36 times faster than baseline constrained RL approaches.
arXiv Detail & Related papers (2021-03-02T08:17:38Z)
Decision-making at Unsignalized Intersection for Autonomous Vehicles: Left-turn Maneuver with Deep Reinforcement Learning [17.715274169051494]
This work proposes a deep reinforcement learning based left-turn decision-making framework at unsignalized intersection for autonomous vehicles. The presented decision-making strategy could efficaciously reduce the collision rate and improve transport efficiency. This work also reveals that the constructed left-turn control structure has a great potential to be applied in real-time.
arXiv Detail & Related papers (2020-08-14T22:44:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.