Related papers: Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning with Application to Autonomous Driving

Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning with Application to Autonomous Driving

URL: http://arxiv.org/abs/2006.13704v1
Date: Mon, 22 Jun 2020 01:41:13 GMT
Title: Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning with Application to Autonomous Driving
Authors: Zheng Wu, Liting Sun, Wei Zhan, Chenyu Yang, Masayoshi Tomizuka
Abstract summary: We present an efficient sampling-based maximum-entropy inverse reinforcement learning (IRL) algorithm in this paper. We evaluate the proposed algorithm on real driving data, including both non-interactive and interactive scenarios.
Score: 35.44498286245894
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the past decades, we have witnessed significant progress in the domain of autonomous driving. Advanced techniques based on optimization and reinforcement learning (RL) become increasingly powerful at solving the forward problem: given designed reward/cost functions, how should we optimize them and obtain driving policies that interact with the environment safely and efficiently. Such progress has raised another equally important question: \emph{what should we optimize}? Instead of manually specifying the reward functions, it is desired that we can extract what human drivers try to optimize from real traffic data and assign that to autonomous vehicles to enable more naturalistic and transparent interaction between humans and intelligent agents. To address this issue, we present an efficient sampling-based maximum-entropy inverse reinforcement learning (IRL) algorithm in this paper. Different from existing IRL algorithms, by introducing an efficient continuous-domain trajectory sampler, the proposed algorithm can directly learn the reward functions in the continuous domain while considering the uncertainties in demonstrated trajectories from human drivers. We evaluate the proposed algorithm on real driving data, including both non-interactive and interactive scenarios. The experimental results show that the proposed algorithm achieves more accurate prediction performance with faster convergence speed and better generalization compared to other baseline IRL algorithms.

Related papers

Rethinking Optimal Transport in Offline Reinforcement Learning [64.56896902186126]
In offline reinforcement learning, the data is provided by various experts and some of them can be sub-optimal. To extract an efficient policy, it is necessary to emphstitch the best behaviors from the dataset. We present an algorithm that aims to find a policy that maps states to a emphpartial distribution of the best expert actions for each given state.
arXiv Detail & Related papers (2024-10-17T22:36:43Z)
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z)
REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world. Recent methods aim to mitigate misalignment by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
Bi-Level Optimization Augmented with Conditional Variational Autoencoder for Autonomous Driving in Dense Traffic [0.9281671380673306]
This paper presents a parameterized bi-level optimization that jointly computes the optimal behavioural decisions and the resulting trajectory. Our approach runs in real-time using a custom GPU-accelerated batch, and a Variational Autoencoder learnt warm-start strategy. Our approach outperforms state-of-the-art model predictive control and RL approaches in terms of collision rate while being competitive in driving efficiency.
arXiv Detail & Related papers (2022-12-05T12:56:42Z)
Fast and computationally efficient generative adversarial network algorithm for unmanned aerial vehicle-based network coverage optimization [1.2853186701496802]
The challenge of dynamic traffic demand in mobile networks is tackled by moving cells based on unmanned aerial vehicles. Considering the tremendous potential of unmanned aerial vehicles in the future, we propose a new algorithm for coverage optimization. The proposed algorithm is implemented based on a conditional generative adversarial neural network, with a unique multilayer sum-pooling loss function.
arXiv Detail & Related papers (2022-03-25T12:13:21Z)
Dynamic Origin-Destination Matrix Estimation in Urban Traffic Networks [0.05735035463793007]
We model the problem as a bi-level optimization problem. In the inner level, given a tentative travel demand, we solve a dynamic traffic assignment problem to decide the routing of the users between their origins and destinations. In the outer level, we adjust the number of trips and their origins and destinations, aiming at minimizing the discrepancy between the counters generated in the inner level and the given vehicle counts measured by sensors in the traffic network.
arXiv Detail & Related papers (2022-01-31T21:33:46Z)
Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone. Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator. We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z)
Model-based Decision Making with Imagination for Autonomous Parking [50.41076449007115]
The proposed algorithm consists of three parts: an imaginative model for anticipating results before parking, an improved rapid-exploring random tree (RRT) and a path smoothing module. Our algorithm is based on a real kinematic vehicle model; which makes it more suitable for algorithm application on real autonomous cars. In order to evaluate the algorithm's effectiveness, we have compared our algorithm with traditional RRT, within three different parking scenarios.
arXiv Detail & Related papers (2021-08-25T18:24:34Z)
Integrated Decision and Control: Towards Interpretable and Efficient Driving Intelligence [13.589285628074542]
We present an interpretable and efficient decision and control framework for automated vehicles. It decomposes the driving task into multi-path planning and optimal tracking that are structured hierarchically. Results show that our method has better online computing efficiency and driving performance including traffic efficiency and safety.
arXiv Detail & Related papers (2021-03-18T14:43:31Z)
Real-world Ride-hailing Vehicle Repositioning using Deep Reinforcement Learning [52.2663102239029]
We present a new practical framework based on deep reinforcement learning and decision-time planning for real-world vehicle on idle-hailing platforms. Our approach learns ride-based state-value function using a batch training algorithm with deep value. We benchmark our algorithm with baselines in a ride-hailing simulation environment to demonstrate its superiority in improving income efficiency.
arXiv Detail & Related papers (2021-03-08T05:34:05Z)
Sample Efficient Interactive End-to-End Deep Learning for Self-Driving Cars with Selective Multi-Class Safe Dataset Aggregation [0.13048920509133805]
End-to-end imitation learning is a popular method for computing self-driving car policies. Standard approach relies on collecting pairs of inputs (camera images) and outputs (steering angle, etc.) from an expert policy and fitting a deep neural network to this data to learn the driving policy.
arXiv Detail & Related papers (2020-07-29T08:38:00Z)
DADA: Differentiable Automatic Data Augmentation [58.560309490774976]
We propose Differentiable Automatic Data Augmentation (DADA) which dramatically reduces the cost. We conduct extensive experiments on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets. Results show our DADA is at least one order of magnitude faster than the state-of-the-art while achieving very comparable accuracy.
arXiv Detail & Related papers (2020-03-08T13:23:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.