Counterfactual Policy Evaluation for Decision-Making in Autonomous
Driving
- URL: http://arxiv.org/abs/2003.11919v3
- Date: Thu, 12 Nov 2020 14:30:42 GMT
- Title: Counterfactual Policy Evaluation for Decision-Making in Autonomous
Driving
- Authors: Patrick Hart and Alois Knoll
- Abstract summary: Learning-based approaches, such as reinforcement and imitation learning, are gaining popularity in decision-making for autonomous driving.
In this work, a counterfactual policy evaluation is introduced that makes use of counterfactual worlds.
We show that the proposed approach significantly decreases the collision-rate whilst maintaining a high success-rate.
- Score: 3.1410342959104725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning-based approaches, such as reinforcement and imitation learning are
gaining popularity in decision-making for autonomous driving. However, learned
policies often fail to generalize and cannot handle novel situations well.
Asking and answering questions in the form of "Would a policy perform well if
the other agents had behaved differently?" can shed light on whether a policy
has seen similar situations during training and generalizes well. In this work,
a counterfactual policy evaluation is introduced that makes use of
counterfactual worlds - worlds in which the behaviors of others are non-actual.
If a policy can handle all counterfactual worlds well, it either has seen
similar situations during training or it generalizes well and is deemed to be
fit enough to be executed in the actual world. Additionally, by performing the
counterfactual policy evaluation, causal relations and the influence of
changing vehicle's behaviors on the surrounding vehicles becomes evident. To
validate the proposed method, we learn a policy using reinforcement learning
for a lane merging scenario. In the application-phase, the policy is only
executed after the counterfactual policy evaluation has been performed and if
the policy is found to be safe enough. We show that the proposed approach
significantly decreases the collision-rate whilst maintaining a high
success-rate.
Related papers
- DRARL: Disengagement-Reason-Augmented Reinforcement Learning for Efficient Improvement of Autonomous Driving Policy [24.36567420971839]
disengagement-reason-augmented reinforcement learning (DRARL) enhances driving policy improvement process.<n>The method is evaluated using real-world disengagement cases collected by autonomous driving robotaxi.
arXiv Detail & Related papers (2025-06-20T03:32:01Z) - Robust Driving Policy Learning with Guided Meta Reinforcement Learning [49.860391298275616]
We introduce an efficient method to train diverse driving policies for social vehicles as a single meta-policy.
By randomizing the interaction-based reward functions of social vehicles, we can generate diverse objectives and efficiently train the meta-policy.
We propose a training strategy to enhance the robustness of the ego vehicle's driving policy using the environment where social vehicles are controlled by the learned meta-policy.
arXiv Detail & Related papers (2023-07-19T17:42:36Z) - Conformal Off-Policy Evaluation in Markov Decision Processes [53.786439742572995]
Reinforcement Learning aims at identifying and evaluating efficient control policies from data.
Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees.
We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty.
arXiv Detail & Related papers (2023-04-05T16:45:11Z) - Supervised Off-Policy Ranking [145.3039527243585]
Off-policy evaluation (OPE) leverages data generated by other policies to evaluate a target policy.
We propose supervised off-policy ranking that learns a policy scoring model by correctly ranking training policies with known performance.
Our method outperforms strong baseline OPE methods in terms of both rank correlation and performance gap between the truly best and the best of the ranked top three policies.
arXiv Detail & Related papers (2021-07-03T07:01:23Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - Reinforcement Learning based Control of Imitative Policies for
Near-Accident Driving [41.54021613421446]
In near-accident scenarios, even a minor change in the vehicle's actions may result in drastically different consequences.
We propose a hierarchical reinforcement and imitation learning (H-ReIL) approach that consists of low-level policies learned by IL for discrete driving modes, and a high-level policy learned by RL that switches between different driving modes.
arXiv Detail & Related papers (2020-07-01T01:41:45Z) - Efficient Evaluation of Natural Stochastic Policies in Offline
Reinforcement Learning [80.42316902296832]
We study the efficient off-policy evaluation of natural policies, which are defined in terms of deviations from the behavior policy.
This is a departure from the literature on off-policy evaluation where most work consider the evaluation of explicitly specified policies.
arXiv Detail & Related papers (2020-06-06T15:08:24Z) - Reinforcement Learning [36.664136621546575]
Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains.
In this chapter, we present the basic framework of RL and recall the two main families of approaches that have been developed to learn a good policy.
arXiv Detail & Related papers (2020-05-29T06:53:29Z) - BRPO: Batch Residual Policy Optimization [79.53696635382592]
In batch reinforcement learning, one often constrains a learned policy to be close to the behavior (data-generating) policy.
We propose residual policies, where the allowable deviation of the learned policy is state-action-dependent.
We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance.
arXiv Detail & Related papers (2020-02-08T01:59:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.