Counterfactual Explanation Policies in RL
- URL: http://arxiv.org/abs/2307.13192v1
- Date: Tue, 25 Jul 2023 01:14:56 GMT
- Title: Counterfactual Explanation Policies in RL
- Authors: Shripad V. Deshmukh, Srivatsan R, Supriti Vijay, Jayakumar
Subramanian, Chirag Agarwal
- Abstract summary: COUNTERPOL is the first framework to analyze Reinforcement Learning policies using counterfactual explanations.
We establish a theoretical connection between Counterpol and widely used trust region-based policy optimization methods in RL.
- Score: 3.674863913115432
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As Reinforcement Learning (RL) agents are increasingly employed in diverse
decision-making problems using reward preferences, it becomes important to
ensure that policies learned by these frameworks in mapping observations to a
probability distribution of the possible actions are explainable. However,
there is little to no work in the systematic understanding of these complex
policies in a contrastive manner, i.e., what minimal changes to the policy
would improve/worsen its performance to a desired level. In this work, we
present COUNTERPOL, the first framework to analyze RL policies using
counterfactual explanations in the form of minimal changes to the policy that
lead to the desired outcome. We do so by incorporating counterfactuals in
supervised learning in RL with the target outcome regulated using desired
return. We establish a theoretical connection between Counterpol and widely
used trust region-based policy optimization methods in RL. Extensive empirical
analysis shows the efficacy of COUNTERPOL in generating explanations for
(un)learning skills while keeping close to the original policy. Our results on
five different RL environments with diverse state and action spaces demonstrate
the utility of counterfactual explanations, paving the way for new frontiers in
designing and developing counterfactual policies.
Related papers
- Reflective Policy Optimization [20.228281670899204]
Reflective Policy Optimization (RPO) amalgamates past and future state-action information for policy optimization.
RPO empowers the agent for introspection, allowing modifications to its actions within the current state.
Empirical results demonstrate RPO's feasibility and efficacy in two reinforcement learning benchmarks.
arXiv Detail & Related papers (2024-06-06T01:46:49Z) - Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization [55.97310586039358]
Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality.
We propose a novel model-free diffusion-based online RL algorithm, Q-weighted Variational Policy Optimization (QVPO)
Specifically, we introduce the Q-weighted variational loss, which can be proved to be a tight lower bound of the policy objective in online RL under certain conditions.
We also develop an efficient behavior policy to enhance sample efficiency by reducing the variance of the diffusion policy during online interactions.
arXiv Detail & Related papers (2024-05-25T10:45:46Z) - POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy
Decomposition [40.851324484481275]
We study off-policy learning of contextual bandit policies in large discrete action spaces.
We propose a novel two-stage algorithm, called Policy Optimization via Two-Stage Policy Decomposition.
We show that POTEC provides substantial improvements in OPL effectiveness particularly in large and structured action spaces.
arXiv Detail & Related papers (2024-02-09T03:01:13Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Combing Policy Evaluation and Policy Improvement in a Unified
f-Divergence Framework [33.90259939664709]
We study the f-divergence between learning policy and sampling policy and derive a novel DRL framework, termed f-Divergence Reinforcement Learning (FRL)
The FRL framework achieves two advantages: (1) policy evaluation and policy improvement processes are derived simultaneously by f-divergence; (2) overestimation issue of value function are alleviated.
arXiv Detail & Related papers (2021-09-24T10:20:46Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z) - Expert-Supervised Reinforcement Learning for Offline Policy Learning and
Evaluation [21.703965401500913]
We propose an Expert-Supervised RL (ESRL) framework which uses uncertainty quantification for offline policy learning.
In particular, we have three contributions: 1) the method can learn safe and optimal policies through hypothesis testing, 2) ESRL allows for different levels of risk averse implementations tailored to the application context, and 3) we propose a way to interpret ESRL's policy at every state through posterior distributions.
arXiv Detail & Related papers (2020-06-23T17:43:44Z) - Stable Policy Optimization via Off-Policy Divergence Regularization [50.98542111236381]
Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL)
We propose a new algorithm which stabilizes the policy improvement through a proximity term that constrains the discounted state-action visitation distribution induced by consecutive policies to be close to one another.
Our proposed method can have a beneficial effect on stability and improve final performance in benchmark high-dimensional control tasks.
arXiv Detail & Related papers (2020-03-09T13:05:47Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z) - BRPO: Batch Residual Policy Optimization [79.53696635382592]
In batch reinforcement learning, one often constrains a learned policy to be close to the behavior (data-generating) policy.
We propose residual policies, where the allowable deviation of the learned policy is state-action-dependent.
We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance.
arXiv Detail & Related papers (2020-02-08T01:59:33Z) - Population-Guided Parallel Policy Search for Reinforcement Learning [17.360163137926]
A new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL)
In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information.
arXiv Detail & Related papers (2020-01-09T10:13:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.