Related papers: Causal policy ranking

Causal policy ranking

URL: http://arxiv.org/abs/2111.08415v1
Date: Tue, 16 Nov 2021 12:33:36 GMT
Title: Causal policy ranking
Authors: Daniel McNamee, Hana Chockler
Abstract summary: Given a trained policy, we propose a black-box method based on counterfactual reasoning that estimates the causal effect that these decisions have on reward attainment. In this work, we compare our measure against an alternative, non-causal, ranking procedure, and discuss potential future work integrating causal algorithms into the interpretation of RL agent policies.
Score: 3.7819322027528113
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Policies trained via reinforcement learning (RL) are often very complex even for simple tasks. In an episode with $n$ time steps, a policy will make $n$ decisions on actions to take, many of which may appear non-intuitive to the observer. Moreover, it is not clear which of these decisions directly contribute towards achieving the reward and how significant is their contribution. Given a trained policy, we propose a black-box method based on counterfactual reasoning that estimates the causal effect that these decisions have on reward attainment and ranks the decisions according to this estimate. In this preliminary work, we compare our measure against an alternative, non-causal, ranking procedure, highlight the benefits of causality-based policy ranking, and discuss potential future work integrating causal algorithms into the interpretation of RL agent policies.

Related papers

Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning [8.182196998385582]
Existing work in bandits has shown that it is possible to identify the best policy by estimating only the difference between the behaviors of individual policies. However, the best-known complexities in RL fail to take advantage of this and instead estimate the behavior of each policy directly. We show that it almost suffices to estimate only the differences in RL: if we can estimate the behavior of a single reference policy, it suffices to only estimate how any other policy deviates from this reference policy.
arXiv Detail & Related papers (2024-06-11T00:02:19Z)
Clustered Policy Decision Ranking [6.338178373376447]
In an episode with n time steps, a policy will make n decisions on actions to take, many of which may appear non-intuitive to the observer. It is not clear which of these decisions directly contribute towards achieving the reward and how significant their contribution is. We propose a black-box method based on statistical covariance estimation that clusters the states of the environment and ranks each cluster according to the importance of decisions made in its states.
arXiv Detail & Related papers (2023-11-21T20:16:02Z)
Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies. Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z)
Offline Reinforcement Learning with On-Policy Q-Function Regularization [57.09073809901382]
We deal with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks.
arXiv Detail & Related papers (2023-07-25T21:38:08Z)
Conformal Off-Policy Evaluation in Markov Decision Processes [53.786439742572995]
Reinforcement Learning aims at identifying and evaluating efficient control policies from data. Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees. We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty.
arXiv Detail & Related papers (2023-04-05T16:45:11Z)
Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment. We first adopt a transformer-based method to learn policy embeddings. Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z)
Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment [0.0]
We develop a robust optimization approach that partially identifies the expected utility of a policy, and then finds an optimal policy. We extend this approach to common and important settings where humans make decisions with the aid of algorithmic recommendations. We derive new classification and recommendation rules that retain the transparency and interpretability of the existing risk assessment instrument.
arXiv Detail & Related papers (2021-09-22T00:52:03Z)
Ranking Policy Decisions [14.562620527204686]
Policies trained via Reinforcement Learning (RL) are often needlessly complex, making them difficult to analyse and interpret. We propose a novel black-box method based on statistical fault localisation that ranks the states of the environment according to the importance of decisions made in those states.
arXiv Detail & Related papers (2020-08-31T13:54:44Z)
Learning "What-if" Explanations for Sequential Decision-Making [92.8311073739295]
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior is essential. We propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to "what if" outcomes. We highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
arXiv Detail & Related papers (2020-07-02T14:24:17Z)
Continuous Action Reinforcement Learning from a Mixture of Interpretable Experts [35.80418547105711]
We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure. The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
arXiv Detail & Related papers (2020-06-10T16:02:08Z)
Reinforcement Learning [36.664136621546575]
Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains. In this chapter, we present the basic framework of RL and recall the two main families of approaches that have been developed to learn a good policy.
arXiv Detail & Related papers (2020-05-29T06:53:29Z)
BRPO: Batch Residual Policy Optimization [79.53696635382592]
In batch reinforcement learning, one often constrains a learned policy to be close to the behavior (data-generating) policy. We propose residual policies, where the allowable deviation of the learned policy is state-action-dependent. We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance.
arXiv Detail & Related papers (2020-02-08T01:59:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.