Clustered Policy Decision Ranking
- URL: http://arxiv.org/abs/2311.12970v2
- Date: Sun, 28 Apr 2024 01:48:57 GMT
- Title: Clustered Policy Decision Ranking
- Authors: Mark Levin, Hana Chockler,
- Abstract summary: In an episode with n time steps, a policy will make n decisions on actions to take, many of which may appear non-intuitive to the observer.
It is not clear which of these decisions directly contribute towards achieving the reward and how significant their contribution is.
We propose a black-box method based on statistical covariance estimation that clusters the states of the environment and ranks each cluster according to the importance of decisions made in its states.
- Score: 6.338178373376447
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Policies trained via reinforcement learning (RL) are often very complex even for simple tasks. In an episode with n time steps, a policy will make n decisions on actions to take, many of which may appear non-intuitive to the observer. Moreover, it is not clear which of these decisions directly contribute towards achieving the reward and how significant their contribution is. Given a trained policy, we propose a black-box method based on statistical covariance estimation that clusters the states of the environment and ranks each cluster according to the importance of decisions made in its states. We compare our measure against a previous statistical fault localization based ranking procedure.
Related papers
- Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Conformal Off-Policy Evaluation in Markov Decision Processes [53.786439742572995]
Reinforcement Learning aims at identifying and evaluating efficient control policies from data.
Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees.
We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty.
arXiv Detail & Related papers (2023-04-05T16:45:11Z) - Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment.
We first adopt a transformer-based method to learn policy embeddings.
Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z) - Generalizing Off-Policy Learning under Sample Selection Bias [15.733136147164032]
We propose a novel framework for learning policies that generalize to the target population.
We prove that, if the uncertainty set is well-specified, our policies generalize to the target population as they can not do worse than on the training data.
arXiv Detail & Related papers (2021-12-02T16:18:16Z) - Causal policy ranking [3.7819322027528113]
Given a trained policy, we propose a black-box method based on counterfactual reasoning that estimates the causal effect that these decisions have on reward attainment.
In this work, we compare our measure against an alternative, non-causal, ranking procedure, and discuss potential future work integrating causal algorithms into the interpretation of RL agent policies.
arXiv Detail & Related papers (2021-11-16T12:33:36Z) - Supervised Off-Policy Ranking [145.3039527243585]
Off-policy evaluation (OPE) leverages data generated by other policies to evaluate a target policy.
We propose supervised off-policy ranking that learns a policy scoring model by correctly ranking training policies with known performance.
Our method outperforms strong baseline OPE methods in terms of both rank correlation and performance gap between the truly best and the best of the ranked top three policies.
arXiv Detail & Related papers (2021-07-03T07:01:23Z) - Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions.
We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions.
We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z) - Offline Policy Selection under Uncertainty [113.57441913299868]
We consider offline policy selection as learning preferences over a set of policy prospects given a fixed experience dataset.
Access to the full distribution over one's belief of the policy value enables more flexible selection algorithms under a wider range of downstream evaluation metrics.
We show how BayesDICE may be used to rank policies with respect to any arbitrary downstream policy selection metric.
arXiv Detail & Related papers (2020-12-12T23:09:21Z) - Robust Batch Policy Learning in Markov Decision Processes [0.0]
We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP)
We propose to evaluate each policy by a set of the average rewards with respect to distributions centered at the policy induced stationary distribution.
arXiv Detail & Related papers (2020-11-09T04:41:21Z) - Ranking Policy Decisions [14.562620527204686]
Policies trained via Reinforcement Learning (RL) are often needlessly complex, making them difficult to analyse and interpret.
We propose a novel black-box method based on statistical fault localisation that ranks the states of the environment according to the importance of decisions made in those states.
arXiv Detail & Related papers (2020-08-31T13:54:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.