Evaluating Interpretable Reinforcement Learning by Distilling Policies into Programs
- URL: http://arxiv.org/abs/2503.08322v1
- Date: Tue, 11 Mar 2025 11:34:06 GMT
- Title: Evaluating Interpretable Reinforcement Learning by Distilling Policies into Programs
- Authors: Hector Kohler, Quentin Delfosse, Waris Radji, Riad Akrour, Philippe Preux,
- Abstract summary: We tackle the problem of empirically evaluating policies interpretability without humans.<n>Despite this lack of clear definition, researchers agree on the notions of ''simulatability''<n>This new methodology relies on proxies for simulatability that we use to conduct a large-scale empirical evaluation of policy interpretability.
- Score: 8.851129384632994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There exist applications of reinforcement learning like medicine where policies need to be ''interpretable'' by humans. User studies have shown that some policy classes might be more interpretable than others. However, it is costly to conduct human studies of policy interpretability. Furthermore, there is no clear definition of policy interpretabiliy, i.e., no clear metrics for interpretability and thus claims depend on the chosen definition. We tackle the problem of empirically evaluating policies interpretability without humans. Despite this lack of clear definition, researchers agree on the notions of ''simulatability'': policy interpretability should relate to how humans understand policy actions given states. To advance research in interpretable reinforcement learning, we contribute a new methodology to evaluate policy interpretability. This new methodology relies on proxies for simulatability that we use to conduct a large-scale empirical evaluation of policy interpretability. We use imitation learning to compute baseline policies by distilling expert neural networks into small programs. We then show that using our methodology to evaluate the baselines interpretability leads to similar conclusions as user studies. We show that increasing interpretability does not necessarily reduce performances and can sometimes increase them. We also show that there is no policy class that better trades off interpretability and performance across tasks making it necessary for researcher to have methodologies for comparing policies interpretability.
Related papers
- Efficient and Sharp Off-Policy Learning under Unobserved Confounding [25.068617118126824]
We develop a novel method for personalized off-policy learning in scenarios with unobserved confounding.<n>Our method is highly relevant for decision-making where unobserved confounding can be problematic.
arXiv Detail & Related papers (2025-02-18T16:42:24Z) - Counterfactual Learning with General Data-generating Policies [3.441021278275805]
We develop an OPE method for a class of full support and deficient support logging policies in contextual-bandit settings.
We prove that our method's prediction converges in probability to the true performance of a counterfactual policy as the sample size increases.
arXiv Detail & Related papers (2022-12-04T21:07:46Z) - Policy Regularization for Legible Behavior [0.0]
In Reinforcement Learning interpretability generally means to provide insight into the agent's mechanisms.
This paper borrows from the Explainable Planning literature methods that focus on the legibility of the agent.
In our formulation, the decision boundary introduced by legibility impacts the states in which the agent's policy returns an action that has high likelihood also in other policies.
arXiv Detail & Related papers (2022-03-08T10:55:46Z) - Interpretable Off-Policy Learning via Hyperbox Search [20.83151214072516]
We propose an algorithm for interpretable off-policy learning via hyperbox search.
Our policies can be represented in disjunctive normal form (i.e., OR-of-ANDs) and are thus intelligible.
We demonstrate that our algorithm outperforms state-of-the-art methods from interpretable off-policy learning in terms of regret.
arXiv Detail & Related papers (2022-03-04T18:10:24Z) - Evaluation of post-hoc interpretability methods in time-series classification [0.6249768559720122]
We propose a framework with quantitative metrics to assess the performance of existing post-hoc interpretability methods.
We show that several drawbacks identified in the literature are addressed, namely dependence on human judgement, retraining, and shift in the data distribution when occluding samples.
The proposed methodology and quantitative metrics can be used to understand the reliability of interpretability methods results obtained in practical applications.
arXiv Detail & Related papers (2022-02-11T14:55:56Z) - Sayer: Using Implicit Feedback to Optimize System Policies [63.992191765269396]
We develop a methodology that leverages implicit feedback to evaluate and train new system policies.
Sayer builds on two ideas from reinforcement learning to leverage data collected by an existing policy.
We show that Sayer can evaluate arbitrary policies accurately, and train new policies that outperform the production policies.
arXiv Detail & Related papers (2021-10-28T04:16:56Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - Are Interpretations Fairly Evaluated? A Definition Driven Pipeline for
Post-Hoc Interpretability [54.85658598523915]
We propose to have a concrete definition of interpretation before we could evaluate faithfulness of an interpretation.
We find that although interpretation methods perform differently under a certain evaluation metric, such a difference may not result from interpretation quality or faithfulness.
arXiv Detail & Related papers (2020-09-16T06:38:03Z) - Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous.
In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist.
We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z) - Policy Evaluation Networks [50.53250641051648]
We introduce a scalable, differentiable fingerprinting mechanism that retains essential policy information in a concise embedding.
Our empirical results demonstrate that combining these three elements can produce policies that outperform those that generated the training data.
arXiv Detail & Related papers (2020-02-26T23:00:27Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.