Related papers: There is no Accuracy-Interpretability Tradeoff in Reinforcement Learning for Mazes

There is no Accuracy-Interpretability Tradeoff in Reinforcement Learning for Mazes

URL: http://arxiv.org/abs/2206.04266v1
Date: Thu, 9 Jun 2022 04:23:26 GMT
Title: There is no Accuracy-Interpretability Tradeoff in Reinforcement Learning for Mazes
Authors: Yishay Mansour, Michal Moshkovitz, Cynthia Rudin
Abstract summary: Interpretability is an essential building block for trustworthiness in reinforcement learning systems. We show that in certain cases, one can achieve policy interpretability while maintaining its optimality.
Score: 64.05903267230467
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Interpretability is an essential building block for trustworthiness in reinforcement learning systems. However, interpretability might come at the cost of deteriorated performance, leading many researchers to build complex models. Our goal is to analyze the cost of interpretability. We show that in certain cases, one can achieve policy interpretability while maintaining its optimality. We focus on a classical problem from reinforcement learning: mazes with $k$ obstacles in $\mathbb{R}^d$. We prove the existence of a small decision tree with a linear function at each inner node and depth $O(\log k + 2^d)$ that represents an optimal policy. Note that for the interesting case of a constant $d$, we have $O(\log k)$ depth. Thus, in this setting, there is no accuracy-interpretability tradeoff. To prove this result, we use a new "compressing" technique that might be useful in additional settings.

Related papers

Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning [32.07657827173262]
We show that the robust $Q$ Bellman operator is a strict contractive mapping with respect to a carefully constructed semi-norm.<n>We present a natural actor-critic algorithm that attains an $epsilon$-optimal robust policy in $tildecO(epsilon-3)$ samples.
arXiv Detail & Related papers (2025-06-08T08:26:27Z)
Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits [49.96531901205305]
We analyze $f$-divergence-regularized offline policy learning.<n>For reverse Kullback-Leibler (KL) divergence, we give the first $tildeO(epsilon-1)$ sample complexity under single-policy concentrability.<n>We extend our analysis to dueling bandits, and we believe these results take a significant step toward a comprehensive understanding of $f$-divergence-regularized policy learning.
arXiv Detail & Related papers (2025-02-09T22:14:45Z)
Partial Identifiability and Misspecification in Inverse Reinforcement Learning [64.13583792391783]
The aim of Inverse Reinforcement Learning is to infer a reward function $R$ from a policy $pi$. This paper provides a comprehensive analysis of partial identifiability and misspecification in IRL.
arXiv Detail & Related papers (2024-11-24T18:35:46Z)
Improved Sample Complexity Bounds for Distributionally Robust Reinforcement Learning [3.222802562733787]
We consider the problem of learning a control policy that is robust against the parameter mismatches between the training environment and testing environment. We propose the Robust Phased Value Learning (RPVL) algorithm to solve this problem for the uncertainty sets specified by four different divergences. We show that our algorithm achieves $tildemathcalO(|mathcalSmathcalA| H5)$ sample complexity, which is uniformly better than the existing results.
arXiv Detail & Related papers (2023-03-05T21:47:08Z)
Understanding Deep Neural Function Approximation in Reinforcement Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL) We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces. Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z)
A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation [16.29514743112387]
We study sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearlyrealizable. We present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries. Delphi requires $tildemathcalO(d)$ expert queries and a $textttpoly(d,|mathcalA|,1/varepsilon)$ amount of exploratory samples to provably recover an $varepsilon$suboptimal policy.
arXiv Detail & Related papers (2022-07-18T01:39:13Z)
Certifiably Robust Interpretation via Renyi Differential Privacy [77.04377192920741]
We study the problem of interpretation robustness from a new perspective of Renyi differential privacy (RDP) First, it can offer provable and certifiable top-$k$ robustness. Second, our proposed method offers $sim10%$ better experimental robustness than existing approaches. Third, our method can provide a smooth tradeoff between robustness and computational efficiency.
arXiv Detail & Related papers (2021-07-04T06:58:01Z)
Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations [79.66404989555566]
We consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies $Pi$ that may not contain any near-optimal policy. We provide an algorithm for this setting whose error is bounded in terms of the rank $d$ of the underlying MDP.
arXiv Detail & Related papers (2021-06-22T03:20:40Z)
Corruption-Robust Offline Reinforcement Learning [19.300465320692066]
We study adversarial robustness in offline reinforcement learning. We show that a worst-case $Omega(depsilon) optimality gap is unavoidable. We propose robust variants of the Least-Square Value Iteration (LSVI) algorithm.
arXiv Detail & Related papers (2021-06-11T22:41:53Z)
Agnostic learning with unknown utilities [70.14742836006042]
In many real-world problems, the utility of a decision depends on the underlying context $x$ and decision $y$. We study this as agnostic learning with unknown utilities. We show that estimating the utilities of only the sampled points$S$ suffices to learn a decision function which generalizes well.
arXiv Detail & Related papers (2021-04-17T08:22:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.