There is no Accuracy-Interpretability Tradeoff in Reinforcement Learning
for Mazes
- URL: http://arxiv.org/abs/2206.04266v1
- Date: Thu, 9 Jun 2022 04:23:26 GMT
- Title: There is no Accuracy-Interpretability Tradeoff in Reinforcement Learning
for Mazes
- Authors: Yishay Mansour, Michal Moshkovitz, Cynthia Rudin
- Abstract summary: Interpretability is an essential building block for trustworthiness in reinforcement learning systems.
We show that in certain cases, one can achieve policy interpretability while maintaining its optimality.
- Score: 64.05903267230467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interpretability is an essential building block for trustworthiness in
reinforcement learning systems. However, interpretability might come at the
cost of deteriorated performance, leading many researchers to build complex
models. Our goal is to analyze the cost of interpretability. We show that in
certain cases, one can achieve policy interpretability while maintaining its
optimality. We focus on a classical problem from reinforcement learning: mazes
with $k$ obstacles in $\mathbb{R}^d$. We prove the existence of a small
decision tree with a linear function at each inner node and depth $O(\log k +
2^d)$ that represents an optimal policy. Note that for the interesting case of
a constant $d$, we have $O(\log k)$ depth. Thus, in this setting, there is no
accuracy-interpretability tradeoff. To prove this result, we use a new
"compressing" technique that might be useful in additional settings.
Related papers
- Improved Sample Complexity Bounds for Distributionally Robust
Reinforcement Learning [3.222802562733787]
We consider the problem of learning a control policy that is robust against the parameter mismatches between the training environment and testing environment.
We propose the Robust Phased Value Learning (RPVL) algorithm to solve this problem for the uncertainty sets specified by four different divergences.
We show that our algorithm achieves $tildemathcalO(|mathcalSmathcalA| H5)$ sample complexity, which is uniformly better than the existing results.
arXiv Detail & Related papers (2023-03-05T21:47:08Z) - Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z) - A Few Expert Queries Suffices for Sample-Efficient RL with Resets and
Linear Value Approximation [16.29514743112387]
We study sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearlyrealizable.
We present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries.
Delphi requires $tildemathcalO(d)$ expert queries and a $textttpoly(d,|mathcalA|,1/varepsilon)$ amount of exploratory samples to provably recover an $varepsilon$suboptimal policy.
arXiv Detail & Related papers (2022-07-18T01:39:13Z) - Certifiably Robust Interpretation via Renyi Differential Privacy [77.04377192920741]
We study the problem of interpretation robustness from a new perspective of Renyi differential privacy (RDP)
First, it can offer provable and certifiable top-$k$ robustness.
Second, our proposed method offers $sim10%$ better experimental robustness than existing approaches.
Third, our method can provide a smooth tradeoff between robustness and computational efficiency.
arXiv Detail & Related papers (2021-07-04T06:58:01Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations [79.66404989555566]
We consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies $Pi$ that may not contain any near-optimal policy.
We provide an algorithm for this setting whose error is bounded in terms of the rank $d$ of the underlying MDP.
arXiv Detail & Related papers (2021-06-22T03:20:40Z) - Corruption-Robust Offline Reinforcement Learning [19.300465320692066]
We study adversarial robustness in offline reinforcement learning.
We show that a worst-case $Omega(depsilon) optimality gap is unavoidable.
We propose robust variants of the Least-Square Value Iteration (LSVI) algorithm.
arXiv Detail & Related papers (2021-06-11T22:41:53Z) - Agnostic learning with unknown utilities [70.14742836006042]
In many real-world problems, the utility of a decision depends on the underlying context $x$ and decision $y$.
We study this as agnostic learning with unknown utilities.
We show that estimating the utilities of only the sampled points$S$ suffices to learn a decision function which generalizes well.
arXiv Detail & Related papers (2021-04-17T08:22:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.