Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in
IBMDPs
- URL: http://arxiv.org/abs/2309.13365v3
- Date: Sun, 21 Jan 2024 13:07:27 GMT
- Title: Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in
IBMDPs
- Authors: Hector Kohler, Riad Akrour, Philippe Preux
- Abstract summary: Interpretability of AI models allows for user safety checks to build trust in such AIs.
Decision Trees (DTs) provide a global look at the learned model and transparently reveal which features of the input are critical for making a decision.
Recent Reinforcement Learning framework has been proposed to explore the space of DTs using deep RL.
- Score: 9.587070290189507
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interpretability of AI models allows for user safety checks to build trust in
such AIs. In particular, Decision Trees (DTs) provide a global look at the
learned model and transparently reveal which features of the input are critical
for making a decision. However, interpretability is hindered if the DT is too
large. To learn compact trees, a recent Reinforcement Learning (RL) framework
has been proposed to explore the space of DTs using deep RL. This framework
augments a decision problem (e.g. a supervised classification task) with
additional actions that gather information about the features of an otherwise
hidden input. By appropriately penalizing these actions, the agent learns to
optimally trade-off size and performance of DTs. In practice, a reactive policy
for a partially observable Markov decision process (MDP) needs to be learned,
which is still an open problem. We show in this paper that deep RL can fail
even on simple toy tasks of this class. However, when the underlying decision
problem is a supervised classification task, we show that finding the optimal
tree can be cast as a fully observable Markov decision problem and be solved
efficiently, giving rise to a new family of algorithms for learning DTs that go
beyond the classical greedy maximization ones.
Related papers
- RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space [28.273737052758907]
We introduce an upper bound on the return gap between the oracle expert policy and an optimal decision tree policy.
This enables us to recast the DT extraction problem into a novel non-euclidean clustering problem over the local observation and action values space of each agent.
We also propose the Return-Gap-Minimization Decision Tree (RGMDT) algorithm, which is a surprisingly simple design and is integrated with reinforcement learning.
arXiv Detail & Related papers (2024-10-21T21:19:49Z) - Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs)
Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally.
Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - Interpretable Decision Tree Search as a Markov Decision Process [8.530182510074983]
Finding an optimal decision tree for a supervised learning task is a challenging problem to solve at scale.
It was recently proposed to frame the problem as a Markov Decision Problem (MDP) and use deep reinforcement learning to tackle scaling.
We propose instead to scale the resolution of such MDPs using an information-theoretic tests generating function that generates for every state.
arXiv Detail & Related papers (2023-09-22T08:18:08Z) - Verifiable Learning for Robust Tree Ensembles [8.207928136395184]
A class of decision tree ensembles called large-spread ensembles admit a security verification algorithm running in restricted time.
We show the benefits of this idea by designing a new training algorithm that automatically learns a large-spread decision tree ensemble from labelled data.
Experimental results on public datasets confirm that large-spread ensembles trained using our algorithm can be verified in a matter of seconds.
arXiv Detail & Related papers (2023-05-05T15:37:23Z) - Optimal Interpretability-Performance Trade-off of Classification Trees
with Black-Box Reinforcement Learning [0.0]
Interpretability of AI models allows for user safety checks to build trust in these models.
Decision trees (DTs) provide a global view on the learned model and clearly outlines the role of the features that are critical to classify a given data.
To learn compact trees, a Reinforcement Learning framework has been recently proposed to explore the space of DTs.
arXiv Detail & Related papers (2023-04-11T09:43:23Z) - Optimal Decision Tree Policies for Markov Decision Processes [7.995360025953931]
We study the optimization of size-limited decision trees for Markov Decision Processes (MPDs)
We show that this is due to an inherent shortcoming of imitation learning, namely that complex policies cannot be represented using size-limited trees.
While there is generally a trade-off between the performance and interpretability of machine learning models, we find that OMDTs limited to a depth of 3 often perform close to the optimal limit.
arXiv Detail & Related papers (2023-01-30T18:51:02Z) - Denoised MDPs: Learning World Models Better Than the World Itself [94.74665254213588]
This work categorizes information out in the wild into four types based on controllability and relation with reward, and formulates useful information as that which is both controllable and reward-relevant.
Experiments on variants of DeepMind Control Suite and RoboDesk demonstrate superior performance of our denoised world model over using raw observations alone.
arXiv Detail & Related papers (2022-06-30T17:59:49Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Optimal Decision Diagrams for Classification [68.72078059880018]
We study the training of optimal decision diagrams from a mathematical programming perspective.
We introduce a novel mixed-integer linear programming model for training.
We show how this model can be easily extended for fairness, parsimony, and stability notions.
arXiv Detail & Related papers (2022-05-28T18:31:23Z) - R(Det)^2: Randomized Decision Routing for Object Detection [64.48369663018376]
We propose a novel approach to combine decision trees and deep neural networks in an end-to-end learning manner for object detection.
To facilitate effective learning, we propose randomized decision routing with node selective and associative losses.
We name this approach as the randomized decision routing for object detection, abbreviated as R(Det)$2$.
arXiv Detail & Related papers (2022-04-02T07:54:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.