Optimal Decision Tree Policies for Markov Decision Processes
- URL: http://arxiv.org/abs/2301.13185v2
- Date: Wed, 14 Feb 2024 01:00:18 GMT
- Title: Optimal Decision Tree Policies for Markov Decision Processes
- Authors: Dani\"el Vos and Sicco Verwer
- Abstract summary: We study the optimization of size-limited decision trees for Markov Decision Processes (MPDs)
We show that this is due to an inherent shortcoming of imitation learning, namely that complex policies cannot be represented using size-limited trees.
While there is generally a trade-off between the performance and interpretability of machine learning models, we find that OMDTs limited to a depth of 3 often perform close to the optimal limit.
- Score: 7.995360025953931
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interpretability of reinforcement learning policies is essential for many
real-world tasks but learning such interpretable policies is a hard problem.
Particularly rule-based policies such as decision trees and rules lists are
difficult to optimize due to their non-differentiability. While existing
techniques can learn verifiable decision tree policies there is no guarantee
that the learners generate a decision that performs optimally. In this work, we
study the optimization of size-limited decision trees for Markov Decision
Processes (MPDs) and propose OMDTs: Optimal MDP Decision Trees. Given a
user-defined size limit and MDP formulation OMDT directly maximizes the
expected discounted return for the decision tree using Mixed-Integer Linear
Programming. By training optimal decision tree policies for different MDPs we
empirically study the optimality gap for existing imitation learning techniques
and find that they perform sub-optimally. We show that this is due to an
inherent shortcoming of imitation learning, namely that complex policies cannot
be represented using size-limited trees. In such cases, it is better to
directly optimize the tree for expected return. While there is generally a
trade-off between the performance and interpretability of machine learning
models, we find that OMDTs limited to a depth of 3 often perform close to the
optimal limit.
Related papers
- Optimizing Interpretable Decision Tree Policies for Reinforcement Learning [10.68128849363198]
Decision trees have gained increased attention in supervised learning for their inherent interpretability.
This paper considers the problem of optimizing interpretable decision tree policies to replace neural networks in reinforcement learning settings.
arXiv Detail & Related papers (2024-08-21T14:04:00Z) - Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees.
We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - Reinforcement Learning with Human Feedback: Learning Dynamic Choices via
Pessimism [91.52263068880484]
We study offline Reinforcement Learning with Human Feedback (RLHF)
We aim to learn the human's underlying reward and the MDP's optimal policy from a set of trajectories induced by human choices.
RLHF is challenging for multiple reasons: large state space but limited human feedback, the bounded rationality of human decisions, and the off-policy distribution shift.
arXiv Detail & Related papers (2023-05-29T01:18:39Z) - Policy learning for many outcomes of interest: Combining optimal policy
trees with multi-objective Bayesian optimisation [0.0]
Multi-Objective Policy Learning combines optimal decision trees for policy learning with a multi-objective Bayesian optimisation approach.
The method is applied to a real-world case-study of non-price rationing of anti-malarial medication in Kenya.
arXiv Detail & Related papers (2022-12-13T01:39:14Z) - bsnsing: A decision tree induction method based on recursive optimal
boolean rule composition [2.28438857884398]
This paper proposes a new mixed-integer programming (MIP) formulation to optimize split rule selection in the decision tree induction process.
It develops an efficient search solver that is able to solve practical instances faster than commercial solvers.
arXiv Detail & Related papers (2022-05-30T17:13:57Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Robust Optimal Classification Trees Against Adversarial Examples [5.254093731341154]
We propose a collection of methods to train decision trees that are optimally robust against user-specified attack models.
We show that the min-max optimization problem that arises in adversarial learning can be solved using a single minimization formulation.
We also present a method that determines the upper bound on adversarial accuracy for any model using bipartite matching.
arXiv Detail & Related papers (2021-09-08T18:10:49Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Generalized and Scalable Optimal Sparse Decision Trees [56.35541305670828]
We present techniques that produce optimal decision trees over a variety of objectives.
We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables.
arXiv Detail & Related papers (2020-06-15T19:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.