Related papers: In Search of Trees: Decision-Tree Policy Synthesis for Black-Box Systems via Search

In Search of Trees: Decision-Tree Policy Synthesis for Black-Box Systems via Search

URL: http://arxiv.org/abs/2409.03260v1
Date: Thu, 5 Sep 2024 05:51:42 GMT
Title: In Search of Trees: Decision-Tree Policy Synthesis for Black-Box Systems via Search
Authors: Emir Demirović, Christian Schilling, Anna Lukina,
Abstract summary: We present an approach to synthesise optimal decision-tree policies given a black-box environment and specification. Our approach is a specialised search algorithm which systematically explores the space of decision trees under the given discretisation.
Score: 6.74890780471356
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Decision trees, owing to their interpretability, are attractive as control policies for (dynamical) systems. Unfortunately, constructing, or synthesising, such policies is a challenging task. Previous approaches do so by imitating a neural-network policy, approximating a tabular policy obtained via formal synthesis, employing reinforcement learning, or modelling the problem as a mixed-integer linear program. However, these works may require access to a hard-to-obtain accurate policy or a formal model of the environment (within reach of formal synthesis), and may not provide guarantees on the quality or size of the final tree policy. In contrast, we present an approach to synthesise optimal decision-tree policies given a black-box environment and specification, and a discretisation of the tree predicates, where optimality is defined with respect to the number of steps to achieve the goal. Our approach is a specialised search algorithm which systematically explores the (exponentially large) space of decision trees under the given discretisation. The key component is a novel pruning mechanism that significantly reduces the search space. Our approach represents a conceptually novel way of synthesising small decision-tree policies with optimality guarantees even for black-box environments with black-box specifications.

Related papers

Topological Social Choice: Designing a Noise-Robust Polar Distance for Persistence Diagrams [0.0]
Topological Data Analysis (TDA) has emerged as a powerful framework for extracting robust and interpretable features from noisy data.<n>This work introduces a novel conceptual bridge between these domains by proposing a new metric for persistence diagrams tailored to noisy preference data.<n>We define a polar coordinate-based distance that captures both the magnitude and orientation of topological features in a smooth and differentiable manner.
arXiv Detail & Related papers (2025-07-18T19:41:19Z)
Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning [66.4260157478436]
We study reinforcement learning in the policy learning setting.<n>The goal is to find a policy whose performance is competitive with the best policy in a given class of interest.
arXiv Detail & Related papers (2025-07-06T14:40:05Z)
Statistical Analysis of Policy Space Compression Problem [54.1754937830779]
Policy search methods are crucial in reinforcement learning, offering a framework to address continuous state-action and partially observable problems. Reducing the policy space through policy compression emerges as a powerful, reward-free approach to accelerate the learning process. This technique condenses the policy space into a smaller, representative set while maintaining most of the original effectiveness.
arXiv Detail & Related papers (2024-11-15T02:46:55Z)
SYMPOL: Symbolic Tree-Based On-Policy Reinforcement Learning [9.035959289139102]
We introduce SYMPOL, a novel method for SYMbolic tree-based on-POLicy RL. SYMPOL employs a tree-based model integrated with a policy gradient method, enabling the agent to learn and adapt its actions. We evaluate SYMPOL on a set of benchmark RL tasks, demonstrating its superiority over alternative tree-based RL approaches.
arXiv Detail & Related papers (2024-08-16T14:04:40Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods. Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z)
Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states. The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z)
A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees [5.250288418639076]
We propose an online POMDP solver called Lazy Cross-Entropy Search Over Policy Trees (LCEOPT) At each planning step, our method uses a novel lazy Cross-Entropy method to search the space of policy trees. Our method is surprisingly simple as compared to existing state-of-the-art methods, yet empirically outperforms them on several continuous-action POMDP problems.
arXiv Detail & Related papers (2023-05-14T03:12:53Z)
Optimal Decision Tree Policies for Markov Decision Processes [7.995360025953931]
We study the optimization of size-limited decision trees for Markov Decision Processes (MPDs) We show that this is due to an inherent shortcoming of imitation learning, namely that complex policies cannot be represented using size-limited trees. While there is generally a trade-off between the performance and interpretability of machine learning models, we find that OMDTs limited to a depth of 3 often perform close to the optimal limit.
arXiv Detail & Related papers (2023-01-30T18:51:02Z)
Policy learning for many outcomes of interest: Combining optimal policy trees with multi-objective Bayesian optimisation [0.0]
Multi-Objective Policy Learning combines optimal decision trees for policy learning with a multi-objective Bayesian optimisation approach. The method is applied to a real-world case-study of non-price rationing of anti-malarial medication in Kenya.
arXiv Detail & Related papers (2022-12-13T01:39:14Z)
Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search. Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function. Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z)
Policy Manifold Search: Exploring the Manifold Hypothesis for Diversity-based Neuroevolution [4.920145245773581]
This paper proposes a novel method for diversity-based policy search via Neuroevolution. We use the Quality-Diversity framework which provides a principled approach to policy search. We also use the Jacobian of the inverse-mapping function to guide the search in the representation space.
arXiv Detail & Related papers (2021-04-27T18:52:03Z)
MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)
Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent [3.118384520557952]
We introduce a deep reinforcement learning algorithm combining policy gradient methods with model-based optimization techniques to solve this problem. In essence, our algorithm iteratively approximates the gradient of the expected return via Monte-Carlo sampling and automatic differentiation. We show that DEPS performs at least as well or better in all three environments, consistently yielding solutions with higher returns in fewer iterations.
arXiv Detail & Related papers (2020-06-02T16:08:07Z)
Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search. We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.