Optimal Decision Tree Pruning Revisited: Algorithms and Complexity
- URL: http://arxiv.org/abs/2503.03576v1
- Date: Wed, 05 Mar 2025 15:02:46 GMT
- Title: Optimal Decision Tree Pruning Revisited: Algorithms and Complexity
- Authors: Juha Harviainen, Frank Sommer, Manuel Sorge, Stefan Szeider,
- Abstract summary: We focus on fundamental pruning operations of subtree replacement and raising.<n>While optimal pruning can be performed in time for subtree replacement, the problem is NP-complete for subtree raising.<n>For example, while subtree raising is hard for small domain size, it can be solved in $D2d cdot |I|O(1)$ time, where $|I|$ is the input size.
- Score: 22.57063332430197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a comprehensive classical and parameterized complexity analysis of decision tree pruning operations, extending recent research on the complexity of learning small decision trees. Thereby, we offer new insights into the computational challenges of decision tree simplification, a crucial aspect of developing interpretable and efficient machine learning models. We focus on fundamental pruning operations of subtree replacement and raising, which are used in heuristics. Surprisingly, while optimal pruning can be performed in polynomial time for subtree replacement, the problem is NP-complete for subtree raising. Therefore, we identify parameters and combinations thereof that lead to fixed-parameter tractability or hardness, establishing a precise borderline between these complexity classes. For example, while subtree raising is hard for small domain size $D$ or number $d$ of features, it can be solved in $D^{2d} \cdot |I|^{O(1)}$ time, where $|I|$ is the input size. We complement our theoretical findings with preliminary experimental results, demonstrating the practical implications of our analysis.
Related papers
- Provably optimal decision trees with arbitrary splitting rules in polynomial time [1.9405875431318445]
We provide the first axiomatic definition of decision trees.<n>We refer to decision trees that satisfy the axioms as proper decision trees.<n>We develop the first provably correct-time algorithm for solving the optimal decision tree problem.
arXiv Detail & Related papers (2025-03-03T12:14:53Z) - LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - Learning accurate and interpretable decision trees [27.203303726977616]
We develop approaches to design decision tree learning algorithms given repeated access to data from the same domain.
We study the sample complexity of tuning prior parameters in Bayesian decision tree learning, and extend our results to decision tree regression.
We also study the interpretability of the learned decision trees and introduce a data-driven approach for optimizing the explainability versus accuracy trade-off using decision trees.
arXiv Detail & Related papers (2024-05-24T20:10:10Z) - Des-q: a quantum algorithm to provably speedup retraining of decision trees [2.7262923206583136]
We introduce Des-q, a novel quantum algorithm to construct and retrain decision trees for regression and binary classification tasks.<n>We benchmark the simulated version of Des-q against the state-of-the-art classical methods on multiple data sets.<n>Our algorithm exhibits similar performance to the state-of-the-art decision trees while significantly speeding up the periodic tree retraining.
arXiv Detail & Related papers (2023-09-18T17:56:08Z) - TreeDQN: Learning to minimize Branch-and-Bound tree [78.52895577861327]
Branch-and-Bound is a convenient approach to solving optimization tasks in the form of Mixed Linear Programs.
The efficiency of the solver depends on the branchning used to select a variable for splitting.
We propose a reinforcement learning method that can efficiently learn the branching.
arXiv Detail & Related papers (2023-06-09T14:01:26Z) - On Single-Objective Sub-Graph-Based Mutation for Solving the
Bi-Objective Minimum Spanning Tree Problem [0.0]
We contribute to the efficient approximation of the $mathcalNP$-hard multi-objective minimum spanning tree problem (moMST) adopting evolutionary computation.
We design several highly biased sub-graph-based mutation operators founded on the gained insights.
Our results confirm that the sub-graph based operators beat baseline algorithms from the literature.
arXiv Detail & Related papers (2023-05-31T22:35:17Z) - Properly learning decision trees in almost polynomial time [25.763690981846125]
We give an $nO(loglog n)$-time membership query algorithm for properly and agnostically learning decision trees.
Our algorithm shares similarities with practicals for learning decision trees.
We show how every decision tree can be "pruned" so that every variable in the resulting tree is influential.
arXiv Detail & Related papers (2021-09-01T22:12:47Z) - Robustifying Algorithms of Learning Latent Trees with Vector Variables [92.18777020401484]
We present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG)
We robustify RG, CLRG, Neighbor Joining (NJ) and Spectral NJ (SNJ) by using the truncated inner product.
We derive the first known instance-dependent impossibility result for structure learning of latent trees.
arXiv Detail & Related papers (2021-06-02T01:37:52Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - Estimating decision tree learnability with polylogarithmic sample
complexity [16.832966312395126]
We show that top-down decision tree learnings are amenable to highly efficient learnability estimation.
We show how the label $T(xstar$)$ of the decision tree hypothesis $T$ can be computed with polylogarithmically many labeled examples.
arXiv Detail & Related papers (2020-11-03T09:26:27Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Generalized and Scalable Optimal Sparse Decision Trees [56.35541305670828]
We present techniques that produce optimal decision trees over a variety of objectives.
We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables.
arXiv Detail & Related papers (2020-06-15T19:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.