Properly learning decision trees in almost polynomial time
- URL: http://arxiv.org/abs/2109.00637v1
- Date: Wed, 1 Sep 2021 22:12:47 GMT
- Title: Properly learning decision trees in almost polynomial time
- Authors: Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan
- Abstract summary: We give an $nO(loglog n)$-time membership query algorithm for properly and agnostically learning decision trees.
Our algorithm shares similarities with practicals for learning decision trees.
We show how every decision tree can be "pruned" so that every variable in the resulting tree is influential.
- Score: 25.763690981846125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We give an $n^{O(\log\log n)}$-time membership query algorithm for properly
and agnostically learning decision trees under the uniform distribution over
$\{\pm 1\}^n$. Even in the realizable setting, the previous fastest runtime was
$n^{O(\log n)}$, a consequence of a classic algorithm of Ehrenfeucht and
Haussler.
Our algorithm shares similarities with practical heuristics for learning
decision trees, which we augment with additional ideas to circumvent known
lower bounds against these heuristics. To analyze our algorithm, we prove a new
structural result for decision trees that strengthens a theorem of O'Donnell,
Saks, Schramm, and Servedio. While the OSSS theorem says that every decision
tree has an influential variable, we show how every decision tree can be
"pruned" so that every variable in the resulting tree is influential.
Related papers
- Fast decision tree learning solves hard coding-theoretic problems [7.420043502440765]
We show that improvement of Ehrenfeucht and Haussler's algorithm will yield $O(log n)$-approximation algorithms for $k$-NCP.
This can be interpreted as a new avenue for designing algorithms for $k$-NCP, or as one for establishing the optimality of Ehrenfeucht and Haussler's algorithm.
arXiv Detail & Related papers (2024-09-19T21:40:57Z) - LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees.
We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - How Smart Guessing Strategies Can Yield Massive Scalability Improvements
for Sparse Decision Tree Optimization [18.294573939199438]
Current algorithms often require impractical amounts of time and memory to find optimal or near-optimal trees for some real-world datasets.
We address this problem via smart guessing strategies that can be applied to any optimal branch-and-bound-based decision tree algorithm.
Our approach enables guesses about how to bin continuous features, the size of the tree, and lower bounds on the error for the optimal decision tree.
arXiv Detail & Related papers (2021-12-01T19:39:28Z) - On Finding the $K$-best Non-projective Dependency Trees [39.5549669100436]
We provide a simplification of the $K$-best spanning tree algorithm of Camerini et al. (1980).
We present a novel extension of the algorithm for decoding the $K$-best dependency trees of a graph which are subject to a root constraint.
arXiv Detail & Related papers (2021-06-01T20:23:41Z) - Learning stochastic decision trees [19.304587350775385]
We give a quasipolynomial-time algorithm for learning decision trees that is optimally resilient to adversarial noise.
Our algorithm is furthermore proper, returning a hypothesis that is itself a decision tree.
arXiv Detail & Related papers (2021-05-08T04:54:12Z) - Testing and reconstruction via decision trees [19.304587350775385]
We study sublinear and local computation algorithms for decision trees, focusing on testing and reconstruction.
A tester that runs in $mathrmpoly(log s, 1/varepsilon)cdot nlog n$ time makes $mathrmpoly(log s,1/varepsilon)cdot n$ queries to an unknown function.
arXiv Detail & Related papers (2020-12-16T04:18:00Z) - Convex Polytope Trees [57.56078843831244]
convex polytope trees (CPT) are proposed to expand the family of decision trees by an interpretable generalization of their decision boundary.
We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.
arXiv Detail & Related papers (2020-10-21T19:38:57Z) - On $\ell_p$-norm Robustness of Ensemble Stumps and Trees [83.81523991945018]
We develop an efficient programming based algorithm for sound verification of ensemble stumps.
We demonstrate the first certified defense method for training ensemble stumps and trees with respect to $ell_p$ norm perturbations.
arXiv Detail & Related papers (2020-08-20T03:42:40Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Generalized and Scalable Optimal Sparse Decision Trees [56.35541305670828]
We present techniques that produce optimal decision trees over a variety of objectives.
We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables.
arXiv Detail & Related papers (2020-06-15T19:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.