Harnessing the Power of Choices in Decision Tree Learning
- URL: http://arxiv.org/abs/2310.01551v2
- Date: Wed, 25 Oct 2023 20:40:49 GMT
- Title: Harnessing the Power of Choices in Decision Tree Learning
- Authors: Guy Blanc, Jane Lange, Chirag Pabbaraju, Colin Sullivan, Li-Yang Tan,
Mo Tiwari
- Abstract summary: We propose a simple generalization of standard and empirically successful decision tree learning algorithms such as ID3, C4.5, and CART.
Our algorithm, Top-$k$, considers the $k$ best attributes as possible splits instead of just the single best attribute.
We show, through extensive experiments, that Top-$k$ outperforms the two main approaches to decision tree learning.
- Score: 20.08390529995706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a simple generalization of standard and empirically successful
decision tree learning algorithms such as ID3, C4.5, and CART. These
algorithms, which have been central to machine learning for decades, are greedy
in nature: they grow a decision tree by iteratively splitting on the best
attribute. Our algorithm, Top-$k$, considers the $k$ best attributes as
possible splits instead of just the single best attribute. We demonstrate,
theoretically and empirically, the power of this simple generalization. We
first prove a {\sl greediness hierarchy theorem} showing that for every $k \in
\mathbb{N}$, Top-$(k+1)$ can be dramatically more powerful than Top-$k$: there
are data distributions for which the former achieves accuracy $1-\varepsilon$,
whereas the latter only achieves accuracy $\frac1{2}+\varepsilon$. We then
show, through extensive experiments, that Top-$k$ outperforms the two main
approaches to decision tree learning: classic greedy algorithms and more recent
"optimal decision tree" algorithms. On one hand, Top-$k$ consistently enjoys
significant accuracy gains over greedy algorithms across a wide range of
benchmarks. On the other hand, Top-$k$ is markedly more scalable than optimal
decision tree algorithms and is able to handle dataset and feature set sizes
that remain far beyond the reach of these algorithms.
Related papers
- A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - On Computing Optimal Tree Ensembles [7.424944196676223]
Random forests and, more generally, (decisionnobreakdash-)tree ensembles are widely used methods for classification and regression.
Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth.
We provide two novel algorithms and corresponding lower bounds.
arXiv Detail & Related papers (2023-06-07T13:30:43Z) - Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time
Guarantees [3.5509551353363644]
We give the first algorithm that maintains an approximate decision tree over an arbitrary sequence of insertions and deletions of labeled examples.
We provide a deterministic algorithm that maintains an $epsilon$-approximate tree using $O!left(fracd, f(n)n operatornamepolyfrachepsilonright)$ operations per update.
arXiv Detail & Related papers (2023-02-08T11:02:58Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Private estimation algorithms for stochastic block models and mixture
models [63.07482515700984]
General tools for designing efficient private estimation algorithms.
First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z) - Fully-Dynamic Decision Trees [3.0058005235097114]
We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples.
For real-valued features the algorithm has an amortized running time per insertion/deletion of $Obig(fracd log3 nepsilon2big)$.
We show that any algorithm with similar guarantees uses amortized running time $Omega(d)$ and space $tildeOmega (n d)$.
arXiv Detail & Related papers (2022-12-01T18:58:19Z) - Optimal Low-Degree Hardness of Maximum Independent Set [93.59919600451487]
We study the algorithmic task of finding a large independent set in a sparse ErdHos-R'enyi random graph.
We show that the class of low-degree algorithms can find independent sets of half-optimal size but no larger.
arXiv Detail & Related papers (2020-10-13T17:26:09Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Provable guarantees for decision tree induction: the agnostic setting [16.784355746717562]
We give strengthened provable guarantees on the performance of widely employed and empirically successful sl top-down decision tree learnings.
We show that for all monotone functions$f$ and parameters $sin mathN$, theses construct a decision tree of size $stildeO((log s)/varepsilon2)$ that achieves error.
We complement our algorithmic guarantee with a near-matching $stildeOmega(log s)$ lower bound.
arXiv Detail & Related papers (2020-06-01T06:44:07Z) - Optimal Sparse Decision Trees [25.043477914272046]
This work introduces the first practical algorithm for optimal decision trees for binary variables.
The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques.
Our experiments highlight advantages in scalability, speed, and proof of optimality.
arXiv Detail & Related papers (2019-04-29T17:56:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.