Related papers: Fully-Dynamic Decision Trees

Fully-Dynamic Decision Trees

URL: http://arxiv.org/abs/2212.00778v1
Date: Thu, 1 Dec 2022 18:58:19 GMT
Title: Fully-Dynamic Decision Trees
Authors: Marco Bressan and Gabriel Damay and Mauro Sozio
Abstract summary: We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples. For real-valued features the algorithm has an amortized running time per insertion/deletion of $Obig(fracd log3 nepsilon2big)$. We show that any algorithm with similar guarantees uses amortized running time $Omega(d)$ and space $tildeOmega (n d)$.
Score: 3.0058005235097114
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples. Given $\epsilon > 0$ our algorithm guarantees that, at every point in time, every node of the decision tree uses a split with Gini gain within an additive $\epsilon$ of the optimum. For real-valued features the algorithm has an amortized running time per insertion/deletion of $O\big(\frac{d \log^3 n}{\epsilon^2}\big)$, which improves to $O\big(\frac{d \log^2 n}{\epsilon}\big)$ for binary or categorical features, while it uses space $O(n d)$, where $n$ is the maximum number of examples at any point in time and $d$ is the number of features. Our algorithm is nearly optimal, as we show that any algorithm with similar guarantees uses amortized running time $\Omega(d)$ and space $\tilde{\Omega} (n d)$. We complement our theoretical results with an extensive experimental evaluation on real-world data, showing the effectiveness of our algorithm.

Related papers

Improved Robust Estimation for Erdős-Rényi Graphs: The Sparse Regime and Optimal Breakdown Point [3.793609515750114]
We study the problem of robustly estimating the edge density of ErdHos-R'enyi random graphs $G(n, dcirc/n)$. Our algorithm is based on the sum-of-squares hierarchy.
arXiv Detail & Related papers (2025-03-05T21:45:17Z)
Mini-Batch Kernel $k$-means [4.604003661048267]
A single iteration of our algorithm takes $widetildeO(kb2)$ time, significantly faster than the $O(n2)$ time required by the full batch kernel $k$-means. Experiments demonstrate that our algorithm consistently achieves a 10-100x speedup with minimal loss in quality.
arXiv Detail & Related papers (2024-10-08T10:59:14Z)
Perturb-and-Project: Differentially Private Similarities and Marginals [73.98880839337873]
We revisit the input perturbations framework for differential privacy where noise is added to the input $Ain mathcalS$. We first design novel efficient algorithms to privately release pair-wise cosine similarities. We derive a novel algorithm to compute $k$-way marginal queries over $n$ features.
arXiv Detail & Related papers (2024-06-07T12:07:16Z)
A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al. A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$. We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z)
Harnessing the Power of Choices in Decision Tree Learning [20.08390529995706]
We propose a simple generalization of standard and empirically successful decision tree learning algorithms such as ID3, C4.5, and CART. Our algorithm, Top-$k$, considers the $k$ best attributes as possible splits instead of just the single best attribute. We show, through extensive experiments, that Top-$k$ outperforms the two main approaches to decision tree learning.
arXiv Detail & Related papers (2023-10-02T18:45:46Z)
Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees [3.5509551353363644]
We give the first algorithm that maintains an approximate decision tree over an arbitrary sequence of insertions and deletions of labeled examples. We provide a deterministic algorithm that maintains an $epsilon$-approximate tree using $O!left(fracd, f(n)n operatornamepolyfrachepsilonright)$ operations per update.
arXiv Detail & Related papers (2023-02-08T11:02:58Z)
Mind the gap: Achieving a super-Grover quantum speedup by jumping to the end [114.3957763744719]
We present a quantum algorithm that has rigorous runtime guarantees for several families of binary optimization problems. We show that the algorithm finds the optimal solution in time $O*(2(0.5-c)n)$ for an $n$-independent constant $c$. We also show that for a large fraction of random instances from the $k$-spin model and for any fully satisfiable or slightly frustrated $k$-CSP formula, statement (a) is the case.
arXiv Detail & Related papers (2022-12-03T02:45:23Z)
Scalable Differentially Private Clustering via Hierarchically Separated Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee. Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z)
Bayesian Optimistic Optimisation with Exponentially Decaying Regret [58.02542541410322]
The current practical BO algorithms have regret bounds ranging from $mathcalO(fraclogNsqrtN)$ to $mathcal O(e-sqrtN)$, where $N$ is the number of evaluations. This paper explores the possibility of improving the regret bound in the noiseless setting by intertwining concepts from BO and tree-based optimistic optimisation. We propose the BOO algorithm, a first practical approach which can achieve an exponential regret bound with order $mathcal O(N-sqrt
arXiv Detail & Related papers (2021-05-10T13:07:44Z)
Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
Provable guarantees for decision tree induction: the agnostic setting [16.784355746717562]
We give strengthened provable guarantees on the performance of widely employed and empirically successful sl top-down decision tree learnings. We show that for all monotone functions$f$ and parameters $sin mathN$, theses construct a decision tree of size $stildeO((log s)/varepsilon2)$ that achieves error. We complement our algorithmic guarantee with a near-matching $stildeOmega(log s)$ lower bound.
arXiv Detail & Related papers (2020-06-01T06:44:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.