Fully-Dynamic Decision Trees
- URL: http://arxiv.org/abs/2212.00778v1
- Date: Thu, 1 Dec 2022 18:58:19 GMT
- Title: Fully-Dynamic Decision Trees
- Authors: Marco Bressan and Gabriel Damay and Mauro Sozio
- Abstract summary: We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples.
For real-valued features the algorithm has an amortized running time per insertion/deletion of $Obig(fracd log3 nepsilon2big)$.
We show that any algorithm with similar guarantees uses amortized running time $Omega(d)$ and space $tildeOmega (n d)$.
- Score: 3.0058005235097114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop the first fully dynamic algorithm that maintains a decision tree
over an arbitrary sequence of insertions and deletions of labeled examples.
Given $\epsilon > 0$ our algorithm guarantees that, at every point in time,
every node of the decision tree uses a split with Gini gain within an additive
$\epsilon$ of the optimum. For real-valued features the algorithm has an
amortized running time per insertion/deletion of $O\big(\frac{d \log^3
n}{\epsilon^2}\big)$, which improves to $O\big(\frac{d \log^2
n}{\epsilon}\big)$ for binary or categorical features, while it uses space $O(n
d)$, where $n$ is the maximum number of examples at any point in time and $d$
is the number of features. Our algorithm is nearly optimal, as we show that any
algorithm with similar guarantees uses amortized running time $\Omega(d)$ and
space $\tilde{\Omega} (n d)$. We complement our theoretical results with an
extensive experimental evaluation on real-world data, showing the effectiveness
of our algorithm.
Related papers
- Mini-Batch Kernel $k$-means [4.604003661048267]
A single iteration of our algorithm takes $widetildeO(kb2)$ time, significantly faster than the $O(n2)$ time required by the full batch kernel $k$-means.
Experiments demonstrate that our algorithm consistently achieves a 10-100x speedup with minimal loss in quality.
arXiv Detail & Related papers (2024-10-08T10:59:14Z) - Perturb-and-Project: Differentially Private Similarities and Marginals [73.98880839337873]
We revisit the input perturbations framework for differential privacy where noise is added to the input $Ain mathcalS$.
We first design novel efficient algorithms to privately release pair-wise cosine similarities.
We derive a novel algorithm to compute $k$-way marginal queries over $n$ features.
arXiv Detail & Related papers (2024-06-07T12:07:16Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Harnessing the Power of Choices in Decision Tree Learning [20.08390529995706]
We propose a simple generalization of standard and empirically successful decision tree learning algorithms such as ID3, C4.5, and CART.
Our algorithm, Top-$k$, considers the $k$ best attributes as possible splits instead of just the single best attribute.
We show, through extensive experiments, that Top-$k$ outperforms the two main approaches to decision tree learning.
arXiv Detail & Related papers (2023-10-02T18:45:46Z) - Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time
Guarantees [3.5509551353363644]
We give the first algorithm that maintains an approximate decision tree over an arbitrary sequence of insertions and deletions of labeled examples.
We provide a deterministic algorithm that maintains an $epsilon$-approximate tree using $O!left(fracd, f(n)n operatornamepolyfrachepsilonright)$ operations per update.
arXiv Detail & Related papers (2023-02-08T11:02:58Z) - Mind the gap: Achieving a super-Grover quantum speedup by jumping to the
end [114.3957763744719]
We present a quantum algorithm that has rigorous runtime guarantees for several families of binary optimization problems.
We show that the algorithm finds the optimal solution in time $O*(2(0.5-c)n)$ for an $n$-independent constant $c$.
We also show that for a large fraction of random instances from the $k$-spin model and for any fully satisfiable or slightly frustrated $k$-CSP formula, statement (a) is the case.
arXiv Detail & Related papers (2022-12-03T02:45:23Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - Bayesian Optimistic Optimisation with Exponentially Decaying Regret [58.02542541410322]
The current practical BO algorithms have regret bounds ranging from $mathcalO(fraclogNsqrtN)$ to $mathcal O(e-sqrtN)$, where $N$ is the number of evaluations.
This paper explores the possibility of improving the regret bound in the noiseless setting by intertwining concepts from BO and tree-based optimistic optimisation.
We propose the BOO algorithm, a first practical approach which can achieve an exponential regret bound with order $mathcal O(N-sqrt
arXiv Detail & Related papers (2021-05-10T13:07:44Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Provable guarantees for decision tree induction: the agnostic setting [16.784355746717562]
We give strengthened provable guarantees on the performance of widely employed and empirically successful sl top-down decision tree learnings.
We show that for all monotone functions$f$ and parameters $sin mathN$, theses construct a decision tree of size $stildeO((log s)/varepsilon2)$ that achieves error.
We complement our algorithmic guarantee with a near-matching $stildeOmega(log s)$ lower bound.
arXiv Detail & Related papers (2020-06-01T06:44:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.