Learning-Augmented B-Trees
- URL: http://arxiv.org/abs/2211.09251v2
- Date: Mon, 24 Jul 2023 07:08:59 GMT
- Title: Learning-Augmented B-Trees
- Authors: Xinyuan Cao, Jingbang Chen, Li Chen, Chris Lambert, Richard Peng,
Daniel Sleator
- Abstract summary: We study learning-augmented binary search trees (BSTs) and B-Trees via Treaps with composite priorities.
The result is a simple search tree where the depth of each item is determined by its predicted weight $w_x$.
- Score: 11.542679443281411
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study learning-augmented binary search trees (BSTs) and B-Trees via Treaps
with composite priorities. The result is a simple search tree where the depth
of each item is determined by its predicted weight $w_x$. To achieve the
result, each item $x$ has its composite priority
$-\lfloor\log\log(1/w_x)\rfloor + U(0, 1)$ where $U(0, 1)$ is the uniform
random variable. This generalizes the recent learning-augmented BSTs
[Lin-Luo-Woodruff ICML`22], which only work for Zipfian distributions, to
arbitrary inputs and predictions. It also gives the first B-Tree data structure
that can provably take advantage of localities in the access sequence via
online self-reorganization. The data structure is robust to prediction errors
and handles insertions, deletions, as well as prediction updates.
Related papers
- Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls [83.89771461061903]
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs)
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs)
We identify two key challenges contributing to this inefficiency: $textitover-exploration$ due to redundant states with semantically equivalent content, and $textitunder-exploration$ caused by high variance in verifier scoring.
We propose FETCH, a flexible, plug-and-play system compatible with various tree search algorithms.
arXiv Detail & Related papers (2025-02-16T16:12:01Z) - Statistical-Computational Trade-offs for Density Estimation [60.81548752871115]
We show that for a broad class of data structures their bounds cannot be significantly improved.
This is a novel emphstatistical-computational trade-off for density estimation.
arXiv Detail & Related papers (2024-10-30T15:03:33Z) - Learning Tree Pattern Transformations [5.767156832161818]
Explaining why and how a tree $t$ structurally differs from another tree $t*$ is a question that is encountered throughout computer science.
In this article, we explore how to learn explanations for structural differences between pairs of trees from sample data.
arXiv Detail & Related papers (2024-10-10T08:20:57Z) - Differentially Private Kernel Density Estimation [11.526850085349155]
We introduce a refined differentially private (DP) data structure for kernel density estimation (KDE)<n>We study the mathematical problem: given a similarity function $f$ and a private dataset $X subset mathbbRd$, our goal is to preprocess $X$ so that for any query $yinmathbbRd$, we approximate $sum_x in X f(x, y)$ in a differentially private fashion.
arXiv Detail & Related papers (2024-09-03T08:01:19Z) - LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - Turnstile $\ell_p$ leverage score sampling with applications [56.403488578703865]
We develop a novel algorithm for sampling rows $a_i$ of a matrix $AinmathbbRntimes d$, proportional to their $ell_p$ norm, when $A$ is presented in a turnstile data stream.
Our algorithm not only returns the set of sampled row indexes, it also returns slightly perturbed rows $tildea_i approx a_i$, and approximates their sampling probabilities up to $varepsilon$ relative error.
For logistic regression, our framework yields the first algorithm that achieves a $
arXiv Detail & Related papers (2024-06-01T07:33:41Z) - Hierarchical Clustering via Local Search [0.0]
We introduce a local search algorithm for hierarchical clustering.
We show that any locally optimal tree guarantees a revenue of at least $fracn-23sum_i jw(i,j)$ where is $n$ the number of objects and $w: [n] times [n] mathbbR+$ is the associated similarity function.
arXiv Detail & Related papers (2024-05-24T23:46:24Z) - End-to-end Feature Selection Approach for Learning Skinny Trees [13.388576838688202]
We propose a new optimization-based approach for feature selection in tree ensembles.
Skinny Trees is an end-to-end toolkit for feature selection in tree ensembles.
arXiv Detail & Related papers (2023-10-28T00:15:10Z) - Data Structures for Density Estimation [66.36971978162461]
Given a sublinear (in $n$) number of samples from $p$, our main result is the first data structure that identifies $v_i$ in time sublinear in $k$.
We also give an improved version of the algorithm of Acharya et al. that reports $v_i$ in time linear in $k$.
arXiv Detail & Related papers (2023-06-20T06:13:56Z) - On Computing Optimal Tree Ensembles [7.424944196676223]
Random forests and, more generally, (decisionnobreakdash-)tree ensembles are widely used methods for classification and regression.
Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth.
We provide two novel algorithms and corresponding lower bounds.
arXiv Detail & Related papers (2023-06-07T13:30:43Z) - The Effect of Points Dispersion on the $k$-nn Search in Random
Projection Forests [1.376408511310322]
partitioning trees are efficient data structures for $k$-nearest neighbor search.
$k$d-trees can be ineffective in high dimensions because they need more tree levels to decrease the vector quantization (VQ) error.
With larger number of trees, the dispersion of points has a very limited effect on the $k$-nn search.
arXiv Detail & Related papers (2023-02-25T20:57:06Z) - Active-LATHE: An Active Learning Algorithm for Boosting the Error
Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$.
Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - Decision trees as partitioning machines to characterize their
generalization properties [2.370481325034443]
We revisit binary decision trees on real-valued features from the perspective of partitions of the data.
We show that the VC dimension of a binary tree structure with $N$ internal nodes is of order $N log(Nell)$.
We elaborate a pruning algorithm based on these results that performs better than the CART algorithm on a number of datasets.
arXiv Detail & Related papers (2020-10-14T19:25:58Z) - On $\ell_p$-norm Robustness of Ensemble Stumps and Trees [83.81523991945018]
We develop an efficient programming based algorithm for sound verification of ensemble stumps.
We demonstrate the first certified defense method for training ensemble stumps and trees with respect to $ell_p$ norm perturbations.
arXiv Detail & Related papers (2020-08-20T03:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.