The Effect of Points Dispersion on the $k$-nn Search in Random
Projection Forests
- URL: http://arxiv.org/abs/2302.13160v1
- Date: Sat, 25 Feb 2023 20:57:06 GMT
- Title: The Effect of Points Dispersion on the $k$-nn Search in Random
Projection Forests
- Authors: Mashaan Alshammari, John Stavrakakis, Adel F. Ahmed, Masahiro
Takatsuka
- Abstract summary: partitioning trees are efficient data structures for $k$-nearest neighbor search.
$k$d-trees can be ineffective in high dimensions because they need more tree levels to decrease the vector quantization (VQ) error.
With larger number of trees, the dispersion of points has a very limited effect on the $k$-nn search.
- Score: 1.376408511310322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Partitioning trees are efficient data structures for $k$-nearest neighbor
search. Machine learning libraries commonly use a special type of partitioning
trees called $k$d-trees to perform $k$-nn search. Unfortunately, $k$d-trees can
be ineffective in high dimensions because they need more tree levels to
decrease the vector quantization (VQ) error. Random projection trees rpTrees
solve this scalability problem by using random directions to split the data. A
collection of rpTrees is called rpForest. $k$-nn search in an rpForest is
influenced by two factors: 1) the dispersion of points along the random
direction and 2) the number of rpTrees in the rpForest. In this study, we
investigate how these two factors affect the $k$-nn search with varying $k$
values and different datasets. We found that with larger number of trees, the
dispersion of points has a very limited effect on the $k$-nn search. One should
use the original rpTree algorithm by picking a random direction regardless of
the dispersion of points.
Related papers
- Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls [83.89771461061903]
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs)
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs)
We identify two key challenges contributing to this inefficiency: $textitover-exploration$ due to redundant states with semantically equivalent content, and $textitunder-exploration$ caused by high variance in verifier scoring.
We propose FETCH, a flexible, plug-and-play system compatible with various tree search algorithms.
arXiv Detail & Related papers (2025-02-16T16:12:01Z) - LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees.
We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - On Computing Optimal Tree Ensembles [7.424944196676223]
Random forests and, more generally, (decisionnobreakdash-)tree ensembles are widely used methods for classification and regression.
Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth.
We provide two novel algorithms and corresponding lower bounds.
arXiv Detail & Related papers (2023-06-07T13:30:43Z) - Random projection tree similarity metric for SpectralNet [1.376408511310322]
SpectralNet is a graph clustering method that uses neural network to find an embedding that separates the data.
We proposed a new SpectralNet similarity metric based on random projection trees (rpTrees)
Our experiments revealed that SpectralNet produces better clustering accuracy using rpTree similarity metric compared to $k$-nn graph with a distance metric.
arXiv Detail & Related papers (2023-02-25T21:32:16Z) - Learning-Augmented B-Trees [11.542679443281411]
We study learning-augmented binary search trees (BSTs) and B-Trees via Treaps with composite priorities.
The result is a simple search tree where the depth of each item is determined by its predicted weight $w_x$.
arXiv Detail & Related papers (2022-11-16T22:50:40Z) - Unbiased and Efficient Sampling of Dependency Trees [0.0]
Most treebanks require that every valid dependency tree has a single edge coming out of the ROOT node.
Zmigrod et al. have recently proposed algorithms for sampling with and without replacement from the single-root dependency tree distribution.
We show that their fastest algorithm for sampling with replacement, Wilson-RC, is in fact biased.
arXiv Detail & Related papers (2022-05-25T09:57:28Z) - Shallow decision trees for explainable $k$-means clustering [1.2891210250935146]
We propose an efficient algorithm that takes into account metrics related to the depths of the leaves in a decision tree.
In experiments on 16 datasets, our algorithm yields better results than decision-tree clustering algorithms.
arXiv Detail & Related papers (2021-12-29T18:22:28Z) - Active-LATHE: An Active Learning Algorithm for Boosting the Error
Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$.
Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z) - SGA: A Robust Algorithm for Partial Recovery of Tree-Structured
Graphical Models with Noisy Samples [75.32013242448151]
We consider learning Ising tree models when the observations from the nodes are corrupted by independent but non-identically distributed noise.
Katiyar et al. (2020) showed that although the exact tree structure cannot be recovered, one can recover a partial tree structure.
We propose Symmetrized Geometric Averaging (SGA), a more statistically robust algorithm for partial tree recovery.
arXiv Detail & Related papers (2021-01-22T01:57:35Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - An Efficient Adversarial Attack for Tree Ensembles [91.05779257472675]
adversarial attacks on tree based ensembles such as gradient boosting decision trees (DTs) and random forests (RFs)
We show that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach.
Our code is available at https://chong-z/tree-ensemble-attack.
arXiv Detail & Related papers (2020-10-22T10:59:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.