Empirical complexity of comparator-based nearest neighbor descent
- URL: http://arxiv.org/abs/2202.00517v1
- Date: Sun, 30 Jan 2022 21:37:53 GMT
- Title: Empirical complexity of comparator-based nearest neighbor descent
- Authors: Jacob D. Baron and R. W. R. Darling
- Abstract summary: A Java parallel streams implementation of the $K$-nearest neighbor descent algorithm is presented.
Experiments with the Kullback-Leibler divergence Comparator support the prediction that the number of rounds of $K$-nearest neighbor updates need not exceed twice the diameter.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A Java parallel streams implementation of the $K$-nearest neighbor descent
algorithm is presented using a natural statistical termination criterion. Input
data consist of a set $S$ of $n$ objects of type V, and a Function<V,
Comparator<V>>, which enables any $x \in S$ to decide which of $y, z \in
S\setminus\{x\}$ is more similar to $x$. Experiments with the Kullback-Leibler
divergence Comparator support the prediction that the number of rounds of
$K$-nearest neighbor updates need not exceed twice the diameter of the
undirected version of a random regular out-degree $K$ digraph on $n$ vertices.
Overall complexity was $O(n K^2 \log_K(n))$ in the class of examples studied.
When objects are sampled uniformly from a $d$-dimensional simplex, accuracy of
the $K$-nearest neighbor approximation is high up to $d = 20$, but declines in
higher dimensions, as theory would predict.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Measuring quantum relative entropy with finite-size effect [53.64687146666141]
We study the estimation of relative entropy $D(rho|sigma)$ when $sigma$ is known.
Our estimator attains the Cram'er-Rao type bound when the dimension $d$ is fixed.
arXiv Detail & Related papers (2024-06-25T06:07:20Z) - A Theory of Interpretable Approximations [61.90216959710842]
We study the idea of approximating a target concept $c$ by a small aggregation of concepts from some base class $mathcalH$.
For any given pair of $mathcalH$ and $c$, exactly one of these cases holds: (i) $c$ cannot be approximated by $mathcalH$ with arbitrary accuracy.
We show that, in the case of interpretable approximations, even a slightly nontrivial a-priori guarantee on the complexity of approximations implies approximations with constant (distribution-free and accuracy-
arXiv Detail & Related papers (2024-06-15T06:43:45Z) - List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering [42.526664955704746]
We develop a novel, conceptually simpler technique for list-decodable sparse mean estimation.
In particular, for distributions with "certifiably bounded" $t-th moments in $k$-sparse directions, our algorithm achieves error of $(1/alpha)O (1/t)$ with sample complexity $m = (klog(n))O(t)/alpha(mnt)$.
For the special case of Gaussian inliers, our algorithm achieves the optimal error guarantee of $Theta (sqrtlog
arXiv Detail & Related papers (2022-06-10T17:38:18Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Statistically Near-Optimal Hypothesis Selection [33.83129262033921]
We derive an optimal $2$-approximation learning strategy for the Hypothesis Selection problem.
This is the first algorithm that simultaneously achieves the best approximation factor and sample complexity.
arXiv Detail & Related papers (2021-08-17T21:11:20Z) - On Efficient Low Distortion Ultrametric Embedding [18.227854382422112]
A widely-used method to preserve the underlying hierarchical structure of the data is to find an embedding of the data into a tree or an ultrametric.
In this paper, we provide a new algorithm which takes as input a set of points isometric in $mathbbRd2 (for universal constant $rho>1$) to output an ultrametric $Delta.
We show that the output of our algorithm is comparable to the output of the linkage algorithms while achieving a much faster running time.
arXiv Detail & Related papers (2020-08-15T11:06:45Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Robust testing of low-dimensional functions [8.927163098772589]
A recent work of the authors showed that linear $k$-juntas are testable.
Following the surge of interest in noise-tolerant property testing, in this paper we prove a noise-tolerant (or robust) version of this result.
We obtain a fully noise tolerant tester with query complexity $kO(mathsfpoly(log k/epsilon))$ for the class of intersection of $k$-halfspaces.
arXiv Detail & Related papers (2020-04-24T10:23:12Z) - Agnostic Q-learning with Function Approximation in Deterministic
Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems.
We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z) - Efficient Distance Approximation for Structured High-Dimensional
Distributions via Learning [31.552581550603005]
We design efficient distance approximation algorithms for several classes of structured high-dimensional distributions.
Our results are the first efficient distance approximation algorithms for these well-studied problems.
arXiv Detail & Related papers (2020-02-13T07:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.