Related papers: Analysis of KNN Density Estimation

Analysis of KNN Density Estimation

URL: http://arxiv.org/abs/2010.00438v1
Date: Wed, 30 Sep 2020 03:33:17 GMT
Title: Analysis of KNN Density Estimation
Authors: Puning Zhao, Lifeng Lai
Abstract summary: kNN density estimation is minimax optimal under both $ell_infty$ and $ell_infty$ criteria, if the support set is known. The $ell_infty$ error does not reach the minimax lower bound, but is better than kernel density estimation.
Score: 56.29748742084386
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We analyze the $\ell_1$ and $\ell_\infty$ convergence rates of k nearest neighbor density estimation method. Our analysis includes two different cases depending on whether the support set is bounded or not. In the first case, the probability density function has a bounded support and is bounded away from zero. We show that kNN density estimation is minimax optimal under both $\ell_1$ and $\ell_\infty$ criteria, if the support set is known. If the support set is unknown, then the convergence rate of $\ell_1$ error is not affected, while $\ell_\infty$ error does not converge. In the second case, the probability density function can approach zero and is smooth everywhere. Moreover, the Hessian is assumed to decay with the density values. For this case, our result shows that the $\ell_\infty$ error of kNN density estimation is nearly minimax optimal. The $\ell_1$ error does not reach the minimax lower bound, but is better than kernel density estimation.

Related papers

Optimal community detection in dense bipartite graphs [0.0]
We consider the problem of detecting a community of densely connected vertices in a high-dimensional bipartite graph of size $n_1 times n$.<n>We provide non-asymptotic upper and lower bounds on the smallest signal strength $delta*$ that is both necessary and sufficient to ensure the existence of a test with small enough type one and type two errors.<n>Our proposed tests involve a combination of hard-thresholded nonlinear statistics of the adjacency matrix, the analysis of which may be of independent interest.
arXiv Detail & Related papers (2025-05-23T20:58:55Z)
Robust density estimation over star-shaped density classes [9.694065209325412]
We construct an algorithm to construct a density estimator within a star-shaped density class, $mathcalF$. We assume that a fraction $epsilon leq frac13$ of the $N$ observations are arbitrarily corrupted. Under certain conditions, we obtain the minimax risk, up to proportionality to constants, as $$ maxleft tau*2 wedge d2, epsilon wedge d2 right, $$ where $tau*
arXiv Detail & Related papers (2025-01-17T08:24:30Z)
Statistical-Computational Trade-offs for Density Estimation [60.81548752871115]
We show that for a broad class of data structures their bounds cannot be significantly improved. This is a novel emphstatistical-computational trade-off for density estimation.
arXiv Detail & Related papers (2024-10-30T15:03:33Z)
Kernel Density Estimators in Large Dimensions [9.299356601085586]
We study the kernel-based estimate of the density $hatrho_hmathcal D(x)=frac1n hdsum_i=1n Kleft(fracx-y_ihright)$, depending on the bandwidth $h$. We show that the optimal bandwidth threshold based on Kullback-Leibler divergence lies in the new statistical regime identified in this paper.
arXiv Detail & Related papers (2024-08-11T15:56:44Z)
Further Understanding of a Local Gaussian Process Approximation: Characterising Convergence in the Finite Regime [1.3518297878940662]
We show that common choices of kernel functions for a highly accurate and massively scalable GPnn regression model exhibit gradual convergence to behaviour as dataset-size $n$ increases. Similar bounds can be found under model misspecification and combined to give overall rates of convergence of both MSE and an important calibration metric.
arXiv Detail & Related papers (2024-04-09T10:47:01Z)
Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density. We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order. We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z)
Data Structures for Density Estimation [66.36971978162461]
Given a sublinear (in $n$) number of samples from $p$, our main result is the first data structure that identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. that reports $v_i$ in time linear in $k$.
arXiv Detail & Related papers (2023-06-20T06:13:56Z)
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize [55.0090961425708]
We propose a new, simplified high probability analysis of AdaGrad for smooth, non- probability problems. We present our analysis in a modular way and obtain a complementary $mathcal O (1 / TT)$ convergence rate in the deterministic setting. To the best of our knowledge, this is the first high probability for AdaGrad with a truly adaptive scheme, i.e., completely oblivious to the knowledge of smoothness.
arXiv Detail & Related papers (2022-04-06T13:50:33Z)
Random quantum circuits transform local noise into global white noise [118.18170052022323]
We study the distribution over measurement outcomes of noisy random quantum circuits in the low-fidelity regime. For local noise that is sufficiently weak and unital, correlations (measured by the linear cross-entropy benchmark) between the output distribution $p_textnoisy$ of a generic noisy circuit instance shrink exponentially. If the noise is incoherent, the output distribution approaches the uniform distribution $p_textunif$ at precisely the same rate.
arXiv Detail & Related papers (2021-11-29T19:26:28Z)
Localization in 1D non-parametric latent space models from pairwise affinities [6.982738885923206]
We consider the problem of estimating latent positions in a one-dimensional torus from pairwise affinities. We introduce an estimation procedure that provably localizes all the latent positions with a maximum error of the order of $sqrtlog(n)/n$, with high-probability.
arXiv Detail & Related papers (2021-08-06T13:05:30Z)
Rates of convergence for density estimation with generative adversarial networks [19.71040653379663]
We prove an oracle inequality for the Jensen-Shannon (JS) divergence between the underlying density $mathsfp*$ and the GAN estimate. We show that the JS-divergence between the GAN estimate and $mathsfp*$ decays as fast as $(logn/n)2beta/ (2beta + d)$.
arXiv Detail & Related papers (2021-01-30T09:59:14Z)
Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist. We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z)
Convergence of Graph Laplacian with kNN Self-tuned Kernels [14.645468999921961]
Self-tuned kernel adaptively sets a $sigma_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance. This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels.
arXiv Detail & Related papers (2020-11-03T04:55:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.