Kernel Density Estimators in Large Dimensions
- URL: http://arxiv.org/abs/2408.05807v3
- Date: Fri, 18 Oct 2024 15:19:04 GMT
- Title: Kernel Density Estimators in Large Dimensions
- Authors: Giulio Biroli, Marc Mézard,
- Abstract summary: We study the kernel-based estimate of the density $hatrho_hmathcal D(x)=frac1n hdsum_i=1n Kleft(fracx-y_ihright)$, depending on the bandwidth $h$.
We show that the optimal bandwidth threshold based on Kullback-Leibler divergence lies in the new statistical regime identified in this paper.
- Score: 9.299356601085586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies Kernel Density Estimation for a high-dimensional distribution $\rho(x)$. Traditional approaches have focused on the limit of large number of data points $n$ and fixed dimension $d$. We analyze instead the regime where both the number $n$ of data points $y_i$ and their dimensionality $d$ grow with a fixed ratio $\alpha=(\log n)/d$. Our study reveals three distinct statistical regimes for the kernel-based estimate of the density $\hat \rho_h^{\mathcal {D}}(x)=\frac{1}{n h^d}\sum_{i=1}^n K\left(\frac{x-y_i}{h}\right)$, depending on the bandwidth $h$: a classical regime for large bandwidth where the Central Limit Theorem (CLT) holds, which is akin to the one found in traditional approaches. Below a certain value of the bandwidth, $h_{CLT}(\alpha)$, we find that the CLT breaks down. The statistics of $\hat\rho_h^{\mathcal {D}}(x)$ for a fixed $x$ drawn from $\rho(x)$ is given by a heavy-tailed distribution (an alpha-stable distribution). In particular below a value $h_G(\alpha)$, we find that $\hat\rho_h^{\mathcal {D}}(x)$ is governed by extreme value statistics: only a few points in the database matter and give the dominant contribution to the density estimator. We provide a detailed analysis for high-dimensional multivariate Gaussian data. We show that the optimal bandwidth threshold based on Kullback-Leibler divergence lies in the new statistical regime identified in this paper. As known by practitioners, when decreasing the bandwidth a Kernel-estimated estimated changes from a smooth curve to a collections of peaks centred on the data points. Our findings reveal that this general phenomenon is related to sharp transitions between phases characterized by different statistical properties, and offer new insights for Kernel density estimation in high-dimensional settings.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Statistical-Computational Trade-offs for Density Estimation [60.81548752871115]
We show that for a broad class of data structures their bounds cannot be significantly improved.
This is a novel emphstatistical-computational trade-off for density estimation.
arXiv Detail & Related papers (2024-10-30T15:03:33Z) - Convergence Analysis of Probability Flow ODE for Score-based Generative Models [5.939858158928473]
We study the convergence properties of deterministic samplers based on probability flow ODEs from both theoretical and numerical perspectives.
We prove the total variation between the target and the generated data distributions can be bounded above by $mathcalO(d3/4delta1/2)$ in the continuous time level.
arXiv Detail & Related papers (2024-04-15T12:29:28Z) - Efficient Estimation of the Central Mean Subspace via Smoothed Gradient Outer Products [12.047053875716506]
We consider the problem of sufficient dimension reduction for multi-index models.
We show that a fast parametric convergence rate of form $C_d cdot n-1/2$ is achievable.
arXiv Detail & Related papers (2023-12-24T12:28:07Z) - Optimal Rate of Kernel Regression in Large Dimensions [13.641780902673792]
We first build a general tool to characterize the upper bound and the minimax lower bound of kernel regression for large dimensional data.
We utilize the new tool to show that the minimax rate of the excess risk of kernel regression is $n-1/2$ when $nasymp dgamma$ for $gamma =2, 4, 6, 8, cdots$.
arXiv Detail & Related papers (2023-09-08T11:29:05Z) - Nearly $d$-Linear Convergence Bounds for Diffusion Models via Stochastic
Localization [40.808942894229325]
We provide the first convergence bounds which are linear in the data dimension.
We show that diffusion models require at most $tilde O(fracd log2(1/delta)varepsilon2)$ steps to approximate an arbitrary distribution.
arXiv Detail & Related papers (2023-08-07T16:01:14Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - Data Structures for Density Estimation [66.36971978162461]
Given a sublinear (in $n$) number of samples from $p$, our main result is the first data structure that identifies $v_i$ in time sublinear in $k$.
We also give an improved version of the algorithm of Acharya et al. that reports $v_i$ in time linear in $k$.
arXiv Detail & Related papers (2023-06-20T06:13:56Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Convergence of Graph Laplacian with kNN Self-tuned Kernels [14.645468999921961]
Self-tuned kernel adaptively sets a $sigma_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance.
This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels.
arXiv Detail & Related papers (2020-11-03T04:55:33Z) - Analysis of KNN Density Estimation [56.29748742084386]
kNN density estimation is minimax optimal under both $ell_infty$ and $ell_infty$ criteria, if the support set is known.
The $ell_infty$ error does not reach the minimax lower bound, but is better than kernel density estimation.
arXiv Detail & Related papers (2020-09-30T03:33:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.