Convergence of Graph Laplacian with kNN Self-tuned Kernels
- URL: http://arxiv.org/abs/2011.01479v2
- Date: Wed, 10 Feb 2021 02:17:46 GMT
- Title: Convergence of Graph Laplacian with kNN Self-tuned Kernels
- Authors: Xiuyuan Cheng, Hau-Tieng Wu
- Abstract summary: Self-tuned kernel adaptively sets a $sigma_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance.
This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels.
- Score: 14.645468999921961
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kernelized Gram matrix $W$ constructed from data points $\{x_i\}_{i=1}^N$ as
$W_{ij}= k_0( \frac{ \| x_i - x_j \|^2} {\sigma^2} )$ is widely used in
graph-based geometric data analysis and unsupervised learning. An important
question is how to choose the kernel bandwidth $\sigma$, and a common practice
called self-tuned kernel adaptively sets a $\sigma_i$ at each point $x_i$ by
the $k$-nearest neighbor (kNN) distance. When $x_i$'s are sampled from a
$d$-dimensional manifold embedded in a possibly high-dimensional space, unlike
with fixed-bandwidth kernels, theoretical results of graph Laplacian
convergence with self-tuned kernels have been incomplete. This paper proves the
convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian
for a new family of kNN self-tuned kernels $W^{(\alpha)}_{ij} = k_0( \frac{ \|
x_i - x_j \|^2}{ \epsilon \hat{\rho}(x_i)
\hat{\rho}(x_j)})/\hat{\rho}(x_i)^\alpha \hat{\rho}(x_j)^\alpha$, where
$\hat{\rho}$ is the estimated bandwidth function {by kNN}, and the limiting
operator is also parametrized by $\alpha$. When $\alpha = 1$, the limiting
operator is the weighted manifold Laplacian $\Delta_p$. Specifically, we prove
the point-wise convergence of $L_N f $ and convergence of the graph Dirichlet
form with rates. Our analysis is based on first establishing a $C^0$
consistency for $\hat{\rho}$ which bounds the relative estimation error
$|\hat{\rho} - \bar{\rho}|/\bar{\rho}$ uniformly with high probability, where
$\bar{\rho} = p^{-1/d}$, and $p$ is the data density function. Our theoretical
results reveal the advantage of self-tuned kernel over fixed-bandwidth kernel
via smaller variance error in low-density regions. In the algorithm, no prior
knowledge of $d$ or data density is needed. The theoretical results are
supported by numerical experiments on simulated data and hand-written digit
image data.
Related papers
- Improved convergence rate of kNN graph Laplacians [11.93971616098517]
General class of $k$NN graph where the graph affinity is $W_ij = epsilon-d/2 .
We prove the point-wise convergence of the $k$NN graph Laplacian to the limiting manifold operator.
arXiv Detail & Related papers (2024-10-30T17:01:00Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Detection of Dense Subhypergraphs by Low-Degree Polynomials [72.4451045270967]
Detection of a planted dense subgraph in a random graph is a fundamental statistical and computational problem.
We consider detecting the presence of a planted $Gr(ngamma, n-alpha)$ subhypergraph in a $Gr(n, n-beta) hypergraph.
Our results are already new in the graph case $r=2$, as we consider the subtle log-density regime where hardness based on average-case reductions is not known.
arXiv Detail & Related papers (2023-04-17T10:38:08Z) - Random Graph Matching in Geometric Models: the Case of Complete Graphs [21.689343447798677]
This paper studies the problem of matching two complete graphs with edge weights correlated through latent geometries.
We derive an approximate maximum likelihood estimator, which provably achieves, with high probability, perfect recovery of $pi*$.
As a side discovery, we show that the celebrated spectral algorithm of [Ume88] emerges as a further approximation to the maximum likelihood in the geometric model.
arXiv Detail & Related papers (2022-02-22T04:14:45Z) - Fast Graph Sampling for Short Video Summarization using Gershgorin Disc
Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling.
Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Non-Parametric Estimation of Manifolds from Noisy Data [1.0152838128195467]
We consider the problem of estimating a $d$ dimensional sub-manifold of $mathbbRD$ from a finite set of noisy samples.
We show that the estimation yields rates of convergence of $n-frack2k + d$ for the point estimation and $n-frack-12k + d$ for the estimation of tangent space.
arXiv Detail & Related papers (2021-05-11T02:29:33Z) - From Smooth Wasserstein Distance to Dual Sobolev Norm: Empirical
Approximation and Statistical Applications [18.618590805279187]
We show that $mathsfW_p(sigma)$ is controlled by a $pth order smooth dual Sobolev $mathsfd_p(sigma)$.
We derive the limit distribution of $sqrtnmathsfd_p(sigma)(hatmu_n,mu)$ in all dimensions $d$, when $mu$ is sub-Gaussian.
arXiv Detail & Related papers (2021-01-11T17:23:24Z) - Optimal Coreset for Gaussian Kernel Density Estimation [0.8376091455761259]
Given a point set $Psubset mathbbRd$, the kernel density estimate of $P$ is defined as [ overlinemathcalG_P(x) = frac1left|Pright|sum_pin Pe-leftlVert x-p rightrVert2 ] for any $xinmathbbRd$.
We study how to construct a small subset $Q$ of $P
arXiv Detail & Related papers (2020-07-15T22:58:50Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.