Kernel Two-Sample Tests for Manifold Data
- URL: http://arxiv.org/abs/2105.03425v4
- Date: Mon, 26 Feb 2024 18:36:46 GMT
- Title: Kernel Two-Sample Tests for Manifold Data
- Authors: Xiuyuan Cheng, Yao Xie
- Abstract summary: We present a kernel-based two-sample test statistic related to the Maximum Mean Discrepancy (MMD) in the manifold data setting.
We characterize the test level and power in relation to the kernel bandwidth, the number of samples, and the intrinsic dimensionality of the manifold.
Our results indicate that the kernel two-sample test has no curse-of-dimensionality when the data lie on or near a low-dimensional manifold.
- Score: 22.09364630430699
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a study of a kernel-based two-sample test statistic related to the
Maximum Mean Discrepancy (MMD) in the manifold data setting, assuming that
high-dimensional observations are close to a low-dimensional manifold. We
characterize the test level and power in relation to the kernel bandwidth, the
number of samples, and the intrinsic dimensionality of the manifold.
Specifically, when data densities $p$ and $q$ are supported on a
$d$-dimensional sub-manifold ${M}$ embedded in an $m$-dimensional space and are
H\"older with order $\beta$ (up to 2) on ${M}$, we prove a guarantee of the
test power for finite sample size $n$ that exceeds a threshold depending on
$d$, $\beta$, and $\Delta_2$ the squared $L^2$-divergence between $p$ and $q$
on the manifold, and with a properly chosen kernel bandwidth $\gamma$. For
small density departures, we show that with large $n$ they can be detected by
the kernel test when $\Delta_2$ is greater than $n^{- { 2 \beta/( d + 4 \beta )
}}$ up to a certain constant and $\gamma$ scales as $n^{-1/(d+4\beta)}$. The
analysis extends to cases where the manifold has a boundary and the data
samples contain high-dimensional additive noise. Our results indicate that the
kernel two-sample test has no curse-of-dimensionality when the data lie on or
near a low-dimensional manifold. We validate our theory and the properties of
the kernel test for manifold data through a series of numerical experiments.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Kernel Density Estimators in Large Dimensions [9.299356601085586]
We study the kernel-based estimate of the density $hatrho_hmathcal D(x)=frac1n hdsum_i=1n Kleft(fracx-y_ihright)$, depending on the bandwidth $h$.
We show that the optimal bandwidth threshold based on Kullback-Leibler divergence lies in the new statistical regime identified in this paper.
arXiv Detail & Related papers (2024-08-11T15:56:44Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - The $L^\infty$ Learnability of Reproducing Kernel Hilbert Spaces [3.2931415075553576]
We analyze the learnability of kernel spaces (RKHS) under the $Linfty$ norm.
For dot-product kernels on the sphere, we identify conditions when the $Linfty$ learning can be achieved with Hilbert samples.
arXiv Detail & Related papers (2023-06-05T12:29:13Z) - A Manifold Two-Sample Test Study: Integral Probability Metric with
Neural Networks [46.62713126719579]
Two-sample tests are important areas aiming to determine whether two collections of observations follow the same distribution or not.
We propose two-sample tests based on integral probability metric (IPM) for high-dimensional samples supported on a low-dimensional manifold.
Our proposed tests are adaptive to low-dimensional geometric structure because their performance crucially depends on the intrinsic dimension instead of the data dimension.
arXiv Detail & Related papers (2022-05-04T13:03:31Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Non-Parametric Estimation of Manifolds from Noisy Data [1.0152838128195467]
We consider the problem of estimating a $d$ dimensional sub-manifold of $mathbbRD$ from a finite set of noisy samples.
We show that the estimation yields rates of convergence of $n-frack2k + d$ for the point estimation and $n-frack-12k + d$ for the estimation of tangent space.
arXiv Detail & Related papers (2021-05-11T02:29:33Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Convergence of Graph Laplacian with kNN Self-tuned Kernels [14.645468999921961]
Self-tuned kernel adaptively sets a $sigma_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance.
This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels.
arXiv Detail & Related papers (2020-11-03T04:55:33Z) - How isotropic kernels perform on simple invariants [0.5729426778193397]
We investigate how the training curve of isotropic kernel methods depends on the symmetry of the task to be learned.
We show that for large bandwidth, $beta = fracd-1+xi3d-3+xi$, where $xiin (0,2)$ is the exponent characterizing the stripe of the kernel at the origin.
arXiv Detail & Related papers (2020-06-17T09:59:18Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.