Maximum Mean Discrepancy Meets Neural Networks: The
Radon-Kolmogorov-Smirnov Test
- URL: http://arxiv.org/abs/2309.02422v3
- Date: Mon, 6 Nov 2023 23:17:07 GMT
- Title: Maximum Mean Discrepancy Meets Neural Networks: The
Radon-Kolmogorov-Smirnov Test
- Authors: Seunghoon Paik, Michael Celentano, Alden Green, Ryan J. Tibshirani
- Abstract summary: We study the function of $mathcalF$ to be the unit ball in the RBV space of a given smoothness order $k geq 0$.
This test can be viewed as a generalization of the well-known and classical Kolmogorov-Smirnov (KS) test to multiple dimensions and higher orders of smoothness.
We leverage the power of modern deep learning toolkits to optimize the criterion that underlies the RKS test.
- Score: 5.255750357176021
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Maximum mean discrepancy (MMD) refers to a general class of nonparametric
two-sample tests that are based on maximizing the mean difference over samples
from one distribution $P$ versus another $Q$, over all choices of data
transformations $f$ living in some function space $\mathcal{F}$. Inspired by
recent work that connects what are known as functions of $\textit{Radon bounded
variation}$ (RBV) and neural networks (Parhi and Nowak, 2021, 2023), we study
the MMD defined by taking $\mathcal{F}$ to be the unit ball in the RBV space of
a given smoothness order $k \geq 0$. This test, which we refer to as the
$\textit{Radon-Kolmogorov-Smirnov}$ (RKS) test, can be viewed as a
generalization of the well-known and classical Kolmogorov-Smirnov (KS) test to
multiple dimensions and higher orders of smoothness. It is also intimately
connected to neural networks: we prove that the witness in the RKS test -- the
function $f$ achieving the maximum mean difference -- is always a ridge spline
of degree $k$, i.e., a single neuron in a neural network. This allows us to
leverage the power of modern deep learning toolkits to (approximately) optimize
the criterion that underlies the RKS test. We prove that the RKS test has
asymptotically full power at distinguishing any distinct pair $P \not= Q$ of
distributions, derive its asymptotic null distribution, and carry out extensive
experiments to elucidate the strengths and weakenesses of the RKS test versus
the more traditional kernel MMD test.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected.
We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure.
Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z) - Kernel-Based Tests for Likelihood-Free Hypothesis Testing [21.143798051525646]
Given $n$ observations from two balanced classes, consider the task of labeling an additional $m$ inputs that are known to all belong to emphone of the two classes.
Special cases of this problem are well-known; when $m=1$ it corresponds to binary classification; and when $mapprox n$ it is equivalent to two-sample testing.
In recent work it was discovered that there is a fundamental trade-off between $m$ and $n$: increasing the data sample $m$ reduces the amount $n$ of training/simulation data needed.
arXiv Detail & Related papers (2023-08-17T15:24:03Z) - Degree Heterogeneity in Higher-Order Networks: Inference in the Hypergraph $\boldsymbolβ$-Model [4.540236408836132]
We study the hypergraph $boldsymbolbeta$-model with multiple layers, which allows for hyperedges of different sizes across the layers.
We derive the rates of convergence of maximum likelihood (ML) estimates and establish their minimax rate optimality.
We also consider the goodness-of-fit problem in the hypergraph $boldsymbolbeta$-model.
arXiv Detail & Related papers (2023-07-06T07:23:06Z) - Neural Inference of Gaussian Processes for Time Series Data of Quasars [72.79083473275742]
We introduce a new model that enables it to describe quasar spectra completely.
We also introduce a new method of inference of Gaussian process parameters, which we call $textitNeural Inference$.
The combination of both the CDRW model and Neural Inference significantly outperforms the baseline DRW and MLE.
arXiv Detail & Related papers (2022-11-17T13:01:26Z) - A Manifold Two-Sample Test Study: Integral Probability Metric with
Neural Networks [46.62713126719579]
Two-sample tests are important areas aiming to determine whether two collections of observations follow the same distribution or not.
We propose two-sample tests based on integral probability metric (IPM) for high-dimensional samples supported on a low-dimensional manifold.
Our proposed tests are adaptive to low-dimensional geometric structure because their performance crucially depends on the intrinsic dimension instead of the data dimension.
arXiv Detail & Related papers (2022-05-04T13:03:31Z) - Deformed semicircle law and concentration of nonlinear random matrices
for ultra-wide neural networks [29.03095282348978]
We study the limiting spectral distributions of two empirical kernel matrices associated with $f(X)$.
We show that random feature regression induced by the empirical kernel achieves the same performance as its limiting kernel regression under the ultra-wide regime.
arXiv Detail & Related papers (2021-09-20T05:25:52Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Adjusted chi-square test for degree-corrected block models [13.122543280692641]
We propose a goodness-of-fit test for degree-corrected block models (DCSBM)
We show that a simple adjustment allows the statistic to converge in distribution, under null, as long as the harmonic mean of $d_i$ grows to infinity.
Our distributional results are nonasymptotic, with explicit constants, providing finite-sample bounds on the Kolmogorov-Smirnov distance to the target distribution.
arXiv Detail & Related papers (2020-12-30T05:20:59Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.