Related papers: Integral Probability Metrics Meet Neural Networks: The Radon-Kolmogorov-Smirnov Test

Integral Probability Metrics Meet Neural Networks: The Radon-Kolmogorov-Smirnov Test

URL: http://arxiv.org/abs/2309.02422v4
Date: Mon, 13 Jan 2025 02:53:39 GMT
Title: Integral Probability Metrics Meet Neural Networks: The Radon-Kolmogorov-Smirnov Test
Authors: Seunghoon Paik, Michael Celentano, Alden Green, Ryan J. Tibshirani,
Abstract summary: We study the function of $mathcalF$ to be the unit ball in the RBV space of a given smoothness degree $k geq 0$.<n>This test can be viewed as a generalization of the well-known and classical Kolmogorov-Smirnov (KS) test to multiple dimensions and higher orders of smoothness.
Score: 4.81107183727255
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Integral probability metrics (IPMs) constitute a general class of nonparametric two-sample tests that are based on maximizing the mean difference between samples from one distribution $P$ versus another $Q$, over all choices of data transformations $f$ living in some function space $\mathcal{F}$. Inspired by recent work that connects what are known as functions of $\textit{Radon bounded variation}$ (RBV) and neural networks (Parhi and Nowak, 2021, 2023), we study the IPM defined by taking $\mathcal{F}$ to be the unit ball in the RBV space of a given smoothness degree $k \geq 0$. This test, which we refer to as the $\textit{Radon-Kolmogorov-Smirnov}$ (RKS) test, can be viewed as a generalization of the well-known and classical Kolmogorov-Smirnov (KS) test to multiple dimensions and higher orders of smoothness. It is also intimately connected to neural networks: we prove that the witness in the RKS test -- the function $f$ achieving the maximum mean difference -- is always a ridge spline of degree $k$, i.e., a single neuron in a neural network. We can thus leverage the power of modern neural network optimization toolkits to (approximately) maximize the criterion that underlies the RKS test. We prove that the RKS test has asymptotically full power at distinguishing any distinct pair $P \not= Q$ of distributions, derive its asymptotic null distribution, and carry out experiments to elucidate the strengths and weaknesses of the RKS test versus the more traditional kernel MMD test.

Related papers

Approximation Rates in Besov Norms and Sample-Complexity of Kolmogorov-Arnold Networks with Residual Connections [9.817834520159936]
Kolmogorov-Arnold Networks (KANs) have emerged as an improved backbone for most deep learning frameworks.<n>We show that KANs can optimally approximate any Besov function in $Bs_p,q(mathcalX)$ on a bounded open, or even fractal, domain.
arXiv Detail & Related papers (2025-04-21T14:02:59Z)
Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional neural networks [0.49728186750345144]
There are proposed novel artificial neurons based on HCR (Arnold Correlation Reconstruction) allowing to remove low level differences. Such HCR network can also propagate probability distributions (also joint) like $rho(y,z|x)$. It also allows for additional training approaches, like direct $(a_mathbfj)$ estimation, through tensor decomposition.
arXiv Detail & Related papers (2024-05-08T14:49:27Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Multi-layer random features and the approximation power of neural networks [4.178980693837599]
We prove that a reproducing kernel Hilbert space contains only functions that can be approximated by the architecture. We show that if eigenvalues of the integral operator of the NNGP decay slower than $k-n-frac23$ where $k$ is an order of an eigenvalue, our theorem guarantees a more succinct neural network approximation than Barron's theorem.
arXiv Detail & Related papers (2024-04-26T14:57:56Z)
Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected. We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure. Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z)
Kernel-Based Tests for Likelihood-Free Hypothesis Testing [21.143798051525646]
Given $n$ observations from two balanced classes, consider the task of labeling an additional $m$ inputs that are known to all belong to emphone of the two classes. Special cases of this problem are well-known; when $m=1$ it corresponds to binary classification; and when $mapprox n$ it is equivalent to two-sample testing. In recent work it was discovered that there is a fundamental trade-off between $m$ and $n$: increasing the data sample $m$ reduces the amount $n$ of training/simulation data needed.
arXiv Detail & Related papers (2023-08-17T15:24:03Z)
Degree Heterogeneity in Higher-Order Networks: Inference in the Hypergraph $\boldsymbolβ$-Model [4.540236408836132]
We study the hypergraph $boldsymbolbeta$-model with multiple layers, which allows for hyperedges of different sizes across the layers. We derive the rates of convergence of maximum likelihood (ML) estimates and establish their minimax rate optimality. We also consider the goodness-of-fit problem in the hypergraph $boldsymbolbeta$-model.
arXiv Detail & Related papers (2023-07-06T07:23:06Z)
Neural Inference of Gaussian Processes for Time Series Data of Quasars [72.79083473275742]
We introduce a new model that enables it to describe quasar spectra completely. We also introduce a new method of inference of Gaussian process parameters, which we call $textitNeural Inference$. The combination of both the CDRW model and Neural Inference significantly outperforms the baseline DRW and MLE.
arXiv Detail & Related papers (2022-11-17T13:01:26Z)
On the Identifiability and Estimation of Causal Location-Scale Noise Models [122.65417012597754]
We study the class of location-scale or heteroscedastic noise models (LSNMs) We show the causal direction is identifiable up to some pathological cases. We propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks.
arXiv Detail & Related papers (2022-10-13T17:18:59Z)
A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks [46.62713126719579]
Two-sample tests are important areas aiming to determine whether two collections of observations follow the same distribution or not. We propose two-sample tests based on integral probability metric (IPM) for high-dimensional samples supported on a low-dimensional manifold. Our proposed tests are adaptive to low-dimensional geometric structure because their performance crucially depends on the intrinsic dimension instead of the data dimension.
arXiv Detail & Related papers (2022-05-04T13:03:31Z)
Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks [29.03095282348978]
We study the limiting spectral distributions of two empirical kernel matrices associated with $f(X)$. We show that random feature regression induced by the empirical kernel achieves the same performance as its limiting kernel regression under the ultra-wide regime.
arXiv Detail & Related papers (2021-09-20T05:25:52Z)
Fundamental tradeoffs between memorization and robustness in random features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded. Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z)
The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$. In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples. We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z)
Adjusted chi-square test for degree-corrected block models [13.122543280692641]
We propose a goodness-of-fit test for degree-corrected block models (DCSBM) We show that a simple adjustment allows the statistic to converge in distribution, under null, as long as the harmonic mean of $d_i$ grows to infinity. Our distributional results are nonasymptotic, with explicit constants, providing finite-sample bounds on the Kolmogorov-Smirnov distance to the target distribution.
arXiv Detail & Related papers (2020-12-30T05:20:59Z)
Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network. We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.