Kernel-Based Tests for Likelihood-Free Hypothesis Testing
- URL: http://arxiv.org/abs/2308.09043v2
- Date: Thu, 23 Nov 2023 23:39:55 GMT
- Title: Kernel-Based Tests for Likelihood-Free Hypothesis Testing
- Authors: Patrik R\'obert Gerber, Tianze Jiang, Yury Polyanskiy, Rui Sun
- Abstract summary: Given $n$ observations from two balanced classes, consider the task of labeling an additional $m$ inputs that are known to all belong to emphone of the two classes.
Special cases of this problem are well-known; when $m=1$ it corresponds to binary classification; and when $mapprox n$ it is equivalent to two-sample testing.
In recent work it was discovered that there is a fundamental trade-off between $m$ and $n$: increasing the data sample $m$ reduces the amount $n$ of training/simulation data needed.
- Score: 21.143798051525646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given $n$ observations from two balanced classes, consider the task of
labeling an additional $m$ inputs that are known to all belong to \emph{one} of
the two classes. Special cases of this problem are well-known: with complete
knowledge of class distributions ($n=\infty$) the problem is solved optimally
by the likelihood-ratio test; when $m=1$ it corresponds to binary
classification; and when $m\approx n$ it is equivalent to two-sample testing.
The intermediate settings occur in the field of likelihood-free inference,
where labeled samples are obtained by running forward simulations and the
unlabeled sample is collected experimentally. In recent work it was discovered
that there is a fundamental trade-off between $m$ and $n$: increasing the data
sample $m$ reduces the amount $n$ of training/simulation data needed. In this
work we (a) introduce a generalization where unlabeled samples come from a
mixture of the two classes -- a case often encountered in practice; (b) study
the minimax sample complexity for non-parametric classes of densities under
\textit{maximum mean discrepancy} (MMD) separation; and (c) investigate the
empirical performance of kernels parameterized by neural networks on two tasks:
detection of the Higgs boson and detection of planted DDPM generated images
amidst CIFAR-10 images. For both problems we confirm the existence of the
theoretically predicted asymmetric $m$ vs $n$ trade-off.
Related papers
- Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected.
We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure.
Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z) - Testable Learning with Distribution Shift [9.036777309376697]
We define a new model called testable learning with distribution shift.
We obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution.
We give several positive results for learning concept classes such as halfspaces, intersections of halfspaces, and decision trees.
arXiv Detail & Related papers (2023-11-25T23:57:45Z) - Out-Of-Domain Unlabeled Data Improves Generalization [0.7589678255312519]
We propose a novel framework for incorporating unlabeled data into semi-supervised classification problems.
We show that unlabeled samples can be harnessed to narrow the generalization gap.
We validate our claims through experiments conducted on a variety of synthetic and real-world datasets.
arXiv Detail & Related papers (2023-09-29T02:00:03Z) - A Manifold Two-Sample Test Study: Integral Probability Metric with
Neural Networks [46.62713126719579]
Two-sample tests are important areas aiming to determine whether two collections of observations follow the same distribution or not.
We propose two-sample tests based on integral probability metric (IPM) for high-dimensional samples supported on a low-dimensional manifold.
Our proposed tests are adaptive to low-dimensional geometric structure because their performance crucially depends on the intrinsic dimension instead of the data dimension.
arXiv Detail & Related papers (2022-05-04T13:03:31Z) - Testing distributional assumptions of learning algorithms [5.204779946147061]
We study the design of tester-learner pairs $(mathcalA,mathcalT)$.
We show that if the distribution on examples in the data passes the tester $mathcalT$ then one can safely trust the output of the agnostic $mathcalA$ on the data.
arXiv Detail & Related papers (2022-04-14T19:10:53Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Binary classification with ambiguous training data [69.50862982117127]
In supervised learning, we often face with ambiguous (A) samples that are difficult to label even by domain experts.
This problem is substantially different from semi-supervised learning since unlabeled samples are not necessarily difficult samples.
arXiv Detail & Related papers (2020-11-05T00:53:58Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.