Compress Then Test: Powerful Kernel Testing in Near-linear Time
- URL: http://arxiv.org/abs/2301.05974v1
- Date: Sat, 14 Jan 2023 21:02:58 GMT
- Title: Compress Then Test: Powerful Kernel Testing in Near-linear Time
- Authors: Carles Domingo-Enrich, Raaz Dwivedi, Lester Mackey
- Abstract summary: Kernel two-sample testing provides a powerful framework for distinguishing any pair of distributions based on $n$ sample points.
We introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression.
CTT cheaply approximates an expensive test by compressing each $n$ point sample into a small but provably high-fidelity coreset.
- Score: 27.723775378945643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kernel two-sample testing provides a powerful framework for distinguishing
any pair of distributions based on $n$ sample points. However, existing kernel
tests either run in $n^2$ time or sacrifice undue power to improve runtime. To
address these shortcomings, we introduce Compress Then Test (CTT), a new
framework for high-powered kernel testing based on sample compression. CTT
cheaply approximates an expensive test by compressing each $n$ point sample
into a small but provably high-fidelity coreset. For standard kernels and
subexponential distributions, CTT inherits the statistical behavior of a
quadratic-time test -- recovering the same optimal detection boundary -- while
running in near-linear time. We couple these advances with cheaper permutation
testing, justified by new power analyses; improved time-vs.-quality guarantees
for low-rank approximation; and a fast aggregation procedure for identifying
especially discriminating kernels. In our experiments with real and simulated
data, CTT and its extensions provide 20--200x speed-ups over state-of-the-art
approximate MMD tests with no loss of power.
Related papers
- Robust Kernel Hypothesis Testing under Data Corruption [6.430258446597413]
We propose two general methods for constructing robust permutation tests under data corruption.
We prove their consistency in power under minimal conditions.
This contributes to the practical deployment of hypothesis tests for real-world applications with potential adversarial attacks.
arXiv Detail & Related papers (2024-05-30T10:23:16Z) - Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected.
We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure.
Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - Efficient Aggregated Kernel Tests using Incomplete $U$-statistics [22.251118308736327]
Three proposed tests aggregate over several kernel bandwidths to detect departures from the null on various scales.
We show that our proposed linear-time aggregated tests obtain higher power than current state-of-the-art linear-time kernel tests.
arXiv Detail & Related papers (2022-06-18T12:30:06Z) - Exact Paired-Permutation Testing for Structured Test Statistics [67.71280539312536]
We provide an efficient exact algorithm for the paired-permutation test for a family of structured test statistics.
Our exact algorithm was $10$x faster than the Monte Carlo approximation with $20000$ samples on a common dataset.
arXiv Detail & Related papers (2022-05-03T11:00:59Z) - An Optimal Witness Function for Two-Sample Testing [13.159512679346685]
We propose data-dependent test statistics based on a one-dimensional witness function, which we call witness two-sample tests (WiTS)
We show that the WiTS test based on a characteristic kernel is consistent against any fixed alternative.
arXiv Detail & Related papers (2021-02-10T17:13:21Z) - Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime.
We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z) - Learning Kernel Tests Without Data Splitting [18.603394415852765]
We propose an approach that enables learning the hyper parameters and testing on the full sample without data splitting.
Our approach's test power is empirically larger than that of the data-splitting approach, regardless of its split proportion.
arXiv Detail & Related papers (2020-06-03T14:07:39Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z) - Learning Deep Kernels for Non-Parametric Two-Sample Tests [50.92621794426821]
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution.
Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power.
arXiv Detail & Related papers (2020-02-21T03:54:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.