Related papers: Compress Then Test: Powerful Kernel Testing in Near-linear Time

Compress Then Test: Powerful Kernel Testing in Near-linear Time

URL: http://arxiv.org/abs/2301.05974v1
Date: Sat, 14 Jan 2023 21:02:58 GMT
Title: Compress Then Test: Powerful Kernel Testing in Near-linear Time
Authors: Carles Domingo-Enrich, Raaz Dwivedi, Lester Mackey
Abstract summary: Kernel two-sample testing provides a powerful framework for distinguishing any pair of distributions based on $n$ sample points. We introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression. CTT cheaply approximates an expensive test by compressing each $n$ point sample into a small but provably high-fidelity coreset.
Score: 27.723775378945643
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Kernel two-sample testing provides a powerful framework for distinguishing any pair of distributions based on $n$ sample points. However, existing kernel tests either run in $n^2$ time or sacrifice undue power to improve runtime. To address these shortcomings, we introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression. CTT cheaply approximates an expensive test by compressing each $n$ point sample into a small but provably high-fidelity coreset. For standard kernels and subexponential distributions, CTT inherits the statistical behavior of a quadratic-time test -- recovering the same optimal detection boundary -- while running in near-linear time. We couple these advances with cheaper permutation testing, justified by new power analyses; improved time-vs.-quality guarantees for low-rank approximation; and a fast aggregation procedure for identifying especially discriminating kernels. In our experiments with real and simulated data, CTT and its extensions provide 20--200x speed-ups over state-of-the-art approximate MMD tests with no loss of power.

Related papers

$\ exttt{SPECS}$: Faster Test-Time Scaling through Speculative Drafts [55.231201692232894]
$textttSPECS$ is a latency-aware test-time scaling method inspired by speculative decoding.<n>Our results show that $textttSPECS$matches or surpasses beam search accuracy while reducing latency by up to $sim$19.1%.
arXiv Detail & Related papers (2025-06-15T05:50:05Z)
Robust Kernel Hypothesis Testing under Data Corruption [6.430258446597413]
We propose two general methods for constructing robust permutation tests under data corruption. We prove their consistency in power under minimal conditions. This contributes to the practical deployment of hypothesis tests for real-world applications with potential adversarial attacks.
arXiv Detail & Related papers (2024-05-30T10:23:16Z)
Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected. We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure. Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z)
Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures. We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z)
Efficient Aggregated Kernel Tests using Incomplete $U$-statistics [22.251118308736327]
Three proposed tests aggregate over several kernel bandwidths to detect departures from the null on various scales. We show that our proposed linear-time aggregated tests obtain higher power than current state-of-the-art linear-time kernel tests.
arXiv Detail & Related papers (2022-06-18T12:30:06Z)
Exact Paired-Permutation Testing for Structured Test Statistics [67.71280539312536]
We provide an efficient exact algorithm for the paired-permutation test for a family of structured test statistics. Our exact algorithm was $10$x faster than the Monte Carlo approximation with $20000$ samples on a common dataset.
arXiv Detail & Related papers (2022-05-03T11:00:59Z)
An Optimal Witness Function for Two-Sample Testing [13.159512679346685]
We propose data-dependent test statistics based on a one-dimensional witness function, which we call witness two-sample tests (WiTS) We show that the WiTS test based on a characteristic kernel is consistent against any fixed alternative.
arXiv Detail & Related papers (2021-02-10T17:13:21Z)
Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime. We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z)
Learning Kernel Tests Without Data Splitting [18.603394415852765]
We propose an approach that enables learning the hyper parameters and testing on the full sample without data splitting. Our approach's test power is empirically larger than that of the data-splitting approach, regardless of its split proportion.
arXiv Detail & Related papers (2020-06-03T14:07:39Z)
Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
Learning Deep Kernels for Non-Parametric Two-Sample Tests [50.92621794426821]
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power.
arXiv Detail & Related papers (2020-02-21T03:54:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.