Related papers: Minimax-Optimal Two-Sample Test with Sliced Wasserstein

Minimax-Optimal Two-Sample Test with Sliced Wasserstein

URL: http://arxiv.org/abs/2510.27498v1
Date: Fri, 31 Oct 2025 14:20:06 GMT
Title: Minimax-Optimal Two-Sample Test with Sliced Wasserstein
Authors: Binh Thuan Tran, Nicolas Schreuder,
Abstract summary: We study the problem of nonparametric two-sample testing using the sliced Wasserstein (SW) distance.<n>We propose a permutation-based SW test and analyze its performance.
Score: 2.019622939313173
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of nonparametric two-sample testing using the sliced Wasserstein (SW) distance. While prior theoretical and empirical work indicates that the SW distance offers a promising balance between strong statistical guarantees and computational efficiency, its theoretical foundations for hypothesis testing remain limited. We address this gap by proposing a permutation-based SW test and analyzing its performance. The test inherits finite-sample Type I error control from the permutation principle. Moreover, we establish non-asymptotic power bounds and show that the procedure achieves the minimax separation rate $n^{-1/2}$ over multinomial and bounded-support alternatives, matching the optimal guarantees of kernel-based tests while building on the geometric foundations of Wasserstein distances. Our analysis further quantifies the trade-off between the number of projections and statistical power. Finally, numerical experiments demonstrate that the test combines finite-sample validity with competitive power and scalability, and -- unlike kernel-based tests, which require careful kernel tuning -- it performs consistently well across all scenarios we consider.

Related papers

Minimax Optimal Kernel Two-Sample Tests with Random Features [6.747832388017275]
We propose a spectral-regularized two-sample test based on random Fourier feature (RFF) approximation.<n>We show the proposed test to be minimax optimal if the approximation order of RFF is sufficiently large.<n>We develop a practically implementable permutation-based version of the proposed test with a data-adaptive strategy for selecting the regularization parameter.
arXiv Detail & Related papers (2025-02-28T06:12:00Z)
A Scalable Nyström-Based Kernel Two-Sample Test with Permutations [9.849635250118912]
Two-sample hypothesis testing is a fundamental problem in statistics and machine learning.<n>In this work, we use a Nystr"om approximation of the maximum mean discrepancy (MMD) to design a computationally efficient and practical testing algorithm.
arXiv Detail & Related papers (2025-02-19T09:22:48Z)
Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected. We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure. Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z)
Precise Error Rates for Computationally Efficient Testing [67.30044609837749]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.<n>An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z)
Near-Optimal Non-Parametric Sequential Tests and Confidence Sequences with Possibly Dependent Observations [44.71254888821376]
We provide the first type-I-error and expected-rejection-time guarantees under general non-data generating processes. We show how to apply our results to inference on parameters defined by estimating equations, such as average treatment effects.
arXiv Detail & Related papers (2022-12-29T18:37:08Z)
Efficient Aggregated Kernel Tests using Incomplete $U$-statistics [22.251118308736327]
Three proposed tests aggregate over several kernel bandwidths to detect departures from the null on various scales. We show that our proposed linear-time aggregated tests obtain higher power than current state-of-the-art linear-time kernel tests.
arXiv Detail & Related papers (2022-06-18T12:30:06Z)
Sequential Permutation Testing of Random Forest Variable Importance Measures [68.8204255655161]
It is proposed here to use sequential permutation tests and sequential p-value estimation to reduce the high computational costs associated with conventional permutation tests. The results of simulation studies confirm that the theoretical properties of the sequential tests apply. The numerical stability of the methods is investigated in two additional application studies.
arXiv Detail & Related papers (2022-06-02T20:16:50Z)
Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes. No nonparametric test of conditional local independence has been available. We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z)
Adversarial Attacks and Defense for Non-Parametric Two-Sample Tests [73.32304304788838]
This paper systematically uncovers the failure mode of non-parametric TSTs through adversarial attacks. To enable TST-agnostic attacks, we propose an ensemble attack framework that jointly minimizes the different types of test criteria. To robustify TSTs, we propose a max-min optimization that iteratively generates adversarial pairs to train the deep kernels.
arXiv Detail & Related papers (2022-02-07T11:18:04Z)
An $\ell^p$-based Kernel Conditional Independence Test [21.689461247198388]
We propose a new computationally efficient test for conditional independence based on the $Lp$ distance between two kernel-based representatives of well suited distributions. We conduct a series of experiments showing that the performance of our new tests outperforms state-of-the-art methods both in term of statistical power and type-I error even in the high dimensional setting.
arXiv Detail & Related papers (2021-10-28T03:18:27Z)
Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm. Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.