Related papers: Boosting the Power of Kernel Two-Sample Tests

Boosting the Power of Kernel Two-Sample Tests

URL: http://arxiv.org/abs/2302.10687v1
Date: Tue, 21 Feb 2023 14:14:30 GMT
Title: Boosting the Power of Kernel Two-Sample Tests
Authors: Anirban Chatterjee, Bhaswar B. Bhattacharya
Abstract summary: We propose a method to boost the power of the kernel test by combining MMD estimates over multiple kernels using their Mahalanobis distance. The resulting test is universally consistent and, since it is obtained by aggregating over a rejection of kernels/bandwidths, is more powerful in detecting a wide range of alternatives in finite samples.
Score: 7.1795069620810805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The kernel two-sample test based on the maximum mean discrepancy (MMD) is one of the most popular methods for detecting differences between two distributions over general metric spaces. In this paper we propose a method to boost the power of the kernel test by combining MMD estimates over multiple kernels using their Mahalanobis distance. We derive the asymptotic null distribution of the proposed test statistic and use a multiplier bootstrap approach to efficiently compute the rejection region. The resulting test is universally consistent and, since it is obtained by aggregating over a collection of kernels/bandwidths, is more powerful in detecting a wide range of alternatives in finite samples. We also derive the distribution of the test statistic for both fixed and local contiguous alternatives. The latter, in particular, implies that the proposed test is statistically efficient, that is, it has non-trivial asymptotic (Pitman) efficiency. Extensive numerical experiments are performed on both synthetic and real-world datasets to illustrate the efficacy of the proposed method over single kernel tests. Our asymptotic results rely on deriving the joint distribution of MMD estimates using the framework of multiple stochastic integrals, which is more broadly useful, specifically, in understanding the efficiency properties of recently proposed adaptive MMD tests based on kernel aggregation.

Related papers

Kernel Trace Distance: Quantum Statistical Metric between Measures through RKHS Density Operators [11.899035547580201]
We introduce a novel distance between measures that compares them through a Schatten norm of their kernel covariance operators.<n>We show that this new distance is an integral probability metric that can be framed between a Maximum Mean Discrepancy (MMD) and a Wasserstein distance.
arXiv Detail & Related papers (2025-07-08T14:56:44Z)
Minimax Optimal Kernel Two-Sample Tests with Random Features [8.030917052755195]
We propose a spectral regularized two-sample test based on random Fourier feature (RFF) approximation. We show the proposed test to be minimax optimal if the approximation order of RFF is sufficiently large. We develop a practically implementable permutation-based version of the proposed test with a data-adaptive strategy for selecting the regularization parameter and the kernel.
arXiv Detail & Related papers (2025-02-28T06:12:00Z)
An Efficient Permutation-Based Kernel Two-Sample Test [13.229867216847534]
Two-sample hypothesis testing is a fundamental problem in statistics and machine learning. In this work, we use a Nystr"om approximation of the maximum mean discrepancy (MMD) to design a computationally efficient and practical testing algorithm.
arXiv Detail & Related papers (2025-02-19T09:22:48Z)
Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z)
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z)
Efficient Numerical Integration in Reproducing Kernel Hilbert Spaces via Leverage Scores Sampling [16.992480926905067]
We consider the problem of approximating integrals with respect to a target probability measure using only pointwise evaluations of the integrand. We propose an efficient procedure which exploits a small i.i.d. random subset of $mn$ samples drawn either uniformly or using approximate leverage scores from the initial observations.
arXiv Detail & Related papers (2023-11-22T17:44:18Z)
MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting [28.59390881834003]
We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD) We show how these kernels can be chosen in a data-dependent but permutation-independent way, in a well-calibrated test, avoiding data splitting. We highlight the applicability of our MMD-FUSE test on both synthetic low-dimensional and real-world high-dimensional data, and compare its performance in terms of power against current state-of-the-art kernel tests.
arXiv Detail & Related papers (2023-06-14T23:13:03Z)
Spectral Regularized Kernel Two-Sample Tests [7.915420897195129]
We show the popular MMD (maximum mean discrepancy) two-sample test to be not optimal in terms of the separation boundary measured in Hellinger distance. We propose a modification to the MMD test based on spectral regularization and prove the proposed test to be minimax optimal with a smaller separation boundary than that achieved by the MMD test. Our results hold for the permutation variant of the test where the test threshold is chosen elegantly through the permutation of the samples.
arXiv Detail & Related papers (2022-12-19T00:42:21Z)
FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels [82.53569355337586]
This work offers an efficient solution to temporal point processes inference using general parametric kernels with finite support. The method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG) Results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
arXiv Detail & Related papers (2022-10-10T12:35:02Z)
Efficient Aggregated Kernel Tests using Incomplete $U$-statistics [22.251118308736327]
Three proposed tests aggregate over several kernel bandwidths to detect departures from the null on various scales. We show that our proposed linear-time aggregated tests obtain higher power than current state-of-the-art linear-time kernel tests.
arXiv Detail & Related papers (2022-06-18T12:30:06Z)
Kernel Two-Sample Tests in High Dimension: Interplay Between Moment Discrepancy and Dimension-and-Sample Orders [1.9303929635966661]
We study the behavior of kernel two-sample tests when the dimension and sample sizes both diverge to infinity. We establish the central limit theorem (CLT) under both the null hypothesis and the local and fixed alternatives. The new non-null CLT results allow us to perform exact power analysis, which reveals a delicate interplay between the moment discrepancy that can be detected.
arXiv Detail & Related papers (2021-12-31T23:12:44Z)
A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective. We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z)
Kernel distance measures for time series, random fields and other structured data [71.61147615789537]
kdiff is a novel kernel-based measure for estimating distances between instances of structured data. It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution. Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
arXiv Detail & Related papers (2021-09-29T22:54:17Z)
A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense. We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z)
Learning Deep Kernels for Non-Parametric Two-Sample Tests [50.92621794426821]
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power.
arXiv Detail & Related papers (2020-02-21T03:54:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.