Robust Kernel Hypothesis Testing under Data Corruption
- URL: http://arxiv.org/abs/2405.19912v1
- Date: Thu, 30 May 2024 10:23:16 GMT
- Title: Robust Kernel Hypothesis Testing under Data Corruption
- Authors: Antonin Schrab, Ilmun Kim,
- Abstract summary: We propose two general methods for constructing robust permutation tests under data corruption.
We prove their consistency in power under minimal conditions.
This contributes to the practical deployment of hypothesis tests for real-world applications with potential adversarial attacks.
- Score: 6.430258446597413
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose two general methods for constructing robust permutation tests under data corruption. The proposed tests effectively control the non-asymptotic type I error under data corruption, and we prove their consistency in power under minimal conditions. This contributes to the practical deployment of hypothesis tests for real-world applications with potential adversarial attacks. One of our methods inherently ensures differential privacy, further broadening its applicability to private data analysis. For the two-sample and independence settings, we show that our kernel robust tests are minimax optimal, in the sense that they are guaranteed to be non-asymptotically powerful against alternatives uniformly separated from the null in the kernel MMD and HSIC metrics at some optimal rate (tight with matching lower bound). Finally, we provide publicly available implementations and empirically illustrate the practicality of our proposed tests.
Related papers
- Minimax Optimal Goodness-of-Fit Testing with Kernel Stein Discrepancy [13.429541377715298]
We explore the minimax optimality of goodness-of-fit tests on general domains using the kernelized Stein discrepancy (KSD)
The KSD framework offers a flexible approach for goodness-of-fit testing, avoiding strong distributional assumptions.
We introduce an adaptive test capable of achieving minimax optimality up to a logarithmic factor by adapting to unknown parameters.
arXiv Detail & Related papers (2024-04-12T07:06:12Z) - Efficient Conformal Prediction under Data Heterogeneity [79.35418041861327]
Conformal Prediction (CP) stands out as a robust framework for uncertainty quantification.
Existing approaches for tackling non-exchangeability lead to methods that are not computable beyond the simplest examples.
This work introduces a new efficient approach to CP that produces provably valid confidence sets for fairly general non-exchangeable data distributions.
arXiv Detail & Related papers (2023-12-25T20:02:51Z) - Differentially Private Permutation Tests: Applications to Kernel Methods [7.596498528060537]
differential privacy has emerged as a rigorous framework for privacy protection, gaining widespread recognition in both academic and industrial circles.
This paper aims to alleviate concerns in the context of hypothesis testing by introducing differentially private permutation tests.
The proposed framework extends classical non-private permutation tests to private settings, maintaining both finite-sample validity and differential privacy in a rigorous manner.
arXiv Detail & Related papers (2023-10-29T15:13:36Z) - MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without
Data Splitting [28.59390881834003]
We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD)
We show how these kernels can be chosen in a data-dependent but permutation-independent way, in a well-calibrated test, avoiding data splitting.
We highlight the applicability of our MMD-FUSE test on both synthetic low-dimensional and real-world high-dimensional data, and compare its performance in terms of power against current state-of-the-art kernel tests.
arXiv Detail & Related papers (2023-06-14T23:13:03Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - Sequential Predictive Two-Sample and Independence Testing [114.4130718687858]
We study the problems of sequential nonparametric two-sample and independence testing.
We build upon the principle of (nonparametric) testing by betting.
arXiv Detail & Related papers (2023-04-29T01:30:33Z) - Spectral Regularized Kernel Two-Sample Tests [7.915420897195129]
We show the popular MMD (maximum mean discrepancy) two-sample test to be not optimal in terms of the separation boundary measured in Hellinger distance.
We propose a modification to the MMD test based on spectral regularization and prove the proposed test to be minimax optimal with a smaller separation boundary than that achieved by the MMD test.
Our results hold for the permutation variant of the test where the test threshold is chosen elegantly through the permutation of the samples.
arXiv Detail & Related papers (2022-12-19T00:42:21Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - Conformal Off-Policy Prediction in Contextual Bandits [54.67508891852636]
Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
arXiv Detail & Related papers (2022-06-09T10:39:33Z) - Kernel Robust Hypothesis Testing [20.78285964841612]
In this paper, uncertainty sets are constructed in a data-driven manner using kernel method.
The goal is to design a test that performs well under the worst-case distributions over the uncertainty sets.
For the Neyman-Pearson setting, the goal is to minimize the worst-case probability of miss detection subject to a constraint on the worst-case probability of false alarm.
arXiv Detail & Related papers (2022-03-23T23:59:03Z) - Adversarial Attacks and Defense for Non-Parametric Two-Sample Tests [73.32304304788838]
This paper systematically uncovers the failure mode of non-parametric TSTs through adversarial attacks.
To enable TST-agnostic attacks, we propose an ensemble attack framework that jointly minimizes the different types of test criteria.
To robustify TSTs, we propose a max-min optimization that iteratively generates adversarial pairs to train the deep kernels.
arXiv Detail & Related papers (2022-02-07T11:18:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.