A Data-Driven Approach to Robust Hypothesis Testing Using Sinkhorn
Uncertainty Sets
- URL: http://arxiv.org/abs/2202.04258v2
- Date: Fri, 11 Feb 2022 04:29:44 GMT
- Title: A Data-Driven Approach to Robust Hypothesis Testing Using Sinkhorn
Uncertainty Sets
- Authors: Jie Wang and Yao Xie
- Abstract summary: We seek the worst-case detector over distributional uncertainty sets centered around the empirical distribution from samples using Sinkhorn distance.
Compared with the Wasserstein robust test, the corresponding least favorable distributions are supported beyond the training samples, which provides a more flexible detector.
- Score: 12.061662346636645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hypothesis testing for small-sample scenarios is a practically important
problem. In this paper, we investigate the robust hypothesis testing problem in
a data-driven manner, where we seek the worst-case detector over distributional
uncertainty sets centered around the empirical distribution from samples using
Sinkhorn distance. Compared with the Wasserstein robust test, the corresponding
least favorable distributions are supported beyond the training samples, which
provides a more flexible detector. Various numerical experiments are conducted
on both synthetic and real datasets to validate the competitive performances of
our proposed method.
Related papers
- Non-Convex Robust Hypothesis Testing using Sinkhorn Uncertainty Sets [18.46110328123008]
We present a new framework to address the non-robust hypothesis testing problem.
The goal is to seek the optimal detector that minimizes the maximum numerical risk.
arXiv Detail & Related papers (2024-03-21T20:29:43Z) - Sequential Predictive Two-Sample and Independence Testing [114.4130718687858]
We study the problems of sequential nonparametric two-sample and independence testing.
We build upon the principle of (nonparametric) testing by betting.
arXiv Detail & Related papers (2023-04-29T01:30:33Z) - Null Hypothesis Test for Anomaly Detection [0.0]
We extend the use of Classification Without Labels for anomaly detection with a hypothesis test designed to exclude the background-only hypothesis.
By testing for statistical independence of the two discriminating dataset regions, we are able exclude the background-only hypothesis without relying on fixed anomaly score cuts or extrapolations of background estimates between regions.
arXiv Detail & Related papers (2022-10-05T13:03:55Z) - Kernel Robust Hypothesis Testing [20.78285964841612]
In this paper, uncertainty sets are constructed in a data-driven manner using kernel method.
The goal is to design a test that performs well under the worst-case distributions over the uncertainty sets.
For the Neyman-Pearson setting, the goal is to minimize the worst-case probability of miss detection subject to a constraint on the worst-case probability of false alarm.
arXiv Detail & Related papers (2022-03-23T23:59:03Z) - Robust hypothesis testing and distribution estimation in Hellinger
distance [18.950453666957692]
We propose a simple robust hypothesis test that has the same sample complexity as that of the optimal Neyman-Pearson test up to constants.
We discuss the applicability of such a robust test for estimating distributions in Hellinger distance.
arXiv Detail & Related papers (2020-11-03T17:09:32Z) - Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm.
Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Two-Sample Testing on Ranked Preference Data and the Role of Modeling
Assumptions [57.77347280992548]
In this paper, we design two-sample tests for pairwise comparison data and ranking data.
Our test requires essentially no assumptions on the distributions.
By applying our two-sample test on real-world pairwise comparison data, we conclude that ratings and rankings provided by people are indeed distributed differently.
arXiv Detail & Related papers (2020-06-21T20:51:09Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z) - Learning Kernel Tests Without Data Splitting [18.603394415852765]
We propose an approach that enables learning the hyper parameters and testing on the full sample without data splitting.
Our approach's test power is empirically larger than that of the data-splitting approach, regardless of its split proportion.
arXiv Detail & Related papers (2020-06-03T14:07:39Z) - Compressing Large Sample Data for Discriminant Analysis [78.12073412066698]
We consider the computational issues due to large sample size within the discriminant analysis framework.
We propose a new compression approach for reducing the number of training samples for linear and quadratic discriminant analysis.
arXiv Detail & Related papers (2020-05-08T05:09:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.