On the Robustness of Kernel Goodness-of-Fit Tests
- URL: http://arxiv.org/abs/2408.05854v2
- Date: Fri, 23 Aug 2024 08:32:33 GMT
- Title: On the Robustness of Kernel Goodness-of-Fit Tests
- Authors: Xing Liu, François-Xavier Briol,
- Abstract summary: We show that existing kernel goodness-of-fit tests are not robust according to common notions of robustness.
We propose the first robust kernel goodness-of-fit test which resolves this open problem using kernel Stein discrepancy balls.
- Score: 5.959410850280868
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Goodness-of-fit testing is often criticized for its lack of practical relevance; since ``all models are wrong'', the null hypothesis that the data conform to our model is ultimately always rejected when the sample size is large enough. Despite this, probabilistic models are still used extensively, raising the more pertinent question of whether the model is good enough for a specific task. This question can be formalized as a robust goodness-of-fit testing problem by asking whether the data were generated by a distribution corresponding to our model up to some mild perturbation. In this paper, we show that existing kernel goodness-of-fit tests are not robust according to common notions of robustness including qualitative and quantitative robustness. We also show that robust techniques based on tilted kernels from the parameter estimation literature are not sufficient for ensuring both types of robustness in the context of goodness-of-fit testing. We therefore propose the first robust kernel goodness-of-fit test which resolves this open problem using kernel Stein discrepancy balls, which encompass perturbation models such as Huber contamination models and density uncertainty bands.
Related papers
- A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models [13.527864898609398]
We introduce the first bias-variance-covariance decomposition for kernel scores.
We derive a kernel-based variance and entropy for uncertainty estimation.
Based on the wide applicability of kernels, we demonstrate our framework via generalization and uncertainty experiments for image, audio, and language generation.
arXiv Detail & Related papers (2023-10-09T16:22:11Z) - Shortcomings of Top-Down Randomization-Based Sanity Checks for
Evaluations of Deep Neural Network Explanations [67.40641255908443]
We identify limitations of model-randomization-based sanity checks for the purpose of evaluating explanations.
Top-down model randomization preserves scales of forward pass activations with high probability.
arXiv Detail & Related papers (2022-11-22T18:52:38Z) - Reliability-Aware Prediction via Uncertainty Learning for Person Image
Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously.
Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z) - Kernel Robust Hypothesis Testing [20.78285964841612]
In this paper, uncertainty sets are constructed in a data-driven manner using kernel method.
The goal is to design a test that performs well under the worst-case distributions over the uncertainty sets.
For the Neyman-Pearson setting, the goal is to minimize the worst-case probability of miss detection subject to a constraint on the worst-case probability of false alarm.
arXiv Detail & Related papers (2022-03-23T23:59:03Z) - Theoretical characterization of uncertainty in high-dimensional linear
classification [24.073221004661427]
We show that uncertainty for learning from limited number of samples of high-dimensional input data and labels can be obtained by the approximate message passing algorithm.
We discuss how over-confidence can be mitigated by appropriately regularising, and show that cross-validating with respect to the loss leads to better calibration than with the 0/1 error.
arXiv Detail & Related papers (2022-02-07T15:32:07Z) - Composite Goodness-of-fit Tests with Kernels [19.744607024807188]
We propose a kernel-based hypothesis tests for the challenging composite testing problem.
Our tests make use of minimum distance estimators based on the maximum mean discrepancy and the kernel Stein discrepancy.
As our main result, we show that we are able to estimate the parameter and conduct our test on the same data, while maintaining a correct test level.
arXiv Detail & Related papers (2021-11-19T15:25:06Z) - When Hearst Is not Enough: Improving Hypernymy Detection from Corpus
with Distributional Models [59.46552488974247]
This paper addresses whether an is-a relationship exists between words (x, y) with the help of large textual corpora.
Recent studies suggest that pattern-based ones are superior, if large-scale Hearst pairs are extracted and fed, with the sparsity of unseen (x, y) pairs relieved.
For the first time, this paper quantifies the non-negligible existence of those specific cases. We also demonstrate that distributional methods are ideal to make up for pattern-based ones in such cases.
arXiv Detail & Related papers (2020-10-10T08:34:19Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - The Curse of Performance Instability in Analysis Datasets: Consequences,
Source, and Suggestions [93.62888099134028]
We find that the performance of state-of-the-art models on Natural Language Inference (NLI) and Reading (RC) analysis/stress sets can be highly unstable.
This raises three questions: (1) How will the instability affect the reliability of the conclusions drawn based on these analysis sets?
We give both theoretical explanations and empirical evidence regarding the source of the instability.
arXiv Detail & Related papers (2020-04-28T15:41:12Z) - A Kernel Stein Test for Comparing Latent Variable Models [48.32146056855925]
We propose a kernel-based nonparametric test of relative goodness of fit, where the goal is to compare two models, both of which may have unobserved latent variables.
We show that our test significantly outperforms the relative Maximum Mean Discrepancy test, which is based on samples from the models and does not exploit the latent structure.
arXiv Detail & Related papers (2019-07-01T07:46:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.