Related papers: On the Robustness of Kernel Goodness-of-Fit Tests

On the Robustness of Kernel Goodness-of-Fit Tests

URL: http://arxiv.org/abs/2408.05854v2
Date: Fri, 23 Aug 2024 08:32:33 GMT
Title: On the Robustness of Kernel Goodness-of-Fit Tests
Authors: Xing Liu, François-Xavier Briol,
Abstract summary: We show that existing kernel goodness-of-fit tests are not robust according to common notions of robustness. We propose the first robust kernel goodness-of-fit test which resolves this open problem using kernel Stein discrepancy balls.
Score: 5.959410850280868
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Goodness-of-fit testing is often criticized for its lack of practical relevance; since ``all models are wrong'', the null hypothesis that the data conform to our model is ultimately always rejected when the sample size is large enough. Despite this, probabilistic models are still used extensively, raising the more pertinent question of whether the model is good enough for a specific task. This question can be formalized as a robust goodness-of-fit testing problem by asking whether the data were generated by a distribution corresponding to our model up to some mild perturbation. In this paper, we show that existing kernel goodness-of-fit tests are not robust according to common notions of robustness including qualitative and quantitative robustness. We also show that robust techniques based on tilted kernels from the parameter estimation literature are not sufficient for ensuring both types of robustness in the context of goodness-of-fit testing. We therefore propose the first robust kernel goodness-of-fit test which resolves this open problem using kernel Stein discrepancy balls, which encompass perturbation models such as Huber contamination models and density uncertainty bands.

Related papers

The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness [13.120373493503772]
We prove a surprising result that even if the ground truth itself is robust to adversarial examples, the benignly overfitted model is benign in terms of the standard'' out-of-sample risk objective. Our finding provides theoretical insights into the puzzling phenomenon observed in practice, where the true target function (e.g., human) is robust against adverasrial attack, while beginly overfitted neural networks lead to models that are not robust.
arXiv Detail & Related papers (2024-01-19T15:40:46Z)
A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models [13.527864898609398]
We introduce the first bias-variance-covariance decomposition for kernel scores. We derive a kernel-based variance and entropy for uncertainty estimation. Based on the wide applicability of kernels, we demonstrate our framework via generalization and uncertainty experiments for image, audio, and language generation.
arXiv Detail & Related papers (2023-10-09T16:22:11Z)
RobustMQ: Benchmarking Robustness of Quantized Models [54.15661421492865]
Quantization is an essential technique for deploying deep neural networks (DNNs) on devices with limited resources. We thoroughly evaluated the robustness of quantized models against various noises (adrial attacks, natural corruptions, and systematic noises) on ImageNet. Our research contributes to advancing the robust quantization of models and their deployment in real-world scenarios.
arXiv Detail & Related papers (2023-08-04T14:37:12Z)
Beyond the Universal Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels [14.186776881154127]
This paper focuses on empirical risk minimization in two settings, namely, random features and the neural tangent kernel (NTK) We prove that, for random features, the model is not robust for any degree of over- parameterization, even when the necessary condition coming from the universal law of robustness is satisfied. Our results are corroborated by numerical evidence on both synthetic and standard prototypical datasets.
arXiv Detail & Related papers (2023-02-03T09:58:31Z)
Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations [67.40641255908443]
We identify limitations of model-randomization-based sanity checks for the purpose of evaluating explanations. Top-down model randomization preserves scales of forward pass activations with high probability.
arXiv Detail & Related papers (2022-11-22T18:52:38Z)
Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously. Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z)
Kernel Robust Hypothesis Testing [20.78285964841612]
In this paper, uncertainty sets are constructed in a data-driven manner using kernel method. The goal is to design a test that performs well under the worst-case distributions over the uncertainty sets. For the Neyman-Pearson setting, the goal is to minimize the worst-case probability of miss detection subject to a constraint on the worst-case probability of false alarm.
arXiv Detail & Related papers (2022-03-23T23:59:03Z)
Theoretical characterization of uncertainty in high-dimensional linear classification [24.073221004661427]
We show that uncertainty for learning from limited number of samples of high-dimensional input data and labels can be obtained by the approximate message passing algorithm. We discuss how over-confidence can be mitigated by appropriately regularising, and show that cross-validating with respect to the loss leads to better calibration than with the 0/1 error.
arXiv Detail & Related papers (2022-02-07T15:32:07Z)
Meta-Learning Hypothesis Spaces for Sequential Decision-making [79.73213540203389]
We propose to meta-learn a kernel from offline data (Meta-KeL) Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets. We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.
arXiv Detail & Related papers (2022-02-01T17:46:51Z)
Composite Goodness-of-fit Tests with Kernels [19.744607024807188]
We propose a kernel-based hypothesis tests for the challenging composite testing problem. Our tests make use of minimum distance estimators based on the maximum mean discrepancy and the kernel Stein discrepancy. As our main result, we show that we are able to estimate the parameter and conduct our test on the same data, while maintaining a correct test level.
arXiv Detail & Related papers (2021-11-19T15:25:06Z)
When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models [59.46552488974247]
This paper addresses whether an is-a relationship exists between words (x, y) with the help of large textual corpora. Recent studies suggest that pattern-based ones are superior, if large-scale Hearst pairs are extracted and fed, with the sparsity of unseen (x, y) pairs relieved. For the first time, this paper quantifies the non-negligible existence of those specific cases. We also demonstrate that distributional methods are ideal to make up for pattern-based ones in such cases.
arXiv Detail & Related papers (2020-10-10T08:34:19Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions [93.62888099134028]
We find that the performance of state-of-the-art models on Natural Language Inference (NLI) and Reading (RC) analysis/stress sets can be highly unstable. This raises three questions: (1) How will the instability affect the reliability of the conclusions drawn based on these analysis sets? We give both theoretical explanations and empirical evidence regarding the source of the instability.
arXiv Detail & Related papers (2020-04-28T15:41:12Z)
Understanding the Intrinsic Robustness of Image Distributions using Conditional Generative Models [87.00072607024026]
We study the intrinsic robustness of two common image benchmarks under $ell$ perturbations. We show the existence of a large gap between the robustness limits implied by our theory and the adversarial robustness achieved by current state-of-the-art robust models.
arXiv Detail & Related papers (2020-03-01T01:45:04Z)
A Kernel Stein Test for Comparing Latent Variable Models [48.32146056855925]
We propose a kernel-based nonparametric test of relative goodness of fit, where the goal is to compare two models, both of which may have unobserved latent variables. We show that our test significantly outperforms the relative Maximum Mean Discrepancy test, which is based on samples from the models and does not exploit the latent structure.
arXiv Detail & Related papers (2019-07-01T07:46:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.