Semiparametric KSD test: unifying score and distance-based approaches for goodness-of-fit testing
- URL: http://arxiv.org/abs/2512.20007v1
- Date: Tue, 23 Dec 2025 03:05:26 GMT
- Title: Semiparametric KSD test: unifying score and distance-based approaches for goodness-of-fit testing
- Authors: Zhihan Huang, Ziang Niu,
- Abstract summary: We show that a score-based Goodness-of-fit (GoF) test is equivalent to tests based on integral probability metrics (IPMs) indexed by a function class.<n>We propose a new nonparametric score-based GoF test through a special class of IPM induced by kernelized Stein's function class.<n>Our method achieves power comparable to task-specific normality tests such as Anderson-Darling and Lilliefors.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Goodness-of-fit (GoF) tests are fundamental for assessing model adequacy. Score-based tests are appealing because they require fitting the model only once under the null. However, extending them to powerful nonparametric alternatives is difficult due to the lack of suitable score functions. Through a class of exponentially tilted models, we show that the resulting score-based GoF tests are equivalent to the tests based on integral probability metrics (IPMs) indexed by a function class. When the class is rich, the test is universally consistent. This simple yet insightful perspective enables reinterpretation of classical distance-based testing procedures-including those based on Kolmogorov-Smirnov distance, Wasserstein-1 distance, and maximum mean discrepancy-as arising from score-based constructions. Building on this insight, we propose a new nonparametric score-based GoF test through a special class of IPM induced by kernelized Stein's function class, called semiparametric kernelized Stein discrepancy (SKSD) test. Compared with other nonparametric score-based tests, the SKSD test is computationally efficient and accommodates general nuisance-parameter estimators, supported by a generic parametric bootstrap procedure. The SKSD test is universally consistent and attains Pitman efficiency. Moreover, SKSD test provides simple GoF tests for models with intractable likelihoods but tractable scores with the help of Stein's identity and we use two popular models, kernel exponential family and conditional Gaussian models, to illustrate the power of our method. Our method achieves power comparable to task-specific normality tests such as Anderson-Darling and Lilliefors, despite being designed for general nonparametric alternatives.
Related papers
- RANSAC Scoring Functions: Analysis and Reality Check [0.0]
We revisit the problem of assigning a score (a quality of fit) to candidate geometric models.<n>We show that a threshold-based parameterization leads to a unified view of likelihood-based and robust M-estimators.
arXiv Detail & Related papers (2025-12-22T20:08:46Z) - Reference-Free Rating of LLM Responses via Latent Information [53.463883683503106]
We study the common practice of asking a judge model to assign Likert-scale scores to free-text responses.<n>We then propose and evaluate Latent Judges, which derive scalar ratings from internal model signals.<n>Across a broad suite of pairwise and single-rating benchmarks, latent methods match or surpass standard prompting.
arXiv Detail & Related papers (2025-09-29T12:15:52Z) - Sample, Don't Search: Rethinking Test-Time Alignment for Language Models [55.2480439325792]
We introduce QAlign, a new test-time alignment approach.<n>As we scale test-time compute, QAlign converges to sampling from the optimal aligned distribution for each individual prompt.<n>By adopting recent advances in Markov chain Monte Carlo for text generation, our method enables better-aligned outputs without modifying the underlying model or even requiring logit access.
arXiv Detail & Related papers (2025-04-04T00:41:40Z) - Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models [49.06068319380296]
We introduce context-aware testing (CAT) which uses context as an inductive bias to guide the search for meaningful model failures.
We instantiate the first CAT system, SMART Testing, which employs large language models to hypothesize relevant and likely failures.
arXiv Detail & Related papers (2024-10-31T15:06:16Z) - Precise Error Rates for Computationally Efficient Testing [67.30044609837749]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.<n>An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Auto-Encoding Goodness of Fit [9.560668678348579]
We develop a new type of generative autoencoder called the Goodness-of-Fit Autoencoder (GoFAE)<n>At the minibatch level, it uses GoF test statistics as regularization objectives.<n>At a more global level, it selects a regularization coefficient based on higher criticism.
arXiv Detail & Related papers (2022-10-12T19:21:57Z) - A Simple Unified Approach to Testing High-Dimensional Conditional
Independences for Categorical and Ordinal Data [0.26651200086513094]
Conditional independence (CI) tests underlie many approaches to model testing and structure learning in causal inference.
Most existing CI tests for categorical and ordinal data stratify the sample by the conditioning variables, perform simple independence tests in each stratum, and combine the results.
Here we propose a simple unified CI test for ordinal and categorical data that maintains reasonable calibration and power in high dimensions.
arXiv Detail & Related papers (2022-06-09T08:56:12Z) - Generalised Kernel Stein Discrepancy(GKSD): A Unifying Approach for
Non-parametric Goodness-of-fit Testing [5.885020100736158]
Non-parametric goodness-of-fit testing procedures based on kernel Stein discrepancies (KSD) are promising approaches to validate general unnormalised distributions.
We propose a unifying framework, the generalised kernel Stein discrepancy (GKSD), to theoretically compare and interpret different Stein operators in performing the KSD-based goodness-of-fit tests.
arXiv Detail & Related papers (2021-06-23T00:44:31Z) - Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm.
Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.