Stein's unbiased risk estimate and Hyvärinen's score matching
- URL: http://arxiv.org/abs/2502.20123v2
- Date: Wed, 24 Sep 2025 03:35:52 GMT
- Title: Stein's unbiased risk estimate and Hyvärinen's score matching
- Authors: Sulagna Ghosh, Nikolaos Ignatiadis, Frederic Koehler, Amber Lee,
- Abstract summary: In empirical Bayes, the predominant approach is via nonparametric maximum likelihood estimation (NPMLE)<n>In our setting, Hyv"arinen's implicit SM is equivalent to another classical idea from statistics -- Stein's Unbiased Risk Estimate (SURE)
- Score: 9.035498732406039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a collection of observed signals corrupted with Gaussian noise, how can we learn to optimally denoise them? This fundamental problem arises in both empirical Bayes and generative modeling. In empirical Bayes, the predominant approach is via nonparametric maximum likelihood estimation (NPMLE), while in generative modeling, score matching (SM) methods have proven very successful. In our setting, Hyv\"arinen's implicit SM is equivalent to another classical idea from statistics -- Stein's Unbiased Risk Estimate (SURE). Revisiting SURE minimization, we establish, for the first time, that SURE achieves nearly parametric rates of convergence of the regret in the classical empirical Bayes setting with homoscedastic noise. We also prove that SURE-training can achieve fast rates of convergence to the oracle denoiser in a commonly studied misspecified model. In contrast, the NPMLE may not even converge to the oracle denoiser under misspecification of the class of signal distributions. We show how to practically implement our method in settings involving heteroscedasticity and side-information, such as in an application to the estimation of economic mobility in the Opportunity Atlas. Our empirical results demonstrate the superior performance of SURE-training over NPMLE under misspecification. Collectively, our findings advance SURE/SM as a strong alternative to the NPMLE for empirical Bayes problems in both theory and practice.
Related papers
- Improving Minimax Estimation Rates for Contaminated Mixture of Multinomial Logistic Experts via Expert Heterogeneity [49.809923981964715]
Contaminated mixture of experts (MoE) is motivated by transfer learning methods where a pre-trained model, acting as a frozen expert, is integrated with an adapter model, functioning as a trainable expert, in order to learn a new task.<n>In this work, we characterize uniform convergence rates for estimating parameters under challenging settings where ground-truth parameters vary with the sample size.<n>We also establish corresponding minimax lower bounds to ensure that these rates are minimax optimal.
arXiv Detail & Related papers (2026-01-31T23:45:50Z) - Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling [50.872910438715486]
Large Language Models (LLMs) are typically evaluated for safety under single-shot or low-budget adversarial prompting.<n>We propose a scaling-aware Best-of-N estimation of risk, SABER, for modeling jailbreak vulnerability under Best-of-N sampling.
arXiv Detail & Related papers (2026-01-30T06:54:35Z) - Bayesian Semiparametric Causal Inference: Targeted Doubly Robust Estimation of Treatment Effects [1.2833734915643464]
We propose a semiparametric Bayesian methodology for estimating the average treatment effect (ATE) within the potential outcomes framework.<n>Our method introduces a Bayesian debiasing procedure that corrects for bias arising from nuisance estimation.<n>Extensive simulations confirm the theoretical results, demonstrating accurate point estimation and credible intervals with nominal coverage.
arXiv Detail & Related papers (2025-11-19T22:15:04Z) - Estimation of discrete distributions in relative entropy, and the deviations of the missing mass [3.4265828682659705]
We study the problem of estimating a distribution over a finite alphabet from an i.i.d. sample, with accuracy measured in relative entropy.
While optimal expected risk bounds are known, high-probability guarantees remain less well-understood.
arXiv Detail & Related papers (2025-04-30T16:47:10Z) - Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression [102.24287051757469]
We study self-supervised covariance estimation in deep heteroscedastic regression.<n>We derive an upper bound on the 2-Wasserstein distance between normal distributions.<n>Experiments over a wide range of synthetic and real datasets demonstrate that the proposed 2-Wasserstein bound coupled with pseudo label annotations results in a computationally cheaper yet accurate deep heteroscedastic regression.
arXiv Detail & Related papers (2025-02-14T22:37:11Z) - Smooth Sailing: Lipschitz-Driven Uncertainty Quantification for Spatial Association [15.706262708190643]
We show that existing methods for constructing confidence intervals for associations can fail to provide nominal coverage in the face of model misspecification and nonrandom locations.<n>We introduce a method that constructs valid frequentist confidence intervals for associations in spatial settings.<n>Our approach is the first to guarantee nominal coverage in this setting and outperforms existing techniques in both real and simulated experiments.
arXiv Detail & Related papers (2025-02-09T23:20:03Z) - Confidence Aware Learning for Reliable Face Anti-spoofing [52.23271636362843]
We propose a Confidence Aware Face Anti-spoofing model, which is aware of its capability boundary.<n>We estimate its confidence during the prediction of each sample.<n>Experiments show that the proposed CA-FAS can effectively recognize samples with low prediction confidence.
arXiv Detail & Related papers (2024-11-02T14:29:02Z) - Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Contextual Optimization under Covariate Shift: A Robust Approach by Intersecting Wasserstein Balls [18.047245099229325]
We propose a distributionally robust approach that uses an ambiguity set by the intersection of two Wasserstein balls.
We demonstrate the strong empirical performance of our proposed models.
arXiv Detail & Related papers (2024-06-04T15:46:41Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - An analysis of the noise schedule for score-based generative models [7.180235086275926]
Score-based generative models (SGMs) aim at estimating a target data distribution by learning score functions using only noise-perturbed samples from the target.<n>Recent literature has focused extensively on assessing the error between the target and estimated distributions, gauging the generative quality through the Kullback-Leibler (KL) divergence and Wasserstein distances.<n>We establish an upper bound for the KL divergence between the target and the estimated distributions, explicitly depending on any time-dependent noise schedule.
arXiv Detail & Related papers (2024-02-07T08:24:35Z) - On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates [5.13323375365494]
We provide theoretical guarantees for the convergence behaviour of diffusion-based generative models under strongly log-concave data.
Our class of functions used for score estimation is made of Lipschitz continuous functions avoiding any Lipschitzness assumption on the score function.
This approach yields the best known convergence rate for our sampling algorithm.
arXiv Detail & Related papers (2023-11-22T18:40:45Z) - TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
Deepscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood.
Recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation.
We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean?
Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood.
arXiv Detail & Related papers (2023-10-29T09:54:03Z) - Statistical Estimation Under Distribution Shift: Wasserstein
Perturbations and Minimax Theory [24.540342159350015]
We focus on Wasserstein distribution shifts, where every data point may undergo a slight perturbation.
We consider perturbations that are either independent or coordinated joint shifts across data points.
We analyze several important statistical problems, including location estimation, linear regression, and non-parametric density estimation.
arXiv Detail & Related papers (2023-08-03T16:19:40Z) - Statistically Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution [2.1146241717926664]
We show that the Wasserstein GAN, constrained to left-invertible push-forward maps, generates distributions that avoid replication and significantly deviate from the empirical distribution.
Our most important contribution provides a finite-sample lower bound on the Wasserstein-1 distance between the generative distribution and the empirical one.
We also establish a finite-sample upper bound on the distance between the generative distribution and the true data-generating one.
arXiv Detail & Related papers (2023-07-31T06:11:57Z) - A Semi-Bayesian Nonparametric Estimator of the Maximum Mean Discrepancy
Measure: Applications in Goodness-of-Fit Testing and Generative Adversarial
Networks [3.623570119514559]
We propose a semi-Bayesian nonparametric (semi-BNP) procedure for the goodness-of-fit (GOF) test.
Our method introduces a novel Bayesian estimator for the maximum mean discrepancy (MMD) measure.
We demonstrate that our proposed test outperforms frequentist MMD-based methods by achieving a lower false rejection and acceptance rate of the null hypothesis.
arXiv Detail & Related papers (2023-03-05T10:36:21Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Towards Robust Waveform-Based Acoustic Models [41.82019240477273]
We propose an approach for learning robust acoustic models in adverse environments, characterized by a significant mismatch between training and test conditions.
Our approach is an instance of vicinal risk minimization, which aims to improve risk estimates during training by replacing the delta functions that define the empirical density over the input space with an approximation of the marginal population density in the vicinity of the training samples.
arXiv Detail & Related papers (2021-10-16T18:21:34Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Generative Modeling with Denoising Auto-Encoders and Langevin Sampling [88.83704353627554]
We show that both DAE and DSM provide estimates of the score of the smoothed population density.
We then apply our results to the homotopy method of arXiv:1907.05600 and provide theoretical justification for its empirical success.
arXiv Detail & Related papers (2020-01-31T23:50:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.