Related papers: Generalization of Gibbs and Langevin Monte Carlo Algorithms in the Interpolation Regime

Generalization of Gibbs and Langevin Monte Carlo Algorithms in the Interpolation Regime

URL: http://arxiv.org/abs/2510.06028v1
Date: Tue, 07 Oct 2025 15:25:56 GMT
Title: Generalization of Gibbs and Langevin Monte Carlo Algorithms in the Interpolation Regime
Authors: Andreas Maurer, Erfan Mirzaei, Massimiliano Pontil,
Abstract summary: The bounds are stable under approximation with Langevin Monte Carlo algorithms.<n>Experiments on the MNIST and CIFAR-10 datasets verify that the bounds yield nontrivial predictions on true labeled data.
Score: 25.99627853121106
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The paper provides data-dependent bounds on the test error of the Gibbs algorithm in the overparameterized interpolation regime, where low training errors are also obtained for impossible data, such as random labels in classification. The bounds are stable under approximation with Langevin Monte Carlo algorithms. Experiments on the MNIST and CIFAR-10 datasets verify that the bounds yield nontrivial predictions on true labeled data and correctly upper bound the test error for random labels. Our method indicates that generalization in the low-temperature, interpolation regime is already signaled by small training errors in the more classical high temperature regime.

Related papers

When Langevin Monte Carlo Meets Randomization: Non-asymptotic Error Bounds beyond Log-Concavity and Gradient Lipschitzness [7.783499788849107]
We revisit the randomized Langevin Monte Carlo (RLMC) for sampling from high dimensional distributions without log-concavity.<n>We prove a uniform-in-time error bound in $mathcalW$-distance of order $O(sqrtdh)$ for the RLMC sampling algorithm.<n> modified RLMC algorithms are proposed and analyzed, with non-asymptotic error bounds established.
arXiv Detail & Related papers (2025-09-30T00:48:51Z)
Consistency of Learned Sparse Grid Quadrature Rules using NeuralODEs [1.3654846342364308]
This paper provides a proof of the consistency of sparse grid quadrature for numerical integration of high dimensional distributions.<n>A decomposition of the total numerical error in quadrature error and statistical error is provided.
arXiv Detail & Related papers (2025-07-02T09:37:16Z)
Maximum a Posteriori Inference for Factor Graphs via Benders' Decomposition [0.38233569758620056]
We present a method for maximum a-posteriori inference in general Bayesian factor models. We derive MAP estimation algorithms for the Bayesian Gaussian mixture model and latent Dirichlet allocation.
arXiv Detail & Related papers (2024-10-24T19:57:56Z)
General regularization in covariate shift adaptation [1.5469452301122175]
We show that the amount of samples needed to achieve the same order of accuracy as in the standard supervised learning without differences in data distributions is smaller than proven by state-of-the-art analyses.
arXiv Detail & Related papers (2023-07-21T11:19:00Z)
Posterior and Computational Uncertainty in Gaussian Processes [52.26904059556759]
Gaussian processes scale prohibitively with the size of the dataset. Many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
arXiv Detail & Related papers (2022-05-30T22:16:25Z)
A Robust and Flexible EM Algorithm for Mixtures of Elliptical Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data. A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data. Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z)
Partial Counterfactual Identification from Observational and Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data. Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z)
Information-Theoretic Generalization Bounds for Iterative Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles. Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z)
Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner. We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z)
Spectral clustering under degree heterogeneity: a case for the random walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree. In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
Matrix Completion with Quantified Uncertainty through Low Rank Gaussian Copula [30.84155327760468]
This paper proposes a framework for missing value imputation with quantified uncertainty. The time required to fit the model scales linearly with the number of rows and the number of columns in the dataset. Empirical results show the method yields state-of-the-art imputation accuracy across a wide range of data types.
arXiv Detail & Related papers (2020-06-18T19:51:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.