Related papers: Mixing times of data-augmentation Gibbs samplers for high-dimensional probit regression

Mixing times of data-augmentation Gibbs samplers for high-dimensional probit regression

URL: http://arxiv.org/abs/2505.14343v1
Date: Tue, 20 May 2025 13:29:01 GMT
Title: Mixing times of data-augmentation Gibbs samplers for high-dimensional probit regression
Authors: Filippo Ascolani, Giacomo Zanella,
Abstract summary: We provide non-asymptotic bounds on the associated mixing times on Gibbs samplers for log-concave targets.<n>The bounds depend explicitly on the design matrix and the prior precision, while they hold uniformly over the vector of responses.<n>An empirical analysis based on coupling techniques suggests that the bounds are effective in predicting practically observed behaviours.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the convergence properties of popular data-augmentation samplers for Bayesian probit regression. Leveraging recent results on Gibbs samplers for log-concave targets, we provide simple and explicit non-asymptotic bounds on the associated mixing times (in Kullback-Leibler divergence). The bounds depend explicitly on the design matrix and the prior precision, while they hold uniformly over the vector of responses. We specialize the results for different regimes of statistical interest, when both the number of data points $n$ and parameters $p$ are large: in particular we identify scenarios where the mixing times remain bounded as $n,p\to\infty$, and ones where they do not. The results are shown to be tight (in the worst case with respect to the responses) and provide guidance on choices of prior distributions that provably lead to fast mixing. An empirical analysis based on coupling techniques suggests that the bounds are effective in predicting practically observed behaviours.

Related papers

Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization [19.261178173399784]
Learning models have been shown to rely on spurious correlations between non-predictive features and the associated labels in the training data.<n>We quantify the amount of spurious correlations $C$ learned via linear regression, in terms of the data covariance and the strength $lambda$ of the ridge regularization.
arXiv Detail & Related papers (2025-02-03T13:38:42Z)
Entropy contraction of the Gibbs sampler under log-concavity [0.16385815610837165]
We show that the random scan Gibbs sampler contracts in relative entropy and provide a sharp characterization of the associated contraction rate. Our techniques are versatile and extend to Metropolis-within-Gibbs schemes and the Hit-and-Run algorithm.
arXiv Detail & Related papers (2024-10-01T16:50:36Z)
Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.<n>We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.<n>We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z)
Quasi-Bayes meets Vines [2.3124143670964448]
We propose a different way to extend Quasi-Bayesian prediction to high dimensions through the use of Sklar's theorem. We show that our proposed Quasi-Bayesian Vine (QB-Vine) is a fully non-parametric density estimator with emphan analytical form.
arXiv Detail & Related papers (2024-06-18T16:31:02Z)
Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space. We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z)
Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective and Improved Bounds [12.699376765058137]
gradient descent (SGD) is perhaps the most prevalent optimization method in modern machine learning. It is only very recently that SGD with sampling without replacement -- shuffled SGD -- has been analyzed. We prove fine-grained complexity bounds that depend on the data matrix and are never worse than what is predicted by the existing bounds.
arXiv Detail & Related papers (2023-06-21T18:14:44Z)
Dimension-free mixing times of Gibbs samplers for Bayesian hierarchical models [0.0]
We analyse the behaviour of total variation mixing times of Gibbs samplers targeting hierarchical models. We obtain convergence results under random data-generating assumptions for a broad class of two-level models.
arXiv Detail & Related papers (2023-04-14T08:30:40Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
Minimax Estimation of Partially-Observed Vector AutoRegressions [0.0]
We study the properties of a partially-observed state-space model. We describe a sparse estimator based on the Dantzig selector and upper bound its non-asymptotic error. An application to open railway data highlights the relevance of this model for public transport traffic analysis.
arXiv Detail & Related papers (2021-06-17T08:46:53Z)
SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets. Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable. This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z)
Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data. We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
Efficiently Sampling Functions from Gaussian Process Posteriors [76.94808614373609]
We propose an easy-to-use and general-purpose approach for fast posterior sampling. We demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
arXiv Detail & Related papers (2020-02-21T14:03:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.