Related papers: Robust Variational Bayes by Min-Max Median Aggregation

Robust Variational Bayes by Min-Max Median Aggregation

URL: http://arxiv.org/abs/2512.12676v1
Date: Sun, 14 Dec 2025 13:02:00 GMT
Title: Robust Variational Bayes by Min-Max Median Aggregation
Authors: Jiawei Yan, Ju Liu, Weidong Liu, Jiyuan Tu,
Abstract summary: We propose a robust variational Bayes framework to handle contamination and outliers in dataset.<n>Our approach partitions the data into $m$ disjoint subsets and formulates a joint optimization problem based on robust aggregation principles.<n>Our findings indicate that the two-stage approach yields a smaller approximation error compared to directly aggregating the $m$-powered local posteriors.
Score: 13.102667562202386
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a robust and scalable variational Bayes (VB) framework designed to effectively handle contamination and outliers in dataset. Our approach partitions the data into $m$ disjoint subsets and formulates a joint optimization problem based on robust aggregation principles. A key insight is that the full posterior distribution is equivalent to the minimizer of the mean Kullback-Leibler (KL) divergence from the $m$-powered local posterior distributions. To enhance robustness, we replace the mean KL divergence with a min-max median formulation. The min-max formulation not only ensures consistency between the KL minimizer and the Evidence Lower Bound (ELBO) maximizer but also facilitates the establishment of improved statistical rates for the mean of variational posterior. We observe a notable discrepancy in the $m$-powered marginal log likelihood function contingent on the presence of local latent variables. To address this, we treat these two scenarios separately to guarantee the consistency of the aggregated variational posterior. Specifically, when local latent variables are present, we introduce an aggregate-and-rescale strategy. Theoretically, we provide a non-asymptotic analysis of our proposed posterior, incorporating a refined analysis of Bernstein-von Mises (BvM) theorem to accommodate a diverging number of subsets $m$. Our findings indicate that the two-stage approach yields a smaller approximation error compared to directly aggregating the $m$-powered local posteriors. Furthermore, we establish a nearly optimal statistical rate for the mean of the proposed posterior, advancing existing theories related to min-max median estimators. The efficacy of our method is demonstrated through extensive simulation studies.

Related papers

Minimax optimal transfer learning for high-dimensional additive regression [0.0]
We first introduce a target-only estimation procedure based on the smooth backfitting estimator with local linear smoothing.<n>We then develop a novel two-stage estimation method within a transfer learning framework, and provide theoretical guarantees at both the population and empirical levels.
arXiv Detail & Related papers (2025-09-08T03:16:05Z)
Bayesian Multivariate Density-Density Regression [25.35298354797079]
We introduce a novel and scalable Bayesian framework for multivariate-density-density regression (DDR)<n>Our approach addresses the critical issue of distributions residing in spaces of differing dimensions.<n>We show that Bayesian DDR provides robust fits, superior predictive performance compared to traditional methods, and valuable insights into complex biological interactions.
arXiv Detail & Related papers (2025-04-17T03:46:03Z)
Robust and Scalable Variational Bayes [2.014089835498735]
We propose a robust framework for variational Bayes (VB) that effectively handles outliers and contamination of arbitrary nature in large datasets.<n>Our approach divides the dataset into disjoint subsets, computes the posterior for each subset, and applies VB approximation independently to these posteriors.<n>This novel aggregation method yields the Variational Median Posterior (VM-Posterior) distribution.
arXiv Detail & Related papers (2025-04-16T23:20:43Z)
Variance Reduction for the Independent Metropolis Sampler [11.074080383657453]
We prove that if $pi$ is close enough under KL divergence to another density $q$, an independent sampler that obtains samples from $pi$ achieves smaller variance than i.i.d. sampling from $pi$. We propose an adaptive independent Metropolis algorithm that adapts the proposal density such that its KL divergence with the target is being reduced.
arXiv Detail & Related papers (2024-06-25T16:38:53Z)
Statistical Estimation Under Distribution Shift: Wasserstein Perturbations and Minimax Theory [24.540342159350015]
We focus on Wasserstein distribution shifts, where every data point may undergo a slight perturbation. We consider perturbations that are either independent or coordinated joint shifts across data points. We analyze several important statistical problems, including location estimation, linear regression, and non-parametric density estimation.
arXiv Detail & Related papers (2023-08-03T16:19:40Z)
Optimal Scaling for Locally Balanced Proposals in Discrete Spaces [65.14092237705476]
We show that efficiency of Metropolis-Hastings (M-H) algorithms in discrete spaces can be characterized by an acceptance rate that is independent of the target distribution. Knowledge of the optimal acceptance rate allows one to automatically tune the neighborhood size of a proposal distribution in a discrete space, directly analogous to step-size control in continuous spaces.
arXiv Detail & Related papers (2022-09-16T22:09:53Z)
A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms. We develop a general framework from the perspective of Bregman minimization divergence. We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z)
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement. For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts. We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference. Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures. We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z)
Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains. One simple and effective estimator is based on the k nearest neighbor between these samples.
arXiv Detail & Related papers (2020-02-26T16:37:37Z)
Distributionally Robust Bayesian Quadrature Optimization [60.383252534861136]
We study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples. A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set. We propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose.
arXiv Detail & Related papers (2020-01-19T12:00:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.