Unbiased Gradient Estimation for Distributionally Robust Learning
- URL: http://arxiv.org/abs/2012.12367v1
- Date: Tue, 22 Dec 2020 21:35:03 GMT
- Title: Unbiased Gradient Estimation for Distributionally Robust Learning
- Authors: Soumyadip Ghosh and Mark Squillante
- Abstract summary: We consider a new approach based on distributionally robust learning (DRL) that applies gradient descent to the inner problem.
Our algorithm efficiently estimates gradient gradient through multi-level Monte Carlo randomization.
- Score: 2.1777837784979277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Seeking to improve model generalization, we consider a new approach based on
distributionally robust learning (DRL) that applies stochastic gradient descent
to the outer minimization problem. Our algorithm efficiently estimates the
gradient of the inner maximization problem through multi-level Monte Carlo
randomization. Leveraging theoretical results that shed light on why standard
gradient estimators fail, we establish the optimal parameterization of the
gradient estimators of our approach that balances a fundamental tradeoff
between computation time and statistical variance. Numerical experiments
demonstrate that our DRL approach yields significant benefits over previous
work.
Related papers
- Stein-Rule Shrinkage for Stochastic Gradient Estimation in High Dimensions [0.0]
gradient methods are central to large-scale learning, but they treat mini-batch gradients as unbiased estimators, which classical decision theory shows are inadmissible in high dimensions.<n>We introduce a framework based on Stein-rule shrinkage and construct a gradient estimator that adaptively contracts mini-batch gradients toward a stable estimator derived from historical momentum.<n> Empirical evaluations on CIFAR10 and CIFAR100 show consistent improvements over Adam in the large-batch regime.
arXiv Detail & Related papers (2026-02-02T08:01:13Z) - Efficient Stochastic Optimisation via Sequential Monte Carlo [0.5599792629509229]
We develop sequential Monte Carlo samplers for optimisation of functions with intractable gradients.<n>Our approach replaces expensive inner sampling methods with efficient SMC approximations, which can result in significant computational gains.<n>We demonstrate the effectiveness of our approach on the reward-tuning of energy-based models within various settings.
arXiv Detail & Related papers (2026-01-29T17:13:25Z) - Prior-Informed Zeroth-Order Optimization with Adaptive Direction Alignment for Memory-Efficient LLM Fine-Tuning [4.278794376089146]
We propose a plug-and-play method that incorporates prior-informed perturbations to refine gradient estimation.<n>Our method significantly accelerates convergence compared to standard ZO approaches.<n>We prove that our gradient estimator achieves stronger alignment with the true gradient direction.
arXiv Detail & Related papers (2026-01-08T08:27:15Z) - On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization [57.179679246370114]
A potential limitation of existing methods is the bias inherent in most perturbation estimators unless a stepsize is proposed.<n>We propose a novel family of unbiased gradient scaling estimators that eliminate bias while maintaining favorable construction.
arXiv Detail & Related papers (2025-10-22T18:25:43Z) - Uncertainty quantification for Markov chain induced martingales with application to temporal difference learning [55.197497603087065]
We analyze the performance of the Temporal Difference (TD) learning algorithm with linear function approximations.<n>We establish novel and general high-dimensional concentration inequalities and Berry-Esseen bounds for vector-valued martingales induced by Markov chains.
arXiv Detail & Related papers (2025-02-19T15:33:55Z) - Eliminating Ratio Bias for Gradient-based Simulated Parameter Estimation [0.7673339435080445]
This article addresses the challenge of parameter calibration in models where the likelihood function is not analytically available.
We propose a gradient-based simulated parameter estimation framework, leveraging a multi-time scale that tackles the issue of ratio bias in both maximum likelihood estimation and posterior density estimation problems.
arXiv Detail & Related papers (2024-11-20T02:46:15Z) - Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients.
Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z) - Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation [3.328448170090945]
Gradient Descent (SGD) with adaptive steps is widely used to train deep neural networks and generative models.
This paper provides a comprehensive analysis of the effect of bias on gradient functions.
arXiv Detail & Related papers (2024-02-05T10:17:36Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - Scalable method for Bayesian experimental design without integrating
over posterior distribution [0.0]
We address the computational efficiency in solving the A-optimal Bayesian design of experiments problems.
A-optimality is a widely used and easy-to-interpret criterion for Bayesian experimental design.
This study presents a novel likelihood-free approach to the A-optimal experimental design.
arXiv Detail & Related papers (2023-06-30T12:40:43Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Tighter Bounds on the Log Marginal Likelihood of Gaussian Process
Regression Using Conjugate Gradients [19.772149500352945]
We show that approximate maximum likelihood learning of model parameters by maximising our lower bound retains many of the sparse variational approach benefits.
In experiments, we show improved predictive performance with our model for a comparable amount of training time compared to other conjugate gradient based approaches.
arXiv Detail & Related papers (2021-02-16T17:54:59Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Efficient Debiased Evidence Estimation by Multilevel Monte Carlo
Sampling [0.0]
We propose a new optimization algorithm for Bayesian inference based multilevel Monte Carlo (MLMC) methods.
Our numerical results confirm considerable computational savings compared to the conventional estimators.
arXiv Detail & Related papers (2020-01-14T09:14:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.