Resampling Stochastic Gradient Descent Cheaply for Efficient Uncertainty
Quantification
- URL: http://arxiv.org/abs/2310.11065v1
- Date: Tue, 17 Oct 2023 08:18:10 GMT
- Title: Resampling Stochastic Gradient Descent Cheaply for Efficient Uncertainty
Quantification
- Authors: Henry Lam, Zitong Wang
- Abstract summary: We investigate two computationally cheap resampling-based methods to construct confidence intervals for SGD solutions.
One uses multiple, but few, SGDs in parallel via resampling with replacement from the data, and another operates this in an online fashion.
- Score: 6.832893024035705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic gradient descent (SGD) or stochastic approximation has been widely
used in model training and stochastic optimization. While there is a huge
literature on analyzing its convergence, inference on the obtained solutions
from SGD has only been recently studied, yet is important due to the growing
need for uncertainty quantification. We investigate two computationally cheap
resampling-based methods to construct confidence intervals for SGD solutions.
One uses multiple, but few, SGDs in parallel via resampling with replacement
from the data, and another operates this in an online fashion. Our methods can
be regarded as enhancements of established bootstrap schemes to substantially
reduce the computation effort in terms of resampling requirements, while at the
same time bypassing the intricate mixing conditions in existing batching
methods. We achieve these via a recent so-called cheap bootstrap idea and
Berry-Esseen-type bound for SGD.
Related papers
- Scalable Signature-Based Distribution Regression via Reference Sets [1.8980236415886387]
Path signatures are used to leverage the information encoded in paths via signature-based features.
Current state of the art DR solutions are memory intensive and incur a high cost.
This computational bottleneck limits the application to small sample sizes.
We present a methodology for addressing the above issues; resolving estimation uncertainties.
We also propose a pipeline that enables us to use DR for a wide variety of learning tasks.
arXiv Detail & Related papers (2024-10-11T18:58:28Z) - CoLiDE: Concomitant Linear DAG Estimation [12.415463205960156]
We deal with the problem of learning acyclic graph structure from observational data to a linear equation.
We propose a new convex score function for sparsity-aware learning DAGs.
arXiv Detail & Related papers (2023-10-04T15:32:27Z) - SketchySGD: Reliable Stochastic Optimization via Randomized Curvature
Estimates [19.420605210427635]
SketchySGD improves upon existing gradient methods in machine learning by using randomized low-rank approximations to the subsampled Hessian.
We show theoretically that SketchySGD with a fixed stepsize converges linearly to a small ball around the optimum.
In the ill-conditioned setting we show SketchySGD converges at a faster rate than SGD for least-squares problems.
arXiv Detail & Related papers (2022-11-16T01:05:41Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Stochastic Reweighted Gradient Descent [4.355567556995855]
We propose an importance-sampling-based algorithm we call SRG (stochastic reweighted gradient)
We pay particular attention to the time and memory overhead of our proposed method.
We present empirical results to support our findings.
arXiv Detail & Related papers (2021-03-23T04:09:43Z) - Second-order step-size tuning of SGD for non-convex optimization [6.021787236982659]
In view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case.
One obtains a new first-order gradient method (Step-Tuned SGD) which can be seen as a version of the classical Barzilai-Borwein method.
arXiv Detail & Related papers (2021-03-05T10:01:48Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.