Related papers: Resampling Stochastic Gradient Descent Cheaply for Efficient Uncertainty Quantification

Resampling Stochastic Gradient Descent Cheaply for Efficient Uncertainty Quantification

URL: http://arxiv.org/abs/2310.11065v1
Date: Tue, 17 Oct 2023 08:18:10 GMT
Title: Resampling Stochastic Gradient Descent Cheaply for Efficient Uncertainty Quantification
Authors: Henry Lam, Zitong Wang
Abstract summary: We investigate two computationally cheap resampling-based methods to construct confidence intervals for SGD solutions. One uses multiple, but few, SGDs in parallel via resampling with replacement from the data, and another operates this in an online fashion.
Score: 6.832893024035705
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic gradient descent (SGD) or stochastic approximation has been widely used in model training and stochastic optimization. While there is a huge literature on analyzing its convergence, inference on the obtained solutions from SGD has only been recently studied, yet is important due to the growing need for uncertainty quantification. We investigate two computationally cheap resampling-based methods to construct confidence intervals for SGD solutions. One uses multiple, but few, SGDs in parallel via resampling with replacement from the data, and another operates this in an online fashion. Our methods can be regarded as enhancements of established bootstrap schemes to substantially reduce the computation effort in terms of resampling requirements, while at the same time bypassing the intricate mixing conditions in existing batching methods. We achieve these via a recent so-called cheap bootstrap idea and Berry-Esseen-type bound for SGD.

Related papers

Variance Reduction Methods Do Not Need to Compute Full Gradients: Improved Efficiency through Shuffling [44.31966204357333]
We develop memory-efficient algorithms for large-scale machine learning problems. We use two key techniques to make our approach memory-efficient and avoid full computations.
arXiv Detail & Related papers (2025-02-20T15:37:45Z)
Scalable Signature-Based Distribution Regression via Reference Sets [1.8980236415886387]
Path signatures are used to leverage the information encoded in paths via signature-based features. Current state of the art DR solutions are memory intensive and incur a high cost. This computational bottleneck limits the application to small sample sizes. We present a methodology for addressing the above issues; resolving estimation uncertainties. We also propose a pipeline that enables us to use DR for a wide variety of learning tasks.
arXiv Detail & Related papers (2024-10-11T18:58:28Z)
An Efficient Rehearsal Scheme for Catastrophic Forgetting Mitigation during Multi-stage Fine-tuning [55.467047686093025]
A common approach to alleviate such forgetting is to rehearse samples from prior tasks during fine-tuning. We propose a sampling scheme, textttbf mix-cd, that prioritizes rehearsal of collateral damage'' samples. Our approach is computationally efficient, easy to implement, and outperforms several leading continual learning methods in compute-constrained settings.
arXiv Detail & Related papers (2024-02-12T22:32:12Z)
CoLiDE: Concomitant Linear DAG Estimation [12.415463205960156]
We deal with the problem of learning acyclic graph structure from observational data to a linear equation. We propose a new convex score function for sparsity-aware learning DAGs.
arXiv Detail & Related papers (2023-10-04T15:32:27Z)
SketchySGD: Reliable Stochastic Optimization via Randomized Curvature Estimates [19.420605210427635]
SketchySGD improves upon existing gradient methods in machine learning by using randomized low-rank approximations to the subsampled Hessian. We show theoretically that SketchySGD with a fixed stepsize converges linearly to a small ball around the optimum. In the ill-conditioned setting we show SketchySGD converges at a faster rate than SGD for least-squares problems.
arXiv Detail & Related papers (2022-11-16T01:05:41Z)
Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial. Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size. We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z)
Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z)
Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances. Online descent (OGD) is a popular approach to handle streaming data in pairwise learning. In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z)
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement. For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts. We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Stochastic Reweighted Gradient Descent [4.355567556995855]
We propose an importance-sampling-based algorithm we call SRG (stochastic reweighted gradient) We pay particular attention to the time and memory overhead of our proposed method. We present empirical results to support our findings.
arXiv Detail & Related papers (2021-03-23T04:09:43Z)
Second-order step-size tuning of SGD for non-convex optimization [6.021787236982659]
In view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case. One obtains a new first-order gradient method (Step-Tuned SGD) which can be seen as a version of the classical Barzilai-Borwein method.
arXiv Detail & Related papers (2021-03-05T10:01:48Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.