Robust learning with anytime-guaranteed feedback
- URL: http://arxiv.org/abs/2105.11135v1
- Date: Mon, 24 May 2021 07:31:52 GMT
- Title: Robust learning with anytime-guaranteed feedback
- Authors: Matthew J. Holland
- Abstract summary: gradient-based learning algorithms are driven by queried feedback with almost no performance guarantees.
Here we explore a modified "anytime online-to-batch" mechanism which admits high-probability error bounds.
In practice, we show noteworthy gains on real-world data applications.
- Score: 6.903929927172917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Under data distributions which may be heavy-tailed, many stochastic
gradient-based learning algorithms are driven by feedback queried at points
with almost no performance guarantees on their own. Here we explore a modified
"anytime online-to-batch" mechanism which for smooth objectives admits
high-probability error bounds while requiring only lower-order moment bounds on
the stochastic gradients. Using this conversion, we can derive a wide variety
of "anytime robust" procedures, for which the task of performance analysis can
be effectively reduced to regret control, meaning that existing regret bounds
(for the bounded gradient case) can be robustified and leveraged in a
straightforward manner. As a direct takeaway, we obtain an easily implemented
stochastic gradient-based algorithm for which all queried points formally enjoy
sub-Gaussian error bounds, and in practice show noteworthy gains on real-world
data applications.
Related papers
- Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - An Effective Dynamic Gradient Calibration Method for Continual Learning [11.555822066922508]
Continual learning (CL) is a fundamental topic in machine learning, where the goal is to train a model with continuously incoming data and tasks.
Due to the memory limit, we cannot store all the historical data, and therefore confront the catastrophic forgetting'' problem.
We develop an effective algorithm to calibrate the gradient in each updating step of the model.
arXiv Detail & Related papers (2024-07-30T16:30:09Z) - Dealing with unbounded gradients in stochastic saddle-point optimization [9.983014605039658]
We study the performance of first-order methods for finding saddle points of convex-concave functions.
A notorious challenge is that the gradients can grow arbitrarily large during optimization.
We propose a simple and effective regularization technique that stabilizes the iterates and yields meaningful performance guarantees.
arXiv Detail & Related papers (2024-02-21T16:13:49Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Low-memory stochastic backpropagation with multi-channel randomized
trace estimation [6.985273194899884]
We propose to approximate the gradient of convolutional layers in neural networks with a multi-channel randomized trace estimation technique.
Compared to other methods, this approach is simple, amenable to analyses, and leads to a greatly reduced memory footprint.
We discuss the performance of networks trained with backpropagation and how the error can be controlled while maximizing memory usage and minimizing computational overhead.
arXiv Detail & Related papers (2021-06-13T13:54:02Z) - Stochastic Reweighted Gradient Descent [4.355567556995855]
We propose an importance-sampling-based algorithm we call SRG (stochastic reweighted gradient)
We pay particular attention to the time and memory overhead of our proposed method.
We present empirical results to support our findings.
arXiv Detail & Related papers (2021-03-23T04:09:43Z) - Acceleration via Fractal Learning Rate Schedules [37.878672787331105]
We show that the learning rate schedule remains notoriously difficult to understand and expensive to tune.
We reinterpret an iterative algorithm from the numerical analysis literature as what we call the Chebyshev learning rate schedule for accelerating vanilla gradient descent.
We provide some experiments and discussion to challenge current understandings of the "edge of stability" in deep learning.
arXiv Detail & Related papers (2021-03-01T22:52:13Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Regret minimization in stochastic non-convex learning via a
proximal-gradient approach [80.59047515124198]
Motivated by applications in machine learning and operations, we regret with first-order oracle feedback minimization online constrained problems.
We develop a new prox-grad with guarantees proximal complexity reduction techniques.
arXiv Detail & Related papers (2020-10-13T09:22:21Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.