A straightforward line search approach on the expected empirical loss
for stochastic deep learning problems
- URL: http://arxiv.org/abs/2010.00921v1
- Date: Fri, 2 Oct 2020 11:04:02 GMT
- Title: A straightforward line search approach on the expected empirical loss
for stochastic deep learning problems
- Authors: Maximus Mutschler and Andreas Zell
- Abstract summary: It is too costly to search for good step sizes on the expected empirical loss due to noisy losses in deep learning.
This work shows that it is possible to approximate the expected empirical loss on vertical cross sections for common deep learning tasks considerably cheaply.
- Score: 20.262526694346104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A fundamental challenge in deep learning is that the optimal step sizes for
update steps of stochastic gradient descent are unknown. In traditional
optimization, line searches are used to determine good step sizes, however, in
deep learning, it is too costly to search for good step sizes on the expected
empirical loss due to noisy losses. This empirical work shows that it is
possible to approximate the expected empirical loss on vertical cross sections
for common deep learning tasks considerably cheaply. This is achieved by
applying traditional one-dimensional function fitting to measured noisy losses
of such cross sections. The step to a minimum of the resulting approximation is
then used as step size for the optimization. This approach leads to a robust
and straightforward optimization method which performs well across datasets and
architectures without the need of hyperparameter tuning.
Related papers
- Training-set-free two-stage deep learning for spectroscopic data
de-noising [0.0]
De-noising is a prominent step in the spectra post-processing procedure.
Previous machine learning-based methods are fast but mostly based on supervised learning.
Unsupervised-based algorithms are slow and require a training set that may be typically expensive in real experimental measurements.
arXiv Detail & Related papers (2024-02-29T03:31:41Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach [46.457298683984924]
Bilevel optimization (BO) is useful for solving a variety important machine learning problems.
Conventional methods need to differentiate through the low-level optimization process with implicit differentiation.
First-order BO depends only on first-order information, requires no implicit differentiation.
arXiv Detail & Related papers (2022-09-19T01:51:12Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Using a one dimensional parabolic model of the full-batch loss to
estimate learning rates during training [21.35522589789314]
This work introduces a line-search method that approximates the full-batch loss with a parabola estimated over several mini-batches.
In the experiments conducted, our approach mostly outperforms SGD tuned with a piece-wise constant learning rate schedule.
arXiv Detail & Related papers (2021-08-31T14:36:23Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - Low-Rank Robust Online Distance/Similarity Learning based on the
Rescaled Hinge Loss [0.34376560669160383]
Existing online methods usually assume training triplets or pairwise constraints are exist in advance.
We formulate the online Distance-Similarity learning problem with the robust Rescaled hinge loss function.
The proposed model is rather general and can be applied to any PA-based online Distance-Similarity algorithm.
arXiv Detail & Related papers (2020-10-07T08:38:34Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Disentangling Adaptive Gradient Methods from Learning Rates [65.0397050979662]
We take a deeper look at how adaptive gradient methods interact with the learning rate schedule.
We introduce a "grafting" experiment which decouples an update's magnitude from its direction.
We present some empirical and theoretical retrospectives on the generalization of adaptive gradient methods.
arXiv Detail & Related papers (2020-02-26T21:42:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.