The Two Regimes of Deep Network Training
- URL: http://arxiv.org/abs/2002.10376v1
- Date: Mon, 24 Feb 2020 17:08:24 GMT
- Title: The Two Regimes of Deep Network Training
- Authors: Guillaume Leclerc, Aleksander Madry
- Abstract summary: We study the effects of different learning schedules and the appropriate way to select them.
To this end, we isolate two distinct phases, which we refer to as the "large-step regime" and the "small-step regime"
Our training algorithm can significantly simplify learning rate schedules.
- Score: 93.84309968956941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning rate schedule has a major impact on the performance of deep learning
models. Still, the choice of a schedule is often heuristical. We aim to develop
a precise understanding of the effects of different learning rate schedules and
the appropriate way to select them. To this end, we isolate two distinct phases
of training, the first, which we refer to as the "large-step" regime, exhibits
a rather poor performance from an optimization point of view but is the primary
contributor to model generalization; the latter, "small-step" regime exhibits
much more "convex-like" optimization behavior but used in isolation produces
models that generalize poorly. We find that by treating these regimes
separately-and em specializing our training algorithm to each one of them, we
can significantly simplify learning rate schedules.
Related papers
- Understanding Optimization in Deep Learning with Central Flows [53.66160508990508]
We show that an RMS's implicit behavior can be explicitly captured by a "central flow:" a differential equation.
We show that these flows can empirically predict long-term optimization trajectories of generic neural networks.
arXiv Detail & Related papers (2024-10-31T17:58:13Z) - Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.
We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z) - Narrowing the Focus: Learned Optimizers for Pretrained Models [24.685918556547055]
We propose a novel technique that learns a layer-specific linear combination of update directions provided by a set of base work tasks.
When evaluated on an image, this specialized significantly outperforms both traditional off-the-shelf methods such as Adam, as well existing general learneds.
arXiv Detail & Related papers (2024-08-17T23:55:19Z) - EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training [79.96741042766524]
We reformulate the training curriculum as a soft-selection function.
We show that exposing the contents of natural images can be readily achieved by the intensity of data augmentation.
The resulting method, EfficientTrain++, is simple, general, yet surprisingly effective.
arXiv Detail & Related papers (2024-05-14T17:00:43Z) - Optimal Linear Decay Learning Rate Schedules and Further Refinements [46.79573408189601]
Learning rate schedules used in practice bear little resemblance to those recommended by theory.
We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules.
arXiv Detail & Related papers (2023-10-11T19:16:35Z) - Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance.
Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z) - Joint Training of Deep Ensembles Fails Due to Learner Collusion [61.557412796012535]
Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model.
Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance.
We show that directly minimizing the loss of the ensemble appears to rarely be applied in practice.
arXiv Detail & Related papers (2023-01-26T18:58:07Z) - Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule
towards Flatter Local Minima [40.70374106466073]
We propose a generic learning rate schedule plugin called LEArning Rate Perturbation (LEAP)
LEAP can be applied to various learning rate schedules to improve the model training by introducing a certain perturbation to the learning rate.
We conduct extensive experiments which show that training with LEAP can improve the performance of various deep learning models on diverse datasets.
arXiv Detail & Related papers (2022-08-25T05:05:18Z) - Automatic Tuning of Stochastic Gradient Descent with Bayesian
Optimisation [8.340191147575307]
We introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation.
It flexibly adjusts to abrupt changes of behaviours induced by new learning rate values.
It is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the schedule for a set of similar tasks, as well as warm-starting it for a new task.
arXiv Detail & Related papers (2020-06-25T13:18:18Z) - Auto-Ensemble: An Adaptive Learning Rate Scheduling based Deep Learning
Model Ensembling [11.324407834445422]
This paper proposes Auto-Ensemble (AE) to collect checkpoints of deep learning model and ensemble them automatically.
The advantage of this method is to make the model converge to various local optima by scheduling the learning rate in once training.
arXiv Detail & Related papers (2020-03-25T08:17:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.