MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks
- URL: http://arxiv.org/abs/2007.14546v3
- Date: Thu, 13 May 2021 15:39:27 GMT
- Title: MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks
- Authors: Jun Shu, Yanwen Zhu, Qian Zhao, Zongben Xu, Deyu Meng
- Abstract summary: The learning rate (LR) is one of the most important hyper-learned network parameters in gradient descent (SGD) training networks (DNN)
In this paper, we propose to learn a proper LR schedule for MLR-SNet tasks.
We also make MLR-SNet to query tasks like different noises, architectures, data modalities, sizes from the training ones, and achieve or even better performance.
- Score: 56.66010634895913
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The learning rate (LR) is one of the most important hyper-parameters in
stochastic gradient descent (SGD) algorithm for training deep neural networks
(DNN). However, current hand-designed LR schedules need to manually pre-specify
a fixed form, which limits their ability to adapt practical non-convex
optimization problems due to the significant diversification of training
dynamics. Meanwhile, it always needs to search proper LR schedules from scratch
for new tasks, which, however, are often largely different with task
variations, like data modalities, network architectures, or training data
capacities. To address this learning-rate-schedule setting issues, we propose
to parameterize LR schedules with an explicit mapping formulation, called
\textit{MLR-SNet}. The learnable parameterized structure brings more
flexibility for MLR-SNet to learn a proper LR schedule to comply with the
training dynamics of DNN. Image and text classification benchmark experiments
substantiate the capability of our method for achieving proper LR schedules.
Moreover, the explicit parameterized structure makes the meta-learned LR
schedules capable of being transferable and plug-and-play, which can be easily
generalized to new heterogeneous tasks. We transfer our meta-learned MLR-SNet
to query tasks like different training epochs, network architectures, data
modalities, dataset sizes from the training ones, and achieve comparable or
even better performance compared with hand-designed LR schedules specifically
designed for the query tasks. The robustness of MLR-SNet is also substantiated
when the training data are biased with corrupted noise. We further prove the
convergence of the SGD algorithm equipped with LR schedule produced by our
MLR-Net, with the convergence rate comparable to the best-known ones of the
algorithm for solving the problem.
Related papers
- Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Scaling Optimal LR Across Token Horizons [81.29631219839311]
We show how optimal learning rate depends on token horizon in LLM training.
We also provide evidence that LLama-1 used too high LR, and estimate the performance hit from this.
arXiv Detail & Related papers (2024-09-30T03:32:02Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Optimizing Learning Rate Schedules for Iterative Pruning of Deep Neural
Networks [25.84452767219292]
We propose a learning rate (LR) schedule for network pruning called SILO.
SILO has a strong theoretical motivation and dynamically adjusts the LR during pruning to improve generalization.
We find that SILO is able to precisely adjust the value of max_lr to be within the Oracle optimized interval, resulting in performance competitive with the Oracle with significantly lower complexity.
arXiv Detail & Related papers (2022-12-09T14:39:50Z) - Selecting and Composing Learning Rate Policies for Deep Neural Networks [10.926538783768219]
This paper presents a systematic approach to selecting and composing an LR policy for effective Deep Neural Networks (DNNs) training.
We develop an LR tuning mechanism for auto-verification of a given LR policy with respect to the desired accuracy goal under the pre-defined training time constraint.
Second, we develop an LR policy recommendation system (LRBench) to select and compose good LR policies from the same and/or different LR functions through dynamic tuning.
Third, we extend LRBench by supporting different DNNs and show the significant mutual impact of different LR policies and different policies.
arXiv Detail & Related papers (2022-10-24T03:32:59Z) - An Optimization-Based Meta-Learning Model for MRI Reconstruction with
Diverse Dataset [4.9259403018534496]
We develop a generalizable MRI reconstruction model in the meta-learning framework.
The proposed network learns regularization function in a learner adaptional model.
We test the result of quick training on the unseen tasks after meta-training and in the saving half of the time.
arXiv Detail & Related papers (2021-10-02T03:21:52Z) - Automated Learning Rate Scheduler for Large-batch Training [24.20872850681828]
Large-batch training has been essential in leveraging large-scale datasets and models in deep learning.
It often requires a specially designed learning rate (LR) schedule to achieve a comparable level of performance as in smaller batch training.
We propose an automated LR scheduling algorithm which is effective for neural network training with a large batch size under the given epoch budget.
arXiv Detail & Related papers (2021-07-13T05:23:13Z) - A Wasserstein Minimax Framework for Mixed Linear Regression [69.40394595795544]
Multi-modal distributions are commonly used to model clustered data in learning tasks.
We propose an optimal transport-based framework for Mixed Linear Regression problems.
arXiv Detail & Related papers (2021-06-14T16:03:51Z) - AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on
the Fly [22.754424957856052]
We propose AutoLRS, which automatically optimize the learning rate for each training stage by modeling training dynamics.
We demonstrate the advantages and the generality of AutoLRS through extensive experiments of training tasks diverse domains.
arXiv Detail & Related papers (2021-05-22T16:41:10Z) - Closed-loop Matters: Dual Regression Networks for Single Image
Super-Resolution [73.86924594746884]
Deep neural networks have exhibited promising performance in image super-resolution.
These networks learn a nonlinear mapping function from low-resolution (LR) images to high-resolution (HR) images.
We propose a dual regression scheme by introducing an additional constraint on LR data to reduce the space of the possible functions.
arXiv Detail & Related papers (2020-03-16T04:23:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.