Related papers: MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks

MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks

URL: http://arxiv.org/abs/2007.14546v3
Date: Thu, 13 May 2021 15:39:27 GMT
Title: MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks
Authors: Jun Shu, Yanwen Zhu, Qian Zhao, Zongben Xu, Deyu Meng
Abstract summary: The learning rate (LR) is one of the most important hyper-learned network parameters in gradient descent (SGD) training networks (DNN) In this paper, we propose to learn a proper LR schedule for MLR-SNet tasks. We also make MLR-SNet to query tasks like different noises, architectures, data modalities, sizes from the training ones, and achieve or even better performance.
Score: 56.66010634895913
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The learning rate (LR) is one of the most important hyper-parameters in stochastic gradient descent (SGD) algorithm for training deep neural networks (DNN). However, current hand-designed LR schedules need to manually pre-specify a fixed form, which limits their ability to adapt practical non-convex optimization problems due to the significant diversification of training dynamics. Meanwhile, it always needs to search proper LR schedules from scratch for new tasks, which, however, are often largely different with task variations, like data modalities, network architectures, or training data capacities. To address this learning-rate-schedule setting issues, we propose to parameterize LR schedules with an explicit mapping formulation, called \textit{MLR-SNet}. The learnable parameterized structure brings more flexibility for MLR-SNet to learn a proper LR schedule to comply with the training dynamics of DNN. Image and text classification benchmark experiments substantiate the capability of our method for achieving proper LR schedules. Moreover, the explicit parameterized structure makes the meta-learned LR schedules capable of being transferable and plug-and-play, which can be easily generalized to new heterogeneous tasks. We transfer our meta-learned MLR-SNet to query tasks like different training epochs, network architectures, data modalities, dataset sizes from the training ones, and achieve comparable or even better performance compared with hand-designed LR schedules specifically designed for the query tasks. The robustness of MLR-SNet is also substantiated when the training data are biased with corrupted noise. We further prove the convergence of the SGD algorithm equipped with LR schedule produced by our MLR-Net, with the convergence rate comparable to the best-known ones of the algorithm for solving the problem.

Related papers

AdaptSR: Low-Rank Adaptation for Efficient and Scalable Real-World Super-Resolution [50.584551250242235]
AdaptSR is a low-rank adaptation framework that efficiently repurposes bi-cubic-trained SR models for real-world tasks. Our experiments demonstrate that AdaptSR outperforms GAN and diffusion-based SR methods by up to 4 dB in PSNR and 2% in perceptual scores on real SR benchmarks.
arXiv Detail & Related papers (2025-03-10T18:03:18Z)
Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting [107.4034346788744]
Existing vehicle trajectory prediction models struggle with generalizability, prediction uncertainties, and handling complex interactions. We propose Perceiver with Register queries (PerReg+), a novel trajectory prediction framework that introduces: (1) Dual-Level Representation Learning via Self-Distillation (SD) and Masked Reconstruction (MR), capturing global context and fine-grained details; (2) Enhanced Multimodality using register-based queries and pretraining, eliminating the need for clustering and suppression; and (3) Adaptive Prompt Tuning during fine-tuning, freezing the main architecture and optimizing a small number of prompts for efficient adaptation.
arXiv Detail & Related papers (2025-01-08T20:11:09Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Scaling Optimal LR Across Token Horizons [81.29631219839311]
We show how optimal learning rate depends on token horizon in LLM training. We also provide evidence that LLama-1 used too high LR, and estimate the performance hit from this.
arXiv Detail & Related papers (2024-09-30T03:32:02Z)
Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z)
Optimizing Learning Rate Schedules for Iterative Pruning of Deep Neural Networks [25.84452767219292]
We propose a learning rate (LR) schedule for network pruning called SILO. SILO has a strong theoretical motivation and dynamically adjusts the LR during pruning to improve generalization. We find that SILO is able to precisely adjust the value of max_lr to be within the Oracle optimized interval, resulting in performance competitive with the Oracle with significantly lower complexity.
arXiv Detail & Related papers (2022-12-09T14:39:50Z)
Selecting and Composing Learning Rate Policies for Deep Neural Networks [10.926538783768219]
This paper presents a systematic approach to selecting and composing an LR policy for effective Deep Neural Networks (DNNs) training. We develop an LR tuning mechanism for auto-verification of a given LR policy with respect to the desired accuracy goal under the pre-defined training time constraint. Second, we develop an LR policy recommendation system (LRBench) to select and compose good LR policies from the same and/or different LR functions through dynamic tuning. Third, we extend LRBench by supporting different DNNs and show the significant mutual impact of different LR policies and different policies.
arXiv Detail & Related papers (2022-10-24T03:32:59Z)
An Optimization-Based Meta-Learning Model for MRI Reconstruction with Diverse Dataset [4.9259403018534496]
We develop a generalizable MRI reconstruction model in the meta-learning framework. The proposed network learns regularization function in a learner adaptional model. We test the result of quick training on the unseen tasks after meta-training and in the saving half of the time.
arXiv Detail & Related papers (2021-10-02T03:21:52Z)
Automated Learning Rate Scheduler for Large-batch Training [24.20872850681828]
Large-batch training has been essential in leveraging large-scale datasets and models in deep learning. It often requires a specially designed learning rate (LR) schedule to achieve a comparable level of performance as in smaller batch training. We propose an automated LR scheduling algorithm which is effective for neural network training with a large batch size under the given epoch budget.
arXiv Detail & Related papers (2021-07-13T05:23:13Z)
A Wasserstein Minimax Framework for Mixed Linear Regression [69.40394595795544]
Multi-modal distributions are commonly used to model clustered data in learning tasks. We propose an optimal transport-based framework for Mixed Linear Regression problems.
arXiv Detail & Related papers (2021-06-14T16:03:51Z)
AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly [22.754424957856052]
We propose AutoLRS, which automatically optimize the learning rate for each training stage by modeling training dynamics. We demonstrate the advantages and the generality of AutoLRS through extensive experiments of training tasks diverse domains.
arXiv Detail & Related papers (2021-05-22T16:41:10Z)
Closed-loop Matters: Dual Regression Networks for Single Image Super-Resolution [73.86924594746884]
Deep neural networks have exhibited promising performance in image super-resolution. These networks learn a nonlinear mapping function from low-resolution (LR) images to high-resolution (HR) images. We propose a dual regression scheme by introducing an additional constraint on LR data to reduce the space of the possible functions.
arXiv Detail & Related papers (2020-03-16T04:23:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.