In-Loop Meta-Learning with Gradient-Alignment Reward
- URL: http://arxiv.org/abs/2102.03275v1
- Date: Fri, 5 Feb 2021 16:27:08 GMT
- Title: In-Loop Meta-Learning with Gradient-Alignment Reward
- Authors: Samuel M\"uller, Andr\'e Biedenkapp, Frank Hutter
- Abstract summary: We present a cheap-to-compute and memory-saving reward, the gradient-alignment reward (GAR), that can guide the optimization.
First, we present the application of GAR to choosing the data distribution as a mixture of multiple dataset splits in a small scale setting.
Second, we show that it can successfully guide learning augmentation strategies competitive with state-of-the-art augmentation strategies on CIFAR-10 and CIFAR-100.
- Score: 34.1954698584925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: At the heart of the standard deep learning training loop is a greedy gradient
step minimizing a given loss. We propose to add a second step to maximize
training generalization. To do this, we optimize the loss of the next training
step. While computing the gradient for this generally is very expensive and
many interesting applications consider non-differentiable parameters (e.g. due
to hard samples), we present a cheap-to-compute and memory-saving reward, the
gradient-alignment reward (GAR), that can guide the optimization. We use this
reward to optimize multiple distributions during model training. First, we
present the application of GAR to choosing the data distribution as a mixture
of multiple dataset splits in a small scale setting. Second, we show that it
can successfully guide learning augmentation strategies competitive with
state-of-the-art augmentation strategies on CIFAR-10 and CIFAR-100.
Related papers
- AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods [17.043034606088234]
We introduce AdAdaGrad's scalar variant AdAdaGradNorm, which increase sizes during training.
We also perform image classification experiments, highlighting the merits of our proposed strategies.
arXiv Detail & Related papers (2024-02-17T07:49:50Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Penalizing Gradient Norm for Efficiently Improving Generalization in
Deep Learning [13.937644559223548]
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning.
We propose an effective method to improve the model generalization by penalizing the gradient norm of loss function during optimization.
arXiv Detail & Related papers (2022-02-08T02:03:45Z) - Meta-Learning with Adjoint Methods [16.753336086160598]
A Meta-Learning (MAML) is widely used to find a good initialization for a family of tasks.
Despite its success, a critical challenge in MAML is to calculate the gradient w.r.t the initialization of a long training trajectory for the sampled tasks.
We propose Adjoint MAML (A-MAML) to address this problem.
We demonstrate the advantage of our approach in both synthetic and real-world meta-learning tasks.
arXiv Detail & Related papers (2021-10-16T01:18:50Z) - Tom: Leveraging trend of the observed gradients for faster convergence [0.0]
Tom is a novel variant of Adam that takes into account the trend observed for the gradients in the landscape in the loss traversed by the neural network.
Tom outperforms Adagrad, Adadelta, RMSProp and Adam in terms of both accuracy and has a faster convergence.
arXiv Detail & Related papers (2021-09-07T20:19:40Z) - Adapting Stepsizes by Momentumized Gradients Improves Optimization and
Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z) - Faster Meta Update Strategy for Noise-Robust Deep Learning [62.08964100618873]
We introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient with a faster layer-wise approximation.
We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance.
arXiv Detail & Related papers (2021-04-30T16:19:07Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Graph Learning with Loss-Guided Training [16.815638149823744]
We explore em loss-guided training in a new domain of node embedding methods pioneered by sc DeepWalk.
Our empirical evaluation on a rich collection of datasets shows significant acceleration over the baseline static methods, both in terms of total training performed and overall computation.
arXiv Detail & Related papers (2020-05-31T08:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.