Negative Inner-Loop Learning Rates Learn Universal Features
- URL: http://arxiv.org/abs/2203.10185v1
- Date: Fri, 18 Mar 2022 22:43:16 GMT
- Title: Negative Inner-Loop Learning Rates Learn Universal Features
- Authors: Tom Starshak
- Abstract summary: We study the effect that a learned learning-rate has on the per-task feature representations in Meta-SGD.
Negative learning rates push features away from task-specific features and towards task-agnostic features.
This confirms the hypothesis that Meta-SGD's negative learning rates cause the model to learn task-agnostic features rather than simply adapt to task specific features.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Model Agnostic Meta-Learning (MAML) consists of two optimization loops: the
outer loop learns a meta-initialization of model parameters that is shared
across tasks, and the inner loop task-specific adaptation step. A variant of
MAML, Meta-SGD, uses the same two loop structure, but also learns the
learning-rate for the adaptation step. Little attention has been paid to how
the learned learning-rate of Meta-SGD affects feature reuse. In this paper, we
study the effect that a learned learning-rate has on the per-task feature
representations in Meta-SGD. The learned learning-rate of Meta-SGD often
contains negative values. During the adaptation phase, these negative learning
rates push features away from task-specific features and towards task-agnostic
features.
We performed several experiments on the Mini-Imagenet dataset. Two neural
networks were trained, one with MAML, and one with Meta-SGD. The feature
quality for both models was tested as follows: strip away the linear
classification layer, pass labeled and unlabeled samples through this encoder,
classify the unlabeled samples according to their nearest neighbor. This
process was performed: 1) after training and using the meta-initialization
parameters; 2) after adaptation, and validated on that task; and 3) after
adaptation, and validated on a different task. The MAML trained model improved
on the task it was adapted to, but had worse performance on other tasks. The
Meta-SGD trained model was the opposite; it had worse performance on the task
it was adapted to, but improved on other tasks. This confirms the hypothesis
that Meta-SGD's negative learning rates cause the model to learn task-agnostic
features rather than simply adapt to task specific features.
Related papers
- Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Learning to Learn with Indispensable Connections [6.040904021861969]
We propose a novel meta-learning method called Meta-LTH that includes indispensible (necessary) connections.
Our method improves the classification accuracy by approximately 2% (20-way 1-shot task setting) for omniglot dataset.
arXiv Detail & Related papers (2023-04-06T04:53:13Z) - Meta-Learning with Self-Improving Momentum Target [72.98879709228981]
We propose Self-improving Momentum Target (SiMT) to improve the performance of a meta-learner.
SiMT generates the target model by adapting from the temporal ensemble of the meta-learner.
We show that SiMT brings a significant performance gain when combined with a wide range of meta-learning methods.
arXiv Detail & Related papers (2022-10-11T06:45:15Z) - The Effect of Diversity in Meta-Learning [79.56118674435844]
Few-shot learning aims to learn representations that can tackle novel tasks given a small number of examples.
Recent studies show that task distribution plays a vital role in the model's performance.
We study different task distributions on a myriad of models and datasets to evaluate the effect of task diversity on meta-learning algorithms.
arXiv Detail & Related papers (2022-01-27T19:39:07Z) - MT3: Meta Test-Time Training for Self-Supervised Test-Time Adaption [69.76837484008033]
An unresolved problem in Deep Learning is the ability of neural networks to cope with domain shifts during test-time.
We combine meta-learning, self-supervision and test-time training to learn to adapt to unseen test distributions.
Our approach significantly improves the state-of-the-art results on the CIFAR-10-Corrupted image classification benchmark.
arXiv Detail & Related papers (2021-03-30T09:33:38Z) - Meta-Regularization by Enforcing Mutual-Exclusiveness [0.8057006406834467]
We propose a regularization technique for meta-learning models that gives the model designer more control over the information flow during meta-training.
Our proposed regularization function shows an accuracy boost of $sim$ $36%$ on the Omniglot dataset.
arXiv Detail & Related papers (2021-01-24T22:57:19Z) - A Nested Bi-level Optimization Framework for Robust Few Shot Learning [10.147225934340877]
NestedMAML learns to assign weights to training tasks or instances.
Experiments on synthetic and real-world datasets demonstrate that NestedMAML efficiently mitigates the effects of "unwanted" tasks or instances.
arXiv Detail & Related papers (2020-11-13T06:41:22Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z) - Few Is Enough: Task-Augmented Active Meta-Learning for Brain Cell
Classification [8.998976678920236]
We propose a tAsk-auGmented actIve meta-LEarning (AGILE) method to efficiently adapt Deep Neural Networks to new tasks.
AGILE combines a meta-learning algorithm with a novel task augmentation technique which we use to generate an initial adaptive model.
We show that the proposed task-augmented meta-learning framework can learn to classify new cell types after a single gradient step.
arXiv Detail & Related papers (2020-07-09T18:03:12Z) - BI-MAML: Balanced Incremental Approach for Meta Learning [9.245355087256314]
We present a novel Balanced Incremental Model Agnostic Meta Learning system (BI-MAML) for learning multiple tasks.
Our method implements a meta-update rule to incrementally adapt its model to new tasks without forgetting old tasks.
Our system performs the meta-updates with only a few-shots and can successfully accomplish them.
arXiv Detail & Related papers (2020-06-12T18:28:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.