Stateless Neural Meta-Learning using Second-Order Gradients
- URL: http://arxiv.org/abs/2104.10527v1
- Date: Wed, 21 Apr 2021 13:34:31 GMT
- Title: Stateless Neural Meta-Learning using Second-Order Gradients
- Authors: Mike Huisman and Aske Plaat and Jan N. van Rijn
- Abstract summary: We show that the meta-learner LSTM subsumes MAML.
We construct a new algorithm (dubbed TURTLE) which is simpler than the meta-learner LSTM yet more expressive than MAML.
- Score: 1.933681537640272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning typically requires large data sets and much compute power for
each new problem that is learned. Meta-learning can be used to learn a good
prior that facilitates quick learning, thereby relaxing these requirements so
that new tasks can be learned quicker; two popular approaches are MAML and the
meta-learner LSTM. In this work, we compare the two and formally show that the
meta-learner LSTM subsumes MAML. Combining this insight with recent empirical
findings, we construct a new algorithm (dubbed TURTLE) which is simpler than
the meta-learner LSTM yet more expressive than MAML. TURTLE outperforms both
techniques at few-shot sine wave regression and image classification on
miniImageNet and CUB without any additional hyperparameter tuning, at a
computational cost that is comparable with second-order MAML. The key to
TURTLE's success lies in the use of second-order gradients, which also
significantly increases the performance of the meta-learner LSTM by 1-6%
accuracy.
Related papers
- Are LSTMs Good Few-Shot Learners? [4.316506818580031]
In 2001, Hochreiter et al. showed that an LSTM trained with backpropagation across different tasks is capable of meta-learning.
We revisit this approach and test it on modern few-shot learning benchmarks.
We find that LSTM, surprisingly, outperform the popular meta-learning technique MAML on a simple few-shot sine wave regression benchmark, but that LSTM, expectedly, fall short on more complex few-shot image classification benchmarks.
arXiv Detail & Related papers (2023-10-22T00:16:30Z) - Understanding Transfer Learning and Gradient-Based Meta-Learning
Techniques [5.2997197698288945]
We investigate performance differences between fine, MAML, and another meta-learning technique called Reptile.
Our findings show that both the output layer and the noisy training conditions induced by data scarcity play important roles in facilitating this specialization for MAML.
We show that the pre-trained features as obtained by the finetuning baseline are more diverse and discriminative than those learned by MAML and Reptile.
arXiv Detail & Related papers (2023-10-09T20:51:49Z) - Learning to Learn with Indispensable Connections [6.040904021861969]
We propose a novel meta-learning method called Meta-LTH that includes indispensible (necessary) connections.
Our method improves the classification accuracy by approximately 2% (20-way 1-shot task setting) for omniglot dataset.
arXiv Detail & Related papers (2023-04-06T04:53:13Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z) - Meta-learning Spiking Neural Networks with Surrogate Gradient Descent [1.90365714903665]
Bi-level learning, such as meta-learning, is increasingly used in deep learning to overcome limitations.
We show that SNNs meta-trained using MAML match or exceed the performance of conventional ANNs meta-trained with MAML on event-based meta-datasets.
Our results emphasize how meta-learning techniques can become instrumental for deploying neuromorphic learning technologies on real-world problems.
arXiv Detail & Related papers (2022-01-26T06:53:46Z) - Can we learn gradients by Hamiltonian Neural Networks? [68.8204255655161]
We propose a meta-learner based on ODE neural networks that learns gradients.
We demonstrate that our method outperforms a meta-learner based on LSTM for an artificial task and the MNIST dataset with ReLU activations in the optimizee.
arXiv Detail & Related papers (2021-10-31T18:35:10Z) - Improving Deep Learning for HAR with shallow LSTMs [70.94062293989832]
We propose to alter the DeepConvLSTM to employ a 1-layered instead of a 2-layered LSTM.
Our results stand in contrast to the belief that one needs at least a 2-layered LSTM when dealing with sequential data.
arXiv Detail & Related papers (2021-08-02T08:14:59Z) - Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and
Personalized Federated Learning [56.17603785248675]
Model-agnostic meta-learning (MAML) has become a popular research area.
Existing MAML algorithms rely on the episode' idea by sampling a few tasks and data points to update the meta-model at each iteration.
This paper proposes memory-based algorithms for MAML that converge with vanishing error.
arXiv Detail & Related papers (2021-06-09T08:47:58Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - La-MAML: Look-ahead Meta Learning for Continual Learning [14.405620521842621]
We propose Look-ahead MAML (La-MAML), a fast optimisation-based meta-learning algorithm for online-continual learning, aided by a small episodic memory.
La-MAML achieves performance superior to other replay-based, prior-based and meta-learning based approaches for continual learning on real-world visual classification benchmarks.
arXiv Detail & Related papers (2020-07-27T23:07:01Z) - Depth-Adaptive Graph Recurrent Network for Text Classification [71.20237659479703]
Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network.
We propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required.
arXiv Detail & Related papers (2020-02-29T03:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.