Are LSTMs Good Few-Shot Learners?
- URL: http://arxiv.org/abs/2310.14139v1
- Date: Sun, 22 Oct 2023 00:16:30 GMT
- Title: Are LSTMs Good Few-Shot Learners?
- Authors: Mike Huisman, Thomas M. Moerland, Aske Plaat, Jan N. van Rijn
- Abstract summary: In 2001, Hochreiter et al. showed that an LSTM trained with backpropagation across different tasks is capable of meta-learning.
We revisit this approach and test it on modern few-shot learning benchmarks.
We find that LSTM, surprisingly, outperform the popular meta-learning technique MAML on a simple few-shot sine wave regression benchmark, but that LSTM, expectedly, fall short on more complex few-shot image classification benchmarks.
- Score: 4.316506818580031
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning requires large amounts of data to learn new tasks well,
limiting its applicability to domains where such data is available.
Meta-learning overcomes this limitation by learning how to learn. In 2001,
Hochreiter et al. showed that an LSTM trained with backpropagation across
different tasks is capable of meta-learning. Despite promising results of this
approach on small problems, and more recently, also on reinforcement learning
problems, the approach has received little attention in the supervised few-shot
learning setting. We revisit this approach and test it on modern few-shot
learning benchmarks. We find that LSTM, surprisingly, outperform the popular
meta-learning technique MAML on a simple few-shot sine wave regression
benchmark, but that LSTM, expectedly, fall short on more complex few-shot image
classification benchmarks. We identify two potential causes and propose a new
method called Outer Product LSTM (OP-LSTM) that resolves these issues and
displays substantial performance gains over the plain LSTM. Compared to popular
meta-learning baselines, OP-LSTM yields competitive performance on
within-domain few-shot image classification, and performs better in
cross-domain settings by 0.5% to 1.9% in accuracy score. While these results
alone do not set a new state-of-the-art, the advances of OP-LSTM are orthogonal
to other advances in the field of meta-learning, yield new insights in how LSTM
work in image classification, allowing for a whole range of new research
directions. For reproducibility purposes, we publish all our research code
publicly.
Related papers
- LLMEmbed: Rethinking Lightweight LLM's Genuine Function in Text Classification [13.319594321038926]
We propose a simple and effective transfer learning strategy, namely LLMEmbed, to address this classical but challenging task.
We perform extensive experiments on publicly available datasets, and the results show that LLMEmbed achieves strong performance while enjoys low training overhead.
arXiv Detail & Related papers (2024-06-06T03:46:59Z) - Kernel Corrector LSTM [1.034961673489652]
We propose a new RW-ML algorithm, Kernel Corrector LSTM (KcLSTM), that replaces the meta-learner of cLSTM with a simpler method: Kernel Smoothing.
We empirically evaluate the forecasting accuracy and the training time of the new algorithm and compare it with cLSTM and LSTM.
arXiv Detail & Related papers (2024-04-28T18:44:10Z) - Image Classification using Sequence of Pixels [3.04585143845864]
This study compares sequential image classification methods based on recurrent neural networks.
We describe methods based on Long-Short-Term memory(LSTM), bidirectional Long-Short-Term memory(BiLSTM) architectures, etc.
arXiv Detail & Related papers (2022-09-23T09:42:44Z) - Few-Shot Class-Incremental Learning by Sampling Multi-Phase Tasks [59.12108527904171]
A model should recognize new classes and maintain discriminability over old classes.
The task of recognizing few-shot new classes without forgetting old classes is called few-shot class-incremental learning (FSCIL)
We propose a new paradigm for FSCIL based on meta-learning by LearnIng Multi-phase Incremental Tasks (LIMIT)
arXiv Detail & Related papers (2022-03-31T13:46:41Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z) - Improving Deep Learning for HAR with shallow LSTMs [70.94062293989832]
We propose to alter the DeepConvLSTM to employ a 1-layered instead of a 2-layered LSTM.
Our results stand in contrast to the belief that one needs at least a 2-layered LSTM when dealing with sequential data.
arXiv Detail & Related papers (2021-08-02T08:14:59Z) - MAML is a Noisy Contrastive Learner [72.04430033118426]
Model-agnostic meta-learning (MAML) is one of the most popular and widely-adopted meta-learning algorithms nowadays.
We provide a new perspective to the working mechanism of MAML and discover that: MAML is analogous to a meta-learner using a supervised contrastive objective function.
We propose a simple but effective technique, zeroing trick, to alleviate such interference.
arXiv Detail & Related papers (2021-06-29T12:52:26Z) - Stateless Neural Meta-Learning using Second-Order Gradients [1.933681537640272]
We show that the meta-learner LSTM subsumes MAML.
We construct a new algorithm (dubbed TURTLE) which is simpler than the meta-learner LSTM yet more expressive than MAML.
arXiv Detail & Related papers (2021-04-21T13:34:31Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.