Few-shot Sequence Learning with Transformers
- URL: http://arxiv.org/abs/2012.09543v1
- Date: Thu, 17 Dec 2020 12:30:38 GMT
- Title: Few-shot Sequence Learning with Transformers
- Authors: Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio
Ranzato, Arthur Szlam
- Abstract summary: Few-shot algorithms aim at learning new tasks provided only a handful of training examples.
In this work we investigate few-shot learning in the setting where the data points are sequences of tokens.
We propose an efficient learning algorithm based on Transformers.
- Score: 79.87875859408955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot algorithms aim at learning new tasks provided only a handful of
training examples. In this work we investigate few-shot learning in the setting
where the data points are sequences of tokens and propose an efficient learning
algorithm based on Transformers. In the simplest setting, we append a token to
an input sequence which represents the particular task to be undertaken, and
show that the embedding of this token can be optimized on the fly given few
labeled examples. Our approach does not require complicated changes to the
model architecture such as adapter layers nor computing second order
derivatives as is currently popular in the meta-learning and few-shot learning
literature. We demonstrate our approach on a variety of tasks, and analyze the
generalization properties of several model variants and baseline approaches. In
particular, we show that compositional task descriptors can improve
performance. Experiments show that our approach works at least as well as other
methods, while being more computationally efficient.
Related papers
- Understanding In-Context Learning in Transformers and LLMs by Learning
to Learn Discrete Functions [32.59746882017483]
We show that Transformers can learn to implement two distinct algorithms to solve a single task.
We also show that extant Large Language Models (LLMs) can compete with nearest-neighbor baselines on prediction tasks.
arXiv Detail & Related papers (2023-10-04T17:57:33Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - Meta Navigator: Search for a Good Adaptation Policy for Few-shot
Learning [113.05118113697111]
Few-shot learning aims to adapt knowledge learned from previous tasks to novel tasks with only a limited amount of labeled data.
Research literature on few-shot learning exhibits great diversity, while different algorithms often excel at different few-shot learning scenarios.
We present Meta Navigator, a framework that attempts to solve the limitation in few-shot learning by seeking a higher-level strategy.
arXiv Detail & Related papers (2021-09-13T07:20:01Z) - Hash Layers For Large Sparse Models [48.90784451703753]
We modify the feedforward layer to hash to different sets of weights depending on the current token, over all tokens in the sequence.
We show that this procedure either outperforms or is competitive with learning-to-route mixture-of-expert methods.
arXiv Detail & Related papers (2021-06-08T14:54:24Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.