Self-Supervised Meta-Learning for Few-Shot Natural Language
Classification Tasks
- URL: http://arxiv.org/abs/2009.08445v2
- Date: Sun, 15 Nov 2020 20:31:22 GMT
- Title: Self-Supervised Meta-Learning for Few-Shot Natural Language
Classification Tasks
- Authors: Trapit Bansal, Rishikesh Jha, Tsendsuren Munkhdalai, Andrew McCallum
- Abstract summary: We propose a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text.
We show that this meta-training leads to better few-shot generalization than language-model pre-training followed by finetuning.
- Score: 40.97125791174191
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised pre-training of transformer models has revolutionized NLP
applications. Such pre-training with language modeling objectives provides a
useful initial point for parameters that generalize well to new tasks with
fine-tuning. However, fine-tuning is still data inefficient -- when there are
few labeled examples, accuracy can be low. Data efficiency can be improved by
optimizing pre-training directly for future fine-tuning with few examples; this
can be treated as a meta-learning problem. However, standard meta-learning
techniques require many training tasks in order to generalize; unfortunately,
finding a diverse set of such supervised tasks is usually difficult. This paper
proposes a self-supervised approach to generate a large, rich, meta-learning
task distribution from unlabeled text. This is achieved using a cloze-style
objective, but creating separate multi-class classification tasks by gathering
tokens-to-be blanked from among only a handful of vocabulary terms. This yields
as many unique meta-training tasks as the number of subsets of vocabulary
terms. We meta-train a transformer model on this distribution of tasks using a
recent meta-learning framework. On 17 NLP tasks, we show that this
meta-training leads to better few-shot generalization than language-model
pre-training followed by finetuning. Furthermore, we show how the
self-supervised tasks can be combined with supervised tasks for meta-learning,
providing substantial accuracy gains over previous supervised meta-learning.
Related papers
- Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - MetaICL: Learning to Learn In Context [87.23056864536613]
We introduce MetaICL, a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learn-ing on a large set of training tasks.
We show that MetaICL approaches (and sometimes beats) the performance of models fully finetuned on the target task training data, and outperforms much bigger models with nearly 8x parameters.
arXiv Detail & Related papers (2021-10-29T17:42:08Z) - Task Attended Meta-Learning for Few-Shot Learning [3.0724051098062097]
We introduce a training curriculum motivated by selective focus in humans, called task attended meta-training, to weight the tasks in a batch.
The comparisons of the models with their non-task-attended counterparts on complex datasets validate its effectiveness.
arXiv Detail & Related papers (2021-06-20T07:34:37Z) - Meta-Regularization by Enforcing Mutual-Exclusiveness [0.8057006406834467]
We propose a regularization technique for meta-learning models that gives the model designer more control over the information flow during meta-training.
Our proposed regularization function shows an accuracy boost of $sim$ $36%$ on the Omniglot dataset.
arXiv Detail & Related papers (2021-01-24T22:57:19Z) - Variable-Shot Adaptation for Online Meta-Learning [123.47725004094472]
We study the problem of learning new tasks from a small, fixed number of examples, by meta-learning across static data from a set of previous tasks.
We find that meta-learning solves the full task set with fewer overall labels and greater cumulative performance, compared to standard supervised methods.
These results suggest that meta-learning is an important ingredient for building learning systems that continuously learn and improve over a sequence of problems.
arXiv Detail & Related papers (2020-12-14T18:05:24Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z) - Incremental Meta-Learning via Indirect Discriminant Alignment [118.61152684795178]
We develop a notion of incremental learning during the meta-training phase of meta-learning.
Our approach performs favorably at test time as compared to training a model with the full meta-training set.
arXiv Detail & Related papers (2020-02-11T01:39:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.