Related papers: Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

URL: http://arxiv.org/abs/2201.11576v1
Date: Thu, 27 Jan 2022 15:29:30 GMT
Title: Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation
Authors: Jixuan Wang, Kuan-Chieh Wang, Frank Rudzicz, Michael Brudno
Abstract summary: We propose a novel conditional neural process-based approach for few-shot text classification. Our key idea is to represent each task using gradient information from a base model. Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
Score: 24.488427641442694
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large pretrained language models (LMs) like BERT have improved performance in many disparate natural language processing (NLP) tasks. However, fine tuning such models requires a large number of training examples for each target task. Simultaneously, many realistic NLP problems are "few shot", without a sufficiently large training set. In this work, we propose a novel conditional neural process-based approach for few-shot text classification that learns to transfer from other diverse tasks with rich annotation. Our key idea is to represent each task using gradient information from a base model and to train an adaptation network that modulates a text classifier conditioned on the task representation. While previous task-aware few-shot learners represent tasks by input encoding, our novel task representation is more powerful, as the gradient captures input-output relationships of a task. Experimental results show that our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches on a collection of diverse few-shot tasks. We further conducted analysis and ablations to justify our design choices.

Related papers

Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Meta-training with Demonstration Retrieval for Efficient Few-shot Learning [11.723856248352007]
Large language models show impressive results on few-shot NLP tasks. These models are memory and computation-intensive. We propose meta-training with demonstration retrieval.
arXiv Detail & Related papers (2023-06-30T20:16:22Z)
Learning Easily Updated General Purpose Text Representations with Adaptable Task-Specific Prefixes [22.661527526471996]
Fine-tuning a large pre-trained language model for each downstream task causes computational burdens. We propose a prefix-based method to learn the fixed text representations with source tasks.
arXiv Detail & Related papers (2023-05-22T21:31:03Z)
Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z)
MetaICL: Learning to Learn In Context [87.23056864536613]
We introduce MetaICL, a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learn-ing on a large set of training tasks. We show that MetaICL approaches (and sometimes beats) the performance of models fully finetuned on the target task training data, and outperforms much bigger models with nearly 8x parameters.
arXiv Detail & Related papers (2021-10-29T17:42:08Z)
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation [80.18830380517753]
We develop a new task-agnostic distillation framework XtremeDistilTransformers. We study the transferability of several source tasks, augmentation resources and model architecture for distillation.
arXiv Detail & Related papers (2021-06-08T17:49:33Z)
Making Pre-trained Language Models Better Few-shot Learners [11.90626040104822]
Recent GPT-3 model achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples.
arXiv Detail & Related papers (2020-12-31T17:21:26Z)
Few-shot Sequence Learning with Transformers [79.87875859408955]
Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens. We propose an efficient learning algorithm based on Transformers.
arXiv Detail & Related papers (2020-12-17T12:30:38Z)
Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks [40.97125791174191]
We propose a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text. We show that this meta-training leads to better few-shot generalization than language-model pre-training followed by finetuning.
arXiv Detail & Related papers (2020-09-17T17:53:59Z)
Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time. We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z)
Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks. We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.