Zero-shot Learning by Generating Task-specific Adapters
- URL: http://arxiv.org/abs/2101.00420v1
- Date: Sat, 2 Jan 2021 10:50:23 GMT
- Title: Zero-shot Learning by Generating Task-specific Adapters
- Authors: Qinyuan Ye, Xiang Ren
- Abstract summary: We introduce Hypter, a framework that improves zero-shot transferability by training a hypernetwork to generate task-specific adapters from task descriptions.
This formulation enables learning at task level, and greatly reduces the number of parameters by using light-weight adapters.
- Score: 38.452434222367515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained text-to-text transformers achieve impressive performance across a
wide range of NLP tasks, and they naturally support zero-shot learning (ZSL) by
using the task description as prompt in the input. However, this approach has
potential limitations, as it learns from input-output pairs at instance level,
instead of learning to solve tasks at task level. Alternatively, applying
existing ZSL methods to text-to-text transformers is non-trivial due to their
text generation objective and huge size. To address these issues, we introduce
Hypter, a framework that improves zero-shot transferability by training a
hypernetwork to generate task-specific adapters from task descriptions. This
formulation enables learning at task level, and greatly reduces the number of
parameters by using light-weight adapters. Experiments on two datasets
demonstrate Hypter improves upon fine-tuning baselines.
Related papers
- From Instance Training to Instruction Learning: Task Adapters Generation from Instructions [29.452006810725184]
This paper focuses on simulating human learning to address the shortcomings of instance training.
We introduce Task Adapters Generation from Instructions (TAGI), which automatically constructs the task-specific model.
We evaluate TAGI on the Super-Natural Instructions and P3 datasets.
arXiv Detail & Related papers (2024-06-18T08:14:28Z) - TransPrompt v2: A Transferable Prompting Framework for Cross-task Text
Classification [37.824031151922604]
We propose TransPrompt v2, a novel transferable prompting framework for few-shot learning across similar or distant text classification tasks.
For learning across similar tasks, we employ a multi-task meta-knowledge acquisition (MMA) procedure to train a meta-learner.
For learning across distant tasks, we inject the task type descriptions into the prompt, and capture the intra-type and inter-type prompt embeddings.
arXiv Detail & Related papers (2023-08-29T04:16:57Z) - Reducing Sequence Length by Predicting Edit Operations with Large
Language Models [50.66922361766939]
This paper proposes predicting edit spans for the source text for local sequence transduction tasks.
We apply instruction tuning for Large Language Models on the supervision data of edit spans.
Experiments show that the proposed method achieves comparable performance to the baseline in four tasks.
arXiv Detail & Related papers (2023-05-19T17:51:05Z) - Improving Cross-task Generalization of Unified Table-to-text Models with
Compositional Task Configurations [63.04466647849211]
Methods typically encode task information with a simple dataset name as a prefix to the encoder.
We propose compositional task configurations, a set of prompts prepended to the encoder to improve cross-task generalization.
We show this not only allows the model to better learn shared knowledge across different tasks at training, but also allows us to control the model by composing new configurations.
arXiv Detail & Related papers (2022-12-17T02:20:14Z) - Task Residual for Tuning Vision-Language Models [69.22958802711017]
We propose a new efficient tuning approach for vision-language models (VLMs) named Task Residual Tuning (TaskRes)
TaskRes explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task.
The proposed TaskRes is simple yet effective, which significantly outperforms previous methods on 11 benchmark datasets.
arXiv Detail & Related papers (2022-11-18T15:09:03Z) - Selective Token Generation for Few-shot Natural Language Generation [19.015739016376532]
We develop a novel additive learning algorithm based on reinforcement learning (RL)
We show that the proposed selective token generation significantly outperforms the previous additive learning algorithms based on the PLMs.
arXiv Detail & Related papers (2022-09-17T00:48:52Z) - Learning to Transfer Prompts for Text Generation [97.64625999380425]
We propose a novel prompt-based method (PTG) for text generation in a transferable setting.
First, PTG learns a set of source prompts for various source generation tasks and then transfers these prompts as target prompts to perform target generation tasks.
In extensive experiments, PTG yields competitive or better results than fine-tuning methods.
arXiv Detail & Related papers (2022-05-03T14:53:48Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - Zero-Shot Information Extraction as a Unified Text-to-Triple Translation [56.01830747416606]
We cast a suite of information extraction tasks into a text-to-triple translation framework.
We formalize the task as a translation between task-specific input text and output triples.
We study the zero-shot performance of this framework on open information extraction.
arXiv Detail & Related papers (2021-09-23T06:54:19Z) - Levenshtein Training for Word-level Quality Estimation [15.119782800097711]
Levenshtein Transformer is a natural fit for the word-level QE task.
A Levenshtein Transformer can learn to post-edit without explicit supervision.
arXiv Detail & Related papers (2021-09-12T20:45:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.