WARP: Word-level Adversarial ReProgramming
- URL: http://arxiv.org/abs/2101.00121v1
- Date: Fri, 1 Jan 2021 00:41:03 GMT
- Title: WARP: Word-level Adversarial ReProgramming
- Authors: Karen Hambardzumyan, Hrant Khachatrian, Jonathan May
- Abstract summary: In many applications it is preferable to tune much smaller sets of parameters, so that the majority of parameters can be shared across multiple tasks.
We present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation.
We show that this approach outperforms other methods with a similar number of trainable parameters on SST-2 and MNLI datasets.
- Score: 13.08689221166729
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning from pretrained language models recently became the
dominant approach for solving many NLP tasks. While fine-tuning large language
models usually gives the best performance, in many applications it is
preferable to tune much smaller sets of parameters, so that the majority of
parameters can be shared across multiple tasks. The main approach is to train
one or more task-specific layers on top of the language model. In this paper we
present an alternative approach based on adversarial reprogramming, which
extends earlier work on automatic prompt generation. It attempts to learn
task-specific word embeddings that, when concatenated to the input text,
instruct the language model to solve the specified task. We show that this
approach outperforms other methods with a similar number of trainable
parameters on SST-2 and MNLI datasets. On SST-2, the performance of our model
is comparable to the fully fine-tuned baseline, while on MNLI it is the best
among the methods that do not modify the parameters of the body of the language
model.
Related papers
- Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning [5.119396962985841]
Intermediate task transfer learning can greatly improve model performance.
We conduct the largest study on NLP task transferability and task selection with 12k source-target pairs.
Applying ESMs on a prior method reduces execution time and disk space usage by factors of 10 and 278, respectively.
arXiv Detail & Related papers (2024-10-19T16:22:04Z) - Effectively Prompting Small-sized Language Models for Cross-lingual Tasks via Winning Tickets [2.803947848713182]
Current soft prompt methods yield limited performance when applied to small-sized models.
Deep prompt-tuning entails prepending parameters in each prompt for enhanced efficacy.
We introduce the Lottery Ticket Prompt-learning framework that integrates winning tickets with soft prompts.
arXiv Detail & Related papers (2024-04-01T17:03:16Z) - On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - Zero-Shot Learners for Natural Language Understanding via a Unified
Multiple Choice Perspective [26.41585967095811]
Zero-shot learning aims to train a model on a given task such that it can address new learning tasks without any additional training.
Our approach converts zero-shot learning into multiple-choice tasks, avoiding problems in commonly used large-scale generative models such as FLAN.
Our approach shows state-of-the-art performance on several benchmarks and produces satisfactory results on tasks such as natural language inference and text classification.
arXiv Detail & Related papers (2022-10-16T17:24:06Z) - An Exploration of Prompt Tuning on Generative Spoken Language Model for
Speech Processing Tasks [112.1942546460814]
We report the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM)
Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models.
arXiv Detail & Related papers (2022-03-31T03:26:55Z) - Switch Point biased Self-Training: Re-purposing Pretrained Models for
Code-Switching [44.034300203700234]
Code-switching is a ubiquitous phenomenon due to the ease of communication it offers in multilingual communities.
We propose a self training method to repurpose the existing pretrained models using a switch-point bias.
Our approach performs well on both tasks by reducing the gap between the switch point performance.
arXiv Detail & Related papers (2021-11-01T19:42:08Z) - Continual Learning in Multilingual NMT via Language-Specific Embeddings [92.91823064720232]
It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data.
Because the parameters of the original model are not modified, its performance on the initial languages does not degrade.
arXiv Detail & Related papers (2021-10-20T10:38:57Z) - UniPELT: A Unified Framework for Parameter-Efficient Language Model
Tuning [64.638804236566]
We propose a unified framework, UniPELT, which incorporates different PELT methods as submodules and learns to activate the ones that best suit the current data or task setup.
Remarkably, on the GLUE benchmark, UniPELT consistently achieves 13pt gains compared to the best individual PELT method that it incorporates and even outperforms fine-tuning under different setups.
arXiv Detail & Related papers (2021-10-14T17:40:08Z) - Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems [74.8759568242933]
Task-oriented dialogue systems use four connected modules, namely, Natural Language Understanding (NLU), a Dialogue State Tracking (DST), Dialogue Policy (DP) and Natural Language Generation (NLG)
A research challenge is to learn each module with the least amount of samples given the high cost related to the data collection.
We evaluate the priming few-shot ability of language models in the NLU, DP and NLG tasks.
arXiv Detail & Related papers (2020-08-14T08:23:21Z) - Exploring Versatile Generative Language Model Via Parameter-Efficient
Transfer Learning [70.81910984985683]
We propose an effective way to fine-tune multiple down-stream generation tasks simultaneously using a single, large pre-trained model.
The experiments on five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.
arXiv Detail & Related papers (2020-04-08T06:18:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.