Learning Easily Updated General Purpose Text Representations with
Adaptable Task-Specific Prefixes
- URL: http://arxiv.org/abs/2305.13499v2
- Date: Sat, 14 Oct 2023 15:35:07 GMT
- Title: Learning Easily Updated General Purpose Text Representations with
Adaptable Task-Specific Prefixes
- Authors: Kuan-Hao Huang, Liang Tan, Rui Hou, Sinong Wang, Amjad Almahairi, Ruty
Rinott
- Abstract summary: Fine-tuning a large pre-trained language model for each downstream task causes computational burdens.
We propose a prefix-based method to learn the fixed text representations with source tasks.
- Score: 22.661527526471996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world applications require making multiple predictions from the
same text. Fine-tuning a large pre-trained language model for each downstream
task causes computational burdens in the inference time due to several times of
forward passes. To amortize the computational cost, freezing the language model
and building lightweight models for downstream tasks based on fixed text
representations are common solutions. Accordingly, how to learn fixed but
general text representations that can generalize well to unseen downstream
tasks becomes a challenge. Previous works have shown that the generalizability
of representations can be improved by fine-tuning the pre-trained language
model with some source tasks in a multi-tasking way. In this work, we propose a
prefix-based method to learn the fixed text representations with source tasks.
We learn a task-specific prefix for each source task independently and combine
them to get the final representations. Our experimental results show that
prefix-based training performs better than multi-tasking training and can
update the text representations at a smaller computational cost than
multi-tasking training.
Related papers
- Multi-Task Learning for Front-End Text Processing in TTS [15.62497569424995]
We propose a multi-task learning (MTL) model for jointly performing three tasks that are commonly solved in a text-to-speech front-end.
Our framework utilizes a tree-like structure with a trunk that learns shared representations, followed by separate task-specific heads.
arXiv Detail & Related papers (2024-01-12T02:13:21Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - Unified Multimodal Pre-training and Prompt-based Tuning for
Vision-Language Understanding and Generation [86.26522210882699]
We propose Unified multimodal pre-training for both Vision-Language understanding and generation.
The proposed UniVL is capable of handling both understanding tasks and generative tasks.
Our experiments show that there is a trade-off between understanding tasks and generation tasks while using the same model.
arXiv Detail & Related papers (2021-12-10T14:59:06Z) - Multitask Prompted Training Enables Zero-Shot Task Generalization [70.12770442071657]
We develop a system for mapping general natural language tasks into a human-readable prompted form.
We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size.
arXiv Detail & Related papers (2021-10-15T17:08:57Z) - VLM: Task-agnostic Video-Language Model Pre-training for Video
Understanding [78.28397557433544]
We present a task-agnostic multi-modal pre-training approach that can accept either video or text input, or both for a variety of end tasks.
Experimental results show strong performance across a wider range of tasks than any previous methods, often outperforming task-specific pre-training.
arXiv Detail & Related papers (2021-05-20T19:13:27Z) - Temporally Correlated Task Scheduling for Sequence Learning [143.70523777803723]
In many applications, a sequence learning task is usually associated with multiple temporally correlated auxiliary tasks.
We introduce a learnable scheduler to sequence learning, which can adaptively select auxiliary tasks for training.
Our method significantly improves the performance of simultaneous machine translation and stock trend forecasting.
arXiv Detail & Related papers (2020-07-10T10:28:54Z) - Pre-training via Paraphrasing [96.79972492585112]
We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual paraphrasing objective.
We show it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization.
For example, with no additional task-specific training we achieve BLEU scores of up to 35.8 for document translation.
arXiv Detail & Related papers (2020-06-26T14:43:43Z) - General Purpose Text Embeddings from Pre-trained Language Models for
Scalable Inference [34.47592026375839]
We show that some of the computational cost during inference can be amortized over the different tasks using a shared text encoder.
We also compare approaches for training such an encoder and show that encoders pre-trained over multiple tasks generalize well to unseen tasks.
arXiv Detail & Related papers (2020-04-29T16:11:26Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.