Embedding Recycling for Language Models
- URL: http://arxiv.org/abs/2207.04993v1
- Date: Mon, 11 Jul 2022 16:36:14 GMT
- Title: Embedding Recycling for Language Models
- Authors: Jon Saad-Falcon, Amanpreet Singh, Luca Soldaini, Mike D'Arcy, Arman
Cohan, Doug Downey
- Abstract summary: We study how to decrease computational cost in such settings through embedding recycling (ER)
We propose caching an intermediate layer's output from a pretrained model and finetuning the remaining layers for new tasks.
We show that our method provides a 100% speedup during training and a 55-86% speedup for inference, and has negligible impacts on accuracy for text classification and entity recognition tasks in the scientific domain.
- Score: 38.11465250435789
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training and inference with large neural models is expensive. However, for
many application domains, while new tasks and models arise frequently, the
underlying documents being modeled remain mostly unchanged. We study how to
decrease computational cost in such settings through embedding recycling (ER):
re-using activations from previous model runs when performing training or
inference. In contrast to prior work focusing on freezing small classification
heads for finetuning which often leads to notable drops in performance, we
propose caching an intermediate layer's output from a pretrained model and
finetuning the remaining layers for new tasks. We show that our method provides
a 100% speedup during training and a 55-86% speedup for inference, and has
negligible impacts on accuracy for text classification and entity recognition
tasks in the scientific domain. For general-domain question answering tasks, ER
offers a similar speedup and lowers accuracy by a small amount. Finally, we
identify several open challenges and future directions for ER.
Related papers
- Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Less is More: On the Feature Redundancy of Pretrained Models When
Transferring to Few-shot Tasks [120.23328563831704]
Transferring a pretrained model to a downstream task can be as easy as conducting linear probing with target data.
We show that, for linear probing, the pretrained features can be extremely redundant when the downstream data is scarce.
arXiv Detail & Related papers (2023-10-05T19:00:49Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for
Downstream Tasks [55.431048995662714]
We create a small model for a new task from the pruned models of similar tasks.
We show that a few fine-tuning steps on this model suffice to produce a promising pruned-model for the new task.
We develop a simple but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the pruning iterations for a new task.
arXiv Detail & Related papers (2023-01-27T06:49:47Z) - Multi-task Retrieval for Knowledge-Intensive Tasks [21.725935960568027]
We propose a multi-task trained model for neural retrieval.
Our approach not only outperforms previous methods in the few-shot setting, but also rivals specialised neural retrievers.
With the help of our retriever, we improve existing models for downstream tasks and closely match or improve the state of the art on multiple benchmarks.
arXiv Detail & Related papers (2021-01-01T00:16:34Z) - Patient-Specific Domain Adaptation for Fast Optical Flow Based on
Teacher-Student Knowledge Transfer [2.0303656145222857]
Fast motion feedback is crucial in computer-aided surgery (CAS) on moving tissue.
Current deep learning OF models show the common speed vs. accuracy trade-off.
We propose patient-specific fine-tuning of a fast model to achieve high accuracy at high processing rates.
arXiv Detail & Related papers (2020-07-09T17:01:08Z) - An Efficient Method of Training Small Models for Regression Problems
with Knowledge Distillation [1.433758865948252]
We propose a new formalism of knowledge distillation for regression problems.
First, we propose a new loss function, teacher outlier loss rejection, which rejects outliers in training samples using teacher model predictions.
By considering the multi-task network, training of the feature extraction of student models becomes more effective.
arXiv Detail & Related papers (2020-02-28T08:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.