EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks
- URL: http://arxiv.org/abs/2110.08426v1
- Date: Sat, 16 Oct 2021 00:50:08 GMT
- Title: EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks
- Authors: Frederick Liu, Siamak Shakeri, Hongkun Yu, Jing Li
- Abstract summary: We study fine-tuning pre-trained encoder-decoder models such as T5.
Our experimental results show that textbfEncT5 with less than half of the parameters of T5 performs similarly to T5 models on GLUE benchmark.
- Score: 9.141586109808895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Encoder-decoder transformer architectures have become popular recently with
the advent of T5 models. It is also more favorable over architectures like BERT
for pre-training on language model task when it comes to large scale models
which could take months to train given it's generality. While being able to
generalize to more tasks, it is not evident if the proposed encoder-decoder
architecture is the most efficient for fine-tuning on classification and
regression tasks given the pre-trained model. In this work, we study
fine-tuning pre-trained encoder-decoder models such as T5. Particularly, we
propose \textbf{EncT5} as a way to efficiently fine-tune pre-trained
encoder-decoder T5 models for classification and regression tasks by using the
encoder layers. Our experimental results show that \textbf{EncT5} with less
than half of the parameters of T5 performs similarly to T5 models on GLUE
benchmark. We believe our proposed approach can be easily applied to any
pre-trained encoder-decoder model.
Related papers
- Shallow Cross-Encoders for Low-Latency Retrieval [69.06104373460597]
Cross-Encoders based on large transformer models (such as BERT or T5) are computationally expensive and allow for scoring only a small number of documents within a reasonably small latency window.
We show that weaker shallow transformer models (i.e., transformers with a limited number of layers) actually perform better than full-scale models when constrained to these practical low-latency settings.
arXiv Detail & Related papers (2024-03-29T15:07:21Z) - SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced
Token Detection [49.43407207482008]
SpacTor is a new training procedure consisting of a hybrid objective combining span corruption (SC) and token replacement detection (RTD)
In our experiments with encoder-decoder architectures (T5) on a variety of NLP tasks, SpacTor-T5 yields the same downstream performance as standard SC pre-training.
arXiv Detail & Related papers (2024-01-24T00:36:13Z) - UT5: Pretraining Non autoregressive T5 with unrolled denoising [9.656399724144192]
We studied unsupervised pretraining for non auto-regressive T5 models via unrolled denoising.
We showed its SoTA results in downstream generation tasks such as SQuAD question generation and XSum.
arXiv Detail & Related papers (2023-11-14T21:28:10Z) - nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style
Models with Limited Resources [1.9813574408340644]
We present nanoT5, a framework for efficient pre-training and fine-tuning of T5 models.
nanoT5 allows a T5-Base model to be pre-trained on a single GPU in just 16 hours, without any loss in performance.
We make our contributions, including configurations, insights, and pre-trained models, available to the public.
arXiv Detail & Related papers (2023-09-05T16:35:41Z) - ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking
Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking.
We finetune a pretrained encoder-decoder model using in the form of document to query generation.
We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z) - What Language Model Architecture and Pretraining Objective Work Best for
Zero-Shot Generalization? [50.84738303888189]
We present a large-scale evaluation of modeling choices and their impact on zero-shot generalization.
We train models with over 5 billion parameters for more than 170 billion tokens.
We find that pretrained causal decoder models can be efficiently adapted into non-causal decoder models.
arXiv Detail & Related papers (2022-04-12T14:19:49Z) - LongT5: Efficient Text-To-Text Transformer for Long Sequences [8.743996838160825]
We present a new model, called LongT5, with which we explore the effects of scaling both the input length and model size at the same time.
We are able to achieve state-of-the-art results on several summarization tasks and outperform the original T5 models on question answering tasks.
arXiv Detail & Related papers (2021-12-15T06:35:29Z) - Scale Efficiently: Insights from Pre-training and Fine-tuning
Transformers [57.931830650323]
This paper presents scaling insights from pretraining and finetuning Transformers.
We show that aside from only the model size, model shape matters for downstream fine-tuning.
We present improved scaling protocols whereby our redesigned models achieve similar downstream fine-tuning quality.
arXiv Detail & Related papers (2021-09-22T12:29:15Z) - Primer: Searching for Efficient Transformers for Language Modeling [79.2677566332444]
Training and inference costs of large Transformer models have grown rapidly and become expensive.
Here we aim to reduce the costs of Transformers by searching for a more efficient variant.
We identify an architecture, named Primer, that has a smaller training cost than the original Transformer.
arXiv Detail & Related papers (2021-09-17T17:50:39Z) - Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text
Models [10.645591218689058]
We provide the first exploration of text-to-text transformers (T5) sentence embeddings.
We investigate three methods for extracting T5 sentence embeddings.
Our encoder-only models outperforms BERT-based sentence embeddings on both transfer tasks and semantic textual similarity.
arXiv Detail & Related papers (2021-08-19T18:58:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.