Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text
Models
- URL: http://arxiv.org/abs/2108.08877v1
- Date: Thu, 19 Aug 2021 18:58:02 GMT
- Title: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text
Models
- Authors: Jianmo Ni, Gustavo Hern\'andez {\'A}brego, Noah Constant, Ji Ma, Keith
B. Hall, Daniel Cer, Yinfei Yang
- Abstract summary: We provide the first exploration of text-to-text transformers (T5) sentence embeddings.
We investigate three methods for extracting T5 sentence embeddings.
Our encoder-only models outperforms BERT-based sentence embeddings on both transfer tasks and semantic textual similarity.
- Score: 10.645591218689058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide the first exploration of text-to-text transformers (T5) sentence
embeddings. Sentence embeddings are broadly useful for language processing
tasks. While T5 achieves impressive performance on language tasks cast as
sequence-to-sequence mapping problems, it is unclear how to produce sentence
embeddings from encoder-decoder models. We investigate three methods for
extracting T5 sentence embeddings: two utilize only the T5 encoder and one uses
the full T5 encoder-decoder model. Our encoder-only models outperforms
BERT-based sentence embeddings on both transfer tasks and semantic textual
similarity (STS). Our encoder-decoder method achieves further improvement on
STS. Scaling up T5 from millions to billions of parameters is found to produce
consistent improvements on downstream tasks. Finally, we introduce a two-stage
contrastive learning approach that achieves a new state-of-art on STS using
sentence embeddings, outperforming both Sentence BERT and SimCSE.
Related papers
- MrT5: Dynamic Token Merging for Efficient Byte-level Language Models [50.46453950887946]
This work introduces MrT5 (MergeT5), a more efficient variant of ByT5.
MrT5 integrates a token deletion mechanism in its encoder to dynamically shorten the input sequence length.
When trained on English text, MrT5 demonstrates the capability to transfer its deletion feature zero-shot across several languages.
arXiv Detail & Related papers (2024-10-28T06:14:12Z) - Code-Switching Text Generation and Injection in Mandarin-English ASR [57.57570417273262]
We investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T)
We first propose a strategy to generate code-switching text data and then investigate injecting generated text into T-T model explicitly by Text-To-Speech (TTS) conversion or implicitly by tying speech and text latent spaces.
Experimental results on the T-T model trained with a dataset containing 1,800 hours of real Mandarin-English code-switched speech show that our approaches to inject generated code-switching text significantly boost the performance of T-T models.
arXiv Detail & Related papers (2023-03-20T09:13:27Z) - Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for
Text-to-SQL Parsing [56.232873134174056]
One of the major challenges in text-to-text parsing is domain generalization, i.e., how to well generalize to unseen databases.
In this work, we explore ways to further augment the pre-trained text-to-text transformer model with specialized components for text-to-text parsing.
To this end, we propose a new architecture GRAPHIX-T5, augmented by some specially-designed graph-aware model with layers.
arXiv Detail & Related papers (2023-01-18T13:29:05Z) - Evaluating Byte and Wordpiece Level Models for Massively Multilingual
Semantic Parsing [3.431659287330068]
We compare a byte-level (ByT5) and a wordpiece based (mT5) sequence to sequence model on the 51 languages of the MASSIVE multilingual semantic parsing dataset.
We are able to reduce the gap in exact match accuracy to only 5 points with respect to a model trained on gold data from all the languages.
arXiv Detail & Related papers (2022-12-14T13:48:32Z) - EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start [21.4394742421462]
EdiT5 is a novel semi-autoregressive text-editing approach.
It combines the strengths of non-autoregressive text-editing and autoregressive decoding.
arXiv Detail & Related papers (2022-05-24T17:13:22Z) - TransAug: Translate as Augmentation for Sentence Embeddings [8.89078869712101]
We present TransAug (Translate as Augmentation), which provide the first exploration of utilizing translated sentence pairs as data augmentation for text.
Instead of adopting an encoder trained in other languages setting, we first distill a Chinese encoder from a SimCSE encoder (pretrained in English), so that their embeddings are close in semantic space, which can be regraded as implicit data augmentation.
Our approach achieves a new state-of-art on standard semantic textual similarity (STS), outperforming both SimCSE and Sentence-T5, and the best performance in corresponding tracks on transfer tasks evaluated by SentEval.
arXiv Detail & Related papers (2021-10-30T03:13:28Z) - EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks [9.141586109808895]
We study fine-tuning pre-trained encoder-decoder models such as T5.
Our experimental results show that textbfEncT5 with less than half of the parameters of T5 performs similarly to T5 models on GLUE benchmark.
arXiv Detail & Related papers (2021-10-16T00:50:08Z) - SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language
Processing [77.4527868307914]
We propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.
The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets.
To align the textual and speech information into a unified semantic space, we propose a cross-modal vector quantization method with random mixing-up to bridge speech and text.
arXiv Detail & Related papers (2021-10-14T07:59:27Z) - mT6: Multilingual Pretrained Text-to-Text Transformer with Translation
Pairs [51.67970832510462]
We improve multilingual text-to-text transfer Transformer with translation pairs (mT6)
We explore three cross-lingual text-to-text pre-training tasks, namely, machine translation, translation pair span corruption, and translation span corruption.
Experimental results show that the proposed mT6 improves cross-lingual transferability over mT5.
arXiv Detail & Related papers (2021-04-18T03:24:07Z) - mT5: A massively multilingual pre-trained text-to-text transformer [60.0210636815514]
"Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on English-language NLP tasks.
We introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages.
arXiv Detail & Related papers (2020-10-22T17:58:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.