Related papers: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

URL: http://arxiv.org/abs/2108.08877v1
Date: Thu, 19 Aug 2021 18:58:02 GMT
Title: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
Authors: Jianmo Ni, Gustavo Hern\'andez {\'A}brego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, Yinfei Yang
Abstract summary: We provide the first exploration of text-to-text transformers (T5) sentence embeddings. We investigate three methods for extracting T5 sentence embeddings. Our encoder-only models outperforms BERT-based sentence embeddings on both transfer tasks and semantic textual similarity.
Score: 10.645591218689058
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We provide the first exploration of text-to-text transformers (T5) sentence embeddings. Sentence embeddings are broadly useful for language processing tasks. While T5 achieves impressive performance on language tasks cast as sequence-to-sequence mapping problems, it is unclear how to produce sentence embeddings from encoder-decoder models. We investigate three methods for extracting T5 sentence embeddings: two utilize only the T5 encoder and one uses the full T5 encoder-decoder model. Our encoder-only models outperforms BERT-based sentence embeddings on both transfer tasks and semantic textual similarity (STS). Our encoder-decoder method achieves further improvement on STS. Scaling up T5 from millions to billions of parameters is found to produce consistent improvements on downstream tasks. Finally, we introduce a two-stage contrastive learning approach that achieves a new state-of-art on STS using sentence embeddings, outperforming both Sentence BERT and SimCSE.

Related papers

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models [50.46453950887946]
This work introduces MrT5 (MergeT5), a more efficient variant of ByT5. MrT5 integrates a token deletion mechanism in its encoder to dynamically shorten the input sequence length. When trained on English text, MrT5 demonstrates the capability to transfer its deletion feature zero-shot across several languages.
arXiv Detail & Related papers (2024-10-28T06:14:12Z)
Code-Switching Text Generation and Injection in Mandarin-English ASR [57.57570417273262]
We investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T) We first propose a strategy to generate code-switching text data and then investigate injecting generated text into T-T model explicitly by Text-To-Speech (TTS) conversion or implicitly by tying speech and text latent spaces. Experimental results on the T-T model trained with a dataset containing 1,800 hours of real Mandarin-English code-switched speech show that our approaches to inject generated code-switching text significantly boost the performance of T-T models.
arXiv Detail & Related papers (2023-03-20T09:13:27Z)
Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing [56.232873134174056]
One of the major challenges in text-to-text parsing is domain generalization, i.e., how to well generalize to unseen databases. In this work, we explore ways to further augment the pre-trained text-to-text transformer model with specialized components for text-to-text parsing. To this end, we propose a new architecture GRAPHIX-T5, augmented by some specially-designed graph-aware model with layers.
arXiv Detail & Related papers (2023-01-18T13:29:05Z)
Evaluating Byte and Wordpiece Level Models for Massively Multilingual Semantic Parsing [3.431659287330068]
We compare a byte-level (ByT5) and a wordpiece based (mT5) sequence to sequence model on the 51 languages of the MASSIVE multilingual semantic parsing dataset. We are able to reduce the gap in exact match accuracy to only 5 points with respect to a model trained on gold data from all the languages.
arXiv Detail & Related papers (2022-12-14T13:48:32Z)
EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start [21.4394742421462]
EdiT5 is a novel semi-autoregressive text-editing approach. It combines the strengths of non-autoregressive text-editing and autoregressive decoding.
arXiv Detail & Related papers (2022-05-24T17:13:22Z)
TransAug: Translate as Augmentation for Sentence Embeddings [8.89078869712101]
We present TransAug (Translate as Augmentation), which provide the first exploration of utilizing translated sentence pairs as data augmentation for text. Instead of adopting an encoder trained in other languages setting, we first distill a Chinese encoder from a SimCSE encoder (pretrained in English), so that their embeddings are close in semantic space, which can be regraded as implicit data augmentation. Our approach achieves a new state-of-art on standard semantic textual similarity (STS), outperforming both SimCSE and Sentence-T5, and the best performance in corresponding tracks on transfer tasks evaluated by SentEval.
arXiv Detail & Related papers (2021-10-30T03:13:28Z)
EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks [9.141586109808895]
We study fine-tuning pre-trained encoder-decoder models such as T5. Our experimental results show that textbfEncT5 with less than half of the parameters of T5 performs similarly to T5 models on GLUE benchmark.
arXiv Detail & Related papers (2021-10-16T00:50:08Z)
SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing [77.4527868307914]
We propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning. The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. To align the textual and speech information into a unified semantic space, we propose a cross-modal vector quantization method with random mixing-up to bridge speech and text.
arXiv Detail & Related papers (2021-10-14T07:59:27Z)
mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs [51.67970832510462]
We improve multilingual text-to-text transfer Transformer with translation pairs (mT6) We explore three cross-lingual text-to-text pre-training tasks, namely, machine translation, translation pair span corruption, and translation span corruption. Experimental results show that the proposed mT6 improves cross-lingual transferability over mT5.
arXiv Detail & Related papers (2021-04-18T03:24:07Z)
mT5: A massively multilingual pre-trained text-to-text transformer [60.0210636815514]
"Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on English-language NLP tasks. We introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages.
arXiv Detail & Related papers (2020-10-22T17:58:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.