Discovering Useful Sentence Representations from Large Pretrained
Language Models
- URL: http://arxiv.org/abs/2008.09049v1
- Date: Thu, 20 Aug 2020 16:03:51 GMT
- Title: Discovering Useful Sentence Representations from Large Pretrained
Language Models
- Authors: Nishant Subramani and Nivedita Suresh
- Abstract summary: We explore the question of whether pretrained language models can be adapted to be used as universal decoders.
For large transformer-based language models trained on vast amounts of English text, we investigate whether such representations can be easily discovered.
We present and compare three representation injection techniques for transformer-based models and three accompanying methods which map sentences to and from this representation space.
- Score: 8.212920842986689
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the extensive success of pretrained language models as encoders for
building NLP systems, they haven't seen prominence as decoders for sequence
generation tasks. We explore the question of whether these models can be
adapted to be used as universal decoders. To be considered "universal," a
decoder must have an implicit representation for any target sentence $s$, such
that it can recover that sentence exactly when conditioned on its
representation. For large transformer-based language models trained on vast
amounts of English text, we investigate whether such representations can be
easily discovered using standard optimization methods. We present and compare
three representation injection techniques for transformer-based models and
three accompanying methods which map sentences to and from this representation
space. Experiments show that not only do representations exist for sentences
from a variety of genres. More importantly, without needing complex
optimization algorithms, our methods recover these sentences almost perfectly
without fine-tuning the underlying language model at all.
Related papers
- Understanding and Mitigating Tokenization Bias in Language Models [6.418593476658017]
State-of-the-art language models are autoregressive and operate on subword units known as tokens.
We show that popular encoding schemes induce a sampling bias that cannot be mitigated with more training or data.
We propose a novel algorithm to obtain unbiased estimates from any language model trained on tokenized data.
arXiv Detail & Related papers (2024-06-24T17:38:02Z) - Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval [55.90407811819347]
We consider the task of paraphrased text-to-image retrieval where a model aims to return similar results given a pair of paraphrased queries.
We train a dual-encoder model starting from a language model pretrained on a large text corpus.
Compared to public dual-encoder models such as CLIP and OpenCLIP, the model trained with our best adaptation strategy achieves a significantly higher ranking similarity for paraphrased queries.
arXiv Detail & Related papers (2024-05-06T06:30:17Z) - Sentence Embedding Leaks More Information than You Expect: Generative
Embedding Inversion Attack to Recover the Whole Sentence [37.63047048491312]
We propose a generative embedding inversion attack (GEIA) that aims to reconstruct input sequences based only on their sentence embeddings.
Given the black-box access to a language model, we treat sentence embeddings as initial tokens' representations and train or fine-tune a powerful decoder model to decode the whole sequences directly.
arXiv Detail & Related papers (2023-05-04T17:31:41Z) - Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models.
Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z) - Few-Shot Semantic Parsing with Language Models Trained On Code [52.23355024995237]
We find that Codex performs better at semantic parsing than equivalent GPT-3 models.
We find that unlike GPT-3, Codex performs similarly when targeting meaning representations directly, perhaps as meaning representations used in semantic parsing are structured similar to code.
arXiv Detail & Related papers (2021-12-16T08:34:06Z) - Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model.
We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder.
We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z) - Discrete Cosine Transform as Universal Sentence Encoder [10.355894890759377]
We use Discrete Cosine Transform (DCT) to generate universal sentence representation for different languages.
The experimental results clearly show the superior effectiveness of DCT encoding.
arXiv Detail & Related papers (2021-06-02T04:43:54Z) - Constrained Language Models Yield Few-Shot Semantic Parsers [73.50960967598654]
We explore the use of large pretrained language models as few-shot semantics.
The goal in semantic parsing is to generate a structured meaning representation given a natural language input.
We use language models to paraphrase inputs into a controlled sublanguage resembling English that can be automatically mapped to a target meaning representation.
arXiv Detail & Related papers (2021-04-18T08:13:06Z) - Learning Universal Representations from Word to Sentence [89.82415322763475]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present our approach of constructing analogy datasets in terms of words, phrases and sentences.
We empirically verify that well pre-trained Transformer models incorporated with appropriate training settings may effectively yield universal representation.
arXiv Detail & Related papers (2020-09-10T03:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.