Efficient Long-Text Understanding with Short-Text Models
- URL: http://arxiv.org/abs/2208.00748v1
- Date: Mon, 1 Aug 2022 11:14:39 GMT
- Title: Efficient Long-Text Understanding with Short-Text Models
- Authors: Maor Ivgi, Uri Shaham, Jonathan Berant
- Abstract summary: SLED is a simple approach for processing long sequences that re-uses and leverages battle-tested short-text pretrained LMs.
We partition the input into overlapping chunks, encode each with a short-text LM encoder and use the pretrained decoder to fuse information across chunks.
We find that SLED is competitive with specialized models that are up to 50x larger and require a dedicated and expensive pretraining step.
- Score: 38.8375175429553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based pretrained language models (LMs) are ubiquitous across
natural language understanding, but cannot be applied to long sequences such as
stories, scientific articles and long documents, due to their quadratic
complexity. While a myriad of efficient transformer variants have been
proposed, they are typically based on custom implementations that require
expensive pretraining from scratch. In this work, we propose SLED:
SLiding-Encoder and Decoder, a simple approach for processing long sequences
that re-uses and leverages battle-tested short-text pretrained LMs.
Specifically, we partition the input into overlapping chunks, encode each with
a short-text LM encoder and use the pretrained decoder to fuse information
across chunks (fusion-in-decoder). We illustrate through controlled experiments
that SLED offers a viable strategy for long text understanding and evaluate our
approach on SCROLLS, a benchmark with seven datasets across a wide range of
language understanding tasks. We find that SLED is competitive with specialized
models that are up to 50x larger and require a dedicated and expensive
pretraining step.
Related papers
- Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - Extending Context Window of Large Language Models via Semantic
Compression [21.35020344956721]
Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses.
We propose a novel semantic compression method that enables generalization to texts 6-8 times longer, without incurring significant computational costs or requiring fine-tuning.
arXiv Detail & Related papers (2023-12-15T07:04:33Z) - LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models [83.98062659664785]
Large language models (LLMs) typically train on short text segments (e.g., 4K tokens) due to the quadratic complexity of their Transformer architectures.
This work identifies three major factors contributing to this length generalization failure.
We propose LM-Infinite, a simple and effective method for enhancing LLMs' capabilities of handling long contexts.
arXiv Detail & Related papers (2023-08-30T16:47:51Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - ConvFiT: Conversational Fine-Tuning of Pretrained Language Models [42.7160113690317]
Transformer-based language models (LMs) pretrained on large text collections are proven to store a wealth of semantic knowledge.
We propose ConvFiT, a simple and efficient two-stage procedure which turns any pretrained LM into a universal conversational encoder.
arXiv Detail & Related papers (2021-09-21T12:16:56Z) - Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model.
We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder.
We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z) - DeltaLM: Encoder-Decoder Pre-training for Language Generation and
Translation by Augmenting Pretrained Multilingual Encoders [92.90543340071007]
We introduce DeltaLM, a pretrained multilingual encoder-decoder model.
Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way.
Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks.
arXiv Detail & Related papers (2021-06-25T16:12:10Z) - Efficient Wait-k Models for Simultaneous Machine Translation [46.01342928010307]
Simultaneous machine translation consists in starting output generation before the entire input sequence is available.
Wait-k decoders offer a simple but efficient approach for this problem.
We investigate the behavior of wait-k decoding in low resource settings for spoken corpora using IWSLT datasets.
arXiv Detail & Related papers (2020-05-18T11:14:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.