Efficient Long-Text Understanding with Short-Text Models
- URL: http://arxiv.org/abs/2208.00748v1
- Date: Mon, 1 Aug 2022 11:14:39 GMT
- Title: Efficient Long-Text Understanding with Short-Text Models
- Authors: Maor Ivgi, Uri Shaham, Jonathan Berant
- Abstract summary: SLED is a simple approach for processing long sequences that re-uses and leverages battle-tested short-text pretrained LMs.
We partition the input into overlapping chunks, encode each with a short-text LM encoder and use the pretrained decoder to fuse information across chunks.
We find that SLED is competitive with specialized models that are up to 50x larger and require a dedicated and expensive pretraining step.
- Score: 38.8375175429553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based pretrained language models (LMs) are ubiquitous across
natural language understanding, but cannot be applied to long sequences such as
stories, scientific articles and long documents, due to their quadratic
complexity. While a myriad of efficient transformer variants have been
proposed, they are typically based on custom implementations that require
expensive pretraining from scratch. In this work, we propose SLED:
SLiding-Encoder and Decoder, a simple approach for processing long sequences
that re-uses and leverages battle-tested short-text pretrained LMs.
Specifically, we partition the input into overlapping chunks, encode each with
a short-text LM encoder and use the pretrained decoder to fuse information
across chunks (fusion-in-decoder). We illustrate through controlled experiments
that SLED offers a viable strategy for long text understanding and evaluate our
approach on SCROLLS, a benchmark with seven datasets across a wide range of
language understanding tasks. We find that SLED is competitive with specialized
models that are up to 50x larger and require a dedicated and expensive
pretraining step.
Related papers
- E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning [20.660297311025417]
We introduce E2LLM (Encodergated Large Language Models), a novel approach that effectively navigates the "impossible triangle"
The method involves splitting long contexts into chunks, compressing each into embedding vectors via a pretrained text encoder, and utilizing an adapter to align these representations with a decoder-only LLM.
Experimental results demonstrate that E2LLM achieves superior performance in long-context scenarios while balancing efficiency, performance, and compatibility with pretrained models.
arXiv Detail & Related papers (2024-09-10T17:44:35Z) - Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer [1.728027753702854]
Large Language Models (LLMs) with long context capabilities are integral to complex tasks in natural language processing and computational biology.
We propose Fully Pipelined Distributed Transformer (FPDT) for efficiently training long-context LLMs.
For GPT and Llama models, we achieve a 16x increase in sequence length that can be trained on the same hardware.
arXiv Detail & Related papers (2024-08-30T02:44:26Z) - mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval [67.50604814528553]
We first introduce a text encoder enhanced with RoPE and unpadding, pre-trained in a native 8192-token context.
Then we construct a hybrid TRM and a cross-encoder reranker by contrastive learning.
arXiv Detail & Related papers (2024-07-29T03:12:28Z) - Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models [83.98062659664785]
Large language models (LLMs) typically train on short text segments (e.g., 4K tokens) due to the quadratic complexity of their Transformer architectures.
This work identifies three major factors contributing to this length generalization failure.
We propose LM-Infinite, a simple and effective method for enhancing LLMs' capabilities of handling long contexts.
arXiv Detail & Related papers (2023-08-30T16:47:51Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - ConvFiT: Conversational Fine-Tuning of Pretrained Language Models [42.7160113690317]
Transformer-based language models (LMs) pretrained on large text collections are proven to store a wealth of semantic knowledge.
We propose ConvFiT, a simple and efficient two-stage procedure which turns any pretrained LM into a universal conversational encoder.
arXiv Detail & Related papers (2021-09-21T12:16:56Z) - DeltaLM: Encoder-Decoder Pre-training for Language Generation and
Translation by Augmenting Pretrained Multilingual Encoders [92.90543340071007]
We introduce DeltaLM, a pretrained multilingual encoder-decoder model.
Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way.
Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks.
arXiv Detail & Related papers (2021-06-25T16:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.