Adapting Pretrained Text-to-Text Models for Long Text Sequences
- URL: http://arxiv.org/abs/2209.10052v1
- Date: Wed, 21 Sep 2022 00:41:07 GMT
- Title: Adapting Pretrained Text-to-Text Models for Long Text Sequences
- Authors: Wenhan Xiong, Anchit Gupta, Shubham Toshniwal, Yashar Mehdad, Wen-tau
Yih
- Abstract summary: We adapt an existing pretrained text-to-text model for long-sequence inputs.
We build a long-context model that achieves competitive performance on long-text QA tasks.
- Score: 39.62224414485055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present an empirical study of adapting an existing pretrained text-to-text
model for long-sequence inputs. Through a comprehensive study along three axes
of the pretraining pipeline -- model architecture, optimization objective, and
pretraining corpus, we propose an effective recipe to build long-context models
from existing short-context models. Specifically, we replace the full attention
in transformers with pooling-augmented blockwise attention, and pretrain the
model with a masked-span prediction task with spans of varying length. In terms
of the pretraining corpus, we find that using randomly concatenated
short-documents from a large open-domain corpus results in better performance
than using existing long document corpora which are typically limited in their
domain coverage. With these findings, we build a long-context model that
achieves competitive performance on long-text QA tasks and establishes the new
state of the art on five long-text summarization datasets, often outperforming
previous methods with larger model sizes.
Related papers
- A Novel LLM-based Two-stage Summarization Approach for Long Dialogues [9.835499880812646]
This study proposes a hierarchical framework that segments and condenses information from long documents.
The condensation stage utilizes an unsupervised generation model to generate condensed data.
The summarization stage fine-tunes the abstractive summarization model on the condensed data to generate the final results.
arXiv Detail & Related papers (2024-10-09T03:42:40Z) - Summarizing long regulatory documents with a multi-step pipeline [2.2591852560804675]
We show that the effectiveness of a two-step architecture for summarizing long regulatory texts varies depending on the model used.
For abstractive encoder-decoder models with short context lengths, the effectiveness of an extractive step varies, whereas for long-context encoder-decoder models, the extractive step worsens their performance.
arXiv Detail & Related papers (2024-08-19T08:07:25Z) - LongWanjuan: Towards Systematic Measurement for Long Text Quality [102.46517202896521]
LongWanjuan is a dataset specifically tailored to enhance the training of language models for long-text tasks with over 160B tokens.
In LongWanjuan, we categorize long texts into holistic, aggregated, and chaotic types, enabling a detailed analysis of long-text quality.
We devise a data mixture recipe that strategically balances different types of long texts within LongWanjuan, leading to significant improvements in model performance on long-text tasks.
arXiv Detail & Related papers (2024-02-21T07:27:18Z) - Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens.
Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z) - A Survey on Long Text Modeling with Transformers [33.9069167068622]
We provide an overview of the recent advances on long texts modeling based on Transformer models.
We discuss how to process long input to satisfy the length limitation and design improved Transformer architectures.
We describe four typical applications involving long text modeling and conclude this paper with a discussion of future directions.
arXiv Detail & Related papers (2023-02-28T11:34:30Z) - Retrieve-and-Fill for Scenario-based Task-Oriented Semantic Parsing [110.4684789199555]
We introduce scenario-based semantic parsing: a variant of the original task which first requires disambiguating an utterance's "scenario"
This formulation enables us to isolate coarse-grained and fine-grained aspects of the task, each of which we solve with off-the-shelf neural modules.
Our model is modular, differentiable, interpretable, and allows us to garner extra supervision from scenarios.
arXiv Detail & Related papers (2022-02-02T08:00:21Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical
Encoder for Long-Form Document Matching [28.190001111358438]
We propose a Siamese Multi-depth Transformer-based SMITH for long-form document matching.
Our model contains several innovations to adapt self-attention models for longer text input.
We will open source a Wikipedia based benchmark dataset, code and a pre-trained checkpoint to accelerate future research on long-form document matching.
arXiv Detail & Related papers (2020-04-26T07:04:08Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.