Related papers: A Survey on Long Text Modeling with Transformers

A Survey on Long Text Modeling with Transformers

URL: http://arxiv.org/abs/2302.14502v1
Date: Tue, 28 Feb 2023 11:34:30 GMT
Title: A Survey on Long Text Modeling with Transformers
Authors: Zican Dong, Tianyi Tang, Lunyi Li and Wayne Xin Zhao
Abstract summary: We provide an overview of the recent advances on long texts modeling based on Transformer models. We discuss how to process long input to satisfy the length limitation and design improved Transformer architectures. We describe four typical applications involving long text modeling and conclude this paper with a discussion of future directions.
Score: 33.9069167068622
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modeling long texts has been an essential technique in the field of natural language processing (NLP). With the ever-growing number of long documents, it is important to develop effective modeling methods that can process and analyze such texts. However, long texts pose important research challenges for existing text models, with more complex semantics and special characteristics. In this paper, we provide an overview of the recent advances on long texts modeling based on Transformer models. Firstly, we introduce the formal definition of long text modeling. Then, as the core content, we discuss how to process long input to satisfy the length limitation and design improved Transformer architectures to effectively extend the maximum context length. Following this, we discuss how to adapt Transformer models to capture the special characteristics of long texts. Finally, we describe four typical applications involving long text modeling and conclude this paper with a discussion of future directions. Our survey intends to provide researchers with a synthesis and pointer to related work on long text modeling.

Related papers

LongEval: A Comprehensive Analysis of Long-Text Generation Through a Plan-based Paradigm [21.661578831520963]
Large Language Models (LLMs) have achieved remarkable success in various natural language processing tasks. Our analysis reveals that current LLMs struggle with length requirements and information density in long-text generation. We present LongEval, a benchmark that evaluates long-text generation through both direct and plan-based generation paradigms.
arXiv Detail & Related papers (2025-02-26T12:46:36Z)
Generalizing From Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning [103.65680870130839]
We investigate how to design instruction data for the post-training phase of a long context pre-trained model. Our controlled study reveals that models instruction-tuned on short contexts can effectively generalize to longer ones. Based on these findings, we propose context synthesis, a novel data synthesis framework.
arXiv Detail & Related papers (2025-02-21T17:02:40Z)
Enhancing Short-Text Topic Modeling with LLM-Driven Context Expansion and Prefix-Tuned VAEs [25.915607750636333]
We propose a novel approach that leverages large language models (LLMs) to extend short texts into more detailed sequences before applying topic modeling. Our method significantly improves short-text topic modeling performance, as demonstrated by extensive experiments on real-world datasets with extreme data sparsity.
arXiv Detail & Related papers (2024-10-04T01:28:56Z)
Summarizing long regulatory documents with a multi-step pipeline [2.2591852560804675]
We show that the effectiveness of a two-step architecture for summarizing long regulatory texts varies depending on the model used. For abstractive encoder-decoder models with short context lengths, the effectiveness of an extractive step varies, whereas for long-context encoder-decoder models, the extractive step worsens their performance.
arXiv Detail & Related papers (2024-08-19T08:07:25Z)
LongWanjuan: Towards Systematic Measurement for Long Text Quality [102.46517202896521]
LongWanjuan is a dataset specifically tailored to enhance the training of language models for long-text tasks with over 160B tokens. In LongWanjuan, we categorize long texts into holistic, aggregated, and chaotic types, enabling a detailed analysis of long-text quality. We devise a data mixture recipe that strategically balances different types of long texts within LongWanjuan, leading to significant improvements in model performance on long-text tasks.
arXiv Detail & Related papers (2024-02-21T07:27:18Z)
Prompting Large Language Models for Topic Modeling [10.31712610860913]
We propose PromptTopic, a novel topic modeling approach that harnesses the advanced language understanding of large language models (LLMs) It involves extracting topics at the sentence level from individual documents, then aggregating and condensing these topics into a predefined quantity, ultimately providing coherent topics for texts of varying lengths. We benchmark PromptTopic against the state-of-the-art baselines on three vastly diverse datasets, establishing its proficiency in discovering meaningful topics.
arXiv Detail & Related papers (2023-12-15T11:15:05Z)
Model Criticism for Long-Form Text Generation [113.13900836015122]
We apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of generated text. We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality. We find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.
arXiv Detail & Related papers (2022-10-16T04:35:58Z)
Adapting Pretrained Text-to-Text Models for Long Text Sequences [39.62224414485055]
We adapt an existing pretrained text-to-text model for long-sequence inputs. We build a long-context model that achieves competitive performance on long-text QA tasks.
arXiv Detail & Related papers (2022-09-21T00:41:07Z)
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text. Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions? We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z)
LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation [49.57366550980932]
Long text modeling requires many capabilities such as modeling long-range commonsense and discourse relations. We propose LOT, a benchmark including two understanding and two generation tasks for Chinese long text modeling evaluation. We release an encoder-decoder Chinese long text pretraining model named LongLM with up to 1 billion parameters.
arXiv Detail & Related papers (2021-08-30T02:38:32Z)
Pretrained Language Models for Text Generation: A Survey [46.03096493973206]
We present an overview of the major advances achieved in the topic of pretrained language models (PLMs) for text generation. We discuss how to adapt existing PLMs to model different input data and satisfy special properties in the generated text.
arXiv Detail & Related papers (2021-05-21T12:27:44Z)
Progressive Generation of Long Text with Pretrained Language Models [83.62523163717448]
Large-scale language models (LMs) pretrained on massive corpora of text, such as GPT-2, are powerful open-domain text generators. It is still challenging for such models to generate coherent long passages of text, especially when the models are fine-tuned to the target domain on a small corpus. We propose a simple but effective method of generating text in a progressive manner, inspired by generating images from low to high resolution.
arXiv Detail & Related papers (2020-06-28T21:23:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.