Improving Citation Text Generation: Overcoming Limitations in Length Control
- URL: http://arxiv.org/abs/2407.14997v1
- Date: Sat, 20 Jul 2024 22:10:37 GMT
- Title: Improving Citation Text Generation: Overcoming Limitations in Length Control
- Authors: Biswadip Mandal, Xiangci Li, Jessica Ouyang,
- Abstract summary: Key challenge in citation text generation is that the length of generated text often differs from the length of the target, lowering the quality of the generation.
In this work, we present an in-depth study of the limitations of predicting scientific citation text length and explore the use of estimates of desired length.
- Score: 10.555859097367286
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A key challenge in citation text generation is that the length of generated text often differs from the length of the target, lowering the quality of the generation. While prior works have investigated length-controlled generation, their effectiveness depends on knowing the appropriate generation length. In this work, we present an in-depth study of the limitations of predicting scientific citation text length and explore the use of heuristic estimates of desired length.
Related papers
- LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs [4.4965596747053]
Long-form text generation is critical for applications such as design proposals and creative writing.
New long-form text evaluation benchmark, LongGenBench, tests models' ability to identify specific events within generated long text sequences.
arXiv Detail & Related papers (2024-09-03T17:25:54Z) - Analysis of Plan-based Retrieval for Grounded Text Generation [78.89478272104739]
hallucinations occur when a language model is given a generation task outside its parametric knowledge.
A common strategy to address this limitation is to infuse the language models with retrieval mechanisms.
We analyze how planning can be used to guide retrieval to further reduce the frequency of hallucinations.
arXiv Detail & Related papers (2024-08-20T02:19:35Z) - LongLaMP: A Benchmark for Personalized Long-form Text Generation [87.41296912519992]
We develop the Long-text Language Model Personalization (LongLaMP) Benchmark.
LongLaMP provides a comprehensive and diverse evaluation framework for personalized long-text generation.
The results highlight the importance of personalization across a wide variety of long-text generation tasks.
arXiv Detail & Related papers (2024-06-27T01:52:05Z) - LongWanjuan: Towards Systematic Measurement for Long Text Quality [102.46517202896521]
LongWanjuan is a dataset specifically tailored to enhance the training of language models for long-text tasks with over 160B tokens.
In LongWanjuan, we categorize long texts into holistic, aggregated, and chaotic types, enabling a detailed analysis of long-text quality.
We devise a data mixture recipe that strategically balances different types of long texts within LongWanjuan, leading to significant improvements in model performance on long-text tasks.
arXiv Detail & Related papers (2024-02-21T07:27:18Z) - LongAlign: A Recipe for Long Context Alignment of Large Language Models [61.85923382850057]
LongAlign is a recipe of the instruction data, training, and evaluation for long context alignment.
We construct a long instruction-following dataset using Self-Instruct.
We adopt the packing and sorted strategies to speed up supervised fine-tuning on data with varied length distributions.
arXiv Detail & Related papers (2024-01-31T18:29:39Z) - LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition [27.280917081410955]
We propose a method called Length-Insensitive Scene TExt Recognizer (LISTER)
A Neighbor Decoder is proposed to obtain accurate character attention maps with the assistance of a novel neighbor matrix.
A Feature Enhancement Module is devised to model the long-range dependency with low cost.
arXiv Detail & Related papers (2023-08-24T13:26:18Z) - Summarization with Precise Length Control [23.688834410051]
We present a framework to generate summaries with precisely the specified number of tokens or sentences.
We jointly train the models to predict the lengths, so our model can generate summaries with optimal length.
arXiv Detail & Related papers (2023-05-09T04:45:24Z) - Sequentially Controlled Text Generation [97.22539956688443]
GPT-2 generates sentences that are remarkably human-like, longer documents can ramble and do not follow human-like writing structure.
We study the problem of imposing structure on long-range text.
We develop a sequential controlled text generation pipeline with generation and editing.
arXiv Detail & Related papers (2023-01-05T21:23:51Z) - A Survey on Retrieval-Augmented Text Generation [53.04991859796971]
Retrieval-augmented text generation has remarkable advantages and has achieved state-of-the-art performance in many NLP tasks.
It firstly highlights the generic paradigm of retrieval-augmented generation, and then it reviews notable approaches according to different tasks.
arXiv Detail & Related papers (2022-02-02T16:18:41Z) - LenAtten: An Effective Length Controlling Unit For Text Summarization [5.554982420311913]
Fixed length summarization aims at generating summaries with a preset number of words or characters.
Most recent researches incorporate length information with word embeddings as the input to the recurrent decoding unit.
We present an effective length controlling unit Length Attention (LenAtten) to break this trade-off.
arXiv Detail & Related papers (2021-06-01T08:45:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.