LongLaMP: A Benchmark for Personalized Long-form Text Generation
- URL: http://arxiv.org/abs/2407.11016v3
- Date: Tue, 15 Oct 2024 03:24:47 GMT
- Title: LongLaMP: A Benchmark for Personalized Long-form Text Generation
- Authors: Ishita Kumar, Snigdha Viswanathan, Sushrita Yerra, Alireza Salemi, Ryan A. Rossi, Franck Dernoncourt, Hanieh Deilamsalehy, Xiang Chen, Ruiyi Zhang, Shubham Agarwal, Nedim Lipka, Chien Van Nguyen, Thien Huu Nguyen, Hamed Zamani,
- Abstract summary: We develop the Long-text Language Model Personalization (LongLaMP) Benchmark.
LongLaMP provides a comprehensive and diverse evaluation framework for personalized long-text generation.
The results highlight the importance of personalization across a wide variety of long-text generation tasks.
- Score: 87.41296912519992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-text generation is seemingly ubiquitous in real-world applications of large language models such as generating an email or writing a review. Despite the fundamental importance and prevalence of long-text generation in many practical applications, existing work on personalized generation has focused on the generation of very short text. To overcome these limitations, we study the problem of personalized long-text generation, that is, generating long-text that is personalized for a specific user while being practically useful for the vast majority of real-world applications that naturally require the generation of longer text. In this work, we demonstrate the importance of user-specific personalization for long-text generation tasks and develop the Long-text Language Model Personalization (LongLaMP) Benchmark. LongLaMP provides a comprehensive and diverse evaluation framework for personalized long-text generation. Extensive experiments on LongLaMP for zero-shot and fine-tuned language tasks demonstrate the effectiveness of the proposed benchmark and its utility for developing and evaluating techniques for personalized long-text generation across a wide variety of long-text generation tasks. The results highlight the importance of personalization across a wide variety of long-text generation tasks. Finally, we release the benchmark for others to use for this important problem.
Related papers
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models [89.28591263741973]
We introduce the Hierarchical Long Text Generation Benchmark (HelloBench) to evaluate Large Language Models' performance in generating long text.
Based on Bloom's taxonomy, HelloBench categorizes long text generation tasks into five subtasks: open-ended QA, summarization, chat, text completion, and text generation.
Besides, we propose Hierarchical Long Text Evaluation (HelloEval), a human evaluation method that significantly reduces the time and effort required for human evaluation.
arXiv Detail & Related papers (2024-09-24T15:38:11Z) - LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs [4.4965596747053]
Long-form text generation is critical for applications such as design proposals and creative writing.
New long-form text evaluation benchmark, LongGenBench, tests models' ability to identify specific events within generated long text sequences.
arXiv Detail & Related papers (2024-09-03T17:25:54Z) - Improving Citation Text Generation: Overcoming Limitations in Length Control [10.555859097367286]
Key challenge in citation text generation is that the length of generated text often differs from the length of the target, lowering the quality of the generation.
In this work, we present an in-depth study of the limitations of predicting scientific citation text length and explore the use of estimates of desired length.
arXiv Detail & Related papers (2024-07-20T22:10:37Z) - LongWanjuan: Towards Systematic Measurement for Long Text Quality [102.46517202896521]
LongWanjuan is a dataset specifically tailored to enhance the training of language models for long-text tasks with over 160B tokens.
In LongWanjuan, we categorize long texts into holistic, aggregated, and chaotic types, enabling a detailed analysis of long-text quality.
We devise a data mixture recipe that strategically balances different types of long texts within LongWanjuan, leading to significant improvements in model performance on long-text tasks.
arXiv Detail & Related papers (2024-02-21T07:27:18Z) - A Survey on Retrieval-Augmented Text Generation [53.04991859796971]
Retrieval-augmented text generation has remarkable advantages and has achieved state-of-the-art performance in many NLP tasks.
It firstly highlights the generic paradigm of retrieval-augmented generation, and then it reviews notable approaches according to different tasks.
arXiv Detail & Related papers (2022-02-02T16:18:41Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - LOT: A Benchmark for Evaluating Chinese Long Text Understanding and
Generation [49.57366550980932]
Long text modeling requires many capabilities such as modeling long-range commonsense and discourse relations.
We propose LOT, a benchmark including two understanding and two generation tasks for Chinese long text modeling evaluation.
We release an encoder-decoder Chinese long text pretraining model named LongLM with up to 1 billion parameters.
arXiv Detail & Related papers (2021-08-30T02:38:32Z) - Pretrained Language Models for Text Generation: A Survey [46.03096493973206]
We present an overview of the major advances achieved in the topic of pretrained language models (PLMs) for text generation.
We discuss how to adapt existing PLMs to model different input data and satisfy special properties in the generated text.
arXiv Detail & Related papers (2021-05-21T12:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.