Summarization is (Almost) Dead
- URL: http://arxiv.org/abs/2309.09558v1
- Date: Mon, 18 Sep 2023 08:13:01 GMT
- Title: Summarization is (Almost) Dead
- Authors: Xiao Pu, Mingqi Gao, Xiaojun Wan
- Abstract summary: We develop new datasets and conduct human evaluation experiments to evaluate the zero-shot generation capability of large language models (LLMs)
Our findings indicate a clear preference among human evaluators for LLM-generated summaries over human-written summaries and summaries generated by fine-tuned models.
- Score: 49.360752383801305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How well can large language models (LLMs) generate summaries? We develop new
datasets and conduct human evaluation experiments to evaluate the zero-shot
generation capability of LLMs across five distinct summarization tasks. Our
findings indicate a clear preference among human evaluators for LLM-generated
summaries over human-written summaries and summaries generated by fine-tuned
models. Specifically, LLM-generated summaries exhibit better factual
consistency and fewer instances of extrinsic hallucinations. Due to the
satisfactory performance of LLMs in summarization tasks (even surpassing the
benchmark of reference summaries), we believe that most conventional works in
the field of text summarization are no longer necessary in the era of LLMs.
However, we recognize that there are still some directions worth exploring,
such as the creation of novel datasets with higher quality and more reliable
evaluation methods.
Related papers
- Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback.
Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z) - LaMSUM: Creating Extractive Summaries of User Generated Content using LLMs [6.770555526416268]
Large Language Models (LLMs) have demonstrated impressive performance across a wide range of NLP tasks, including summarization.
We introduce LaMSUM, a novel framework designed to generate extractive summaries from large collections of user-generated text.
arXiv Detail & Related papers (2024-06-22T10:25:55Z) - Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing [37.400757839157116]
Large Language Models (LLMs) have achieved state-of-the-art performance at zero-shot generation of abstractive summaries for given articles.
We propose relevance paraphrasing, a simple strategy that can be used to measure the robustness of LLMs as summarizers.
arXiv Detail & Related papers (2024-06-06T12:08:43Z) - Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization [132.25202059478065]
We benchmark large language models (LLMs) on instruction controllable text summarization.
Our study reveals that instruction controllable text summarization remains a challenging task for LLMs.
arXiv Detail & Related papers (2023-11-15T18:25:26Z) - BooookScore: A systematic exploration of book-length summarization in the era of LLMs [53.42917858142565]
We develop an automatic metric, BooookScore, that measures the proportion of sentences in a summary that do not contain any of the identified error types.
We find that closed-source LLMs such as GPT-4 and 2 produce summaries with higher BooookScore than those generated by open-source models.
arXiv Detail & Related papers (2023-10-01T20:46:44Z) - On Learning to Summarize with Large Language Models as References [101.79795027550959]
Large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets.
We study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved.
arXiv Detail & Related papers (2023-05-23T16:56:04Z) - Element-aware Summarization with Large Language Models: Expert-aligned
Evaluation and Chain-of-Thought Method [35.181659789684545]
Automatic summarization generates concise summaries that contain key ideas of source documents.
References from CNN/DailyMail and BBC XSum are noisy, mainly in terms of factual hallucination and information redundancy.
We propose a Summary Chain-of-Thought (SumCoT) technique to elicit LLMs to generate summaries step by step.
Experimental results show our method outperforms state-of-the-art fine-tuned PLMs and zero-shot LLMs by +4.33/+4.77 in ROUGE-L.
arXiv Detail & Related papers (2023-05-22T18:54:35Z) - Benchmarking Large Language Models for News Summarization [79.37850439866938]
Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood.
We find instruction tuning, and not model size, is the key to the LLM's zero-shot summarization capability.
arXiv Detail & Related papers (2023-01-31T18:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.