PreSumm: Predicting Summarization Performance Without Summarizing
- URL: http://arxiv.org/abs/2504.05420v1
- Date: Mon, 07 Apr 2025 18:43:00 GMT
- Title: PreSumm: Predicting Summarization Performance Without Summarizing
- Authors: Steven Koniaev, Ori Ernst, Jackie Chi Kit Cheung,
- Abstract summary: We introduce PreSumm, a novel task in which a system predicts summarization performance based solely on the source document.<n>Our analysis sheds light on common properties of documents with low PreSumm scores, revealing that they often suffer from coherence issues, complex content, or a lack of a clear main theme.
- Score: 20.149416378181872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent advancements in automatic summarization, state-of-the-art models do not summarize all documents equally well, raising the question: why? While prior research has extensively analyzed summarization models, little attention has been given to the role of document characteristics in influencing summarization performance. In this work, we explore two key research questions. First, do documents exhibit consistent summarization quality across multiple systems? If so, can we predict a document's summarization performance without generating a summary? We answer both questions affirmatively and introduce PreSumm, a novel task in which a system predicts summarization performance based solely on the source document. Our analysis sheds light on common properties of documents with low PreSumm scores, revealing that they often suffer from coherence issues, complex content, or a lack of a clear main theme. In addition, we demonstrate PreSumm's practical utility in two key applications: improving hybrid summarization workflows by identifying documents that require manual summarization and enhancing dataset quality by filtering outliers and noisy documents. Overall, our findings highlight the critical role of document properties in summarization performance and offer insights into the limitations of current systems that could serve as the basis for future improvements.
Related papers
- Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization [48.57273563299046]
We propose the task of Stepwise Summarization, which aims to generate a new appended summary each time a new document is proposed.
The appended summary should not only summarize the newly added content but also be coherent with the previous summary.
We show that SSG achieves state-of-the-art performance in terms of both automatic metrics and human evaluations.
arXiv Detail & Related papers (2024-06-08T05:37:26Z) - Thesis: Document Summarization with applications to Keyword extraction and Image Retrieval [0.0]
We propose a set of submodular functions for opinion summarization.
Opinion summarization has built in it the tasks of summarization and sentiment detection.
Our functions generate summaries such as there is good correlation between document sentiment and summary sentiment along with good ROUGE score.
arXiv Detail & Related papers (2024-05-20T21:27:18Z) - Bias in News Summarization: Measures, Pitfalls and Corpora [4.917075909999548]
We introduce definitions for biased behaviours in summarization models, along with practical operationalizations.
We measure gender bias in English summaries generated by both purpose-built summarization models and general purpose chat models.
We find content selection in single document summarization to be largely unaffected by gender bias, while hallucinations exhibit evidence of bias.
arXiv Detail & Related papers (2023-09-14T22:20:27Z) - QuOTeS: Query-Oriented Technical Summarization [0.2936007114555107]
We propose QuOTeS, an interactive system designed to retrieve sentences related to a summary of the research from a collection of potential references.
QuOTeS integrates techniques from Query-Focused Extractive Summarization and High-Recall Information Retrieval to provide Interactive Query-Focused Summarization of scientific documents.
The results show that QuOTeS provides a positive user experience and consistently provides query-focused summaries that are relevant, concise, and complete.
arXiv Detail & Related papers (2023-06-20T18:43:24Z) - UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot
Summarization [54.59104881168188]
textscUniSumm is a unified few-shot summarization model pre-trained with multiple summarization tasks.
textscSummZoo is a new benchmark to better evaluate few-shot summarizers.
arXiv Detail & Related papers (2022-11-17T18:54:47Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Leveraging Information Bottleneck for Scientific Document Summarization [26.214930773343887]
This paper presents an unsupervised extractive approach to summarize scientific long documents.
Inspired by previous work which uses the Information Bottleneck principle for sentence compression, we extend it to document level summarization.
arXiv Detail & Related papers (2021-10-04T09:43:47Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - Summaformers @ LaySumm 20, LongSumm 20 [14.44754831438127]
In this paper, we look at the problem of summarizing scientific research papers from multiple domains.
We differentiate between two types of summaries, namely, LaySumm and LongSumm.
While leveraging latest Transformer-based models, our systems are simple, intuitive and based on how specific paper sections contribute to human summaries.
arXiv Detail & Related papers (2021-01-10T13:48:12Z) - Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised
Approach [89.56158561087209]
We study summarizing on arbitrary aspects relevant to the document.
Due to the lack of supervision data, we develop a new weak supervision construction method and an aspect modeling scheme.
Experiments show our approach achieves performance boosts on summarizing both real and synthetic documents.
arXiv Detail & Related papers (2020-10-14T03:20:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.