An Automated Length-Aware Quality Metric for Summarization
- URL: http://arxiv.org/abs/2507.07653v1
- Date: Thu, 10 Jul 2025 11:25:16 GMT
- Title: An Automated Length-Aware Quality Metric for Summarization
- Authors: Andrew D. Foland,
- Abstract summary: This paper proposes NOrmed Index of Retention (NOIR), a quantitative objective metric for evaluating summarization quality of arbitrary texts.<n>Experiments demonstrate NOIR effectively captures the token-length / semantic retention of a summarizer and correlates to human perception of sumarization quality.<n>The proposed metric can be applied to various summarization tasks, offering an automated tool for evaluating and improving summarization algorithms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes NOrmed Index of Retention (NOIR), a quantitative objective metric for evaluating summarization quality of arbitrary texts that relies on both the retention of semantic meaning and the summary length compression. This gives a measure of how well the recall-compression tradeoff is managed, the most important skill in summarization. Experiments demonstrate that NOIR effectively captures the token-length / semantic retention tradeoff of a summarizer and correlates to human perception of sumarization quality. Using a language model-embedding to measure semantic similarity, it provides an automated alternative for assessing summarization quality without relying on time-consuming human-generated reference summaries. The proposed metric can be applied to various summarization tasks, offering an automated tool for evaluating and improving summarization algorithms, summarization prompts, and synthetically-generated summaries.
Related papers
- Evaluating Code Summarization Techniques: A New Metric and an Empirical
Characterization [16.127739014966487]
We investigate the complementarity of different types of metrics in capturing the quality of a generated summary.
We present a new metric based on contrastive learning to capture said aspect.
arXiv Detail & Related papers (2023-12-24T13:12:39Z) - Is Summary Useful or Not? An Extrinsic Human Evaluation of Text
Summaries on Downstream Tasks [45.550554287918885]
This paper focuses on evaluating the usefulness of text summaries with extrinsic methods.
We design three different downstream tasks for extrinsic human evaluation of summaries, i.e., question answering, text classification and text similarity assessment.
We find summaries are particularly useful in tasks that rely on an overall judgment of the text, while being less effective for question answering tasks.
arXiv Detail & Related papers (2023-05-24T11:34:39Z) - Towards Interpretable and Efficient Automatic Reference-Based
Summarization Evaluation [160.07938471250048]
Interpretability and efficiency are two important considerations for the adoption of neural automatic metrics.
We develop strong-performing automatic metrics for reference-based summarization evaluation.
arXiv Detail & Related papers (2023-03-07T02:49:50Z) - Towards Interpretable Summary Evaluation via Allocation of Contextual
Embeddings to Reference Text Topics [1.5749416770494706]
The multifaceted interpretable summary evaluation method (MISEM) is based on allocation of a summary's contextual token embeddings to semantic topics identified in the reference text.
MISEM achieves a promising.404 Pearson correlation with human judgment on the TAC'08 dataset.
arXiv Detail & Related papers (2022-10-25T17:09:08Z) - WIDAR -- Weighted Input Document Augmented ROUGE [26.123086537577155]
The proposed metric WIDAR is designed to adapt the evaluation score according to the quality of the reference summary.
The proposed metric correlates better than ROUGE by 26%, 76%, 82%, and 15%, respectively, in coherence, consistency, fluency, and relevance on human judgement scores.
arXiv Detail & Related papers (2022-01-23T14:40:42Z) - A Training-free and Reference-free Summarization Evaluation Metric via
Centrality-weighted Relevance and Self-referenced Redundancy [60.419107377879925]
We propose a training-free and reference-free summarization evaluation metric.
Our metric consists of a centrality-weighted relevance score and a self-referenced redundancy score.
Our methods can significantly outperform existing methods on both multi-document and single-document summarization evaluation.
arXiv Detail & Related papers (2021-06-26T05:11:27Z) - Improving Factual Consistency of Abstractive Summarization via Question
Answering [25.725873545789046]
We present an approach to address factual consistency in summarization.
We first propose an efficient automatic evaluation metric to measure factual consistency.
We then propose a novel learning algorithm that maximizes the proposed metric during model training.
arXiv Detail & Related papers (2021-05-10T19:07:21Z) - Understanding the Extent to which Summarization Evaluation Metrics
Measure the Information Quality of Summaries [74.28810048824519]
We analyze the token alignments used by ROUGE and BERTScore to compare summaries.
We argue that their scores largely cannot be interpreted as measuring information overlap.
arXiv Detail & Related papers (2020-10-23T15:55:15Z) - Unsupervised Reference-Free Summary Quality Evaluation via Contrastive
Learning [66.30909748400023]
We propose to evaluate the summary qualities without reference summaries by unsupervised contrastive learning.
Specifically, we design a new metric which covers both linguistic qualities and semantic informativeness based on BERT.
Experiments on Newsroom and CNN/Daily Mail demonstrate that our new evaluation method outperforms other metrics even without reference summaries.
arXiv Detail & Related papers (2020-10-05T05:04:14Z) - Towards Question-Answering as an Automatic Metric for Evaluating the
Content Quality of a Summary [65.37544133256499]
We propose a metric to evaluate the content quality of a summary using question-answering (QA)
We demonstrate the experimental benefits of QA-based metrics through an analysis of our proposed metric, QAEval.
arXiv Detail & Related papers (2020-10-01T15:33:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.