FactSim: Fact-Checking for Opinion Summarization
- URL: http://arxiv.org/abs/2602.08709v1
- Date: Mon, 09 Feb 2026 14:21:19 GMT
- Title: FactSim: Fact-Checking for Opinion Summarization
- Authors: Leandro Anghinoni, Jorge Sanchez,
- Abstract summary: We explore the need for more comprehensive and precise evaluation techniques for generative artificial intelligence (GenAI) in text summarization tasks.<n>Traditional methods, which leverage automated metrics to compare machine-generated summaries from a collection of opinion pieces, have shown limitations due to the paradigm shift introduced by large language models (LLM)<n>This paper proposes a novel, fully automated methodology for assessing the factual consistency of such summaries.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We explore the need for more comprehensive and precise evaluation techniques for generative artificial intelligence (GenAI) in text summarization tasks, specifically in the area of opinion summarization. Traditional methods, which leverage automated metrics to compare machine-generated summaries from a collection of opinion pieces, e.g. product reviews, have shown limitations due to the paradigm shift introduced by large language models (LLM). This paper addresses these shortcomings by proposing a novel, fully automated methodology for assessing the factual consistency of such summaries. The method is based on measuring the similarity between the claims in a given summary with those from the original reviews, measuring the coverage and consistency of the generated summary. To do so, we rely on a simple approach to extract factual assessment from texts that we then compare and summarize in a suitable score. We demonstrate that the proposed metric attributes higher scores to similar claims, regardless of whether the claim is negated, paraphrased, or expanded, and that the score has a high correlation to human judgment when compared to state-of-the-art metrics.
Related papers
- FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction [85.26780391682894]
We propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE)
FENICE leverages an NLI-based alignment between information in the source document and a set of atomic facts, referred to as claims, extracted from the summary.
Our metric sets a new state of the art on AGGREFACT, the de-facto benchmark for factuality evaluation.
arXiv Detail & Related papers (2024-03-04T17:57:18Z) - Interpretable Automatic Fine-grained Inconsistency Detection in Text
Summarization [56.94741578760294]
We propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary.
Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact.
arXiv Detail & Related papers (2023-05-23T22:11:47Z) - Factual Consistency Evaluation for Text Summarization via Counterfactual
Estimation [42.63902468258758]
We propose a novel metric to evaluate the factual consistency in text summarization via counterfactual estimation.
We conduct a series of experiments on three public abstractive text summarization datasets.
arXiv Detail & Related papers (2021-08-30T11:48:41Z) - A Training-free and Reference-free Summarization Evaluation Metric via
Centrality-weighted Relevance and Self-referenced Redundancy [60.419107377879925]
We propose a training-free and reference-free summarization evaluation metric.
Our metric consists of a centrality-weighted relevance score and a self-referenced redundancy score.
Our methods can significantly outperform existing methods on both multi-document and single-document summarization evaluation.
arXiv Detail & Related papers (2021-06-26T05:11:27Z) - Improving Factual Consistency of Abstractive Summarization via Question
Answering [25.725873545789046]
We present an approach to address factual consistency in summarization.
We first propose an efficient automatic evaluation metric to measure factual consistency.
We then propose a novel learning algorithm that maximizes the proposed metric during model training.
arXiv Detail & Related papers (2021-05-10T19:07:21Z) - Understanding Factuality in Abstractive Summarization with FRANK: A
Benchmark for Factuality Metrics [17.677637487977208]
Modern summarization models generate highly fluent but often factually unreliable outputs.
Due to the lack of common benchmarks, metrics attempting to measure the factuality of automatically generated summaries cannot be compared.
We devise a typology of factual errors and use it to collect human annotations of generated summaries from state-of-the-art summarization systems.
arXiv Detail & Related papers (2021-04-27T17:28:07Z) - Understanding the Extent to which Summarization Evaluation Metrics
Measure the Information Quality of Summaries [74.28810048824519]
We analyze the token alignments used by ROUGE and BERTScore to compare summaries.
We argue that their scores largely cannot be interpreted as measuring information overlap.
arXiv Detail & Related papers (2020-10-23T15:55:15Z) - SummEval: Re-evaluating Summarization Evaluation [169.622515287256]
We re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion.
We benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics.
We assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset.
arXiv Detail & Related papers (2020-07-24T16:25:19Z) - SueNes: A Weakly Supervised Approach to Evaluating Single-Document
Summarization via Negative Sampling [25.299937353444854]
We present a proof-of-concept study to a weakly supervised summary evaluation approach without the presence of reference summaries.
Massive data in existing summarization datasets are transformed for training by pairing documents with corrupted reference summaries.
arXiv Detail & Related papers (2020-05-13T15:40:13Z) - Enhancing Factual Consistency of Abstractive Summarization [57.67609672082137]
We propose a fact-aware summarization model FASum to extract and integrate factual relations into the summary generation process.
We then design a factual corrector model FC to automatically correct factual errors from summaries generated by existing systems.
arXiv Detail & Related papers (2020-03-19T07:36:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.