Improving Truthfulness of Headline Generation
- URL: http://arxiv.org/abs/2005.00882v2
- Date: Tue, 5 May 2020 02:02:50 GMT
- Title: Improving Truthfulness of Headline Generation
- Authors: Kazuki Matsumaru, Sho Takase, Naoaki Okazaki
- Abstract summary: We show that the state-of-the-art encoder-decoder model sometimes generates untruthful headlines.
We conjecture that one of the reasons lies in untruthful supervision data used for training the model.
After confirming quite a few untruthful instances in the datasets, this study hypothesizes that removing untruthful instances from the supervision data may remedy the problem.
- Score: 24.07832528012763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most studies on abstractive summarization report ROUGE scores between system
and reference summaries. However, we have a concern about the truthfulness of
generated summaries: whether all facts of a generated summary are mentioned in
the source text. This paper explores improving the truthfulness in headline
generation on two popular datasets. Analyzing headlines generated by the
state-of-the-art encoder-decoder model, we show that the model sometimes
generates untruthful headlines. We conjecture that one of the reasons lies in
untruthful supervision data used for training the model. In order to quantify
the truthfulness of article-headline pairs, we consider the textual entailment
of whether an article entails its headline. After confirming quite a few
untruthful instances in the datasets, this study hypothesizes that removing
untruthful instances from the supervision data may remedy the problem of the
untruthful behaviors of the model. Building a binary classifier that predicts
an entailment relation between an article and its headline, we filter out
untruthful instances from the supervision data. Experimental results
demonstrate that the headline generation model trained on filtered supervision
data shows no clear difference in ROUGE scores but remarkable improvements in
automatic and manual evaluations of the generated headlines.
Related papers
- AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation [57.8363998797433]
We propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs)
Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage.
arXiv Detail & Related papers (2023-11-16T02:56:29Z) - Benchmarking the Generation of Fact Checking Explanations [19.363672064425504]
We focus on the generation of justifications (textual explanation of why a claim is classified as either true or false) and benchmark it with novel datasets and advanced baselines.
Results show that in justification production summarization benefits from the claim information.
Although cross-dataset experiments suffer from performance degradation, a unique model trained on a combination of the two datasets is able to retain style information in an efficient manner.
arXiv Detail & Related papers (2023-08-29T10:40:46Z) - Correcting Diverse Factual Errors in Abstractive Summarization via
Post-Editing and Language Model Infilling [56.70682379371534]
We show that our approach vastly outperforms prior methods in correcting erroneous summaries.
Our model -- FactEdit -- improves factuality scores by over 11 points on CNN/DM and over 31 points on XSum.
arXiv Detail & Related papers (2022-10-22T07:16:19Z) - Does Your Model Classify Entities Reasonably? Diagnosing and Mitigating
Spurious Correlations in Entity Typing [29.820473012776283]
Existing entity typing models are subject to the problem of spurious correlations.
We identify six types of existing model biases, including mention-context bias, lexical overlapping bias, named entity bias, pronoun bias, dependency bias, and overgeneralization bias.
By augmenting the original training set with their bias-free counterparts, models are forced to fully comprehend the sentences.
arXiv Detail & Related papers (2022-05-25T10:34:22Z) - MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News
Summarization [19.062996443574047]
We present a new dataset MiRANews and benchmark existing summarization models.
We show via data analysis that it's not only the models which are to blame.
assisted summarization reduces 55% of hallucinations when compared to single-document summarization models trained on the main article only.
arXiv Detail & Related papers (2021-09-22T10:58:40Z) - Hidden Biases in Unreliable News Detection Datasets [60.71991809782698]
We show that selection bias during data collection leads to undesired artifacts in the datasets.
We observed a significant drop (>10%) in accuracy for all models tested in a clean split with no train/test source overlap.
We suggest future dataset creation include a simple model as a difficulty/bias probe and future model development use a clean non-overlapping site and date split.
arXiv Detail & Related papers (2021-04-20T17:16:41Z) - Annotating and Modeling Fine-grained Factuality in Summarization [36.88018450067003]
A major barrier to their use in practice is their propensity to output summaries that are not faithful to the input and that contain factual errors.
We explore both synthetic and human-labeled data sources for training models to identify factual errors in summarization.
We show that our best factuality detection model enables training of more factual XSum summarization models by allowing us to identify non-factual tokens in the training data.
arXiv Detail & Related papers (2021-04-09T11:20:44Z) - Few-Shot Learning for Opinion Summarization [117.70510762845338]
Opinion summarization is the automatic creation of text reflecting subjective information expressed in multiple documents.
In this work, we show that even a handful of summaries is sufficient to bootstrap generation of the summary text.
Our approach substantially outperforms previous extractive and abstractive methods in automatic and human evaluation.
arXiv Detail & Related papers (2020-04-30T15:37:38Z) - Unsupervised Opinion Summarization with Noising and Denoising [85.49169453434554]
We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof.
At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise.
arXiv Detail & Related papers (2020-04-21T16:54:57Z) - Enhancing Factual Consistency of Abstractive Summarization [57.67609672082137]
We propose a fact-aware summarization model FASum to extract and integrate factual relations into the summary generation process.
We then design a factual corrector model FC to automatically correct factual errors from summaries generated by existing systems.
arXiv Detail & Related papers (2020-03-19T07:36:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.