SNaC: Coherence Error Detection for Narrative Summarization
- URL: http://arxiv.org/abs/2205.09641v1
- Date: Thu, 19 May 2022 16:01:47 GMT
- Title: SNaC: Coherence Error Detection for Narrative Summarization
- Authors: Tanya Goyal, Junyi Jessy Li, Greg Durrett
- Abstract summary: We introduce SNaC, a narrative coherence evaluation framework rooted in fine-grained annotations for long summaries.
We develop a taxonomy of coherence errors in generated narrative summaries and collect span-level annotations for 6.6k sentences across 150 book and movie screenplay summaries.
Our work provides the first characterization of coherence errors generated by state-of-the-art summarization models and a protocol for eliciting coherence judgments from crowd annotators.
- Score: 73.48220043216087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Progress in summarizing long texts is inhibited by the lack of appropriate
evaluation frameworks. When a long summary must be produced to appropriately
cover the facets of that text, that summary needs to present a coherent
narrative to be understandable by a reader, but current automatic and human
evaluation methods fail to identify gaps in coherence. In this work, we
introduce SNaC, a narrative coherence evaluation framework rooted in
fine-grained annotations for long summaries. We develop a taxonomy of coherence
errors in generated narrative summaries and collect span-level annotations for
6.6k sentences across 150 book and movie screenplay summaries. Our work
provides the first characterization of coherence errors generated by
state-of-the-art summarization models and a protocol for eliciting coherence
judgments from crowd annotators. Furthermore, we show that the collected
annotations allow us to train a strong classifier for automatically localizing
coherence errors in generated summaries as well as benchmarking past work in
coherence modeling. Finally, our SNaC framework can support future work in long
document summarization and coherence evaluation, including improved
summarization modeling and post-hoc summary correction.
Related papers
- Lexical Repetitions Lead to Rote Learning: Unveiling the Impact of
Lexical Overlap in Train and Test Reference Summaries [131.80860903537172]
Ideal summarization models should generalize to novel summary-worthy content without remembering reference training summaries by rote.
We propose a fine-grained evaluation protocol by partitioning a test set based on the lexical similarity of reference test summaries with training summaries.
arXiv Detail & Related papers (2023-11-15T23:47:53Z) - CoheSentia: A Novel Benchmark of Incremental versus Holistic Assessment
of Coherence in Generated Texts [15.866519123942457]
We introduce sc CoheSentia, a novel benchmark of human-perceived coherence of automatically generated texts.
Our benchmark contains 500 automatically-generated and human-annotated paragraphs, each annotated in both methods.
Our analysis shows that the inter-annotator agreement in the incremental mode is higher than in the holistic alternative.
arXiv Detail & Related papers (2023-10-25T03:21:20Z) - SummIt: Iterative Text Summarization via ChatGPT [12.966825834765814]
We propose SummIt, an iterative text summarization framework based on large language models like ChatGPT.
Our framework enables the model to refine the generated summary iteratively through self-evaluation and feedback.
We also conduct a human evaluation to validate the effectiveness of the iterative refinements and identify a potential issue of over-correction.
arXiv Detail & Related papers (2023-05-24T07:40:06Z) - Interpretable Automatic Fine-grained Inconsistency Detection in Text
Summarization [56.94741578760294]
We propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary.
Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact.
arXiv Detail & Related papers (2023-05-23T22:11:47Z) - SWING: Balancing Coverage and Faithfulness for Dialogue Summarization [67.76393867114923]
We propose to utilize natural language inference (NLI) models to improve coverage while avoiding factual inconsistencies.
We use NLI to compute fine-grained training signals to encourage the model to generate content in the reference summaries that have not been covered.
Experiments on the DialogSum and SAMSum datasets confirm the effectiveness of the proposed approach.
arXiv Detail & Related papers (2023-01-25T09:33:11Z) - How to Find Strong Summary Coherence Measures? A Toolbox and a
Comparative Study for Summary Coherence Measure Evaluation [3.434197496862117]
We conduct a large-scale investigation of various methods for summary coherence modelling on an even playing field.
We introduce two novel analysis measures, intra-system correlation and bias matrices, that help identify biases in coherence measures and provide robustness against system-level confounders.
While none of the currently available automatic coherence measures are able to assign reliable coherence scores to system summaries across all evaluation metrics, large-scale language models show promising results, as long as fine-tuning takes into account that they need to generalize across different summary lengths.
arXiv Detail & Related papers (2022-09-14T09:42:19Z) - Dialogue Summarization with Supporting Utterance Flow Modeling and Fact
Regularization [58.965859508695225]
We propose an end-to-end neural model for dialogue summarization with two novel modules.
The supporting utterance flow modeling helps to generate a coherent summary by smoothly shifting the focus from the former utterances to the later ones.
The fact regularization encourages the generated summary to be factually consistent with the ground-truth summary during model training.
arXiv Detail & Related papers (2021-08-03T03:09:25Z) - Controllable Abstractive Dialogue Summarization with Sketch Supervision [56.59357883827276]
Our model achieves state-of-the-art performance on the largest dialogue summarization corpus SAMSum, with as high as 50.79 in ROUGE-L score.
arXiv Detail & Related papers (2021-05-28T19:05:36Z) - Generating (Factual?) Narrative Summaries of RCTs: Experiments with
Neural Multi-Document Summarization [22.611879349101596]
We evaluate modern neural models for abstractive summarization of relevant article abstracts from systematic reviews.
We find that modern summarization systems yield consistently fluent and relevant synopses, but that they are not always factual.
arXiv Detail & Related papers (2020-08-25T22:22:50Z) - SueNes: A Weakly Supervised Approach to Evaluating Single-Document
Summarization via Negative Sampling [25.299937353444854]
We present a proof-of-concept study to a weakly supervised summary evaluation approach without the presence of reference summaries.
Massive data in existing summarization datasets are transformed for training by pairing documents with corrupted reference summaries.
arXiv Detail & Related papers (2020-05-13T15:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.