Consistency and Coherence from Points of Contextual Similarity
- URL: http://arxiv.org/abs/2112.11638v1
- Date: Wed, 22 Dec 2021 03:04:20 GMT
- Title: Consistency and Coherence from Points of Contextual Similarity
- Authors: Oleg Vasilyev, John Bohannon
- Abstract summary: ESTIME measure, recently proposed specifically for factual consistency, achieves high correlations with human expert scores.
This is not a problem for current styles of summarization, but it may become an obstacle for future summarization systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Factual consistency is one of important summary evaluation dimensions,
especially as summary generation becomes more fluent and coherent. The ESTIME
measure, recently proposed specifically for factual consistency, achieves high
correlations with human expert scores both for consistency and fluency, while
in principle being restricted to evaluating such text-summary pairs that have
high dictionary overlap. This is not a problem for current styles of
summarization, but it may become an obstacle for future summarization systems,
or for evaluating arbitrary claims against the text. In this work we generalize
the method, making it applicable to any text-summary pairs. As ESTIME uses
points of contextual similarity, it provides insights into usefulness of
information taken from different BERT layers. We observe that useful
information exists in almost all of the layers except the several lowest ones.
For consistency and fluency - qualities focused on local text details - the
most useful layers are close to the top (but not at the top); for coherence and
relevance we found a more complicated and interesting picture.
Related papers
- Using Similarity to Evaluate Factual Consistency in Summaries [2.7595794227140056]
Abstractive summarisers generate fluent summaries, but the factuality of the generated text is not guaranteed.
We propose a new zero-shot factuality evaluation metric, Sentence-BERTScore (SBERTScore), which compares sentences between the summary and the source document.
Our experiments indicate that each technique has different strengths, with SBERTScore particularly effective in identifying correct summaries.
arXiv Detail & Related papers (2024-09-23T15:02:38Z) - On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries.
Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens.
We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z) - SWING: Balancing Coverage and Faithfulness for Dialogue Summarization [67.76393867114923]
We propose to utilize natural language inference (NLI) models to improve coverage while avoiding factual inconsistencies.
We use NLI to compute fine-grained training signals to encourage the model to generate content in the reference summaries that have not been covered.
Experiments on the DialogSum and SAMSum datasets confirm the effectiveness of the proposed approach.
arXiv Detail & Related papers (2023-01-25T09:33:11Z) - Topic Segmentation Model Focusing on Local Context [1.9871897882042773]
We propose siamese sentence embedding layers which process two input sentences independently to get appropriate amount of information.
Also, we adopt multi-task learning techniques including Same Topic Prediction (STP), Topic Classification (TC) and Next Sentence Prediction (NSP)
arXiv Detail & Related papers (2023-01-05T06:57:42Z) - Evaluating the Factual Consistency of Large Language Models Through News
Summarization [97.04685401448499]
We propose a new benchmark called FIB(Factual Inconsistency Benchmark) that focuses on the task of summarization.
For factually consistent summaries, we use human-written reference summaries that we manually verify as factually consistent.
For factually inconsistent summaries, we generate summaries from a suite of summarization models that we have manually annotated as factually inconsistent.
arXiv Detail & Related papers (2022-11-15T18:50:34Z) - Toward the Understanding of Deep Text Matching Models for Information
Retrieval [72.72380690535766]
This paper aims at testing whether existing deep text matching methods satisfy some fundamental gradients in information retrieval.
Specifically, four attributions are used in our study, i.e., term frequency constraint, term discrimination constraint, length normalization constraints, and TF-length constraint.
Experimental results on LETOR 4.0 and MS Marco show that all the investigated deep text matching methods satisfy the above constraints with high probabilities in statistics.
arXiv Detail & Related papers (2021-08-16T13:33:15Z) - Hierarchical Text Interaction for Rating Prediction [8.400688907233398]
We propose a novel Hierarchical Text Interaction model for rating prediction.
We exploit semantic correlations between each user-item pair at different hierarchies.
Experiments on five real-world datasets demonstrate that HTI outperforms state-of-the-art models by a large margin.
arXiv Detail & Related papers (2020-10-15T09:52:40Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Understanding Points of Correspondence between Sentences for Abstractive
Summarization [39.7404761923196]
We present an investigation into fusing sentences drawn from a document by introducing the notion of points of correspondence.
We create a dataset containing the documents, source and fusion sentences, and human annotations of points of correspondence between sentences.
arXiv Detail & Related papers (2020-06-10T02:42:38Z) - Extending Text Informativeness Measures to Passage Interestingness
Evaluation (Language Model vs. Word Embedding) [1.2998637003026272]
This paper defines the concept of Interestingness as a generalization of Informativeness.
We then study the ability of state of the art Informativeness measures to cope with this generalization.
We prove that the CLEF-INEX Tweet Contextualization 2012 Logarithm Similarity measure provides best results.
arXiv Detail & Related papers (2020-04-14T18:22:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.