Source Identification in Abstractive Summarization
- URL: http://arxiv.org/abs/2402.04677v1
- Date: Wed, 7 Feb 2024 09:09:09 GMT
- Title: Source Identification in Abstractive Summarization
- Authors: Yoshi Suhara and Dimitris Alikaniotis
- Abstract summary: We define input sentences that contain essential information in the generated summary as $textitsource sentences$ and study how abstractive summaries are made by analyzing the source sentences.
We formulate automatic source sentence detection and compare multiple methods to establish a strong baseline for the task.
Experimental results show that the perplexity-based method performs well in highly abstractive settings, while similarity-based methods robustly in relatively extractive settings.
- Score: 0.8883733362171033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural abstractive summarization models make summaries in an end-to-end
manner, and little is known about how the source information is actually
converted into summaries. In this paper, we define input sentences that contain
essential information in the generated summary as $\textit{source sentences}$
and study how abstractive summaries are made by analyzing the source sentences.
To this end, we annotate source sentences for reference summaries and system
summaries generated by PEGASUS on document-summary pairs sampled from the
CNN/DailyMail and XSum datasets. We also formulate automatic source sentence
detection and compare multiple methods to establish a strong baseline for the
task. Experimental results show that the perplexity-based method performs well
in highly abstractive settings, while similarity-based methods perform robustly
in relatively extractive settings. Our code and data are available at
https://github.com/suhara/sourcesum.
Related papers
- Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback.
Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z) - AugSumm: towards generalizable speech summarization using synthetic
labels from large language model [61.73741195292997]
Abstractive speech summarization (SSUM) aims to generate human-like summaries from speech.
conventional SSUM models are mostly trained and evaluated with a single ground-truth (GT) human-annotated deterministic summary.
We propose AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries.
arXiv Detail & Related papers (2024-01-10T18:39:46Z) - On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries.
Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens.
We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z) - Salience Allocation as Guidance for Abstractive Summarization [61.31826412150143]
We propose a novel summarization approach with a flexible and reliable salience guidance, namely SEASON (SaliencE Allocation as Guidance for Abstractive SummarizatiON)
SEASON utilizes the allocation of salience expectation to guide abstractive summarization and adapts well to articles in different abstractiveness.
arXiv Detail & Related papers (2022-10-22T02:13:44Z) - A Survey on Neural Abstractive Summarization Methods and Factual
Consistency of Summarization [18.763290930749235]
summarization is the process of shortening a set of textual data computationally, to create a subset (a summary)
Existing summarization methods can be roughly divided into two types: extractive and abstractive.
An extractive summarizer explicitly selects text snippets from the source document, while an abstractive summarizer generates novel text snippets to convey the most salient concepts prevalent in the source.
arXiv Detail & Related papers (2022-04-20T14:56:36Z) - Reinforcing Semantic-Symmetry for Document Summarization [15.113768658584979]
Document summarization condenses a long document into a short version with salient information and accurate semantic descriptions.
This paper introduces a new textbfreinforcing stextbfemantic-textbfsymmetry learning textbfmodel is proposed for document summarization.
A series of experiments have been conducted on two wildly used benchmark datasets CNN/Daily Mail and BigPatent.
arXiv Detail & Related papers (2021-12-14T17:41:37Z) - Topic Modeling Based Extractive Text Summarization [0.0]
We propose a novel method to summarize a text document by clustering its contents based on latent topics.
We utilize the lesser used and challenging WikiHow dataset in our approach to text summarization.
arXiv Detail & Related papers (2021-06-29T12:28:19Z) - Automated News Summarization Using Transformers [4.932130498861987]
We will be presenting a comprehensive comparison of a few transformer architecture based pre-trained models for text summarization.
For analysis and comparison, we have used the BBC news dataset that contains text data that can be used for summarization and human generated summaries.
arXiv Detail & Related papers (2021-04-23T04:22:33Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z) - At Which Level Should We Extract? An Empirical Analysis on Extractive
Document Summarization [110.54963847339775]
We show that unnecessity and redundancy issues exist when extracting full sentences.
We propose extracting sub-sentential units based on the constituency parsing tree.
arXiv Detail & Related papers (2020-04-06T13:35:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.