OTExtSum: Extractive Text Summarisation with Optimal Transport
- URL: http://arxiv.org/abs/2204.10086v1
- Date: Thu, 21 Apr 2022 13:25:34 GMT
- Title: OTExtSum: Extractive Text Summarisation with Optimal Transport
- Authors: Peggy Tang, Kun Hu, Rui Yan, Lei Zhang, Junbin Gao, Zhiyong Wang
- Abstract summary: We propose a novel non-learning-based method by for the first time formulating text summarisation as an Optimal Transport (OT) problem.
Our proposed method outperforms the state-of-the-art non-learning-based methods and several recent learning-based methods in terms of the ROUGE metric.
- Score: 45.78604902572955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Extractive text summarisation aims to select salient sentences from a
document to form a short yet informative summary. While learning-based methods
have achieved promising results, they have several limitations, such as
dependence on expensive training and lack of interpretability. Therefore, in
this paper, we propose a novel non-learning-based method by for the first time
formulating text summarisation as an Optimal Transport (OT) problem, namely
Optimal Transport Extractive Summariser (OTExtSum). Optimal sentence extraction
is conceptualised as obtaining an optimal summary that minimises the
transportation cost to a given document regarding their semantic distributions.
Such a cost is defined by the Wasserstein distance and used to measure the
summary's semantic coverage of the original document. Comprehensive experiments
on four challenging and widely used datasets - MultiNews, PubMed, BillSum, and
CNN/DM demonstrate that our proposed method outperforms the state-of-the-art
non-learning-based methods and several recent learning-based methods in terms
of the ROUGE metric.
Related papers
- Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - Source Identification in Abstractive Summarization [0.8883733362171033]
We define input sentences that contain essential information in the generated summary as $textitsource sentences$ and study how abstractive summaries are made by analyzing the source sentences.
We formulate automatic source sentence detection and compare multiple methods to establish a strong baseline for the task.
Experimental results show that the perplexity-based method performs well in highly abstractive settings, while similarity-based methods robustly in relatively extractive settings.
arXiv Detail & Related papers (2024-02-07T09:09:09Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Towards Abstractive Timeline Summarisation using Preference-based
Reinforcement Learning [3.6640004265358477]
This paper introduces a novel pipeline for summarising timelines of events reported by multiple news sources.
Transformer-based models for abstractive summarisation generate coherent and concise summaries of long documents.
While extractive summaries are more faithful to their sources, they may be less readable and contain redundant or unnecessary information.
arXiv Detail & Related papers (2022-11-14T18:24:13Z) - Salience Allocation as Guidance for Abstractive Summarization [61.31826412150143]
We propose a novel summarization approach with a flexible and reliable salience guidance, namely SEASON (SaliencE Allocation as Guidance for Abstractive SummarizatiON)
SEASON utilizes the allocation of salience expectation to guide abstractive summarization and adapts well to articles in different abstractiveness.
arXiv Detail & Related papers (2022-10-22T02:13:44Z) - Comparing Methods for Extractive Summarization of Call Centre Dialogue [77.34726150561087]
We experimentally compare several such methods by using them to produce summaries of calls, and evaluating these summaries objectively.
We found that TopicSum and Lead-N outperform the other summarisation methods, whilst BERTSum received comparatively lower scores in both subjective and objective evaluations.
arXiv Detail & Related papers (2022-09-06T13:16:02Z) - A Survey on Neural Abstractive Summarization Methods and Factual
Consistency of Summarization [18.763290930749235]
summarization is the process of shortening a set of textual data computationally, to create a subset (a summary)
Existing summarization methods can be roughly divided into two types: extractive and abstractive.
An extractive summarizer explicitly selects text snippets from the source document, while an abstractive summarizer generates novel text snippets to convey the most salient concepts prevalent in the source.
arXiv Detail & Related papers (2022-04-20T14:56:36Z) - A New Sentence Extraction Strategy for Unsupervised Extractive
Summarization Methods [26.326800624948344]
We model the task of extractive text summarization methods from the perspective of Information Theory.
To improve the feature distribution and to decrease the mutual information of summarization sentences, we propose a new sentence extraction strategy.
arXiv Detail & Related papers (2021-12-06T18:00:02Z) - Unsupervised Extractive Summarization using Pointwise Mutual Information [5.544401446569243]
We propose new metrics of relevance and redundancy using pointwise mutual information (PMI) between sentences.
We show that our method outperforms similarity-based methods on datasets in a range of domains including news, medical journal articles, and personal anecdotes.
arXiv Detail & Related papers (2021-02-11T21:05:50Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.