Unsupervised Opinion Summarisation in the Wasserstein Space
- URL: http://arxiv.org/abs/2211.14923v1
- Date: Sun, 27 Nov 2022 19:45:38 GMT
- Title: Unsupervised Opinion Summarisation in the Wasserstein Space
- Authors: Jiayu Song, Iman Munire Bilal, Adam Tsakalidis, Rob Procter, Maria
Liakata
- Abstract summary: We present WassOS, an unsupervised abstractive summarization model which makes use of the Wasserstein distance.
We show that WassOS almost always outperforms the state-of-the-art on ROUGE metrics and consistently produces the best summaries according to human evaluations.
- Score: 22.634245146129857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Opinion summarisation synthesises opinions expressed in a group of documents
discussing the same topic to produce a single summary. Recent work has looked
at opinion summarisation of clusters of social media posts. Such posts are
noisy and have unpredictable structure, posing additional challenges for the
construction of the summary distribution and the preservation of meaning
compared to online reviews, which has been so far the focus of opinion
summarisation. To address these challenges we present \textit{WassOS}, an
unsupervised abstractive summarization model which makes use of the Wasserstein
distance. A Variational Autoencoder is used to get the distribution of
documents/posts, and the distributions are disentangled into separate semantic
and syntactic spaces. The summary distribution is obtained using the
Wasserstein barycenter of the semantic and syntactic distributions. A latent
variable sampled from the summary distribution is fed into a GRU decoder with a
transformer layer to produce the final summary. Our experiments on multiple
datasets including Twitter clusters, Reddit threads, and reviews show that
WassOS almost always outperforms the state-of-the-art on ROUGE metrics and
consistently produces the best summaries with respect to meaning preservation
according to human evaluations.
Related papers
- Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs)
Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy.
At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z) - Incremental Extractive Opinion Summarization Using Cover Trees [81.59625423421355]
In online marketplaces user reviews accumulate over time, and opinion summaries need to be updated periodically.
In this work, we study the task of extractive opinion summarization in an incremental setting.
We present an efficient algorithm for accurately computing the CentroidRank summaries in an incremental setting.
arXiv Detail & Related papers (2024-01-16T02:00:17Z) - AugSumm: towards generalizable speech summarization using synthetic
labels from large language model [61.73741195292997]
Abstractive speech summarization (SSUM) aims to generate human-like summaries from speech.
conventional SSUM models are mostly trained and evaluated with a single ground-truth (GT) human-annotated deterministic summary.
We propose AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries.
arXiv Detail & Related papers (2024-01-10T18:39:46Z) - Efficient and Interpretable Compressive Text Summarisation with
Unsupervised Dual-Agent Reinforcement Learning [36.93582300019002]
We propose an efficient and interpretable compressive summarisation method using unsupervised dual-agent reinforcement learning.
Our model achieves promising performance and a significant improvement on Newsroom in terms of the ROUGE metric.
arXiv Detail & Related papers (2023-06-06T05:30:49Z) - SNaC: Coherence Error Detection for Narrative Summarization [73.48220043216087]
We introduce SNaC, a narrative coherence evaluation framework rooted in fine-grained annotations for long summaries.
We develop a taxonomy of coherence errors in generated narrative summaries and collect span-level annotations for 6.6k sentences across 150 book and movie screenplay summaries.
Our work provides the first characterization of coherence errors generated by state-of-the-art summarization models and a protocol for eliciting coherence judgments from crowd annotators.
arXiv Detail & Related papers (2022-05-19T16:01:47Z) - Unsupervised Extractive Opinion Summarization Using Sparse Coding [19.598936651505067]
We present Semantic Autoencoder (SemAE) to perform extractive opinion summarization in an unsupervised manner.
SemAE uses dictionary learning to implicitly capture semantic information from the review and learns a latent representation of each sentence over semantic units.
We report strong performance on SPACE and AMAZON datasets, and perform experiments to investigate the functioning of our model.
arXiv Detail & Related papers (2022-03-15T14:03:35Z) - Unsupervised Summarization with Customized Granularities [76.26899748972423]
We propose the first unsupervised multi-granularity summarization framework, GranuSum.
By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner.
arXiv Detail & Related papers (2022-01-29T05:56:35Z) - Reinforcing Semantic-Symmetry for Document Summarization [15.113768658584979]
Document summarization condenses a long document into a short version with salient information and accurate semantic descriptions.
This paper introduces a new textbfreinforcing stextbfemantic-textbfsymmetry learning textbfmodel is proposed for document summarization.
A series of experiments have been conducted on two wildly used benchmark datasets CNN/Daily Mail and BigPatent.
arXiv Detail & Related papers (2021-12-14T17:41:37Z) - AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer
Summarization [73.91543616777064]
Community Question Answering (CQA) fora such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of community-based questions.
One goal of answer summarization is to produce a summary that reflects the range of answer perspectives.
This work introduces a novel dataset of 4,631 CQA threads for answer summarization, curated by professional linguists.
arXiv Detail & Related papers (2021-11-11T21:48:02Z) - Semantic Extractor-Paraphraser based Abstractive Summarization [40.05739160204135]
We propose an extractor-paraphraser based abstractive summarization system that exploits semantic overlap.
Our model outperforms the state-of-the-art baselines in terms of ROUGE, METEOR and word similarity (WMS)
arXiv Detail & Related papers (2021-05-04T05:24:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.