Unsupervised Summarization by Jointly Extracting Sentences and Keywords
- URL: http://arxiv.org/abs/2009.07481v2
- Date: Mon, 24 Jul 2023 03:26:17 GMT
- Title: Unsupervised Summarization by Jointly Extracting Sentences and Keywords
- Authors: Zongyi Li, Xiaoqing Zheng, Jun He
- Abstract summary: RepRank is an unsupervised graph-based ranking model for extractive multi-document summarization.
We show that salient sentences and keywords can be extracted in a joint and mutual reinforcement process using our learned representations.
Experiment results with multiple benchmark datasets show that RepRank achieved the best or comparable performance in ROUGE.
- Score: 12.387378783627762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present RepRank, an unsupervised graph-based ranking model for extractive
multi-document summarization in which the similarity between words, sentences,
and word-to-sentence can be estimated by the distances between their vector
representations in a unified vector space. In order to obtain desirable
representations, we propose a self-attention based learning method that
represent a sentence by the weighted sum of its word embeddings, and the
weights are concentrated to those words hopefully better reflecting the content
of a document. We show that salient sentences and keywords can be extracted in
a joint and mutual reinforcement process using our learned representations, and
prove that this process always converges to a unique solution leading to
improvement in performance. A variant of absorbing random walk and the
corresponding sampling-based algorithm are also described to avoid redundancy
and increase diversity in the summaries. Experiment results with multiple
benchmark datasets show that RepRank achieved the best or comparable
performance in ROUGE.
Related papers
- Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining [0.22499166814992438]
We show that when target phrases reside inside noisy context, representing the full sentence with a single dense vector is not sufficient for effective phrase retrieval.
We show that this technique is much more effective for phrase mining, yet requires considerable compute to obtain useful span representations.
arXiv Detail & Related papers (2024-05-12T12:08:05Z) - Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs)
Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy.
At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z) - Enhancing Coherence of Extractive Summarization with Multitask Learning [40.349019691412465]
This study proposes a multitask learning architecture for extractive summarization with coherence boosting.
The architecture contains an extractive summarizer and coherent discriminator module.
Experiments show that our proposed method significantly improves the proportion of consecutive sentences in the extracted summaries.
arXiv Detail & Related papers (2023-05-22T09:20:58Z) - Learning to Rank Utterances for Query-Focused Meeting Summarization [0.7868449549351486]
We propose a Ranker-Generator framework to rank utterances.
We show that learning to rank utterances helps to select utterances related to the query effectively.
Experimental results on QMSum show that the proposed model outperforms all existing multi-stage models with fewer parameters.
arXiv Detail & Related papers (2023-05-22T06:25:09Z) - BERM: Training the Balanced and Extractable Representation for Matching
to Improve Generalization Ability of Dense Retrieval [54.66399120084227]
We propose a novel method to improve the generalization of dense retrieval via capturing matching signal called BERM.
Dense retrieval has shown promise in the first-stage retrieval process when trained on in-domain labeled datasets.
arXiv Detail & Related papers (2023-05-18T15:43:09Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Retrofitting Multilingual Sentence Embeddings with Abstract Meaning
Representation [70.58243648754507]
We introduce a new method to improve existing multilingual sentence embeddings with Abstract Meaning Representation (AMR)
Compared with the original textual input, AMR is a structured semantic representation that presents the core concepts and relations in a sentence explicitly and unambiguously.
Experiment results show that retrofitting multilingual sentence embeddings with AMR leads to better state-of-the-art performance on both semantic similarity and transfer tasks.
arXiv Detail & Related papers (2022-10-18T11:37:36Z) - Text Summarization with Oracle Expectation [88.39032981994535]
Extractive summarization produces summaries by identifying and concatenating the most important sentences in a document.
Most summarization datasets do not come with gold labels indicating whether document sentences are summary-worthy.
We propose a simple yet effective labeling algorithm that creates soft, expectation-based sentence labels.
arXiv Detail & Related papers (2022-09-26T14:10:08Z) - A General Contextualized Rewriting Framework for Text Summarization [15.311467109946571]
Exiting rewriting systems take each extractive sentence as the only input, which is relatively focused but can lose necessary background knowledge and discourse context.
We formalize contextualized rewriting as a seq2seq with group-tag alignments, identifying extractive sentences through content-based addressing.
Results show that our approach significantly outperforms non-contextualized rewriting systems without requiring reinforcement learning.
arXiv Detail & Related papers (2022-07-13T03:55:57Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.