Related papers: RankSum An unsupervised extractive text summarization based on rank fusion

RankSum An unsupervised extractive text summarization based on rank fusion

URL: http://arxiv.org/abs/2402.05976v1
Date: Wed, 7 Feb 2024 22:24:09 GMT
Title: RankSum An unsupervised extractive text summarization based on rank fusion
Authors: A. Joshi, E. Fidalgo, E. Alegre, and R. Alaiz-Rodriguez
Abstract summary: We propose Ranksum, an approach for extractive text summarization of single documents. The Ranksum obtains the sentence saliency rankings corresponding to each feature in an unsupervised way. We evaluate our approach on publicly available summarization datasets CNN/DailyMail and DUC 2002.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we propose Ranksum, an approach for extractive text summarization of single documents based on the rank fusion of four multi-dimensional sentence features extracted for each sentence: topic information, semantic content, significant keywords, and position. The Ranksum obtains the sentence saliency rankings corresponding to each feature in an unsupervised way followed by the weighted fusion of the four scores to rank the sentences according to their significance. The scores are generated in completely unsupervised way, and a labeled document set is required to learn the fusion weights. Since we found that the fusion weights can generalize to other datasets, we consider the Ranksum as an unsupervised approach. To determine topic rank, we employ probabilistic topic models whereas semantic information is captured using sentence embeddings. To derive rankings using sentence embeddings, we utilize Siamese networks to produce abstractive sentence representation and then we formulate a novel strategy to arrange them in their order of importance. A graph-based strategy is applied to find the significant keywords and related sentence rankings in the document. We also formulate a sentence novelty measure based on bigrams, trigrams, and sentence embeddings to eliminate redundant sentences from the summary. The ranks of all the sentences computed for each feature are finally fused to get the final score for each sentence in the document. We evaluate our approach on publicly available summarization datasets CNN/DailyMail and DUC 2002. Experimental results show that our approach outperforms other existing state-of-the-art summarization methods.

Related papers

GUM-SAGE: A Novel Dataset and Approach for Graded Entity Salience Prediction [12.172254885579706]
Graded entity salience assigns entities scores that reflect their relative importance in a text. We introduce a novel approach for graded entity salience that combines the strengths of both approaches. Our approach shows stronger correlation with scores based on human summaries and alignments, and outperforms existing techniques.
arXiv Detail & Related papers (2025-04-15T01:26:14Z)
OrderSum: Semantic Sentence Ordering for Extractive Summarization [0.8287206589886881]
OrderSum semantically orders sentences within an extractive summary. OrderSum achieves a ROUGE-L score of 30.52 on CNN/DailyMail, outperforming the previous state-of-the-art model by a large margin of 2.54.
arXiv Detail & Related papers (2025-02-22T10:51:04Z)
Context-Aware Hierarchical Merging for Long Document Summarization [56.96619074316232]
We propose different approaches to enrich hierarchical merging with context from the source document. Experimental results on datasets representing legal and narrative domains show that contextual augmentation consistently outperforms zero-shot and hierarchical merging baselines.
arXiv Detail & Related papers (2025-02-03T01:14:31Z)
Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs) Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy. At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z)
Incremental Extractive Opinion Summarization Using Cover Trees [81.59625423421355]
In online marketplaces user reviews accumulate over time, and opinion summaries need to be updated periodically. In this work, we study the task of extractive opinion summarization in an incremental setting. We present an efficient algorithm for accurately computing the CentroidRank summaries in an incremental setting.
arXiv Detail & Related papers (2024-01-16T02:00:17Z)
On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries. Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens. We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z)
Text Summarization with Oracle Expectation [88.39032981994535]
Extractive summarization produces summaries by identifying and concatenating the most important sentences in a document. Most summarization datasets do not come with gold labels indicating whether document sentences are summary-worthy. We propose a simple yet effective labeling algorithm that creates soft, expectation-based sentence labels.
arXiv Detail & Related papers (2022-09-26T14:10:08Z)
Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers [107.12125265675483]
Unsupervised extractive document summarization aims to select important sentences from a document without using labeled summaries during training. Existing methods are mostly graph-based with sentences as nodes and edge weights measured by sentence similarities. We find that transformer attentions can be used to rank sentences for unsupervised extractive summarization.
arXiv Detail & Related papers (2020-10-16T08:44:09Z)
Unsupervised Summarization by Jointly Extracting Sentences and Keywords [12.387378783627762]
RepRank is an unsupervised graph-based ranking model for extractive multi-document summarization. We show that salient sentences and keywords can be extracted in a joint and mutual reinforcement process using our learned representations. Experiment results with multiple benchmark datasets show that RepRank achieved the best or comparable performance in ROUGE.
arXiv Detail & Related papers (2020-09-16T05:58:00Z)
Understanding Points of Correspondence between Sentences for Abstractive Summarization [39.7404761923196]
We present an investigation into fusing sentences drawn from a document by introducing the notion of points of correspondence. We create a dataset containing the documents, source and fusion sentences, and human annotations of points of correspondence between sentences.
arXiv Detail & Related papers (2020-06-10T02:42:38Z)
Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction [31.648764677078837]
Automatic sentence summarization produces a shorter version of a sentence, while preserving its most important information. We model these two aspects in an unsupervised objective function, consisting of language modeling and semantic similarity metrics. Our proposed method achieves a new state-of-the art for unsupervised sentence summarization according to ROUGE scores.
arXiv Detail & Related papers (2020-05-04T19:01:55Z)
An Unsupervised Semantic Sentence Ranking Scheme for Text Documents [9.272728720669846]
Semantic SentenceRank (SSR) is an unsupervised scheme for ranking sentences in a single document according to their relative importance. It extracts essential words and phrases from a text document, and uses semantic measures to construct, respectively, a semantic phrase graph over phrases and words, and a semantic sentence graph over sentences.
arXiv Detail & Related papers (2020-04-28T20:17:51Z)
Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems. We formulate the extractive summarization task as a semantic text matching problem. We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.