An Unsupervised Semantic Sentence Ranking Scheme for Text Documents
- URL: http://arxiv.org/abs/2005.02158v1
- Date: Tue, 28 Apr 2020 20:17:51 GMT
- Title: An Unsupervised Semantic Sentence Ranking Scheme for Text Documents
- Authors: Hao Zhang, Jie Wang
- Abstract summary: Semantic SentenceRank (SSR) is an unsupervised scheme for ranking sentences in a single document according to their relative importance.
It extracts essential words and phrases from a text document, and uses semantic measures to construct, respectively, a semantic phrase graph over phrases and words, and a semantic sentence graph over sentences.
- Score: 9.272728720669846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents Semantic SentenceRank (SSR), an unsupervised scheme for
automatically ranking sentences in a single document according to their
relative importance. In particular, SSR extracts essential words and phrases
from a text document, and uses semantic measures to construct, respectively, a
semantic phrase graph over phrases and words, and a semantic sentence graph
over sentences. It applies two variants of article-structure-biased PageRank to
score phrases and words on the first graph and sentences on the second graph.
It then combines these scores to generate the final score for each sentence.
Finally, SSR solves a multi-objective optimization problem for ranking
sentences based on their final scores and topic diversity through semantic
subtopic clustering. An implementation of SSR that runs in quadratic time is
presented, and it outperforms, on the SummBank benchmarks, each individual
judge's ranking and compares favorably with the combined ranking of all judges.
Related papers
- Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs)
Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy.
At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z) - RankSum An unsupervised extractive text summarization based on rank
fusion [0.0]
We propose Ranksum, an approach for extractive text summarization of single documents.
The Ranksum obtains the sentence saliency rankings corresponding to each feature in an unsupervised way.
We evaluate our approach on publicly available summarization datasets CNN/DailyMail and DUC 2002.
arXiv Detail & Related papers (2024-02-07T22:24:09Z) - Bipartite Graph Pre-training for Unsupervised Extractive Summarization
with Graph Convolutional Auto-Encoders [24.13261636386226]
We argue that utilizing pre-trained embeddings derived from a process specifically designed to optimize cohensive and distinctive sentence representations helps rank significant sentences.
We propose a novel graph pre-training auto-encoder to obtain sentence embeddings by explicitly modelling intra-sentential distinctive features and inter-sentential cohesive features.
arXiv Detail & Related papers (2023-10-29T12:27:18Z) - RankCSE: Unsupervised Sentence Representations Learning via Learning to
Rank [54.854714257687334]
We propose a novel approach, RankCSE, for unsupervised sentence representation learning.
It incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework.
An extensive set of experiments are conducted on both semantic textual similarity (STS) and transfer (TR) tasks.
arXiv Detail & Related papers (2023-05-26T08:27:07Z) - Text Summarization with Oracle Expectation [88.39032981994535]
Extractive summarization produces summaries by identifying and concatenating the most important sentences in a document.
Most summarization datasets do not come with gold labels indicating whether document sentences are summary-worthy.
We propose a simple yet effective labeling algorithm that creates soft, expectation-based sentence labels.
arXiv Detail & Related papers (2022-09-26T14:10:08Z) - Contextual Networks and Unsupervised Ranking of Sentences [9.198786220570096]
We devise an unsupervised algorithm called CNATAR (Contextual Network Text Analysis Rank) to score sentences.
We show that CNATAR outperforms the combined ranking of the three human judges provided on SummBank dataset.
We also compare the performance of CNATAR and the latest supervised neural-network summarization models and compute oracle results.
arXiv Detail & Related papers (2022-03-09T00:47:20Z) - Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and
Context-Aware Auto-Encoders [59.038157066874255]
We propose a novel framework called RankAE to perform chat summarization without employing manually labeled data.
RankAE consists of a topic-oriented ranking strategy that selects topic utterances according to centrality and diversity simultaneously.
A denoising auto-encoder is designed to generate succinct but context-informative summaries based on the selected utterances.
arXiv Detail & Related papers (2020-12-14T07:31:17Z) - Discrete Optimization for Unsupervised Sentence Summarization with
Word-Level Extraction [31.648764677078837]
Automatic sentence summarization produces a shorter version of a sentence, while preserving its most important information.
We model these two aspects in an unsupervised objective function, consisting of language modeling and semantic similarity metrics.
Our proposed method achieves a new state-of-the art for unsupervised sentence summarization according to ROUGE scores.
arXiv Detail & Related papers (2020-05-04T19:01:55Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z) - Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence
Lip-Reading [96.48553941812366]
Lip-reading aims to infer the speech content from the lip movement sequence.
Traditional learning process of seq2seq models suffers from two problems.
We propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems.
arXiv Detail & Related papers (2020-03-09T09:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.