Combining Word Embeddings and N-grams for Unsupervised Document
Summarization
- URL: http://arxiv.org/abs/2004.14119v1
- Date: Sat, 25 Apr 2020 00:22:46 GMT
- Title: Combining Word Embeddings and N-grams for Unsupervised Document
Summarization
- Authors: Zhuolin Jiang, Manaj Srivastava, Sanjay Krishna, David Akodes, Richard
Schwartz
- Abstract summary: Graph-based extractive document summarization relies on the quality of the sentence similarity graph.
We employ off-the-shelf deep embedding features and tf-idf features, and introduce a new text similarity metric.
Our approach can outperform the tf-idf based approach and achieve state-of-the-art performance on the DUC04 dataset.
- Score: 2.1591018627187286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph-based extractive document summarization relies on the quality of the
sentence similarity graph. Bag-of-words or tf-idf based sentence similarity
uses exact word matching, but fails to measure the semantic similarity between
individual words or to consider the semantic structure of sentences. In order
to improve the similarity measure between sentences, we employ off-the-shelf
deep embedding features and tf-idf features, and introduce a new text
similarity metric. An improved sentence similarity graph is built and used in a
submodular objective function for extractive summarization, which consists of a
weighted coverage term and a diversity term. A Transformer based compression
model is developed for sentence compression to aid in document summarization.
Our summarization approach is extractive and unsupervised. Experiments
demonstrate that our approach can outperform the tf-idf based approach and
achieve state-of-the-art performance on the DUC04 dataset, and comparable
performance to the fully supervised learning methods on the CNN/DM and NYT
datasets.
Related papers
- DiffuSum: Generation Enhanced Extractive Summarization with Diffusion [14.930704950433324]
Extractive summarization aims to form a summary by directly extracting sentences from the source document.
This paper proposes DiffuSum, a novel paradigm for extractive summarization.
Experimental results show that DiffuSum achieves the new state-of-the-art extractive results on CNN/DailyMail with ROUGE scores of $44.83/22.56/40.56$.
arXiv Detail & Related papers (2023-05-02T19:09:16Z) - Improving Sentence Similarity Estimation for Unsupervised Extractive
Summarization [21.602394765472386]
We propose two novel strategies to improve sentence similarity estimation for unsupervised extractive summarization.
We use contrastive learning to optimize a document-level objective that sentences from the same document are more similar than those from different documents.
We also use mutual learning to enhance the relationship between sentence similarity estimation and sentence salience ranking.
arXiv Detail & Related papers (2023-02-24T07:10:33Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Document-Level Relation Extraction with Sentences Importance Estimation
and Focusing [52.069206266557266]
Document-level relation extraction (DocRE) aims to determine the relation between two entities from a document of multiple sentences.
We propose a Sentence Estimation and Focusing (SIEF) framework for DocRE, where we design a sentence importance score and a sentence focusing loss.
Experimental results on two domains show that our SIEF not only improves overall performance, but also makes DocRE models more robust.
arXiv Detail & Related papers (2022-04-27T03:20:07Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Unsupervised Summarization by Jointly Extracting Sentences and Keywords [12.387378783627762]
RepRank is an unsupervised graph-based ranking model for extractive multi-document summarization.
We show that salient sentences and keywords can be extracted in a joint and mutual reinforcement process using our learned representations.
Experiment results with multiple benchmark datasets show that RepRank achieved the best or comparable performance in ROUGE.
arXiv Detail & Related papers (2020-09-16T05:58:00Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.