An Efficient Coarse-to-Fine Facet-Aware Unsupervised Summarization
Framework based on Semantic Blocks
- URL: http://arxiv.org/abs/2208.08253v1
- Date: Wed, 17 Aug 2022 12:18:36 GMT
- Title: An Efficient Coarse-to-Fine Facet-Aware Unsupervised Summarization
Framework based on Semantic Blocks
- Authors: Xinnian Liang, Jing Li, Shuangzhi Wu, Jiali Zeng, Yufan Jiang, Mu Li,
Zhoujun Li
- Abstract summary: We propose an efficient Coarse-to-Fine Facet-Aware Ranking (C2F-FAR) framework for unsupervised long document summarization.
In the coarse-level stage, we propose a new segment algorithm to split the document into facet-aware semantic blocks and then filter insignificant blocks.
In the fine-level stage, we select salient sentences in each block and then extract the final summary from selected sentences.
- Score: 27.895044398724664
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised summarization methods have achieved remarkable results by
incorporating representations from pre-trained language models. However,
existing methods fail to consider efficiency and effectiveness at the same time
when the input document is extremely long. To tackle this problem, in this
paper, we proposed an efficient Coarse-to-Fine Facet-Aware Ranking (C2F-FAR)
framework for unsupervised long document summarization, which is based on the
semantic block. The semantic block refers to continuous sentences in the
document that describe the same facet. Specifically, we address this problem by
converting the one-step ranking method into the hierarchical multi-granularity
two-stage ranking. In the coarse-level stage, we propose a new segment
algorithm to split the document into facet-aware semantic blocks and then
filter insignificant blocks. In the fine-level stage, we select salient
sentences in each block and then extract the final summary from selected
sentences. We evaluate our framework on four long document summarization
datasets: Gov-Report, BillSum, arXiv, and PubMed. Our C2F-FAR can achieve new
state-of-the-art unsupervised summarization results on Gov-Report and BillSum.
In addition, our method speeds up 4-28 times more than previous
methods.\footnote{\url{https://github.com/xnliang98/c2f-far}}
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - RankSum An unsupervised extractive text summarization based on rank
fusion [0.0]
We propose Ranksum, an approach for extractive text summarization of single documents.
The Ranksum obtains the sentence saliency rankings corresponding to each feature in an unsupervised way.
We evaluate our approach on publicly available summarization datasets CNN/DailyMail and DUC 2002.
arXiv Detail & Related papers (2024-02-07T22:24:09Z) - Text Summarization with Oracle Expectation [88.39032981994535]
Extractive summarization produces summaries by identifying and concatenating the most important sentences in a document.
Most summarization datasets do not come with gold labels indicating whether document sentences are summary-worthy.
We propose a simple yet effective labeling algorithm that creates soft, expectation-based sentence labels.
arXiv Detail & Related papers (2022-09-26T14:10:08Z) - Sparse Optimization for Unsupervised Extractive Summarization of Long
Documents with the Frank-Wolfe Algorithm [4.786337974720721]
We address the problem of unsupervised extractive document summarization, especially for long documents.
We model the unsupervised problem as a sparse auto-regression one and approximate the resulting problem via a convex, norm-constrained problem.
To generate a summary with $k$ sentences, the algorithm only needs to execute $approx k$, making it very efficient.
arXiv Detail & Related papers (2022-08-19T17:17:43Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Reinforcing Semantic-Symmetry for Document Summarization [15.113768658584979]
Document summarization condenses a long document into a short version with salient information and accurate semantic descriptions.
This paper introduces a new textbfreinforcing stextbfemantic-textbfsymmetry learning textbfmodel is proposed for document summarization.
A series of experiments have been conducted on two wildly used benchmark datasets CNN/Daily Mail and BigPatent.
arXiv Detail & Related papers (2021-12-14T17:41:37Z) - Unsupervised Extractive Summarization by Pre-training Hierarchical
Transformers [107.12125265675483]
Unsupervised extractive document summarization aims to select important sentences from a document without using labeled summaries during training.
Existing methods are mostly graph-based with sentences as nodes and edge weights measured by sentence similarities.
We find that transformer attentions can be used to rank sentences for unsupervised extractive summarization.
arXiv Detail & Related papers (2020-10-16T08:44:09Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - Discrete Optimization for Unsupervised Sentence Summarization with
Word-Level Extraction [31.648764677078837]
Automatic sentence summarization produces a shorter version of a sentence, while preserving its most important information.
We model these two aspects in an unsupervised objective function, consisting of language modeling and semantic similarity metrics.
Our proposed method achieves a new state-of-the art for unsupervised sentence summarization according to ROUGE scores.
arXiv Detail & Related papers (2020-05-04T19:01:55Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.