At Which Level Should We Extract? An Empirical Analysis on Extractive
Document Summarization
- URL: http://arxiv.org/abs/2004.02664v2
- Date: Mon, 26 Oct 2020 08:35:19 GMT
- Title: At Which Level Should We Extract? An Empirical Analysis on Extractive
Document Summarization
- Authors: Qingyu Zhou, Furu Wei, Ming Zhou
- Abstract summary: We show that unnecessity and redundancy issues exist when extracting full sentences.
We propose extracting sub-sentential units based on the constituency parsing tree.
- Score: 110.54963847339775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extractive methods have been proven effective in automatic document
summarization. Previous works perform this task by identifying informative
contents at sentence level. However, it is unclear whether performing
extraction at sentence level is the best solution. In this work, we show that
unnecessity and redundancy issues exist when extracting full sentences, and
extracting sub-sentential units is a promising alternative. Specifically, we
propose extracting sub-sentential units based on the constituency parsing tree.
A neural extractive model which leverages the sub-sentential information and
extracts them is presented. Extensive experiments and analyses show that
extracting sub-sentential units performs competitively comparing to full
sentence extraction under the evaluation of both automatic and human
evaluations. Hopefully, our work could provide some inspiration of the basic
extraction units in extractive summarization for future research.
Related papers
- Source Identification in Abstractive Summarization [0.8883733362171033]
We define input sentences that contain essential information in the generated summary as $textitsource sentences$ and study how abstractive summaries are made by analyzing the source sentences.
We formulate automatic source sentence detection and compare multiple methods to establish a strong baseline for the task.
Experimental results show that the perplexity-based method performs well in highly abstractive settings, while similarity-based methods robustly in relatively extractive settings.
arXiv Detail & Related papers (2024-02-07T09:09:09Z) - Incremental Extractive Opinion Summarization Using Cover Trees [81.59625423421355]
In online marketplaces user reviews accumulate over time, and opinion summaries need to be updated periodically.
In this work, we study the task of extractive opinion summarization in an incremental setting.
We present an efficient algorithm for accurately computing the CentroidRank summaries in an incremental setting.
arXiv Detail & Related papers (2024-01-16T02:00:17Z) - DiffuSum: Generation Enhanced Extractive Summarization with Diffusion [14.930704950433324]
Extractive summarization aims to form a summary by directly extracting sentences from the source document.
This paper proposes DiffuSum, a novel paradigm for extractive summarization.
Experimental results show that DiffuSum achieves the new state-of-the-art extractive results on CNN/DailyMail with ROUGE scores of $44.83/22.56/40.56$.
arXiv Detail & Related papers (2023-05-02T19:09:16Z) - Novel Chapter Abstractive Summarization using Spinal Tree Aware
Sub-Sentential Content Selection [29.30939223344407]
We present a pipelined extractive-abstractive approach to summarizing novel chapters.
We show an improvement of 3.71 Rouge-1 points over best results reported in prior work on an existing novel chapter dataset.
arXiv Detail & Related papers (2022-11-09T14:12:09Z) - Salience Allocation as Guidance for Abstractive Summarization [61.31826412150143]
We propose a novel summarization approach with a flexible and reliable salience guidance, namely SEASON (SaliencE Allocation as Guidance for Abstractive SummarizatiON)
SEASON utilizes the allocation of salience expectation to guide abstractive summarization and adapts well to articles in different abstractiveness.
arXiv Detail & Related papers (2022-10-22T02:13:44Z) - Improving Multi-Document Summarization through Referenced Flexible
Extraction with Credit-Awareness [21.037841262371355]
A notable challenge in Multi-Document Summarization (MDS) is the extremely-long length of the input.
We present an extract-then-abstract Transformer framework to overcome the problem.
We propose a loss weighting mechanism that makes the model aware of the unequal importance for the sentences not in the pseudo extraction oracle.
arXiv Detail & Related papers (2022-05-04T04:40:39Z) - Reinforcing Semantic-Symmetry for Document Summarization [15.113768658584979]
Document summarization condenses a long document into a short version with salient information and accurate semantic descriptions.
This paper introduces a new textbfreinforcing stextbfemantic-textbfsymmetry learning textbfmodel is proposed for document summarization.
A series of experiments have been conducted on two wildly used benchmark datasets CNN/Daily Mail and BigPatent.
arXiv Detail & Related papers (2021-12-14T17:41:37Z) - Unsupervised Extractive Summarization by Pre-training Hierarchical
Transformers [107.12125265675483]
Unsupervised extractive document summarization aims to select important sentences from a document without using labeled summaries during training.
Existing methods are mostly graph-based with sentences as nodes and edge weights measured by sentence similarities.
We find that transformer attentions can be used to rank sentences for unsupervised extractive summarization.
arXiv Detail & Related papers (2020-10-16T08:44:09Z) - Few-Shot Learning for Opinion Summarization [117.70510762845338]
Opinion summarization is the automatic creation of text reflecting subjective information expressed in multiple documents.
In this work, we show that even a handful of summaries is sufficient to bootstrap generation of the summary text.
Our approach substantially outperforms previous extractive and abstractive methods in automatic and human evaluation.
arXiv Detail & Related papers (2020-04-30T15:37:38Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.