A New Sentence Extraction Strategy for Unsupervised Extractive
Summarization Methods
- URL: http://arxiv.org/abs/2112.03203v5
- Date: Wed, 24 Jan 2024 13:47:03 GMT
- Title: A New Sentence Extraction Strategy for Unsupervised Extractive
Summarization Methods
- Authors: Dehao Tao, Yingzhu Xiong, Zhongliang Yang, and Yongfeng Huang
- Abstract summary: We model the task of extractive text summarization methods from the perspective of Information Theory.
To improve the feature distribution and to decrease the mutual information of summarization sentences, we propose a new sentence extraction strategy.
- Score: 26.326800624948344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, text summarization methods have attracted much attention
again thanks to the researches on neural network models. Most of the current
text summarization methods based on neural network models are supervised
methods which need large-scale datasets. However, large-scale datasets are
difficult to obtain in practical applications. In this paper, we model the task
of extractive text summarization methods from the perspective of Information
Theory, and then describe the unsupervised extractive methods with a uniform
framework. To improve the feature distribution and to decrease the mutual
information of summarization sentences, we propose a new sentence extraction
strategy which can be applied to existing unsupervised extractive methods.
Experiments are carried out on different datasets, and results show that our
strategy is indeed effective and in line with expectations.
Related papers
- Model-Free Active Exploration in Reinforcement Learning [53.786439742572995]
We study the problem of exploration in Reinforcement Learning and present a novel model-free solution.
Our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches.
arXiv Detail & Related papers (2024-06-30T19:00:49Z) - Building blocks for complex tasks: Robust generative event extraction
for radiology reports under domain shifts [11.845850292404768]
We show that multi-pass T5-based text-to-text generative models exhibit better generalization across exam modalities compared to approaches that employ BERT-based task-specific classification layers.
We then develop methods that reduce the inference cost of the model, making large-scale corpus processing more feasible for clinical applications.
arXiv Detail & Related papers (2023-06-15T23:16:58Z) - Recent Trends in Unsupervised Summarization [0.6752538702870792]
Unsupervised summarization is a powerful technique that enables training summarizing models without requiring labeled datasets.
This survey covers different recent techniques and models used for unsupervised summarization.
arXiv Detail & Related papers (2023-05-18T18:00:44Z) - Boosting Event Extraction with Denoised Structure-to-Text Augmentation [52.21703002404442]
Event extraction aims to recognize pre-defined event triggers and arguments from texts.
Recent data augmentation methods often neglect the problem of grammatical incorrectness.
We propose a denoised structure-to-text augmentation framework for event extraction DAEE.
arXiv Detail & Related papers (2023-05-16T16:52:07Z) - Towards Abstractive Timeline Summarisation using Preference-based
Reinforcement Learning [3.6640004265358477]
This paper introduces a novel pipeline for summarising timelines of events reported by multiple news sources.
Transformer-based models for abstractive summarisation generate coherent and concise summaries of long documents.
While extractive summaries are more faithful to their sources, they may be less readable and contain redundant or unnecessary information.
arXiv Detail & Related papers (2022-11-14T18:24:13Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - The Summary Loop: Learning to Write Abstractive Summaries Without
Examples [21.85348918324668]
This work presents a new approach to unsupervised abstractive summarization based on maximizing a combination of coverage and fluency for a given length constraint.
Key terms are masked out of the original document and must be filled in by a coverage model using the current generated summary.
When tested on popular news summarization datasets, the method outperforms previous unsupervised methods by more than 2 R-1 points.
arXiv Detail & Related papers (2021-05-11T23:19:46Z) - CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural
Summarization Systems [121.78477833009671]
We investigate the performance of different summarization models under a cross-dataset setting.
A comprehensive study of 11 representative summarization systems on 5 datasets from different domains reveals the effect of model architectures and generation ways.
arXiv Detail & Related papers (2020-10-11T02:19:15Z) - Multi-Fact Correction in Abstractive Text Summarization [98.27031108197944]
Span-Fact is a suite of two factual correction models that leverages knowledge learned from question answering models to make corrections in system-generated summaries via span selection.
Our models employ single or multi-masking strategies to either iteratively or auto-regressively replace entities in order to ensure semantic consistency w.r.t. the source text.
Experiments show that our models significantly boost the factual consistency of system-generated summaries without sacrificing summary quality in terms of both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-10-06T02:51:02Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - Improving unsupervised neural aspect extraction for online discussions
using out-of-domain classification [11.746330029375745]
We introduce a simple approach based on sentence filtering to improve topical aspects learned from newsgroups-based content.
The positive effect of sentence filtering on topic coherence is demonstrated in comparison to aspect extraction models trained on unfiltered texts.
arXiv Detail & Related papers (2020-06-17T10:34:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.