Topic-Aware Encoding for Extractive Summarization
- URL: http://arxiv.org/abs/2112.09572v1
- Date: Fri, 17 Dec 2021 15:26:37 GMT
- Title: Topic-Aware Encoding for Extractive Summarization
- Authors: Mingyang Song, Liping Jing
- Abstract summary: We propose a topic-aware encoding for document summarization to deal with this issue.
A neural topic model is added in the neural-based sentence-level representation learning to adequately consider the central topic information.
The experimental results on three public datasets show that our model outperforms the state-of-the-art models.
- Score: 15.113768658584979
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Document summarization provides an instrument for faster understanding the
collection of text documents and has several real-life applications. With the
growth of online text data, numerous summarization models have been proposed
recently. The Sequence-to-Sequence (Seq2Seq) based neural summarization model
is the most widely used in the summarization field due to its high performance.
This is because semantic information and structure information in the text is
adequately considered when encoding. However, the existing extractive
summarization models pay little attention to and use the central topic
information to assist the generation of summaries, which leads to models not
ensuring the generated summary under the primary topic. A lengthy document can
span several topics, and a single summary cannot do justice to all the topics.
Therefore, the key to generating a high-quality summary is determining the
central topic and building a summary based on it, especially for a long
document. We propose a topic-aware encoding for document summarization to deal
with this issue. This model effectively combines syntactic-level and
topic-level information to build a comprehensive sentence representation.
Specifically, a neural topic model is added in the neural-based sentence-level
representation learning to adequately consider the central topic information
for capturing the critical content in the original document. The experimental
results on three public datasets show that our model outperforms the
state-of-the-art models.
Related papers
- Improving Sequence-to-Sequence Models for Abstractive Text Summarization Using Meta Heuristic Approaches [0.0]
Humans have a unique ability to create abstractions.
The use of sequence-to-sequence (seq2seq) models for neural abstractive text summarization has been ascending as far as prevalence.
In this article, we aim toward enhancing the present architectures and models for abstractive text summarization.
arXiv Detail & Related papers (2024-03-24T17:39:36Z) - Let the Pretrained Language Models "Imagine" for Short Texts Topic
Modeling [29.87929724277381]
In short texts, co-occurrence information is minimal, which results in feature sparsity in document representation.
Existing topic models (probabilistic or neural) mostly fail to mine patterns from them to generate coherent topics.
We extend short text into longer sequences using existing pre-trained language models (PLMs)
arXiv Detail & Related papers (2023-10-24T00:23:30Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Unsupervised Summarization with Customized Granularities [76.26899748972423]
We propose the first unsupervised multi-granularity summarization framework, GranuSum.
By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner.
arXiv Detail & Related papers (2022-01-29T05:56:35Z) - TopNet: Learning from Neural Topic Model to Generate Long Stories [43.5564336855688]
Long story generation (LSG) is one of the coveted goals in natural language processing.
We propose emphTopNet to obtain high-quality skeleton words to complement the short input.
Our proposed framework is highly effective in skeleton word selection and significantly outperforms state-of-the-art models in both automatic evaluation and human evaluation.
arXiv Detail & Related papers (2021-12-14T09:47:53Z) - Topic Modeling Based Extractive Text Summarization [0.0]
We propose a novel method to summarize a text document by clustering its contents based on latent topics.
We utilize the lesser used and challenging WikiHow dataset in our approach to text summarization.
arXiv Detail & Related papers (2021-06-29T12:28:19Z) - Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and
Context-Aware Auto-Encoders [59.038157066874255]
We propose a novel framework called RankAE to perform chat summarization without employing manually labeled data.
RankAE consists of a topic-oriented ranking strategy that selects topic utterances according to centrality and diversity simultaneously.
A denoising auto-encoder is designed to generate succinct but context-informative summaries based on the selected utterances.
arXiv Detail & Related papers (2020-12-14T07:31:17Z) - Enhancing Extractive Text Summarization with Topic-Aware Graph Neural
Networks [21.379555672973975]
This paper proposes a graph neural network (GNN)-based extractive summarization model.
Our model integrates a joint neural topic model (NTM) to discover latent topics, which can provide document-level features for sentence selection.
The experimental results demonstrate that our model achieves substantially state-of-the-art results on CNN/DM and NYT datasets.
arXiv Detail & Related papers (2020-10-13T09:30:04Z) - Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization.
We also propose a prototype encoding structure to model the ability of intra-topic derivation.
Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z) - Screenplay Summarization Using Latent Narrative Structure [78.45316339164133]
We propose to explicitly incorporate the underlying structure of narratives into general unsupervised and supervised extractive summarization models.
We formalize narrative structure in terms of key narrative events (turning points) and treat it as latent in order to summarize screenplays.
Experimental results on the CSI corpus of TV screenplays, which we augment with scene-level summarization labels, show that latent turning points correlate with important aspects of a CSI episode.
arXiv Detail & Related papers (2020-04-27T11:54:19Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.