Learning Cluster Patterns for Abstractive Summarization
- URL: http://arxiv.org/abs/2202.10967v2
- Date: Wed, 23 Feb 2022 02:38:10 GMT
- Title: Learning Cluster Patterns for Abstractive Summarization
- Authors: Sung-Guk Jo, Jeong-Jae Kim and Byung-Won On
- Abstract summary: We consider two clusters of salient and non-salient context vectors, using which the decoder can attend more to salient context vectors for summary generation.
Our experimental result shows that the proposed model outperforms the existing BART model by learning these distinct cluster patterns.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, pre-trained sequence-to-sequence models such as BERTSUM and BART
have shown state-of-the-art results in abstractive summarization. In these
models, during fine-tuning, the encoder transforms sentences to context vectors
in the latent space and the decoder learns the summary generation task based on
the context vectors. In our approach, we consider two clusters of salient and
non-salient context vectors, using which the decoder can attend more to salient
context vectors for summary generation. For this, we propose a novel clustering
transformer layer between the encoder and the decoder, which first generates
two clusters of salient and non-salient vectors, and then normalizes and
shrinks the clusters to make them apart in the latent space. Our experimental
result shows that the proposed model outperforms the existing BART model by
learning these distinct cluster patterns, improving up to 4% in ROUGE and 0.3%
in BERTScore on average in CNN/DailyMail and XSUM data sets.
Related papers
- Vector Quantized Wasserstein Auto-Encoder [57.29764749855623]
We study learning deep discrete representations from the generative viewpoint.
We endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution.
We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution.
arXiv Detail & Related papers (2023-02-12T13:51:36Z) - Deep Temporal Contrastive Clustering [21.660509622172274]
This paper presents a deep temporal contrastive clustering approach.
It incorporates the contrastive learning paradigm into the deep time series clustering research.
Experiments on a variety of time series datasets demonstrate the superiority of our approach over the state-of-the-art.
arXiv Detail & Related papers (2022-12-29T16:43:34Z) - ClusTR: Exploring Efficient Self-attention via Clustering for Vision
Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention.
Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count.
The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z) - Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification.
The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z) - Enhancing Latent Space Clustering in Multi-filter Seq2Seq Model: A
Reinforcement Learning Approach [0.0]
We design a latent-enhanced multi-filter seq2seq model (LMS2S) that analyzes the latent space representations using a clustering algorithm.
Our experiments on semantic parsing and machine translation demonstrate the positive correlation between the clustering quality and the model's performance.
arXiv Detail & Related papers (2021-09-25T16:36:31Z) - Latent Space Energy-Based Model of Symbol-Vector Coupling for Text
Generation and Classification [42.5461221309397]
We propose a latent space energy-based prior model for text generation and classification.
The model stands on a generator network that generates the text sequence based on a continuous latent vector.
Our experiments demonstrate that the proposed model learns well-structured and meaningful latent space.
arXiv Detail & Related papers (2021-08-26T02:31:18Z) - Clustering by Maximizing Mutual Information Across Views [62.21716612888669]
We propose a novel framework for image clustering that incorporates joint representation learning and clustering.
Our method significantly outperforms state-of-the-art single-stage clustering methods across a variety of image datasets.
arXiv Detail & Related papers (2021-07-24T15:36:49Z) - Neighborhood Contrastive Learning for Novel Class Discovery [79.14767688903028]
We build a new framework, named Neighborhood Contrastive Learning, to learn discriminative representations that are important to clustering performance.
We experimentally demonstrate that these two ingredients significantly contribute to clustering performance and lead our model to outperform state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-06-20T17:34:55Z) - Variational Auto Encoder Gradient Clustering [0.0]
Clustering using deep neural network models have been extensively studied in recent years.
This article investigates how probability function gradient ascent can be used to process data in order to achieve better clustering.
We propose a simple yet effective method for investigating suitable number of clusters for data, based on the DBSCAN clustering algorithm.
arXiv Detail & Related papers (2021-05-11T08:00:36Z) - Graph Contrastive Clustering [131.67881457114316]
We propose a novel graph contrastive learning framework, which is then applied to the clustering task and we come up with the Graph Constrastive Clustering(GCC) method.
Specifically, on the one hand, the graph Laplacian based contrastive loss is proposed to learn more discriminative and clustering-friendly features.
On the other hand, a novel graph-based contrastive learning strategy is proposed to learn more compact clustering assignments.
arXiv Detail & Related papers (2021-04-03T15:32:49Z) - Event-Driven News Stream Clustering using Entity-Aware Contextual
Embeddings [14.225334321146779]
We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm.
Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity along these multiple representations.
We show that the use of a suitable fine-tuning objective and external knowledge in pre-trained transformer models yields significant improvements in the effectiveness of contextual embeddings.
arXiv Detail & Related papers (2021-01-26T19:58:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.