Diversity-Aware Coherence Loss for Improving Neural Topic Models
- URL: http://arxiv.org/abs/2305.16199v2
- Date: Fri, 26 May 2023 09:59:49 GMT
- Title: Diversity-Aware Coherence Loss for Improving Neural Topic Models
- Authors: Raymond Li, Felipe Gonz\'alez-Pizarro, Linzi Xing, Gabriel Murray and
Giuseppe Carenini
- Abstract summary: We propose a novel diversity-aware coherence loss that encourages the model to learn corpus-level coherence scores.
Experimental results on multiple datasets show that our method significantly improves the performance of neural topic models.
- Score: 20.98172300869239
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The standard approach for neural topic modeling uses a variational
autoencoder (VAE) framework that jointly minimizes the KL divergence between
the estimated posterior and prior, in addition to the reconstruction loss.
Since neural topic models are trained by recreating individual input documents,
they do not explicitly capture the coherence between topic words on the corpus
level. In this work, we propose a novel diversity-aware coherence loss that
encourages the model to learn corpus-level coherence scores while maintaining a
high diversity between topics. Experimental results on multiple datasets show
that our method significantly improves the performance of neural topic models
without requiring any pretraining or additional parameters.
Related papers
- Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods.
Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions.
We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - Topic Modeling as Multi-Objective Contrastive Optimization [46.24876966674759]
Recent representation learning approaches enhance neural topic models by optimizing the weighted linear combination of the evidence lower bound (ELBO) of the log-likelihood and the contrastive learning objective that contrasts pairs of input documents.
We introduce a novel contrastive learning method oriented towards sets of topic vectors to capture useful semantics that are shared among a set of input documents.
Our framework consistently produces higher-performing neural topic models in terms of topic coherence, topic diversity, and downstream performance.
arXiv Detail & Related papers (2024-02-12T11:18:32Z) - Dynamical Hyperspectral Unmixing with Variational Recurrent Neural
Networks [25.051918587650636]
Multitemporal hyperspectral unmixing (MTHU) is a fundamental tool in the analysis of hyperspectral image sequences.
We propose an unsupervised MTHU algorithm based on variational recurrent neural networks.
arXiv Detail & Related papers (2023-03-19T04:51:34Z) - Towards Better Understanding with Uniformity and Explicit Regularization
of Embeddings in Embedding-based Neural Topic Models [16.60033525943772]
Embedding-based neural topic models could explicitly represent words and topics by embedding them to a homogeneous feature space.
There are no explicit constraints for the training of embeddings, leading to a larger optimization space.
We propose an embedding regularized neural topic model, which applies the specially designed training constraints on word embedding and topic embedding.
arXiv Detail & Related papers (2022-06-16T07:02:55Z) - Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them.
We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z) - Firearm Detection via Convolutional Neural Networks: Comparing a
Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents.
One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis.
We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z) - Sparsely constrained neural networks for model discovery of PDEs [0.0]
We present a modular framework that determines the sparsity pattern of a deep-learning based surrogate using any sparse regression technique.
We show how a different network architecture and sparsity estimator improve model discovery accuracy and convergence on several benchmark examples.
arXiv Detail & Related papers (2020-11-09T11:02:40Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - A Discrete Variational Recurrent Topic Model without the
Reparametrization Trick [16.54912614895861]
We show how to learn a neural topic model with discrete random variables.
We show improved perplexity and document understanding across multiple corpora.
arXiv Detail & Related papers (2020-10-22T20:53:44Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.