Probabilistic Topic Modelling with Transformer Representations
- URL: http://arxiv.org/abs/2403.03737v1
- Date: Wed, 6 Mar 2024 14:27:29 GMT
- Title: Probabilistic Topic Modelling with Transformer Representations
- Authors: Arik Reuter, Anton Thielmann, Christoph Weisser, Benjamin S\"afken,
Thomas Kneib
- Abstract summary: We propose the Transformer-Representation Neural Topic Model (TNTM)
This approach unifies the powerful and versatile notion of topics based on transformer embeddings with fully probabilistic modelling.
Experimental results show that our proposed model achieves results on par with various state-of-the-art approaches in terms of embedding coherence.
- Score: 0.9999629695552195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Topic modelling was mostly dominated by Bayesian graphical models during the
last decade. With the rise of transformers in Natural Language Processing,
however, several successful models that rely on straightforward clustering
approaches in transformer-based embedding spaces have emerged and consolidated
the notion of topics as clusters of embedding vectors. We propose the
Transformer-Representation Neural Topic Model (TNTM), which combines the
benefits of topic representations in transformer-based embedding spaces and
probabilistic modelling. Therefore, this approach unifies the powerful and
versatile notion of topics based on transformer embeddings with fully
probabilistic modelling, as in models such as Latent Dirichlet Allocation
(LDA). We utilize the variational autoencoder (VAE) framework for improved
inference speed and modelling flexibility. Experimental results show that our
proposed model achieves results on par with various state-of-the-art approaches
in terms of embedding coherence while maintaining almost perfect topic
diversity. The corresponding source code is available at
https://github.com/ArikReuter/TNTM.
Related papers
- Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity [11.302828987873497]
We present a Cross-Architecture Layerwise Distillation (CALD) approach that jointly converts a transformer model to a linear time substitute and fine-tunes it to a target task.
We show that CALD can effectively recover the result of the original model, and that the guiding strategy contributes to the result.
arXiv Detail & Related papers (2024-10-09T13:06:43Z) - FASTopic: Pretrained Transformer is a Fast, Adaptive, Stable, and Transferable Topic Model [76.509837704596]
We propose FASTopic, a fast, adaptive, stable, and transferable topic model.
We use Dual Semantic-relation Reconstruction (DSR) to model latent topics.
We also propose Embedding Transport Plan (ETP) to regularize semantic relations as optimal transport plans.
arXiv Detail & Related papers (2024-05-28T09:06:38Z) - Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z) - Converting Transformers to Polynomial Form for Secure Inference Over
Homomorphic Encryption [45.00129952368691]
Homomorphic Encryption (HE) has emerged as one of the most promising approaches in deep learning.
We introduce the first transformer, providing the first demonstration of secure inference over HE with transformers.
Our models yield results comparable to traditional methods, bridging the performance gap with transformers of similar scale and underscoring the viability of HE for state-of-the-art applications.
arXiv Detail & Related papers (2023-11-15T00:23:58Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - GAMMT: Generative Ambiguity Modeling Using Multiple Transformers [0.0]
We introduce a novel model called GAMMT (Generative Ambiguity Models using Multiple Transformers) for sequential data that is based on sets of probabilities.
Our approach acknowledges that the data generation process of a sequence is not deterministic, but rather ambiguous and influenced by a set of probabilities.
arXiv Detail & Related papers (2022-11-16T06:24:26Z) - N-Grammer: Augmenting Transformers with latent n-grams [35.39961549040385]
We propose a simple yet effective modification to the Transformer architecture inspired by the literature in statistical language modeling, by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence.
We evaluate our model, the N-Grammer on language modeling on the C4 data-set as well as text classification on the SuperGLUE data-set, and find that it outperforms several strong baselines such as the Transformer and the Primer.
arXiv Detail & Related papers (2022-07-13T17:18:02Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - A Discrete Variational Recurrent Topic Model without the
Reparametrization Trick [16.54912614895861]
We show how to learn a neural topic model with discrete random variables.
We show improved perplexity and document understanding across multiple corpora.
arXiv Detail & Related papers (2020-10-22T20:53:44Z) - Improving Neural Topic Models using Knowledge Distillation [84.66983329587073]
We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers.
Our modular method can be straightforwardly applied with any neural topic model to improve topic quality.
arXiv Detail & Related papers (2020-10-05T22:49:16Z) - Variational Transformers for Diverse Response Generation [71.53159402053392]
Variational Transformer (VT) is a variational self-attentive feed-forward sequence model.
VT combines the parallelizability and global receptive field computation of the Transformer with the variational nature of the CVAE.
We explore two types of VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of finegrained latent variables.
arXiv Detail & Related papers (2020-03-28T07:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.