Related papers: Is Neural Topic Modelling Better than Clustering? An Empirical Study on Clustering with Contextual Embeddings for Topics

Is Neural Topic Modelling Better than Clustering? An Empirical Study on Clustering with Contextual Embeddings for Topics

URL: http://arxiv.org/abs/2204.09874v1
Date: Thu, 21 Apr 2022 04:26:51 GMT
Title: Is Neural Topic Modelling Better than Clustering? An Empirical Study on Clustering with Contextual Embeddings for Topics
Authors: Zihan Zhang, Meng Fang, Ling Chen, Mohammad-Reza Namazi-Rad
Abstract summary: Recent work incorporates pre-trained word embeddings into Neural Topic Models (NTMs) In this paper, we conduct thorough experiments showing that directly clustering high-quality sentence embeddings with an appropriate word selecting method can generate more coherent and diverse topics than NTMs.
Score: 28.13990734234436
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work incorporates pre-trained word embeddings such as BERT embeddings into Neural Topic Models (NTMs), generating highly coherent topics. However, with high-quality contextualized document representations, do we really need sophisticated neural models to obtain coherent and interpretable topics? In this paper, we conduct thorough experiments showing that directly clustering high-quality sentence embeddings with an appropriate word selecting method can generate more coherent and diverse topics than NTMs, achieving also higher efficiency and simplicity.

Related papers

Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement [7.6115889231452964]
We introduce a novel approach termed "Topic Refinement" This approach does not directly involve itself in the initial modeling of topics but focuses on improving topics after they have been mined. By employing prompt engineering, we direct LLMs to eliminate off-topic words within a given topic, ensuring that only contextually relevant words are preserved or substituted with ones that fit better semantically.
arXiv Detail & Related papers (2024-03-26T13:50:34Z)
HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding [54.52651110749165]
We present a novel framework that introduces hyperbolic embeddings to represent words and topics. With the tree-likeness property of hyperbolic space, the underlying semantic hierarchy can be better exploited to mine more interpretable topics.
arXiv Detail & Related papers (2022-10-16T02:54:17Z)
Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling. Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z)
TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner. We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z)
Neural Attention-Aware Hierarchical Topic Model [25.721713066830404]
We propose a variational autoencoder (VAE) NTM model that jointly reconstructs the sentence and document word counts. Our model also features hierarchical KL divergence to leverage embeddings of each document to regularize those of their sentences. Both quantitative and qualitative experiments have shown the efficacy of our model in 1) lowering the reconstruction errors at both the sentence and document levels, and 2) discovering more coherent topics from real-world datasets.
arXiv Detail & Related papers (2021-10-14T05:42:32Z)
Obtaining Better Static Word Embeddings Using Contextual Embedding Models [53.86080627007695]
Our proposed distillation method is a simple extension of CBOW-based training. As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings.
arXiv Detail & Related papers (2021-06-08T12:59:32Z)
Be More with Less: Hypergraph Attention Networks for Inductive Text Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task. Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words. We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z)
Improving Neural Topic Models using Knowledge Distillation [84.66983329587073]
We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers. Our modular method can be straightforwardly applied with any neural topic model to improve topic quality.
arXiv Detail & Related papers (2020-10-05T22:49:16Z)
Neural Topic Model via Optimal Transport [24.15046280736009]
We present a new neural topic model via the theory of optimal transport (OT) Specifically, we propose to learn the topic distribution of a document by directly minimising its OT distance to the document's word distributions. Our proposed model can be trained efficiently with a differentiable loss.
arXiv Detail & Related papers (2020-08-12T06:37:09Z)
Context Reinforced Neural Topic Modeling over Short Texts [15.487822291146689]
We propose a Context Reinforced Neural Topic Model (CRNTM) CRNTM infers the topic for each word in a narrow range by assuming that each short text covers only a few salient topics. Experiments on two benchmark datasets validate the effectiveness of the proposed model on both topic discovery and text classification.
arXiv Detail & Related papers (2020-08-11T06:41:53Z)
Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too! [5.819224524813161]
We propose an alternative way to obtain topics: clustering pre-trained word embeddings while incorporating document information for weighted clustering and reranking top words. The best performing combination for our approach performs as well as classical topic models, but with lower runtime and computational complexity.
arXiv Detail & Related papers (2020-04-30T16:18:18Z)
Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence [29.874072827824627]
We find that our approach produces more meaningful and coherent topics than traditional bag-of-words topic models and recent neural models. Our results indicate that future improvements in language models will translate into better topic models.
arXiv Detail & Related papers (2020-04-08T12:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.