Enhancing Topic Interpretability for Neural Topic Modeling through Topic-wise Contrastive Learning
- URL: http://arxiv.org/abs/2412.17338v1
- Date: Mon, 23 Dec 2024 07:07:06 GMT
- Title: Enhancing Topic Interpretability for Neural Topic Modeling through Topic-wise Contrastive Learning
- Authors: Xin Gao, Yang Lin, Ruiqing Li, Yasha Wang, Xu Chu, Xinyu Ma, Hailong Yu,
- Abstract summary: Overemphasizing likelihood without incorporating topic regularization can lead to an overly expansive latent space for topic modeling.
We propose a novel NTM framework, named ContraTopic, that integrates a differentiable regularizer capable of evaluating multiple facets of topic interpretability.
Our approach consistently produces topics with superior interpretability compared to state-of-the-art NTMs.
- Score: 23.816433328623397
- License:
- Abstract: Data mining and knowledge discovery are essential aspects of extracting valuable insights from vast datasets. Neural topic models (NTMs) have emerged as a valuable unsupervised tool in this field. However, the predominant objective in NTMs, which aims to discover topics maximizing data likelihood, often lacks alignment with the central goals of data mining and knowledge discovery which is to reveal interpretable insights from large data repositories. Overemphasizing likelihood maximization without incorporating topic regularization can lead to an overly expansive latent space for topic modeling. In this paper, we present an innovative approach to NTMs that addresses this misalignment by introducing contrastive learning measures to assess topic interpretability. We propose a novel NTM framework, named ContraTopic, that integrates a differentiable regularizer capable of evaluating multiple facets of topic interpretability throughout the training process. Our regularizer adopts a unique topic-wise contrastive methodology, fostering both internal coherence within topics and clear external distinctions among them. Comprehensive experiments conducted on three diverse datasets demonstrate that our approach consistently produces topics with superior interpretability compared to state-of-the-art NTMs.
Related papers
- Self-Supervised Learning for Neural Topic Models with Variance-Invariance-Covariance Regularization [12.784397404903142]
We propose a self-supervised neural topic model (NTM) that combines the power of NTMs and regularized self-supervised learning methods to improve performance.
NTMs use neural networks to learn latent topics hidden behind the words in documents.
Our models outperformed baselines and state-of-the-art models both quantitatively and qualitatively.
arXiv Detail & Related papers (2025-02-14T06:47:37Z) - Bidirectional Topic Matching: Quantifying Thematic Overlap Between Corpora Through Topic Modelling [0.0]
Bidirectional Topic Matching (BTM) is a novel method for cross-corpus topic modeling that quantifies thematic overlap and divergence between corpora.
BTM employs a dual-model approach, training separate topic models for each corpus and applying them reciprocally to enable comprehensive cross-corpus comparisons.
arXiv Detail & Related papers (2024-12-24T12:02:43Z) - Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Topic Modeling as Multi-Objective Contrastive Optimization [46.24876966674759]
Recent representation learning approaches enhance neural topic models by optimizing the weighted linear combination of the evidence lower bound (ELBO) of the log-likelihood and the contrastive learning objective that contrasts pairs of input documents.
We introduce a novel contrastive learning method oriented towards sets of topic vectors to capture useful semantics that are shared among a set of input documents.
Our framework consistently produces higher-performing neural topic models in terms of topic coherence, topic diversity, and downstream performance.
arXiv Detail & Related papers (2024-02-12T11:18:32Z) - Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset
and Comprehensive Framework [51.44863255495668]
Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence.
We present Multi-Modal Reasoning(COCO-MMR) dataset, a novel dataset that encompasses an extensive collection of open-ended questions.
We propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders.
arXiv Detail & Related papers (2023-07-24T08:58:25Z) - vONTSS: vMF based semi-supervised neural topic modeling with optimal
transport [6.874745415692134]
This work presents a semi-supervised neural topic modeling method, vONTSS, which uses von Mises-Fisher (vMF) based variational autoencoders and optimal transport.
Experiments show that vONTSS outperforms existing semi-supervised topic modeling methods in classification accuracy and diversity.
It is also much faster than the state-of-the-art weakly supervised text classification method while achieving similar classification performance.
arXiv Detail & Related papers (2023-07-03T04:23:41Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Neural Topic Modeling with Deep Mutual Information Estimation [23.474848535821994]
We propose a neural topic model which incorporates deep mutual information estimation.
NTM-DMIE is a neural network method for topic learning.
We evaluate NTM-DMIE on several metrics, including accuracy of text clustering, with topic representation, topic uniqueness and topic coherence.
arXiv Detail & Related papers (2022-03-12T01:08:10Z) - A Variational Information Bottleneck Approach to Multi-Omics Data
Integration [98.6475134630792]
We propose a deep variational information bottleneck (IB) approach for incomplete multi-view observations.
Our method applies the IB framework on marginal and joint representations of the observed views to focus on intra-view and inter-view interactions that are relevant for the target.
Experiments on real-world datasets show that our method consistently achieves gain from data integration and outperforms state-of-the-art benchmarks.
arXiv Detail & Related papers (2021-02-05T06:05:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.