Topic Analysis for Text with Side Data
- URL: http://arxiv.org/abs/2203.00762v1
- Date: Tue, 1 Mar 2022 22:06:30 GMT
- Title: Topic Analysis for Text with Side Data
- Authors: Biyi Fang, Kripa Rajshekhar, Diego Klabjan
- Abstract summary: We introduce a hybrid generative probabilistic model that combines a neural network with a latent topic model.
In the model, each document is modeled as a finite mixture over an underlying set of topics.
Each topic is modeled as an infinite mixture over an underlying set of topic probabilities.
- Score: 18.939336393665553
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Although latent factor models (e.g., matrix factorization) obtain good
performance in predictions, they suffer from several problems including
cold-start, non-transparency, and suboptimal recommendations. In this paper, we
employ text with side data to tackle these limitations. We introduce a hybrid
generative probabilistic model that combines a neural network with a latent
topic model, which is a four-level hierarchical Bayesian model. In the model,
each document is modeled as a finite mixture over an underlying set of topics
and each topic is modeled as an infinite mixture over an underlying set of
topic probabilities. Furthermore, each topic probability is modeled as a finite
mixture over side data. In the context of text, the neural network provides an
overview distribution about side data for the corresponding text, which is the
prior distribution in LDA to help perform topic grouping. The approach is
evaluated on several different datasets, where the model is shown to outperform
standard LDA and Dirichlet-multinomial regression (DMR) in terms of topic
grouping, model perplexity, classification and comment generation.
Related papers
- Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - Let the Pretrained Language Models "Imagine" for Short Texts Topic
Modeling [29.87929724277381]
In short texts, co-occurrence information is minimal, which results in feature sparsity in document representation.
Existing topic models (probabilistic or neural) mostly fail to mine patterns from them to generate coherent topics.
We extend short text into longer sequences using existing pre-trained language models (PLMs)
arXiv Detail & Related papers (2023-10-24T00:23:30Z) - Neural Dynamic Focused Topic Model [2.9005223064604078]
We leverage recent advances in neural variational inference and present an alternative neural approach to the dynamic Focused Topic Model.
We develop a neural model for topic evolution which exploits sequences of Bernoulli random variables in order to track the appearances of topics.
arXiv Detail & Related papers (2023-01-26T08:37:34Z) - Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular
data [81.43750358586072]
We propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes.
We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets.
arXiv Detail & Related papers (2022-10-24T08:57:55Z) - Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network [49.458250193768826]
We propose sawtooth factorial topic embedding guided GBN, a deep generative model of documents.
Both the words and topics are represented as embedding vectors of the same dimension.
Our models outperform other neural topic models on extracting deeper interpretable topics.
arXiv Detail & Related papers (2021-06-30T10:14:57Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Context Reinforced Neural Topic Modeling over Short Texts [15.487822291146689]
We propose a Context Reinforced Neural Topic Model (CRNTM)
CRNTM infers the topic for each word in a narrow range by assuming that each short text covers only a few salient topics.
Experiments on two benchmark datasets validate the effectiveness of the proposed model on both topic discovery and text classification.
arXiv Detail & Related papers (2020-08-11T06:41:53Z) - Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization.
We also propose a prototype encoding structure to model the ability of intra-topic derivation.
Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z) - Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets.
Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z) - Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for
Fast and Good Topics too! [5.819224524813161]
We propose an alternative way to obtain topics: clustering pre-trained word embeddings while incorporating document information for weighted clustering and reranking top words.
The best performing combination for our approach performs as well as classical topic models, but with lower runtime and computational complexity.
arXiv Detail & Related papers (2020-04-30T16:18:18Z) - A Gamma-Poisson Mixture Topic Model for Short Text [0.0]
Most topic models are constructed under the assumption that documents follow a multinomial distribution.
For topic modelling, the Poisson distribution describes the number of occurrences of a word in documents of fixed length.
The few Poisson topic models in literature are admixture models, making the assumption that a document is generated from a mixture of topics.
arXiv Detail & Related papers (2020-04-23T21:13:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.