Semiparametric Latent Topic Modeling on Consumer-Generated Corpora
- URL: http://arxiv.org/abs/2107.10651v1
- Date: Tue, 13 Jul 2021 00:22:02 GMT
- Title: Semiparametric Latent Topic Modeling on Consumer-Generated Corpora
- Authors: Dominic B. Dayta and Erniel B. Barrios
- Abstract summary: This paper proposes semiparametric topic model, a two-step approach utilizing nonnegative matrix factorization and semiparametric regression in topic modeling.
The model enables the reconstruction of sparse topic structures in the corpus and provides a generative model for predicting topics in new documents entering the corpus.
In an actual consumer feedback corpus, the model also demonstrably provides interpretable and useful topic definitions comparable with those produced by other methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Legacy procedures for topic modelling have generally suffered problems of
overfitting and a weakness towards reconstructing sparse topic structures. With
motivation from a consumer-generated corpora, this paper proposes
semiparametric topic model, a two-step approach utilizing nonnegative matrix
factorization and semiparametric regression in topic modeling. The model
enables the reconstruction of sparse topic structures in the corpus and
provides a generative model for predicting topics in new documents entering the
corpus. Assuming the presence of auxiliary information related to the topics,
this approach exhibits better performance in discovering underlying topic
structures in cases where the corpora are small and limited in vocabulary. In
an actual consumer feedback corpus, the model also demonstrably provides
interpretable and useful topic definitions comparable with those produced by
other methods.
Related papers
- SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - The Geometric Structure of Topic Models [0.0]
Despite their widespread use in research and application, an in-depth analysis of topic models is still an open research topic.
We propose an incidence-geometric method for deriving an ordinal structure from flat topic models.
We present a new visualization paradigm for concept hierarchies based on ordinal motifs.
arXiv Detail & Related papers (2024-03-06T10:53:51Z) - Let the Pretrained Language Models "Imagine" for Short Texts Topic
Modeling [29.87929724277381]
In short texts, co-occurrence information is minimal, which results in feature sparsity in document representation.
Existing topic models (probabilistic or neural) mostly fail to mine patterns from them to generate coherent topics.
We extend short text into longer sequences using existing pre-trained language models (PLMs)
arXiv Detail & Related papers (2023-10-24T00:23:30Z) - TopicAdapt- An Inter-Corpora Topics Adaptation Approach [27.450275637652418]
This paper proposes a neural topic model, TopicAdapt, that can adapt relevant topics from a related source corpus and also discover new topics in a target corpus that are absent in the source corpus.
Experiments over multiple datasets from diverse domains show the superiority of the proposed model against the state-of-the-art topic models.
arXiv Detail & Related papers (2023-10-08T02:56:44Z) - A modified model for topic detection from a corpus and a new metric
evaluating the understandability of topics [0.0]
The new model builds upon the embedded topic model incorporating some modifications such as document clustering.
Numerical experiments suggest that the new model performs favourably regardless of the document's length.
The new metric, which can be computed more efficiently than widely-used metrics such as topic coherence, provides variable information regarding the understandability of the detected topics.
arXiv Detail & Related papers (2023-06-08T05:17:03Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Model Criticism for Long-Form Text Generation [113.13900836015122]
We apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of generated text.
We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality.
We find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.
arXiv Detail & Related papers (2022-10-16T04:35:58Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Improving Neural Topic Models using Knowledge Distillation [84.66983329587073]
We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers.
Our modular method can be straightforwardly applied with any neural topic model to improve topic quality.
arXiv Detail & Related papers (2020-10-05T22:49:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.