SimLDA: A tool for topic model evaluation
- URL: http://arxiv.org/abs/2208.09299v1
- Date: Fri, 19 Aug 2022 12:25:53 GMT
- Title: SimLDA: A tool for topic model evaluation
- Authors: Rebecca M.C. Taylor, Johan A. du Preez
- Abstract summary: We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA)
We compare it with the gold standard VB and collapsed Gibbs sampling algorithms.
Using coherence measures we show that ALBU learns latent distributions more accurately than does VB, especially for smaller data sets.
- Score: 2.6397379133308214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has
become the most popular algorithm for aspect modeling. While sufficiently
successful in text topic extraction from large corpora, VB is less successful
in identifying aspects in the presence of limited data. We present a novel
variational message passing algorithm as applied to Latent Dirichlet Allocation
(LDA) and compare it with the gold standard VB and collapsed Gibbs sampling. In
situations where marginalisation leads to non-conjugate messages, we use ideas
from sampling to derive approximate update equations. In cases where conjugacy
holds, Loopy Belief update (LBU) (also known as Lauritzen-Spiegelhalter) is
used. Our algorithm, ALBU (approximate LBU), has strong similarities with
Variational Message Passing (VMP) (which is the message passing variant of VB).
To compare the performance of the algorithms in the presence of limited data,
we use data sets consisting of tweets and news groups. Using coherence measures
we show that ALBU learns latent distributions more accurately than does VB,
especially for smaller data sets.
Related papers
- Substance or Style: What Does Your Image Embedding Know? [55.676463077772866]
Image foundation models have primarily been evaluated for semantic content.
We measure the visual content of embeddings along many axes, including image style, quality, and a range of natural and artificial transformations.
We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE)
arXiv Detail & Related papers (2023-07-10T22:40:10Z) - Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls
and New Benchmarking [66.83273589348758]
Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph.
A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task.
New and diverse datasets have also been created to better evaluate the effectiveness of these new models.
arXiv Detail & Related papers (2023-06-18T01:58:59Z) - MoBYv2AL: Self-supervised Active Learning for Image Classification [57.4372176671293]
We present MoBYv2AL, a novel self-supervised active learning framework for image classification.
Our contribution lies in lifting MoBY, one of the most successful self-supervised learning algorithms, to the AL pipeline.
We achieve state-of-the-art results when compared to recent AL methods.
arXiv Detail & Related papers (2023-01-04T10:52:02Z) - A Bayesian Bradley-Terry model to compare multiple ML algorithms on
multiple data sets [4.394728504061753]
This paper proposes a Bayesian model to compare multiple algorithms on multiple data sets, on any metric.
The model is based on the Bradley-Terry model, that counts the number of times one algorithm performs better than another on different data sets.
arXiv Detail & Related papers (2022-08-09T17:59:06Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Variational message passing (VMP) applied to LDA [3.5027291542274366]
Variational message passing (VMP) is the message passing equivalent of VB.
In this article we present the VMP equations for latent Dirichlet allocation (LDA)
arXiv Detail & Related papers (2021-11-02T10:32:15Z) - ALBU: An approximate Loopy Belief message passing algorithm for LDA to
improve performance on small data sets [3.5027291542274366]
We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA)
We compare it with the gold standard VB and collapsed Gibbs sampling algorithms.
Using coherence measures for the text corpora and KLD with the simulations we show that ALBU learns latent distributions more accurately than does VB.
arXiv Detail & Related papers (2021-10-01T19:55:12Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Spike and slab variational Bayes for high dimensional logistic
regression [5.371337604556311]
Variational Bayes (VB) is a popular scalable alternative to Markov chain Monte Carlo for Bayesian inference.
We provide non-asymptotic theoretical guarantees for the VB in both $ell$ and prediction loss for a sparse truth.
We confirm the improved performance of our VB algorithm over common sparse VB approaches in a numerical study.
arXiv Detail & Related papers (2020-10-22T12:49:58Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Improving Reliability of Latent Dirichlet Allocation by Assessing Its
Stability Using Clustering Techniques on Replicated Runs [0.3499870393443268]
We study the stability of LDA by comparing assignments from replicated runs.
We propose to quantify the similarity of two generated topics by a modified Jaccard coefficient.
We show that the measure S-CLOP is useful for assessing the stability of LDA models.
arXiv Detail & Related papers (2020-02-14T07:10:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.