Are Neural Topic Models Broken?
- URL: http://arxiv.org/abs/2210.16162v1
- Date: Fri, 28 Oct 2022 14:38:50 GMT
- Title: Are Neural Topic Models Broken?
- Authors: Alexander Hoyle, Pranav Goel, Rupak Sarkar, Philip Resnik
- Abstract summary: We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
- Score: 81.15470302729638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the relationship between automated and human evaluation of topic
models has been called into question. Method developers have staked the
efficacy of new topic model variants on automated measures, and their failure
to approximate human preferences places these models on uncertain ground.
Moreover, existing evaluation paradigms are often divorced from real-world use.
Motivated by content analysis as a dominant real-world use case for topic
modeling, we analyze two related aspects of topic models that affect their
effectiveness and trustworthiness in practice for that purpose: the stability
of their estimates and the extent to which the model's discovered categories
align with human-determined categories in the data. We find that neural topic
models fare worse in both respects compared to an established classical method.
We take a step toward addressing both issues in tandem by demonstrating that a
straightforward ensembling method can reliably outperform the members of the
ensemble.
Related papers
- Historia Magistra Vitae: Dynamic Topic Modeling of Roman Literature using Neural Embeddings [10.095706051685665]
We compare topic models built using traditional statistical models (LDA and NMF) and the BERT-based model.
We find that while quantitative metrics prefer statistical models, qualitative evaluation finds better insights from the neural model.
arXiv Detail & Related papers (2024-06-27T05:38:49Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Temporal Relevance Analysis for Video Action Models [70.39411261685963]
We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models.
We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
arXiv Detail & Related papers (2022-04-25T19:06:48Z) - Distributional Depth-Based Estimation of Object Articulation Models [21.046351215949525]
We propose a method that efficiently learns distributions over articulation model parameters directly from depth images.
Our core contributions include a novel representation for distributions over rigid body transformations.
We introduce a novel deep learning based approach, DUST-net, that performs category-independent articulation model estimation.
arXiv Detail & Related papers (2021-08-12T17:44:51Z) - Is Automated Topic Model Evaluation Broken?: The Incoherence of
Coherence [62.826466543958624]
We look at the standardization gap and the validation gap in topic model evaluation.
Recent models relying on neural components surpass classical topic models according to these metrics.
We use automatic coherence along with the two most widely accepted human judgment tasks, namely, topic rating and word intrusion.
arXiv Detail & Related papers (2021-07-05T17:58:52Z) - User Ex Machina : Simulation as a Design Probe in Human-in-the-Loop Text
Analytics [29.552736183006672]
We conduct a simulation-based analysis of human-centered interactions with topic models.
We find that user interactions have impacts that differ in magnitude but often negatively affect the quality of the resulting modelling.
arXiv Detail & Related papers (2021-01-06T19:44:11Z) - Firearm Detection via Convolutional Neural Networks: Comparing a
Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents.
One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis.
We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z) - Recipes for Safety in Open-domain Chatbots [32.31067267979087]
We introduce a new human-and-model-in-the-loop framework for both training safer models and for evaluating them.
We conduct experiments comparing these methods and find our new techniques are (i) safer than existing models as measured by automatic and human evaluations.
We then discuss the limitations of this work by analyzing failure cases of our models.
arXiv Detail & Related papers (2020-10-14T13:26:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.