Revisiting Automated Topic Model Evaluation with Large Language Models
- URL: http://arxiv.org/abs/2305.12152v2
- Date: Sun, 22 Oct 2023 09:46:13 GMT
- Title: Revisiting Automated Topic Model Evaluation with Large Language Models
- Authors: Dominik Stammbach, Vil\'em Zouhar, Alexander Hoyle, Mrinmaya Sachan,
Elliott Ash
- Abstract summary: We find that large language models appropriately assess the resulting topics.
We then investigate whether we can use large language models to automatically determine the optimal number of topics.
- Score: 82.93251466435208
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Topic models are used to make sense of large text collections. However,
automatically evaluating topic model output and determining the optimal number
of topics both have been longstanding challenges, with no effective automated
solutions to date. This paper proposes using large language models to evaluate
such output. We find that large language models appropriately assess the
resulting topics, correlating more strongly with human judgments than existing
automated metrics. We then investigate whether we can use large language models
to automatically determine the optimal number of topics. We automatically
assign labels to documents and choosing configurations with the most pure
labels returns reasonable values for the optimal number of topics.
Related papers
- LLM-Assisted Topic Reduction for BERTopic on Social Media Data [0.22940141855172028]
We propose a framework that combines BERTopic for topic generation with large language models for topic reduction.<n>We evaluate the approach across three Twitter/X datasets and four different language models.
arXiv Detail & Related papers (2025-09-18T20:59:11Z) - Combining Autoregressive and Autoencoder Language Models for Text Classification [1.0878040851638]
CAALM-TC is a novel method that enhances text classification by integrating autoregressive and autoencoder language models.
Experimental results on four benchmark datasets demonstrate that CAALM consistently outperforms existing methods.
arXiv Detail & Related papers (2024-11-20T12:49:42Z) - LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models [12.500091504010067]
We propose WALM (Words Agreement with Language Model), a new evaluation method for topic modeling.
With extensive experiments involving different types of topic models, WALM is shown to align with human judgment.
arXiv Detail & Related papers (2024-06-13T11:19:50Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Label-Efficient Model Selection for Text Generation [14.61636207880449]
We introduce DiffUse, a method to make an informed decision between candidate text generation models based on preference annotations.
In a series of experiments over hundreds of model pairs, we demonstrate that DiffUse can dramatically reduce the required number of annotations.
arXiv Detail & Related papers (2024-02-12T18:54:02Z) - Multi-Candidate Speculative Decoding [82.05519287513444]
Large language models have shown impressive capabilities across a variety of NLP tasks, yet their generating text autoregressively is time-consuming.
One way to speed them up is speculative decoding, which generates candidate segments from a fast draft model that is then verified in parallel by the target model.
This paper proposes sampling multiple candidates from a draft model and then organising them in batches for verification.
We design algorithms for efficient multi-candidate verification while maintaining the distribution of the target model.
arXiv Detail & Related papers (2024-01-12T17:15:23Z) - EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [70.19437817951673]
We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities.
Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation.
Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmark, in terms of visual qualities, content qualities, motion qualities, and text-video alignment with 17 well-selected objective metrics.
arXiv Detail & Related papers (2023-10-17T17:50:46Z) - Large Language Models as Zero-Shot Conversational Recommenders [52.57230221644014]
We present empirical studies on conversational recommendation tasks using representative large language models in a zero-shot setting.
We construct a new dataset of recommendation-related conversations by scraping a popular discussion website.
We observe that even without fine-tuning, large language models can outperform existing fine-tuned conversational recommendation models.
arXiv Detail & Related papers (2023-08-19T15:29:45Z) - Topic Discovery via Latent Space Clustering of Pretrained Language Model
Representations [35.74225306947918]
We propose a joint latent space learning and clustering framework built upon PLM embeddings.
Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery.
arXiv Detail & Related papers (2022-02-09T17:26:08Z) - Generating Usage-related Questions for Preference Elicitation in Conversational Recommender Systems [19.950705852361565]
We propose a novel approach to preference elicitation by asking implicit questions based on item usage.
We develop a high-quality labeled training dataset using crowdsourcing.
We show that our approaches are effective in generating elicitation questions, even with limited training data.
arXiv Detail & Related papers (2021-11-26T12:23:14Z) - Model LineUpper: Supporting Interactive Model Comparison at Multiple
Levels for AutoML [29.04776652873194]
In current AutoML systems, selection is supported only by performance metrics.
We develop tool to support interactive model comparison for AutoML by integrating multiple Explainable AI (XAI) and visualization techniques.
arXiv Detail & Related papers (2021-04-09T14:06:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.