Comprehensive Evaluation of Large Language Models for Topic Modeling
- URL: http://arxiv.org/abs/2406.00697v2
- Date: Tue, 25 Jun 2024 08:42:53 GMT
- Title: Comprehensive Evaluation of Large Language Models for Topic Modeling
- Authors: Tomoki Doi, Masaru Isonuma, Hitomi Yanaka,
- Abstract summary: We quantitatively evaluate Large Language Models (LLMs) for topic modeling.
We show that LLMs can identify coherent and diverse topics with few hallucinations but may take shortcuts by focusing only on parts of documents.
- Score: 18.317976368281716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work utilizes Large Language Models (LLMs) for topic modeling, generating comprehensible topic labels for given documents. However, their performance has mainly been evaluated qualitatively, and there remains room for quantitative investigation of their capabilities. In this paper, we quantitatively evaluate LLMs from multiple perspectives: the quality of topics, the impact of LLM-specific concerns, such as hallucination and shortcuts for limited documents, and LLMs' controllability of topic categories via prompts. Our findings show that LLMs can identify coherent and diverse topics with few hallucinations but may take shortcuts by focusing only on parts of documents. We also found that their controllability is limited.
Related papers
- The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead? [60.01746782465275]
Large Language Models (LLMs) have shown capabilities close to human performance in various analytical tasks.
This paper investigates the efficiency and accuracy of LLMs in specialized tasks through a structured user study focusing on Human-LLM partnership.
arXiv Detail & Related papers (2024-10-07T02:30:18Z) - Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling [1.0345450222523374]
Large language models (LLMs) with their strong zero-shot topic extraction capabilities offer an alternative to probabilistic topic modelling.
This paper focuses on addressing the issues of topic granularity and hallucinations for better LLM-based topic modelling.
Our approach does not rely on traditional human annotation to rank preferred answers but employs a reconstruction pipeline to modify raw topics.
arXiv Detail & Related papers (2024-05-01T16:32:07Z) - Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling [0.9095496510579351]
We investigate the untapped potential of large language models (LLMs) as an alternative for uncovering the underlying topics within extensive text corpora.
Our findings indicate that LLMs with appropriate prompts can stand out as a viable alternative, capable of generating relevant topic titles and adhering to human guidelines to refine and merge topics.
arXiv Detail & Related papers (2024-03-24T17:39:51Z) - Exploring Perceptual Limitation of Multimodal Large Language Models [57.567868157293994]
We quantitatively study the perception of small visual objects in several state-of-the-art MLLMs.
We identify four independent factors that can contribute to this limitation.
Lower object quality and smaller object size can both independently reduce MLLMs' ability to answer visual questions.
arXiv Detail & Related papers (2024-02-12T03:04:42Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals? [19.814974042343028]
We investigate the controllability of large language models (LLMs) on scientific summarization tasks.
We find that non-fine-tuned LLMs outperform humans in the MuP review generation task.
arXiv Detail & Related papers (2024-01-18T23:00:54Z) - Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models [50.653838482083614]
This paper introduces a scalable test-bed to assess the capabilities of IT-LVLMs on fundamental computer vision tasks.
MERLIM contains over 300K image-question pairs and has a strong focus on detecting cross-modal "hallucination" events in IT-LVLMs.
arXiv Detail & Related papers (2023-12-03T16:39:36Z) - BLESS: Benchmarking Large Language Models on Sentence Simplification [55.461555829492866]
We present BLESS, a performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS)
We assess a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting.
Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines.
arXiv Detail & Related papers (2023-10-24T12:18:17Z) - Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks.
We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.