Related papers: Exploring the Power of Topic Modeling Techniques in Analyzing Customer Reviews: A Comparative Analysis

Exploring the Power of Topic Modeling Techniques in Analyzing Customer Reviews: A Comparative Analysis

URL: http://arxiv.org/abs/2308.11520v1
Date: Sat, 19 Aug 2023 08:18:04 GMT
Title: Exploring the Power of Topic Modeling Techniques in Analyzing Customer Reviews: A Comparative Analysis
Authors: Anusuya Krishnan
Abstract summary: Machine learning and natural language processing algorithms have been deployed to analyze the vast amount of textual data available online. In this study, we examine and compare five frequently used topic modeling methods specifically applied to customer reviews. Our findings reveal that BERTopic consistently yield more meaningful extracted topics and achieve favorable results.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The exponential growth of online social network platforms and applications has led to a staggering volume of user-generated textual content, including comments and reviews. Consequently, users often face difficulties in extracting valuable insights or relevant information from such content. To address this challenge, machine learning and natural language processing algorithms have been deployed to analyze the vast amount of textual data available online. In recent years, topic modeling techniques have gained significant popularity in this domain. In this study, we comprehensively examine and compare five frequently used topic modeling methods specifically applied to customer reviews. The methods under investigation are latent semantic analysis (LSA), latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), pachinko allocation model (PAM), Top2Vec, and BERTopic. By practically demonstrating their benefits in detecting important topics, we aim to highlight their efficacy in real-world scenarios. To evaluate the performance of these topic modeling methods, we carefully select two textual datasets. The evaluation is based on standard statistical evaluation metrics such as topic coherence score. Our findings reveal that BERTopic consistently yield more meaningful extracted topics and achieve favorable results.

Related papers

Toward Purpose-oriented Topic Model Evaluation enabled by Large Language Models [0.8193467416247519]
We introduce a purpose-oriented evaluation framework that employs nine Large Language Models (LLMs)-based metrics spanning four key dimensions of topic quality.<n>The framework is validated through adversarial and sampling-based protocols, and is applied across datasets spanning news articles, scholarly publications, and social media posts.
arXiv Detail & Related papers (2025-09-08T18:46:08Z)
MLego: Interactive and Scalable Topic Exploration Through Model Reuse [12.133380833451573]
We present MLego, an interactive query framework designed to support real-time topic modeling analysis.<n>Instead of retraining models from scratch, MLego efficiently merges materialized topic models to construct approximate results at interactive speeds.<n>We integrate MLego into a visual analytics prototype system, enabling users to explore large-scale textual datasets through interactive queries.
arXiv Detail & Related papers (2025-08-11T06:06:26Z)
Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation [0.0]
This study presents a framework for automated evaluation of dynamically evolving topic in scientific literature using Large Language Models (LLMs) The proposed approach harnesses LLMs to measure key quality dimensions, such as coherence, repetitiveness, diversity, and topic-document alignment, without heavy reliance on expert annotators or narrow statistical metrics.
arXiv Detail & Related papers (2025-02-11T08:23:56Z)
Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z)
Why do you cite? An investigation on citation intents and decision-making classification processes [1.7812428873698407]
This study emphasizes the importance of trustfully classifying citation intents. We present a study utilizing advanced Ensemble Strategies for Citation Intent Classification (CIC) One of our models sets as a new state-of-the-art (SOTA) with an 89.46% Macro-F1 score on the SciCite benchmark.
arXiv Detail & Related papers (2024-07-18T09:29:33Z)
QuaLLM: An LLM-based Framework to Extract Quantitative Insights from Online Forums [10.684484559041284]
This study introduces QuaLLM, a novel framework to analyze and extract quantitative insights from text data on online forums. We applied this framework to analyze over one million comments from two Reddit's rideshare worker communities.
arXiv Detail & Related papers (2024-05-08T18:20:03Z)
A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [51.26815896167173]
We present a comprehensive tertiary analysis of PAMI reviews along three complementary dimensions.<n>Our analyses reveal distinctive organizational patterns as well as persistent gaps in current review practices.<n>Finally, our evaluation of state-of-the-art AI-generated reviews indicates encouraging advances in coherence and organization.
arXiv Detail & Related papers (2024-02-20T11:28:50Z)
Multi-Dimensional Evaluation of Text Summarization with In-Context Learning [79.02280189976562]
In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning. Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization. We then analyze the effects of factors such as the selection and number of in-context examples on performance.
arXiv Detail & Related papers (2023-06-01T23:27:49Z)
Investigating Fairness Disparities in Peer Review: A Language Model Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs) We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date. We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z)
A Data-driven Latent Semantic Analysis for Automatic Text Summarization using LDA Topic Modelling [0.0]
This study presents the Latent Dirichlet Allocation (LDA) approach used to perform topic modelling. The visualisation provides an overarching view of the main topics while allowing and attributing deep meaning to the prevalence individual topic. The results suggest the terms ranked purely by considering their probability of the topic prevalence within the processed document.
arXiv Detail & Related papers (2022-07-23T11:04:03Z)
Enhance Topics Analysis based on Keywords Properties [0.0]
We present a specificity score based on keywords properties that is able to select the most informative topics. In the experiments, we show that we are able to compress the state-of-the-art topic modelling results of different factors with an information loss that is much lower than the solution based on the recent coherence score presented in literature.
arXiv Detail & Related papers (2022-03-09T15:10:12Z)
ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads. We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z)
Why model why? Assessing the strengths and limitations of LIME [0.0]
This paper examines the effectiveness of the Local Interpretable Model-Agnostic Explanations (LIME) xAI framework. LIME is one of the most popular model agnostic frameworks found in the literature. We show how LIME can be used to supplement conventional performance assessment methods.
arXiv Detail & Related papers (2020-11-30T21:08:07Z)
Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task. The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them. By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)
A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning. This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021. We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.