Exploring the Power of Topic Modeling Techniques in Analyzing Customer
Reviews: A Comparative Analysis
- URL: http://arxiv.org/abs/2308.11520v1
- Date: Sat, 19 Aug 2023 08:18:04 GMT
- Title: Exploring the Power of Topic Modeling Techniques in Analyzing Customer
Reviews: A Comparative Analysis
- Authors: Anusuya Krishnan
- Abstract summary: Machine learning and natural language processing algorithms have been deployed to analyze the vast amount of textual data available online.
In this study, we examine and compare five frequently used topic modeling methods specifically applied to customer reviews.
Our findings reveal that BERTopic consistently yield more meaningful extracted topics and achieve favorable results.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The exponential growth of online social network platforms and applications
has led to a staggering volume of user-generated textual content, including
comments and reviews. Consequently, users often face difficulties in extracting
valuable insights or relevant information from such content. To address this
challenge, machine learning and natural language processing algorithms have
been deployed to analyze the vast amount of textual data available online. In
recent years, topic modeling techniques have gained significant popularity in
this domain. In this study, we comprehensively examine and compare five
frequently used topic modeling methods specifically applied to customer
reviews. The methods under investigation are latent semantic analysis (LSA),
latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF),
pachinko allocation model (PAM), Top2Vec, and BERTopic. By practically
demonstrating their benefits in detecting important topics, we aim to highlight
their efficacy in real-world scenarios. To evaluate the performance of these
topic modeling methods, we carefully select two textual datasets. The
evaluation is based on standard statistical evaluation metrics such as topic
coherence score. Our findings reveal that BERTopic consistently yield more
meaningful extracted topics and achieve favorable results.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Why do you cite? An investigation on citation intents and decision-making classification processes [1.7812428873698407]
This study emphasizes the importance of trustfully classifying citation intents.
We present a study utilizing advanced Ensemble Strategies for Citation Intent Classification (CIC)
One of our models sets as a new state-of-the-art (SOTA) with an 89.46% Macro-F1 score on the SciCite benchmark.
arXiv Detail & Related papers (2024-07-18T09:29:33Z) - QuaLLM: An LLM-based Framework to Extract Quantitative Insights from Online Forums [10.684484559041284]
This study introduces QuaLLM, a novel framework to analyze and extract quantitative insights from text data on online forums.
We applied this framework to analyze over one million comments from two Reddit's rideshare worker communities.
arXiv Detail & Related papers (2024-05-08T18:20:03Z) - Multi-Dimensional Evaluation of Text Summarization with In-Context
Learning [79.02280189976562]
In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning.
Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization.
We then analyze the effects of factors such as the selection and number of in-context examples on performance.
arXiv Detail & Related papers (2023-06-01T23:27:49Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - A Data-driven Latent Semantic Analysis for Automatic Text Summarization
using LDA Topic Modelling [0.0]
This study presents the Latent Dirichlet Allocation (LDA) approach used to perform topic modelling.
The visualisation provides an overarching view of the main topics while allowing and attributing deep meaning to the prevalence individual topic.
The results suggest the terms ranked purely by considering their probability of the topic prevalence within the processed document.
arXiv Detail & Related papers (2022-07-23T11:04:03Z) - Enhance Topics Analysis based on Keywords Properties [0.0]
We present a specificity score based on keywords properties that is able to select the most informative topics.
In the experiments, we show that we are able to compress the state-of-the-art topic modelling results of different factors with an information loss that is much lower than the solution based on the recent coherence score presented in literature.
arXiv Detail & Related papers (2022-03-09T15:10:12Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Why model why? Assessing the strengths and limitations of LIME [0.0]
This paper examines the effectiveness of the Local Interpretable Model-Agnostic Explanations (LIME) xAI framework.
LIME is one of the most popular model agnostic frameworks found in the literature.
We show how LIME can be used to supplement conventional performance assessment methods.
arXiv Detail & Related papers (2020-11-30T21:08:07Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.