Comparison of Topic Modelling Approaches in the Banking Context
- URL: http://arxiv.org/abs/2402.03176v1
- Date: Mon, 5 Feb 2024 16:43:53 GMT
- Title: Comparison of Topic Modelling Approaches in the Banking Context
- Authors: Bayode Ogunleye, Tonderai Maswera, Laurence Hirsch, Jotham Gaudoin,
and Teresa Brunsdon
- Abstract summary: This study presents the use of Kernel Principal Component Analysis ( KernelPCA) and K-means Clustering in the BERTopic architecture.
We have prepared a new dataset using tweets from customers of Nigerian banks and we use this to compare the topic modelling approaches.
Our findings showed KernelPCA and K-means in the BERTopic architecture-produced coherent topics with a coherence score of 0.8463.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Topic modelling is a prominent task for automatic topic extraction in many
applications such as sentiment analysis and recommendation systems. The
approach is vital for service industries to monitor their customer discussions.
The use of traditional approaches such as Latent Dirichlet Allocation (LDA) for
topic discovery has shown great performances, however, they are not consistent
in their results as these approaches suffer from data sparseness and inability
to model the word order in a document. Thus, this study presents the use of
Kernel Principal Component Analysis (KernelPCA) and K-means Clustering in the
BERTopic architecture. We have prepared a new dataset using tweets from
customers of Nigerian banks and we use this to compare the topic modelling
approaches. Our findings showed KernelPCA and K-means in the BERTopic
architecture-produced coherent topics with a coherence score of 0.8463.
Related papers
- Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - Navigating Public Sentiment in the Circular Economy through Topic Modelling and Hyperparameter Optimisation [3.73232429960464]
This study is pioneering in investigating various levels of public opinions concerning CE through topic modelling.
We collected data related to the circular economy from diverse platforms including Twitter, Reddit, and The Guardian.
The results of this study indicate that concerns about sustainability and economic impact persist across all three datasets.
arXiv Detail & Related papers (2024-05-16T21:38:21Z) - A Machine Learning-Based Framework for Clustering Residential
Electricity Load Profiles to Enhance Demand Response Programs [0.0]
We present a novel machine learning based framework in order to achieve optimal load profiling through a real case study.
In this paper, we present a novel machine learning based framework in order to achieve optimal load profiling through a real case study.
arXiv Detail & Related papers (2023-10-31T11:23:26Z) - Exploring the Power of Topic Modeling Techniques in Analyzing Customer
Reviews: A Comparative Analysis [0.0]
Machine learning and natural language processing algorithms have been deployed to analyze the vast amount of textual data available online.
In this study, we examine and compare five frequently used topic modeling methods specifically applied to customer reviews.
Our findings reveal that BERTopic consistently yield more meaningful extracted topics and achieve favorable results.
arXiv Detail & Related papers (2023-08-19T08:18:04Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Temporal Relevance Analysis for Video Action Models [70.39411261685963]
We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models.
We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
arXiv Detail & Related papers (2022-04-25T19:06:48Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Modeling Topical Relevance for Multi-Turn Dialogue Generation [61.87165077442267]
We propose a new model, named STAR-BTM, to tackle the problem of topic drift in multi-turn dialogue.
The Biterm Topic Model is pre-trained on the whole training dataset. Then, the topic level attention weights are computed based on the topic representation of each context.
Experimental results on both Chinese customer services data and English Ubuntu dialogue data show that STAR-BTM significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2020-09-27T03:33:22Z) - Topology-based Clusterwise Regression for User Segmentation and Demand
Forecasting [63.78344280962136]
Using a public and a novel proprietary data set of commercial data, this research shows that the proposed system enables analysts to both cluster their user base and plan demand at a granular level.
This work seeks to introduce TDA-based clustering of time series and clusterwise regression with matrix factorization methods as viable tools for the practitioner.
arXiv Detail & Related papers (2020-09-08T12:10:10Z) - Critically Examining the Claimed Value of Convolutions over User-Item
Embedding Maps for Recommender Systems [14.414055798999764]
In recent years, algorithm research in the area of recommender systems has shifted from matrix factorization techniques to neural approaches.
We show through analytical considerations and empirical evaluations that the claimed gains reported in the literature cannot be attributed to the ability of CNNs to model embedding correlations.
arXiv Detail & Related papers (2020-07-23T10:03:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.