Related papers: Comparison of Topic Modelling Approaches in the Banking Context

Comparison of Topic Modelling Approaches in the Banking Context

URL: http://arxiv.org/abs/2402.03176v1
Date: Mon, 5 Feb 2024 16:43:53 GMT
Title: Comparison of Topic Modelling Approaches in the Banking Context
Authors: Bayode Ogunleye, Tonderai Maswera, Laurence Hirsch, Jotham Gaudoin, and Teresa Brunsdon
Abstract summary: This study presents the use of Kernel Principal Component Analysis ( KernelPCA) and K-means Clustering in the BERTopic architecture. We have prepared a new dataset using tweets from customers of Nigerian banks and we use this to compare the topic modelling approaches. Our findings showed KernelPCA and K-means in the BERTopic architecture-produced coherent topics with a coherence score of 0.8463.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Topic modelling is a prominent task for automatic topic extraction in many applications such as sentiment analysis and recommendation systems. The approach is vital for service industries to monitor their customer discussions. The use of traditional approaches such as Latent Dirichlet Allocation (LDA) for topic discovery has shown great performances, however, they are not consistent in their results as these approaches suffer from data sparseness and inability to model the word order in a document. Thus, this study presents the use of Kernel Principal Component Analysis (KernelPCA) and K-means Clustering in the BERTopic architecture. We have prepared a new dataset using tweets from customers of Nigerian banks and we use this to compare the topic modelling approaches. Our findings showed KernelPCA and K-means in the BERTopic architecture-produced coherent topics with a coherence score of 0.8463.

Related papers

Toward Purpose-oriented Topic Model Evaluation enabled by Large Language Models [0.8193467416247519]
We introduce a purpose-oriented evaluation framework that employs nine Large Language Models (LLMs)-based metrics spanning four key dimensions of topic quality.<n>The framework is validated through adversarial and sampling-based protocols, and is applied across datasets spanning news articles, scholarly publications, and social media posts.
arXiv Detail & Related papers (2025-09-08T18:46:08Z)
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability [53.51560766150442]
Critical tokens are elements within reasoning trajectories that significantly influence incorrect outcomes. We present a novel framework for identifying these tokens through rollout sampling. We show that identifying and replacing critical tokens significantly improves model accuracy.
arXiv Detail & Related papers (2024-11-29T18:58:22Z)
Semantic Component Analysis: Introducing Multi-Topic Distributions to Clustering-Based Topic Modeling [8.834228408033896]
We introduce Semantic Component Analysis (SCA), a topic modeling technique that discovers multiple topics per sample.<n>We evaluate SCA on Twitter datasets in English, Hausa and Chinese.
arXiv Detail & Related papers (2024-10-28T14:09:52Z)
Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z)
Deep Model Interpretation with Limited Data : A Coreset-based Approach [0.810304644344495]
We propose a coreset-based interpretation framework that utilizes coreset selection methods to sample a representative subset of the large dataset for the interpretation task. We propose a similarity-based evaluation protocol to assess the robustness of model interpretation methods towards the amount data they take as input.
arXiv Detail & Related papers (2024-10-01T09:07:24Z)
High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models. To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence. Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z)
Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling. EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z)
A Machine Learning-Based Framework for Clustering Residential Electricity Load Profiles to Enhance Demand Response Programs [0.0]
We present a novel machine learning based framework in order to achieve optimal load profiling through a real case study. In this paper, we present a novel machine learning based framework in order to achieve optimal load profiling through a real case study.
arXiv Detail & Related papers (2023-10-31T11:23:26Z)
Exploring the Power of Topic Modeling Techniques in Analyzing Customer Reviews: A Comparative Analysis [0.0]
Machine learning and natural language processing algorithms have been deployed to analyze the vast amount of textual data available online. In this study, we examine and compare five frequently used topic modeling methods specifically applied to customer reviews. Our findings reveal that BERTopic consistently yield more meaningful extracted topics and achieve favorable results.
arXiv Detail & Related papers (2023-08-19T08:18:04Z)
Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models. We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z)
Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document. We also simultaneously cluster users, removing the need for post-hoc cluster estimation. Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z)
Topology-based Clusterwise Regression for User Segmentation and Demand Forecasting [63.78344280962136]
Using a public and a novel proprietary data set of commercial data, this research shows that the proposed system enables analysts to both cluster their user base and plan demand at a granular level. This work seeks to introduce TDA-based clustering of time series and clusterwise regression with matrix factorization methods as viable tools for the practitioner.
arXiv Detail & Related papers (2020-09-08T12:10:10Z)
Critically Examining the Claimed Value of Convolutions over User-Item Embedding Maps for Recommender Systems [14.414055798999764]
In recent years, algorithm research in the area of recommender systems has shifted from matrix factorization techniques to neural approaches. We show through analytical considerations and empirical evaluations that the claimed gains reported in the literature cannot be attributed to the ability of CNNs to model embedding correlations.
arXiv Detail & Related papers (2020-07-23T10:03:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.