Topic Modelling on Consumer Financial Protection Bureau Data: An
Approach Using BERT Based Embeddings
- URL: http://arxiv.org/abs/2205.07259v1
- Date: Sun, 15 May 2022 11:14:47 GMT
- Title: Topic Modelling on Consumer Financial Protection Bureau Data: An
Approach Using BERT Based Embeddings
- Authors: Vasudeva Raju Sangaraju, Bharath Kumar Bolla, Deepak Kumar Nayak,
Jyothsna Kh
- Abstract summary: We evaluate BERTopic, a novel method that generates topics using sentence embeddings on Consumer Financial Protection Bureau (CFPB) data.
Our work shows that BERTopic is flexible and yet provides meaningful and diverse topics compared to LDA and LSA.
domain-specific pre-trained embeddings (FinBERT) yield even better topics.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Customers' reviews and comments are important for businesses to understand
users' sentiment about the products and services. However, this data needs to
be analyzed to assess the sentiment associated with topics/aspects to provide
efficient customer assistance. LDA and LSA fail to capture the semantic
relationship and are not specific to any domain. In this study, we evaluate
BERTopic, a novel method that generates topics using sentence embeddings on
Consumer Financial Protection Bureau (CFPB) data. Our work shows that BERTopic
is flexible and yet provides meaningful and diverse topics compared to LDA and
LSA. Furthermore, domain-specific pre-trained embeddings (FinBERT) yield even
better topics. We evaluated the topics on coherence score (c_v) and UMass.
Related papers
- Exploratory Data Analysis for Banking and Finance: Unveiling Insights and Patterns [0.2594420805049218]
The study examines transaction patterns, credit limits, and usage across merchant categories.
It also considers demographic factors like age, gender, and income on usage patterns.
The report addresses customer churning, analyzing churn rates and factors such as demographics, transaction history, and satisfaction levels.
arXiv Detail & Related papers (2024-05-25T16:15:21Z) - CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models [60.59638232596912]
We introduce CLAMBER, a benchmark for evaluating large language models (LLMs)
Building upon the taxonomy, we construct 12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs.
Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries.
arXiv Detail & Related papers (2024-05-20T14:34:01Z) - Service Level Agreements and Security SLA: A Comprehensive Survey [51.000851088730684]
This survey paper identifies state of the art covering concepts, approaches, and open problems of SLA management.
It contributes by carrying out a comprehensive review and covering the gap between the analyses proposed in existing surveys and the most recent literature on this topic.
It proposes a novel classification criterium to organize the analysis based on SLA life cycle phases.
arXiv Detail & Related papers (2024-01-31T12:33:41Z) - OATS: Opinion Aspect Target Sentiment Quadruple Extraction Dataset for
Aspect-Based Sentiment Analysis [55.61047894397937]
Aspect-based sentiment analysis (ABSA) delves into understanding sentiments specific to distinct elements within a user-generated review.
We introduce the OATS dataset, which encompasses three fresh domains and consists of 27,470 sentence-level quadruples and 17,092 review-levels.
Our initiative seeks to bridge specific observed gaps: the recurrent focus on familiar domains like restaurants and laptops, limited data for intricate quadruple extraction tasks, and an occasional oversight of the synergy between sentence and review-level sentiments.
arXiv Detail & Related papers (2023-09-23T07:39:16Z) - Exploring the Power of Topic Modeling Techniques in Analyzing Customer
Reviews: A Comparative Analysis [0.0]
Machine learning and natural language processing algorithms have been deployed to analyze the vast amount of textual data available online.
In this study, we examine and compare five frequently used topic modeling methods specifically applied to customer reviews.
Our findings reveal that BERTopic consistently yield more meaningful extracted topics and achieve favorable results.
arXiv Detail & Related papers (2023-08-19T08:18:04Z) - Proactive Detractor Detection Framework Based on Message-Wise Sentiment
Analysis Over Customer Support Interactions [60.87845704495664]
We propose a framework relying solely on chat-based customer support interactions for predicting the recommendation decision of individual users.
For our case study, we analyzed a total number of 16.4k users and 48.7k customer support conversations within the financial vertical of a large e-commerce company in Latin America.
Our results show that, with respective feature interpretability, it is possible to predict the likelihood of a user to recommend a product or service, based solely on the message-wise sentiment evolution of their CS conversations in a fully automated way.
arXiv Detail & Related papers (2022-11-08T00:43:36Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Privacy enabled Financial Text Classification using Differential Privacy
and Federated Learning [0.0]
We propose a contextualized text classification model integrated with privacy features such as Differential Privacy (DP) and Federated Learning (FL)
We present how to privately train NLP models and desirable privacy-utility tradeoffs and evaluate them on the Financial Phrase Bank dataset.
arXiv Detail & Related papers (2021-10-04T18:15:32Z) - A Comparative Study of Sentiment Analysis Using NLP and Different
Machine Learning Techniques on US Airline Twitter Data [0.0]
Sentiment Analysis is a technique of Natural Language Processing (NLP) and Machine Learning (ML)
In this paper, we have introduced two NLP techniques (Bag-of-Words and TF-IDF) and various ML classification algorithms.
Our best approaches provide 77% accuracy using Support Vector Machine and Logistic Regression with Bag-of-Words technique.
arXiv Detail & Related papers (2021-10-02T18:05:00Z) - Improved Customer Transaction Classification using Semi-Supervised
Knowledge Distillation [0.0]
We propose a cost-effective transaction classification approach based on semi-supervision and knowledge distillation frameworks.
The approach identifies the category of a transaction using free text input given by the customer.
We use weak labelling and notice that the performance gains are similar to that of using human-annotated samples.
arXiv Detail & Related papers (2021-02-15T16:16:42Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.