Towards Theme Detection in Personal Finance Questions
- URL: http://arxiv.org/abs/2110.01550v1
- Date: Mon, 4 Oct 2021 16:44:16 GMT
- Title: Towards Theme Detection in Personal Finance Questions
- Authors: John Xi Qiu, Adam Faulkner, Aysu Ezen Can
- Abstract summary: We present an approach to call center theme detection that captures the occurrence of multiple themes in a question.
To capture the occurrence of multiple themes in a single question, the approach encodes and clusters at the sentence- rather than question-level.
Our highest performing approach achieves a Micro-F1 of 0.46 for this task and we show that the resulting clusters, even when slightly noisy, contain sentences that are topically consistent with the label associated with the cluster.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Banking call centers receive millions of calls annually, with much of the
information in these calls unavailable to analysts interested in tracking new
and emerging call center trends. In this study we present an approach to call
center theme detection that captures the occurrence of multiple themes in a
question, using a publicly available corpus of StackExchange personal finance
questions, labeled by users with topic tags, as a testbed. To capture the
occurrence of multiple themes in a single question, the approach encodes and
clusters at the sentence- rather than question-level. We also present a
comparison of state-of-the-art sentence encoding models, including the SBERT
family of sentence encoders. We frame our evaluation as a multiclass
classification task and show that a simple combination of the original sentence
text, Universal Sentence Encoder, and KMeans outperforms more sophisticated
techniques that involve semantic parsing, SBERT-family models, and HDBSCAN. Our
highest performing approach achieves a Micro-F1 of 0.46 for this task and we
show that the resulting clusters, even when slightly noisy, contain sentences
that are topically consistent with the label associated with the cluster.
Related papers
- Retrieval-augmented Multi-label Text Classification [20.100081284294973]
Multi-label text classification is a challenging task in settings of large label sets.
Retrieval augmentation aims to improve the sample efficiency of classification models.
We evaluate this approach on four datasets from the legal and biomedical domains.
arXiv Detail & Related papers (2023-05-22T14:16:23Z) - CLIP-GCD: Simple Language Guided Generalized Category Discovery [21.778676607030253]
Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data.
Prior methods leveraged self-supervised pre-training combined with supervised fine-tuning on the labeled data, followed by simple clustering methods.
We propose to leverage multi-modal (vision and language) models, in two complementary ways.
arXiv Detail & Related papers (2023-05-17T17:55:33Z) - Providing Insights for Open-Response Surveys via End-to-End
Context-Aware Clustering [2.6094411360258185]
In this work, we present a novel end-to-end context-aware framework that extracts, aggregates, and abbreviates embedded semantic patterns in open-response survey data.
Our framework relies on a pre-trained natural language model in order to encode the textual data into semantic vectors.
Our framework reduces the costs at-scale by automating the process of extracting the most insightful information pieces from survey data.
arXiv Detail & Related papers (2022-03-02T18:24:10Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z) - Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and
Context-Aware Auto-Encoders [59.038157066874255]
We propose a novel framework called RankAE to perform chat summarization without employing manually labeled data.
RankAE consists of a topic-oriented ranking strategy that selects topic utterances according to centrality and diversity simultaneously.
A denoising auto-encoder is designed to generate succinct but context-informative summaries based on the selected utterances.
arXiv Detail & Related papers (2020-12-14T07:31:17Z) - The Influence of Domain-Based Preprocessing on Subject-Specific
Clustering [55.41644538483948]
The sudden change of moving the majority of teaching online at Universities has caused an increased amount of workload for academics.
One way to deal with this problem is to cluster these questions depending on their topic.
In this paper, we explore the realms of tagging data sets, focusing on identifying code excerpts and providing empirical results.
arXiv Detail & Related papers (2020-11-16T17:47:19Z) - Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data
and Methodology [68.8836704199096]
Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents.
With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses.
Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence.
arXiv Detail & Related papers (2020-08-21T22:43:27Z) - Global Multiclass Classification and Dataset Construction via
Heterogeneous Local Experts [37.27708297562079]
We show how to minimize the number of labelers while ensuring the reliability of the resulting dataset.
Experiments with the MNIST and CIFAR-10 datasets demonstrate the favorable accuracy of our aggregation scheme.
arXiv Detail & Related papers (2020-05-21T18:07:42Z) - Interaction Matching for Long-Tail Multi-Label Classification [57.262792333593644]
We present an elegant and effective approach for addressing limitations in existing multi-label classification models.
By performing soft n-gram interaction matching, we match labels with natural language descriptions.
arXiv Detail & Related papers (2020-05-18T15:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.