A High-Performance Customer Churn Prediction System based on
Self-Attention
- URL: http://arxiv.org/abs/2206.01523v1
- Date: Fri, 3 Jun 2022 12:16:24 GMT
- Title: A High-Performance Customer Churn Prediction System based on
Self-Attention
- Authors: Haotian Wu
- Abstract summary: This work conducts experiments on publicly available dataset related to commercial bank customers.
A novel algorithm, a hybrid neural network with self-attention enhancement (HNNSAE), is proposed in this paper.
- Score: 9.83578821760002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Customer churn prediction is a challenging domain of research that
contributes to customer retention strategy. The predictive performance of
existing machine learning models, which are often adopted by churn communities,
appear to be at a bottleneck, partly due to models' poor feature extraction
capability. Therefore, a novel algorithm, a hybrid neural network with
self-attention enhancement (HNNSAE), is proposed in this paper to improve the
efficiency of feature screening and feature extraction, consequently improving
the model's predictive performance. This model consists of three main blocks.
The first block is the entity embedding layer, which is employed to process the
categorical variables transformed into 0-1 code. The second block is the
feature extractor, which extracts the significant features through the
multi-head self-attention mechanism. In addition, to improve the feature
extraction effect, we stack the residual connection neural network on
multi-head self-attention modules. The third block is a classifier, which is a
three-layer multilayer perceptron. This work conducts experiments on publicly
available dataset related to commercial bank customers. The result demonstrates
that HNNSAE significantly outperforms the other Individual Machine Learning
(IML), Ensemble Machine Learning (EML), and Deep Learning (DL) methods tested
in this paper. Furthermore, we compare the performance of the feature extractor
proposed in this paper with that of other three feature extractors and find
that the method proposed in this paper significantly outperforms other methods.
In addition, four hypotheses about model prediction performance and overfitting
risk are tested on the publicly available dataset.
Related papers
- Few-Shot Medical Image Segmentation with Large Kernel Attention [5.630842216128902]
We propose a few-shot medical segmentation model that acquire comprehensive feature representation capabilities.
Our model comprises four key modules: a dual-path feature extractor, an attention module, an adaptive prototype prediction module, and a multi-scale prediction fusion module.
The results demonstrate that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-07-27T02:28:30Z) - DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers [34.282971510732736]
We introduce DiTMoS, a novel DNN training and inference framework with a selector-classifiers architecture.
A composition of weak models can exhibit high diversity and the union of them can significantly boost the accuracy upper bound.
We deploy DiTMoS on the Neucleo STM32F767ZI board and evaluate it based on three time-series datasets for human activity recognition, keywords spotting, and emotion recognition.
arXiv Detail & Related papers (2024-03-14T02:11:38Z) - Learning Active Subspaces and Discovering Important Features with Gaussian Radial Basis Functions Neural Networks [0.0]
We show that precious information is contained in the spectrum of the precision matrix that can be extracted once the training of the model is completed.
We conducted numerical experiments for regression, classification, and feature selection tasks.
Our results demonstrate that the proposed model does not only yield an attractive prediction performance compared to the competitors.
arXiv Detail & Related papers (2023-07-11T09:54:30Z) - Predictability of Machine Learning Algorithms and Related Feature
Extraction Techniques [0.0]
This thesis designs a prediction system based on matrix factorization to predict the classification accuracy of a specific model on a particular dataset.
We study the performance prediction of three fundamental machine learning algorithms, namely, random forest, XGBoost, and MultiLayer Perceptron(MLP)
arXiv Detail & Related papers (2023-04-30T11:21:48Z) - Boosting Low-Data Instance Segmentation by Unsupervised Pre-training
with Saliency Prompt [103.58323875748427]
This work offers a novel unsupervised pre-training solution for low-data regimes.
Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models.
Experimental results show that our method significantly boosts several QEIS models on three datasets.
arXiv Detail & Related papers (2023-02-02T15:49:03Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - Understanding Interlocking Dynamics of Cooperative Rationalization [90.6863969334526]
Selective rationalization explains the prediction of complex neural networks by finding a small subset of the input that is sufficient to predict the neural model output.
We reveal a major problem with such cooperative rationalization paradigm -- model interlocking.
We propose a new rationalization framework, called A2R, which introduces a third component into the architecture, a predictor driven by soft attention as opposed to selection.
arXiv Detail & Related papers (2021-10-26T17:39:18Z) - Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs)
We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.