Classifying variety of customer's online engagement for churn prediction
with mixed-penalty logistic regression
- URL: http://arxiv.org/abs/2105.07671v1
- Date: Mon, 17 May 2021 08:40:34 GMT
- Title: Classifying variety of customer's online engagement for churn prediction
with mixed-penalty logistic regression
- Authors: Petra Posedel \v{S}imovi\'c, Davor Horvatic, Edward W. Sun
- Abstract summary: We provide new predictive analytics of customer churn rate based on a machine learning method that enhances the classification of logistic regression by adding a mixed penalty term.
We show the analytical properties of the proposed method and its computational advantage in this research.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using big data to analyze consumer behavior can provide effective
decision-making tools for preventing customer attrition (churn) in customer
relationship management (CRM). Focusing on a CRM dataset with several different
categories of factors that impact customer heterogeneity (i.e., usage of
self-care service channels, duration of service, and responsiveness to
marketing actions), we provide new predictive analytics of customer churn rate
based on a machine learning method that enhances the classification of logistic
regression by adding a mixed penalty term. The proposed penalized logistic
regression can prevent overfitting when dealing with big data and minimize the
loss function when balancing the cost from the median (absolute value) and mean
(squared value) regularization. We show the analytical properties of the
proposed method and its computational advantage in this research. In addition,
we investigate the performance of the proposed method with a CRM data set (that
has a large number of features) under different settings by efficiently
eliminating the disturbance of (1) least important features and (2) sensitivity
from the minority (churn) class. Our empirical results confirm the expected
performance of the proposed method in full compliance with the common
classification criteria (i.e., accuracy, precision, and recall) for evaluating
machine learning methods.
Related papers
- Quantifying User Coherence: A Unified Framework for Cross-Domain Recommendation Analysis [69.37718774071793]
This paper introduces novel information-theoretic measures for understanding recommender systems.
We evaluate 7 recommendation algorithms across 9 datasets, revealing the relationships between our measures and standard performance metrics.
arXiv Detail & Related papers (2024-10-03T13:02:07Z) - Submodular Maximization Approaches for Equitable Client Selection in Federated Learning [4.167345675621377]
In a conventional Learning framework, client selection for training typically involves the random sampling of a subset of clients in each iteration.
This paper introduces two novel methods, namely SUBTRUNC and UNIONFL, designed to address the limitations of random client selection.
arXiv Detail & Related papers (2024-08-24T22:40:31Z) - Causal Customer Churn Analysis with Low-rank Tensor Block Hazard Model [4.694536172504849]
This study introduces an innovative method for analyzing the impact of various interventions on customer churn, using the potential outcomes framework.
We present a new causal model, the tensorized latent factor block hazard model, which incorporates tensor completion methods for a principled causal analysis of customer churn.
arXiv Detail & Related papers (2024-05-18T19:54:14Z) - FedADMM-InSa: An Inexact and Self-Adaptive ADMM for Federated Learning [1.802525429431034]
We propose an inexact and self-adaptive FedADMM algorithm, termed FedADMM-InSa.
The convergence of the resulting inexact ADMM is proved under the assumption of strongly convex loss functions.
Our proposed algorithm can reduce the clients' local computational load significantly and also accelerate the learning process compared to the vanilla FedADMM.
arXiv Detail & Related papers (2024-02-21T18:19:20Z) - TRIAGE: Characterizing and auditing training data for improved
regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors.
TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score.
We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated
Learning via Class-Imbalance Reduction [76.26710990597498]
We show that the class-imbalance of the grouped data from randomly selected clients can lead to significant performance degradation.
Based on our key observation, we design an efficient client sampling mechanism, i.e., Federated Class-balanced Sampling (Fed-CBS)
In particular, we propose a measure of class-imbalance and then employ homomorphic encryption to derive this measure in a privacy-preserving way.
arXiv Detail & Related papers (2022-09-30T05:42:56Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Topology-based Clusterwise Regression for User Segmentation and Demand
Forecasting [63.78344280962136]
Using a public and a novel proprietary data set of commercial data, this research shows that the proposed system enables analysts to both cluster their user base and plan demand at a granular level.
This work seeks to introduce TDA-based clustering of time series and clusterwise regression with matrix factorization methods as viable tools for the practitioner.
arXiv Detail & Related papers (2020-09-08T12:10:10Z) - Counterfactual Learning of Stochastic Policies with Continuous Actions:
from Models to Offline Evaluation [41.21447375318793]
We introduce a modelling strategy based on a joint kernel embedding of contexts and actions.
We empirically show that the optimization aspect of counterfactual learning is important.
We propose an evaluation protocol for offline policies in real-world logged systems.
arXiv Detail & Related papers (2020-04-22T07:42:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.