Efficient Commercial Bank Customer Credit Risk Assessment Based on
LightGBM and Feature Engineering
- URL: http://arxiv.org/abs/2308.08762v1
- Date: Thu, 17 Aug 2023 03:32:38 GMT
- Title: Efficient Commercial Bank Customer Credit Risk Assessment Based on
LightGBM and Feature Engineering
- Authors: Yanjie Sun, Zhike Gong, Quan Shi, Lin Chen
- Abstract summary: This paper is based on the customer information dataset of a foreign commercial bank in Kaggle.
We use LightGBM algorithm to build a classifier to classify customers, to help the bank judge the possibility of customer credit default.
- Score: 5.6081706361236865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective control of credit risk is a key link in the steady operation of
commercial banks. This paper is mainly based on the customer information
dataset of a foreign commercial bank in Kaggle, and we use LightGBM algorithm
to build a classifier to classify customers, to help the bank judge the
possibility of customer credit default. This paper mainly deals with
characteristic engineering, such as missing value processing, coding,
imbalanced samples, etc., which greatly improves the machine learning effect.
The main innovation of this paper is to construct new feature attributes on the
basis of the original dataset so that the accuracy of the classifier reaches
0.734, and the AUC reaches 0.772, which is more than many classifiers based on
the same dataset. The model can provide some reference for commercial banks'
credit granting, and also provide some feature processing ideas for other
similar studies.
Related papers
- Bayesian Regression for Predicting Subscription to Bank Term Deposits in Direct Marketing Campaigns [0.0]
The purpose of this research is to examine the efficacy of logit and probit models in predicting term deposit subscriptions.
The target variable was balanced, considering the inherent imbalance in the dataset.
The logit model performed better than the probit model in handling this classification problem.
arXiv Detail & Related papers (2024-10-28T21:04:58Z) - Empowering Many, Biasing a Few: Generalist Credit Scoring through Large
Language Models [53.620827459684094]
Large Language Models (LLMs) have great potential for credit scoring tasks, with strong generalization ability across multiple tasks.
We propose the first open-source comprehensive framework for exploring LLMs for credit scoring.
We then propose the first Credit and Risk Assessment Large Language Model (CALM) by instruction tuning, tailored to the nuanced demands of various financial risk assessment tasks.
arXiv Detail & Related papers (2023-10-01T03:50:34Z) - Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated
Learning via Class-Imbalance Reduction [76.26710990597498]
We show that the class-imbalance of the grouped data from randomly selected clients can lead to significant performance degradation.
Based on our key observation, we design an efficient client sampling mechanism, i.e., Federated Class-balanced Sampling (Fed-CBS)
In particular, we propose a measure of class-imbalance and then employ homomorphic encryption to derive this measure in a privacy-preserving way.
arXiv Detail & Related papers (2022-09-30T05:42:56Z) - Machine Learning Models Evaluation and Feature Importance Analysis on
NPL Dataset [0.0]
We evaluate how different Machine learning models perform on the dataset provided by a private bank in Ethiopia.
XGBoost achieves the highest F1 score on the KMeans SMOTE over-sampled data.
arXiv Detail & Related papers (2022-08-28T17:09:44Z) - Feature-Level Fusion of Super-App and Telecommunication Alternative Data
Sources for Credit Card Fraud Detection [106.33204064461802]
We review the effectiveness of a feature-level fusion of super-app customer information, mobile phone line data, and traditional credit risk variables for the early detection of identity theft credit card fraud.
We evaluate our approach over approximately 90,000 users from a credit lender's digital platform database.
arXiv Detail & Related papers (2021-11-05T19:10:35Z) - Bagging Supervised Autoencoder Classifier for Credit Scoring [3.5977219275318166]
The imbalanced nature of credit scoring datasets, as well as the heterogeneous nature of features in credit scoring datasets, pose difficulties in developing and implementing effective credit scoring models.
We propose the Bagging Supervised Autoencoder (BSAC) that mainly leverages the superior performance of the Supervised Autoencoder.
BSAC also addresses the data imbalance problem by employing a variant of the Bagging process based on the undersampling of the majority class.
arXiv Detail & Related papers (2021-08-12T17:49:08Z) - Explanations of Machine Learning predictions: a mandatory step for its
application to Operational Processes [61.20223338508952]
Credit Risk Modelling plays a paramount role.
Recent machine and deep learning techniques have been applied to the task.
We suggest to use LIME technique to tackle the explainability problem in this field.
arXiv Detail & Related papers (2020-12-30T10:27:59Z) - A Novel Classification Approach for Credit Scoring based on Gaussian
Mixture Models [0.0]
This paper introduces a new method for credit scoring based on Gaussian Mixture Models.
Our algorithm classifies consumers into groups which are labeled as positive or negative.
We apply our model with real world databases from Australia, Japan, and Germany.
arXiv Detail & Related papers (2020-10-26T07:34:27Z) - PCAL: A Privacy-preserving Intelligent Credit Risk Modeling Framework
Based on Adversarial Learning [111.19576084222345]
This paper proposes a framework of Privacy-preserving Credit risk modeling based on Adversarial Learning (PCAL)
PCAL aims to mask the private information inside the original dataset, while maintaining the important utility information for the target prediction task performance.
Results indicate that PCAL can learn an effective, privacy-free representation from user data, providing a solid foundation towards privacy-preserving machine learning for credit risk analysis.
arXiv Detail & Related papers (2020-10-06T07:04:59Z) - Intelligent Credit Limit Management in Consumer Loans Based on Causal
Inference [5.292270534252169]
Credit limits are adjusted based on limited strategies, which are developed by experienced professionals.
In this paper, we present a data-driven approach to manage the credit limit intelligently.
arXiv Detail & Related papers (2020-07-10T06:22:44Z) - Super-App Behavioral Patterns in Credit Risk Models: Financial,
Statistical and Regulatory Implications [110.54266632357673]
We present the impact of alternative data that originates from an app-based marketplace, in contrast to traditional bureau data, upon credit scoring models.
Our results, validated across two countries, show that these new sources of data are particularly useful for predicting financial behavior in low-wealth and young individuals.
arXiv Detail & Related papers (2020-05-09T01:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.