Machine Learning Models Evaluation and Feature Importance Analysis on
NPL Dataset
- URL: http://arxiv.org/abs/2209.09638v1
- Date: Sun, 28 Aug 2022 17:09:44 GMT
- Title: Machine Learning Models Evaluation and Feature Importance Analysis on
NPL Dataset
- Authors: Rufael Fekadu, Anteneh Getachew, Yishak Tadele, Nuredin Ali, Israel
Goytom
- Abstract summary: We evaluate how different Machine learning models perform on the dataset provided by a private bank in Ethiopia.
XGBoost achieves the highest F1 score on the KMeans SMOTE over-sampled data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Predicting the probability of non-performing loans for individuals has a
vital and beneficial role for banks to decrease credit risk and make the right
decisions before giving the loan. The trend to make these decisions are based
on credit study and in accordance with generally accepted standards, loan
payment history, and demographic data of the clients. In this work, we evaluate
how different Machine learning models such as Random Forest, Decision tree,
KNN, SVM, and XGBoost perform on the dataset provided by a private bank in
Ethiopia. Further, motivated by this evaluation we explore different feature
selection methods to state the important features for the bank. Our findings
show that XGBoost achieves the highest F1 score on the KMeans SMOTE
over-sampled data. We also found that the most important features are the age
of the applicant, years of employment, and total income of the applicant rather
than collateral-related features in evaluating credit risk.
Related papers
- Deep Bayesian Active Learning for Preference Modeling in Large Language Models [84.817400962262]
We propose the Bayesian Active Learner for Preference Modeling (BAL-PM) for Preference Modeling.
BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous Bayesian acquisition policies.
Our experiments demonstrate that BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous Bayesian acquisition policies.
arXiv Detail & Related papers (2024-06-14T13:32:43Z) - Empowering Many, Biasing a Few: Generalist Credit Scoring through Large
Language Models [53.620827459684094]
Large Language Models (LLMs) have great potential for credit scoring tasks, with strong generalization ability across multiple tasks.
We propose the first open-source comprehensive framework for exploring LLMs for credit scoring.
We then propose the first Credit and Risk Assessment Large Language Model (CALM) by instruction tuning, tailored to the nuanced demands of various financial risk assessment tasks.
arXiv Detail & Related papers (2023-10-01T03:50:34Z) - Inclusive FinTech Lending via Contrastive Learning and Domain Adaptation [9.75150920742607]
FinTech lending has played a significant role in facilitating financial inclusion.
There are concerns about the potentially biased algorithmic decision-making during loan screening.
We propose a new Transformer-based sequential loan screening model with self-supervised contrastive learning and domain adaptation.
arXiv Detail & Related papers (2023-05-10T01:11:35Z) - SF-PATE: Scalable, Fair, and Private Aggregation of Teacher Ensembles [50.90773979394264]
This paper studies a model that protects the privacy of individuals' sensitive information while also allowing it to learn non-discriminatory predictors.
A key characteristic of the proposed model is to enable the adoption of off-the-selves and non-private fair models to create a privacy-preserving and fair model.
arXiv Detail & Related papers (2022-04-11T14:42:54Z) - Bagging Supervised Autoencoder Classifier for Credit Scoring [3.5977219275318166]
The imbalanced nature of credit scoring datasets, as well as the heterogeneous nature of features in credit scoring datasets, pose difficulties in developing and implementing effective credit scoring models.
We propose the Bagging Supervised Autoencoder (BSAC) that mainly leverages the superior performance of the Supervised Autoencoder.
BSAC also addresses the data imbalance problem by employing a variant of the Bagging process based on the undersampling of the majority class.
arXiv Detail & Related papers (2021-08-12T17:49:08Z) - Differential Privacy for Credit Risk Model [0.0]
We assess differential privacy as a solution to address privacy problems.
We evaluate one such tool from LeapYear as applied to the Credit Risk modeling domain.
arXiv Detail & Related papers (2021-06-24T09:58:49Z) - Explanations of Machine Learning predictions: a mandatory step for its
application to Operational Processes [61.20223338508952]
Credit Risk Modelling plays a paramount role.
Recent machine and deep learning techniques have been applied to the task.
We suggest to use LIME technique to tackle the explainability problem in this field.
arXiv Detail & Related papers (2020-12-30T10:27:59Z) - PCAL: A Privacy-preserving Intelligent Credit Risk Modeling Framework
Based on Adversarial Learning [111.19576084222345]
This paper proposes a framework of Privacy-preserving Credit risk modeling based on Adversarial Learning (PCAL)
PCAL aims to mask the private information inside the original dataset, while maintaining the important utility information for the target prediction task performance.
Results indicate that PCAL can learn an effective, privacy-free representation from user data, providing a solid foundation towards privacy-preserving machine learning for credit risk analysis.
arXiv Detail & Related papers (2020-10-06T07:04:59Z) - Determining Secondary Attributes for Credit Evaluation in P2P Lending [0.0]
We utilize machine learning classification and clustering algorithms to accurately predict a borrower's creditworthiness.
We achieved 65% F1 and 73% AUC on the LendingClub data while identifying key secondary attributes.
arXiv Detail & Related papers (2020-06-08T16:12:00Z) - Super-App Behavioral Patterns in Credit Risk Models: Financial,
Statistical and Regulatory Implications [110.54266632357673]
We present the impact of alternative data that originates from an app-based marketplace, in contrast to traditional bureau data, upon credit scoring models.
Our results, validated across two countries, show that these new sources of data are particularly useful for predicting financial behavior in low-wealth and young individuals.
arXiv Detail & Related papers (2020-05-09T01:32:03Z) - Predicting Bank Loan Default with Extreme Gradient Boosting [0.0]
We use an Extreme Gradient Boosting algorithm called XGBoost for loan default prediction.
The prediction is based on a loan data from a leading bank taking into consideration data sets from both the loan application and the demographic of the applicant.
arXiv Detail & Related papers (2020-01-18T18:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.