Related papers: Credit card score prediction using machine learning models: A new dataset

Credit card score prediction using machine learning models: A new dataset

URL: http://arxiv.org/abs/2310.02956v2
Date: Sun, 15 Oct 2023 06:27:58 GMT
Title: Credit card score prediction using machine learning models: A new dataset
Authors: Anas Arram, Masri Ayob, Musatafa Abbas Abbood Albadr, Alaa Sulaiman, Dheeb Albashish
Abstract summary: This study investigates the utilization of machine learning (ML) models for credit card default prediction system. The main goal here is to investigate the best-performing ML model for new proposed credit card scoring dataset.
Score: 2.099922236065961
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The use of credit cards has recently increased, creating an essential need for credit card assessment methods to minimize potential risks. This study investigates the utilization of machine learning (ML) models for credit card default prediction system. The main goal here is to investigate the best-performing ML model for new proposed credit card scoring dataset. This new dataset includes credit card transaction histories and customer profiles, is proposed and tested using a variety of machine learning algorithms, including logistic regression, decision trees, random forests, multi-layer perceptron (MLP) neural network, XGBoost, and LightGBM. To prepare the data for machine learning models, we perform data pre-processing, feature extraction, feature selection, and data balancing techniques. Experimental results demonstrate that MLP outperforms logistic regression, decision trees, random forests, LightGBM, and XGBoost in terms of predictive performance in true positive rate, achieving an impressive area under the curve (AUC) of 86.7% and an accuracy rate of 91.6%, with a recall rate exceeding 80%. These results indicate the superiority of MLP in predicting the default customers and assessing the potential risks. Furthermore, they help banks and other financial institutions in predicting loan defaults at an earlier stage.

Related papers

Improving Credit Card Fraud Detection with an Optimized Explainable Boosting Machine [0.0]
The study proposes an enhanced workflow based on the Explainable Boosting Machine (EBM)<n>The optimized EBM achieves an effective balance between accuracy and interpretability, enabling precise detection of fraudulent transactions.<n> Experimental evaluation on benchmark credit card data yields an ROC-AUC of 0.983, surpassing prior EBM baselines.
arXiv Detail & Related papers (2026-02-06T18:56:17Z)
Enhancing Credit Default Prediction Using Boruta Feature Selection and DBSCAN Algorithm with Different Resampling Techniques [0.0]
This study examines credit default prediction by comparing three techniques, namely SMOTE, SMOTE-Tomek, and ADASYN.<n> recognizing that credit default datasets are typically skewed, we began our analysis by evaluating machine learning (ML) models on the imbalanced data.
arXiv Detail & Related papers (2025-09-23T13:43:18Z)
A comparative analysis of machine learning algorithms for predicting probabilities of default [1.534667887016089]
Predicting the probability of default (PD) of prospective loans is a critical objective for financial institutions.<n>In recent years, machine learning (ML) algorithms have achieved remarkable success across a wide variety of prediction tasks.<n>This paper highlights the opportunities that ML algorithms offer to this field by comparing the performance of five predictive models.
arXiv Detail & Related papers (2025-06-24T16:56:07Z)
DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets. Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining. Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z)
Leveraging Convolutional Neural Network-Transformer Synergy for Predictive Modeling in Risk-Based Applications [5.914777314371152]
This paper proposes a deep learning model based on the combination of convolutional neural networks (CNN) and Transformer for credit user default prediction. The results show that the CNN+Transformer model outperforms traditional machine learning models, such as random forests and XGBoost. This study provides a new idea for credit default prediction and provides strong support for risk assessment and intelligent decision-making in the financial field.
arXiv Detail & Related papers (2024-12-24T07:07:14Z)
Bank Loan Prediction Using Machine Learning Techniques [0.0]
We have worked on a dataset of 148,670 instances and 37 attributes using machine learning methods. The best-performing algorithm was AdaBoosting, which achieved an incredible accuracy of 99.99%.
arXiv Detail & Related papers (2024-10-11T15:01:47Z)
Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning. By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z)
F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm. By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases. Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z)
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process. We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z)
Improving Fairness in Credit Lending Models using Subgroup Threshold Optimization [0.0]
We introduce a new fairness technique called textitSubgroup Threshold (textitSTO) STO works by optimizing the classification thresholds for individual subgroups in order to minimize the overall discrimination score between them. Our experiments on a real-world credit lending dataset show that STO can reduce gender discrimination by over 90%.
arXiv Detail & Related papers (2024-03-15T19:36:56Z)
Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients. FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification. Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z)
Feature Selection with Annealing for Forecasting Financial Time Series [2.44755919161855]
This study provides a comprehensive method for forecasting financial time series based on tactical input output feature mapping techniques using machine learning (ML) models. Experiments indicate that the FSA algorithm increased the performance of ML models, regardless of problem type.
arXiv Detail & Related papers (2023-03-03T21:33:38Z)
Machine Learning Models Evaluation and Feature Importance Analysis on NPL Dataset [0.0]
We evaluate how different Machine learning models perform on the dataset provided by a private bank in Ethiopia. XGBoost achieves the highest F1 score on the KMeans SMOTE over-sampled data.
arXiv Detail & Related papers (2022-08-28T17:09:44Z)
Predicting Credit Risk for Unsecured Lending: A Machine Learning Approach [0.0]
This research paper is to build a contemporary credit scoring model to forecast credit defaults for unsecured lending (credit cards) Our research indicates that the Light Gradient Boosting Machine (LGBM) model is better equipped to deliver higher learning speeds, better efficiencies and manage larger data volumes. We expect that deployment of this model will enable better and timely prediction of credit defaults for decision-makers in commercial lending institutions and banks.
arXiv Detail & Related papers (2021-10-05T17:54:56Z)
Explanations of Machine Learning predictions: a mandatory step for its application to Operational Processes [61.20223338508952]
Credit Risk Modelling plays a paramount role. Recent machine and deep learning techniques have been applied to the task. We suggest to use LIME technique to tackle the explainability problem in this field.
arXiv Detail & Related papers (2020-12-30T10:27:59Z)
Transfer Learning without Knowing: Reprogramming Black-box Machine Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model. Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses. BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.