A comparative analysis of machine learning algorithms for predicting probabilities of default
- URL: http://arxiv.org/abs/2506.19789v1
- Date: Tue, 24 Jun 2025 16:56:07 GMT
- Title: A comparative analysis of machine learning algorithms for predicting probabilities of default
- Authors: Adrian Iulian Cristescu, Matteo Giordano,
- Abstract summary: Predicting the probability of default (PD) of prospective loans is a critical objective for financial institutions.<n>In recent years, machine learning (ML) algorithms have achieved remarkable success across a wide variety of prediction tasks.<n>This paper highlights the opportunities that ML algorithms offer to this field by comparing the performance of five predictive models.
- Score: 1.534667887016089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting the probability of default (PD) of prospective loans is a critical objective for financial institutions. In recent years, machine learning (ML) algorithms have achieved remarkable success across a wide variety of prediction tasks; yet, they remain relatively underutilised in credit risk analysis. This paper highlights the opportunities that ML algorithms offer to this field by comparing the performance of five predictive models-Random Forests, Decision Trees, XGBoost, Gradient Boosting and AdaBoost-to the predominantly used logistic regression, over a benchmark dataset from Scheule et al. (Credit Risk Analytics: The R Companion). Our findings underscore the strengths and weaknesses of each method, providing valuable insights into the most effective ML algorithms for PD prediction in the context of loan portfolios.
Related papers
- Efficient distributional regression trees learning algorithms for calibrated non-parametric probabilistic forecasts [0.0]
In the context of regression, instead of estimating a conditional mean, this can be achieved by producing a predictive interval for the output.<n>This paper introduces novel algorithms for learning probabilistic regression trees for the WIS or CRPS loss functions.
arXiv Detail & Related papers (2025-02-07T18:39:35Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models [24.445829787297658]
Large Language Models (LLMs) have demonstrated remarkable capabilities across various applications.
This study aims to scrutinize the validity of such probability-based evaluation methods within the context of using LLMs for Multiple Choice Questions (MCQs)
Our empirical investigation reveals that the prevalent probability-based evaluation method inadequately aligns with generation-based prediction.
arXiv Detail & Related papers (2024-02-21T15:58:37Z) - Weak Supervision Performance Evaluation via Partial Identification [46.73061437177238]
Programmatic Weak Supervision (PWS) enables supervised model training without direct access to ground truth labels.
We present a novel method to address this challenge by framing model evaluation as a partial identification problem.
Our approach derives reliable bounds on key metrics without requiring labeled data, overcoming core limitations in current weak supervision evaluation techniques.
arXiv Detail & Related papers (2023-12-07T07:15:11Z) - Credit card score prediction using machine learning models: A new
dataset [2.099922236065961]
This study investigates the utilization of machine learning (ML) models for credit card default prediction system.
The main goal here is to investigate the best-performing ML model for new proposed credit card scoring dataset.
arXiv Detail & Related papers (2023-10-04T16:46:26Z) - Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines.
We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z) - Algorithmic Foundations of Empirical X-risk Minimization [51.58884973792057]
This manuscript introduces a new optimization framework machine learning and AI, named bf empirical X-risk baseline (EXM).
X-risk is a term introduced to represent a family of compositional measures or objectives.
arXiv Detail & Related papers (2022-06-01T12:22:56Z) - An Explainable Regression Framework for Predicting Remaining Useful Life
of Machines [6.374451442486538]
This paper proposes an explainable regression framework for the prediction of machines' Remaining Useful Life (RUL)
We also evaluate several Machine Learning (ML) algorithms including classical and Neural Networks (NNs) based solutions for the task.
arXiv Detail & Related papers (2022-04-28T15:44:12Z) - Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic
Regression [51.770998056563094]
Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees.
We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-03T08:32:13Z) - Machine Learning approach for Credit Scoring [0.0]
We build a stack of machine learning models aimed at composing a state-of-the-art credit rating and default prediction system.
Our approach is an excursion through the most recent ML / AI concepts.
arXiv Detail & Related papers (2020-07-20T21:29:06Z) - A Dynamical Systems Approach for Convergence of the Bayesian EM
Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.