The Value of Big Data for Credit Scoring: Enhancing Financial Inclusion
using Mobile Phone Data and Social Network Analytics
- URL: http://arxiv.org/abs/2002.09931v1
- Date: Sun, 23 Feb 2020 16:13:56 GMT
- Title: The Value of Big Data for Credit Scoring: Enhancing Financial Inclusion
using Mobile Phone Data and Social Network Analytics
- Authors: Mar\'ia \'Oskarsd\'ottir, Cristi\'an Bravo, Carlos Sarraute, Jan
Vanthienen, Bart Baesens
- Abstract summary: This paper leverages alternative data sources to enhance both statistical and economic model performance.
A unique combination of datasets, including call-detail records, credit and debit account information of customers is used.
The results have an impact in terms of ethical use of call-detail records, regulatory implications, financial inclusion, as well as data sharing and privacy.
- Score: 6.919243767837341
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Credit scoring is without a doubt one of the oldest applications of
analytics. In recent years, a multitude of sophisticated classification
techniques have been developed to improve the statistical performance of credit
scoring models. Instead of focusing on the techniques themselves, this paper
leverages alternative data sources to enhance both statistical and economic
model performance. The study demonstrates how including call networks, in the
context of positive credit information, as a new Big Data source has added
value in terms of profit by applying a profit measure and profit-based feature
selection. A unique combination of datasets, including call-detail records,
credit and debit account information of customers is used to create scorecards
for credit card applicants. Call-detail records are used to build call networks
and advanced social network analytics techniques are applied to propagate
influence from prior defaulters throughout the network to produce influence
scores. The results show that combining call-detail records with traditional
data in credit scoring models significantly increases their performance when
measured in AUC. In terms of profit, the best model is the one built with only
calling behavior features. In addition, the calling behavior features are the
most predictive in other models, both in terms of statistical and economic
performance. The results have an impact in terms of ethical use of call-detail
records, regulatory implications, financial inclusion, as well as data sharing
and privacy.
Related papers
- LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Privacy-Preserving Financial Anomaly Detection via Federated Learning & Multi-Party Computation [17.314619091307343]
We describe a privacy-preserving framework that allows financial institutions to jointly train highly accurate anomaly detection models.
We show that our solution enables the network to train a highly accurate anomaly detection model while preserving privacy of customer data.
arXiv Detail & Related papers (2023-10-06T19:16:41Z) - Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models.
We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations.
By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z) - Assessment of creditworthiness models privacy-preserving training with
synthetic data [4.014524824655106]
We evaluate the performance of models trained with synthetic data when applied to real-world data.
creditworthiness assessment models trained with synthetic data show a reduction of 3% of AUC and 6% of KS when compared with models trained with real data.
arXiv Detail & Related papers (2022-12-31T19:13:14Z) - On the dynamics of credit history and social interaction features, and
their impact on creditworthiness assessment performance [3.6748639131154315]
This study aims to understand the creditworthiness assessment performance dynamics and how it is influenced by the credit history, repayment behavior, and social network features.
Our research shows that borrowers' history increases performance at a decreasing rate during the first six months and then stabilizes.
The most notable effect on perfomance of social networks features occurs at loan application.
arXiv Detail & Related papers (2022-04-13T00:42:27Z) - On the combination of graph data for assessing thin-file borrowers'
creditworthiness [0.0]
We introduce a framework to improve credit scoring models by blending several Graph Representation Learning methods.
We validated this framework using a unique dataset that characterizes the relationships and credit history for the entire population of a Latin American country.
In Corporate lending, where the gain is much higher, it confirms that evaluating an unbanked company cannot solely consider its features.
arXiv Detail & Related papers (2021-11-26T18:45:23Z) - Relational Graph Neural Networks for Fraud Detection in a Super-App
environment [53.561797148529664]
We propose a framework of relational graph convolutional networks methods for fraudulent behaviour prevention in the financial services of a Super-App.
We use an interpretability algorithm for graph neural networks to determine the most important relations to the classification task of the users.
Our results show that there is an added value when considering models that take advantage of the alternative data of the Super-App and the interactions found in their high connectivity.
arXiv Detail & Related papers (2021-07-29T00:02:06Z) - Enhancing User' s Income Estimation with Super-App Alternative Data [59.60094442546867]
It compares the performance of these alternative data sources with the performance of industry-accepted bureau income estimators.
Ultimately, this paper shows the incentive for financial institutions to seek to incorporate alternative data into constructing their risk profiles.
arXiv Detail & Related papers (2021-04-12T21:34:44Z) - Supporting Financial Inclusion with Graph Machine Learning and Super-App
Alternative Data [63.942632088208505]
Super-Apps have changed the way we think about the interactions between users and commerce.
This paper investigates how different interactions between users within a Super-App provide a new source of information to predict borrower behavior.
arXiv Detail & Related papers (2021-02-19T15:13:06Z) - A comparative study of forecasting Corporate Credit Ratings using Neural
Networks, Support Vector Machines, and Decision Trees [0.0]
Credit ratings are one of the primary keys that reflect the level of riskiness and reliability of corporations to meet their financial obligations.
Successful machine learning methods can provide rapid analysis of credit scores while updating older ones on a daily time scale.
arXiv Detail & Related papers (2020-07-13T18:47:20Z) - Super-App Behavioral Patterns in Credit Risk Models: Financial,
Statistical and Regulatory Implications [110.54266632357673]
We present the impact of alternative data that originates from an app-based marketplace, in contrast to traditional bureau data, upon credit scoring models.
Our results, validated across two countries, show that these new sources of data are particularly useful for predicting financial behavior in low-wealth and young individuals.
arXiv Detail & Related papers (2020-05-09T01:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.