The value of text for small business default prediction: A deep learning
approach
- URL: http://arxiv.org/abs/2003.08964v4
- Date: Wed, 7 Jul 2021 19:19:33 GMT
- Title: The value of text for small business default prediction: A deep learning
approach
- Authors: Matthew Stevenson, Christophe Mues and Cristi\'an Bravo
- Abstract summary: It is standard policy for a loan officer to provide a textual loan assessment to mitigate limited data availability.
We exploit recent advances from the field of Deep Learning and Natural Language Processing to extract information from 60 000 textual assessments provided by a lender.
We find that the text alone is surprisingly effective for predicting default, but when combined with traditional data, it yields no additional predictive capability.
Our proposed deep learning model does, however, appear to be robust to the quality of the text and therefore suitable for partly automating the mSME lending process.
- Score: 9.023847175654602
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Compared to consumer lending, Micro, Small and Medium Enterprise (mSME)
credit risk modelling is particularly challenging, as, often, the same sources
of information are not available. Therefore, it is standard policy for a loan
officer to provide a textual loan assessment to mitigate limited data
availability. In turn, this statement is analysed by a credit expert alongside
any available standard credit data. In our paper, we exploit recent advances
from the field of Deep Learning and Natural Language Processing (NLP),
including the BERT (Bidirectional Encoder Representations from Transformers)
model, to extract information from 60 000 textual assessments provided by a
lender. We consider the performance in terms of the AUC (Area Under the
receiver operating characteristic Curve) and Brier Score metrics and find that
the text alone is surprisingly effective for predicting default. However, when
combined with traditional data, it yields no additional predictive capability,
with performance dependent on the text's length. Our proposed deep learning
model does, however, appear to be robust to the quality of the text and
therefore suitable for partly automating the mSME lending process. We also
demonstrate how the content of loan assessments influences performance, leading
us to a series of recommendations on a new strategy for collecting future mSME
loan assessments.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - RepEval: Effective Text Evaluation with LLM Representation [55.26340302485898]
RepEval is a metric that leverages the projection of Large Language Models (LLMs) representations for evaluation.
Our work underscores the richness of information regarding text quality embedded within LLM representations, offering insights for the development of new metrics.
arXiv Detail & Related papers (2024-04-30T13:50:55Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending [1.1970409518725493]
Peer-to-peer (P2P) lending has emerged as a distinctive financing mechanism, linking borrowers with lenders through online platforms.
However, P2P lending faces the challenge of information asymmetry, as lenders often lack sufficient data to assess the creditworthiness of borrowers.
This paper proposes a novel approach to address this issue by leveraging the textual descriptions provided by borrowers during the loan application process.
arXiv Detail & Related papers (2024-01-29T10:11:05Z) - Empowering Many, Biasing a Few: Generalist Credit Scoring through Large
Language Models [53.620827459684094]
Large Language Models (LLMs) have great potential for credit scoring tasks, with strong generalization ability across multiple tasks.
We propose the first open-source comprehensive framework for exploring LLMs for credit scoring.
We then propose the first Credit and Risk Assessment Large Language Model (CALM) by instruction tuning, tailored to the nuanced demands of various financial risk assessment tasks.
arXiv Detail & Related papers (2023-10-01T03:50:34Z) - Inclusive FinTech Lending via Contrastive Learning and Domain Adaptation [9.75150920742607]
FinTech lending has played a significant role in facilitating financial inclusion.
There are concerns about the potentially biased algorithmic decision-making during loan screening.
We propose a new Transformer-based sequential loan screening model with self-supervised contrastive learning and domain adaptation.
arXiv Detail & Related papers (2023-05-10T01:11:35Z) - Machine Learning Models Evaluation and Feature Importance Analysis on
NPL Dataset [0.0]
We evaluate how different Machine learning models perform on the dataset provided by a private bank in Ethiopia.
XGBoost achieves the highest F1 score on the KMeans SMOTE over-sampled data.
arXiv Detail & Related papers (2022-08-28T17:09:44Z) - Explanations of Machine Learning predictions: a mandatory step for its
application to Operational Processes [61.20223338508952]
Credit Risk Modelling plays a paramount role.
Recent machine and deep learning techniques have been applied to the task.
We suggest to use LIME technique to tackle the explainability problem in this field.
arXiv Detail & Related papers (2020-12-30T10:27:59Z) - Super-App Behavioral Patterns in Credit Risk Models: Financial,
Statistical and Regulatory Implications [110.54266632357673]
We present the impact of alternative data that originates from an app-based marketplace, in contrast to traditional bureau data, upon credit scoring models.
Our results, validated across two countries, show that these new sources of data are particularly useful for predicting financial behavior in low-wealth and young individuals.
arXiv Detail & Related papers (2020-05-09T01:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.