Empowering Many, Biasing a Few: Generalist Credit Scoring through Large
Language Models
- URL: http://arxiv.org/abs/2310.00566v3
- Date: Sun, 18 Feb 2024 01:24:17 GMT
- Title: Empowering Many, Biasing a Few: Generalist Credit Scoring through Large
Language Models
- Authors: Duanyu Feng, Yongfu Dai, Jimin Huang, Yifang Zhang, Qianqian Xie,
Weiguang Han, Zhengyu Chen, Alejandro Lopez-Lira, Hao Wang
- Abstract summary: Large Language Models (LLMs) have great potential for credit scoring tasks, with strong generalization ability across multiple tasks.
We propose the first open-source comprehensive framework for exploring LLMs for credit scoring.
We then propose the first Credit and Risk Assessment Large Language Model (CALM) by instruction tuning, tailored to the nuanced demands of various financial risk assessment tasks.
- Score: 53.620827459684094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the financial industry, credit scoring is a fundamental element, shaping
access to credit and determining the terms of loans for individuals and
businesses alike. Traditional credit scoring methods, however, often grapple
with challenges such as narrow knowledge scope and isolated evaluation of
credit tasks. Our work posits that Large Language Models (LLMs) have great
potential for credit scoring tasks, with strong generalization ability across
multiple tasks. To systematically explore LLMs for credit scoring, we propose
the first open-source comprehensive framework. We curate a novel benchmark
covering 9 datasets with 14K samples, tailored for credit assessment and a
critical examination of potential biases within LLMs, and the novel instruction
tuning data with over 45k samples. We then propose the first Credit and Risk
Assessment Large Language Model (CALM) by instruction tuning, tailored to the
nuanced demands of various financial risk assessment tasks. We evaluate CALM,
existing state-of-art (SOTA) methods, open source and closed source LLMs on the
build benchmark. Our empirical results illuminate the capability of LLMs to not
only match but surpass conventional models, pointing towards a future where
credit scoring can be more inclusive, comprehensive, and unbiased. We
contribute to the industry's transformation by sharing our pioneering
instruction-tuning datasets, credit and risk assessment LLM, and benchmarks
with the research community and the financial industry.
Related papers
- Evaluating Large Language Models on Financial Report Summarization: An Empirical Study [9.28042182186057]
We conduct a comparative study on three state-of-the-art Large Language Models (LLMs)
Our primary motivation is to explore how these models can be harnessed within finance, a field demanding precision, contextual relevance, and robustness against erroneous or misleading information.
We introduce an innovative evaluation framework that integrates both quantitative metrics (e.g., precision, recall) and qualitative analyses (e.g., contextual fit, consistency) to provide a holistic view of each model's output quality.
arXiv Detail & Related papers (2024-11-11T10:36:04Z) - Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge [84.34545223897578]
Despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility.
We identify 12 key potential biases and propose a new automated bias quantification framework-CALM- which quantifies and analyzes each type of bias in LLM-as-a-Judge.
Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications.
arXiv Detail & Related papers (2024-10-03T17:53:30Z) - CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models [60.59638232596912]
We introduce CLAMBER, a benchmark for evaluating large language models (LLMs)
Building upon the taxonomy, we construct 12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs.
Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries.
arXiv Detail & Related papers (2024-05-20T14:34:01Z) - Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending [1.1970409518725493]
Peer-to-peer (P2P) lending has emerged as a distinctive financing mechanism, linking borrowers with lenders through online platforms.
However, P2P lending faces the challenge of information asymmetry, as lenders often lack sufficient data to assess the creditworthiness of borrowers.
This paper proposes a novel approach to address this issue by leveraging the textual descriptions provided by borrowers during the loan application process.
arXiv Detail & Related papers (2024-01-29T10:11:05Z) - Survey on Factuality in Large Language Models: Knowledge, Retrieval and
Domain-Specificity [61.54815512469125]
This survey addresses the crucial issue of factuality in Large Language Models (LLMs)
As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital.
arXiv Detail & Related papers (2023-10-11T14:18:03Z) - Through the Lens of Core Competency: Survey on Evaluation of Large
Language Models [27.271533306818732]
Large language model (LLM) has excellent performance and wide practical uses.
Existing evaluation tasks are difficult to keep up with the wide range of applications in real-world scenarios.
We summarize 4 core competencies of LLM, including reasoning, knowledge, reliability, and safety.
Under this competency architecture, similar tasks are combined to reflect corresponding ability, while new tasks can also be easily added into the system.
arXiv Detail & Related papers (2023-08-15T17:40:34Z) - A Survey on Evaluation of Large Language Models [87.60417393701331]
Large language models (LLMs) are gaining increasing popularity in both academia and industry.
This paper focuses on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
arXiv Detail & Related papers (2023-07-06T16:28:35Z) - KoLA: Carefully Benchmarking World Knowledge of Large Language Models [87.96683299084788]
We construct a Knowledge-oriented LLM Assessment benchmark (KoLA)
We mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks.
We use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, to evaluate the capacity to handle unseen data and evolving knowledge.
arXiv Detail & Related papers (2023-06-15T17:20:46Z) - Bagging Supervised Autoencoder Classifier for Credit Scoring [3.5977219275318166]
The imbalanced nature of credit scoring datasets, as well as the heterogeneous nature of features in credit scoring datasets, pose difficulties in developing and implementing effective credit scoring models.
We propose the Bagging Supervised Autoencoder (BSAC) that mainly leverages the superior performance of the Supervised Autoencoder.
BSAC also addresses the data imbalance problem by employing a variant of the Bagging process based on the undersampling of the majority class.
arXiv Detail & Related papers (2021-08-12T17:49:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.