Contrastive Pre-training for Imbalanced Corporate Credit Ratings
- URL: http://arxiv.org/abs/2102.12580v1
- Date: Thu, 18 Feb 2021 08:14:46 GMT
- Title: Contrastive Pre-training for Imbalanced Corporate Credit Ratings
- Authors: Bojing Feng, Wenfang Xue
- Abstract summary: We propose Contrastive Pre-training for Corporate Credit Rating (CP4 CCR), which utilizes the self-surpervision for getting over class imbalance.
Experiments conducted on the Chinese public-listed corporate rating dataset, prove that CP4 CCR can improve the performance of standard corporate credit rating models.
- Score: 1.90365714903665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Corporate credit rating reflects the level of corporate credit and plays a
crucial role in modern financial risk control. But real-world credit rating
data usually shows long-tail distributions, which means heavy class imbalanced
problem challenging the corporate credit rating system greatly. To tackle that,
inspried by the recent advances of pre-train techniques in self-supervised
representation learning, we propose a novel framework named Contrastive
Pre-training for Corporate Credit Rating (CP4CCR), which utilizes the
self-surpervision for getting over class imbalance. Specifically, we propose
to, in the first phase, exert constrastive self-superivised pre-training
without label information, which want to learn a better class-agnostic
initialization. During this phase, two self-supervised task are developed
within CP4CCR: (i) Feature Masking (FM) and (ii) Feature Swapping(FS). In the
second phase, we can train any standard corporate redit rating model
initialized by the pre-trained network. Extensive experiments conducted on the
Chinese public-listed corporate rating dataset, prove that CP4CCR can improve
the performance of standard corporate credit rating models, especially for
class with few samples.
Related papers
- Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study [61.65123150513683]
multimodal foundation models, such as CLIP, produce state-of-the-art zero-shot results.
It is reported that these models close the robustness gap by matching the performance of supervised models trained on ImageNet.
We show that CLIP leads to a significant robustness drop compared to supervised ImageNet models on our benchmark.
arXiv Detail & Related papers (2024-03-15T17:33:49Z) - The Effects of Data Imbalance Under a Federated Learning Approach for
Credit Risk Forecasting [0.0]
Credit risk forecasting plays a crucial role for commercial banks and other financial institutions in granting loans to customers.
Traditional machine learning methods require the sharing of sensitive client information with an external server to build a global model.
A newly developed privacy-preserving distributed machine learning technique known as Federated Learning (FL) allows the training of a global model without the necessity of accessing private local data directly.
arXiv Detail & Related papers (2024-01-14T09:15:10Z) - Empowering Many, Biasing a Few: Generalist Credit Scoring through Large
Language Models [53.620827459684094]
Large Language Models (LLMs) have great potential for credit scoring tasks, with strong generalization ability across multiple tasks.
We propose the first open-source comprehensive framework for exploring LLMs for credit scoring.
We then propose the first Credit and Risk Assessment Large Language Model (CALM) by instruction tuning, tailored to the nuanced demands of various financial risk assessment tasks.
arXiv Detail & Related papers (2023-10-01T03:50:34Z) - Client-side Gradient Inversion Against Federated Learning from Poisoning [59.74484221875662]
Federated Learning (FL) enables distributed participants to train a global model without sharing data directly to a central server.
Recent studies have revealed that FL is vulnerable to gradient inversion attack (GIA), which aims to reconstruct the original training samples.
We propose Client-side poisoning Gradient Inversion (CGI), which is a novel attack method that can be launched from clients.
arXiv Detail & Related papers (2023-09-14T03:48:27Z) - Stabilizing and Improving Federated Learning with Non-IID Data and
Client Dropout [15.569507252445144]
Label distribution skew induced data heterogeniety has been shown to be a significant obstacle that limits the model performance in federated learning.
We propose a simple yet effective framework by introducing a prior-calibrated softmax function for computing the cross-entropy loss.
The improved model performance over existing baselines in the presence of non-IID data and client dropout is demonstrated.
arXiv Detail & Related papers (2023-03-11T05:17:59Z) - FedABC: Targeting Fair Competition in Personalized Federated Learning [76.9646903596757]
Federated learning aims to collaboratively train models without accessing their client's local private data.
We propose a novel and generic PFL framework termed Federated Averaging via Binary Classification, dubbed FedABC.
In particular, we adopt the one-vs-all'' training strategy in each client to alleviate the unfair competition between classes.
arXiv Detail & Related papers (2023-02-15T03:42:59Z) - Multi-task Envisioning Transformer-based Autoencoder for Corporate
Credit Rating Migration Early Prediction [18.374597213278626]
Being able to predict rating changes will greatly benefit both investors and regulators alike.
In this paper, we consider the corporate credit rating migration early prediction problem.
We propose a new Multi-task Envisioning Transformer-based Autoencoder model to tackle this problem.
arXiv Detail & Related papers (2022-07-10T21:12:04Z) - Cooperative Multi-Agent Actor-Critic for Privacy-Preserving Load
Scheduling in a Residential Microgrid [71.17179010567123]
We propose a privacy-preserving multi-agent actor-critic framework where the decentralized actors are trained with distributed critics.
The proposed framework can preserve the privacy of the households while simultaneously learn the multi-agent credit assignment mechanism implicitly.
arXiv Detail & Related papers (2021-10-06T14:05:26Z) - Bagging Supervised Autoencoder Classifier for Credit Scoring [3.5977219275318166]
The imbalanced nature of credit scoring datasets, as well as the heterogeneous nature of features in credit scoring datasets, pose difficulties in developing and implementing effective credit scoring models.
We propose the Bagging Supervised Autoencoder (BSAC) that mainly leverages the superior performance of the Supervised Autoencoder.
BSAC also addresses the data imbalance problem by employing a variant of the Bagging process based on the undersampling of the majority class.
arXiv Detail & Related papers (2021-08-12T17:49:08Z) - Adversarial Semi-supervised Learning for Corporate Credit Ratings [1.90365714903665]
In this work, we consider the problem of adversarial semi-supervised learning for corporate credit rating.
In the first phase, we train a normal rating system via a normal machine-learning algorithm to give unlabeled data pseudo rating level.
In the second phase, adversarial semi-supervised learning is applied uniting labeled data and pseudo-labeled data.
arXiv Detail & Related papers (2021-04-04T09:05:53Z) - Explanations of Machine Learning predictions: a mandatory step for its
application to Operational Processes [61.20223338508952]
Credit Risk Modelling plays a paramount role.
Recent machine and deep learning techniques have been applied to the task.
We suggest to use LIME technique to tackle the explainability problem in this field.
arXiv Detail & Related papers (2020-12-30T10:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.