The Effects of Data Imbalance Under a Federated Learning Approach for
Credit Risk Forecasting
- URL: http://arxiv.org/abs/2401.07234v1
- Date: Sun, 14 Jan 2024 09:15:10 GMT
- Title: The Effects of Data Imbalance Under a Federated Learning Approach for
Credit Risk Forecasting
- Authors: Shuyao Zhang, Jordan Tay, Pedro Baiz
- Abstract summary: Credit risk forecasting plays a crucial role for commercial banks and other financial institutions in granting loans to customers.
Traditional machine learning methods require the sharing of sensitive client information with an external server to build a global model.
A newly developed privacy-preserving distributed machine learning technique known as Federated Learning (FL) allows the training of a global model without the necessity of accessing private local data directly.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Credit risk forecasting plays a crucial role for commercial banks and other
financial institutions in granting loans to customers and minimise the
potential loss. However, traditional machine learning methods require the
sharing of sensitive client information with an external server to build a
global model, potentially posing a risk of security threats and privacy
leakage. A newly developed privacy-preserving distributed machine learning
technique known as Federated Learning (FL) allows the training of a global
model without the necessity of accessing private local data directly. This
investigation examined the feasibility of federated learning in credit risk
assessment and showed the effects of data imbalance on model performance. Two
neural network architectures, Multilayer Perceptron (MLP) and Long Short-Term
Memory (LSTM), and one tree ensemble architecture, Extreme Gradient Boosting
(XGBoost), were explored across three different datasets under various
scenarios involving different numbers of clients and data distribution
configurations. We demonstrate that federated models consistently outperform
local models on non-dominant clients with smaller datasets. This trend is
especially pronounced in highly imbalanced data scenarios, yielding a
remarkable average improvement of 17.92% in model performance. However, for
dominant clients (clients with more data), federated models may not exhibit
superior performance, suggesting the need for special incentives for this type
of clients to encourage their participation.
Related papers
- Personalized Federated Learning with Mixture of Models for Adaptive Prediction and Model Fine-Tuning [22.705411388403036]
This paper develops a novel personalized federated learning algorithm.
Each client constructs a personalized model by combining a locally fine-tuned model with multiple federated models.
Theoretical analysis and experiments on real datasets corroborate the effectiveness of this approach.
arXiv Detail & Related papers (2024-10-28T21:20:51Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - One-Shot Federated Learning with Classifier-Guided Diffusion Models [44.604485649167216]
One-shot federated learning (OSFL) has gained attention in recent years due to its low communication cost.
In this paper, we explore the novel opportunities that diffusion models bring to OSFL and propose FedCADO.
FedCADO generates data that complies with clients' distributions and subsequently training the aggregated model on the server.
arXiv Detail & Related papers (2023-11-15T11:11:25Z) - Efficient Personalized Federated Learning via Sparse Model-Adaptation [47.088124462925684]
Federated Learning (FL) aims to train machine learning models for multiple clients without sharing their own private data.
We propose pFedGate for efficient personalized FL by adaptively and efficiently learning sparse local models.
We show that pFedGate achieves superior global accuracy, individual accuracy and efficiency simultaneously over state-of-the-art methods.
arXiv Detail & Related papers (2023-05-04T12:21:34Z) - Model Pruning Enables Localized and Efficient Federated Learning for
Yield Forecasting and Data Sharing [6.4742178124596625]
Federated Learning (FL) presents a decentralized approach to model training in the agri-food sector.
This paper proposes a new technical solution that utilizes network pruning on client models and aggregates the pruned models.
We experiment with a soybean yield forecasting dataset and find that this approach can improve inference performance by 15.5% to 20% compared to FedAvg.
arXiv Detail & Related papers (2023-04-19T17:53:43Z) - FedDM: Iterative Distribution Matching for Communication-Efficient
Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry.
We propose FedDM to build the global training objective from multiple local surrogate functions.
In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z) - Federated Learning with Unreliable Clients: Performance Analysis and
Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients.
However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training.
We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z) - Towards Fair Federated Learning with Zero-Shot Data Augmentation [123.37082242750866]
Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data.
We propose a novel federated learning system that employs zero-shot data augmentation on under-represented data to mitigate statistical heterogeneity and encourage more uniform accuracy performance across clients in federated networks.
We study two variants of this scheme, Fed-ZDAC (federated learning with zero-shot data augmentation at the clients) and Fed-ZDAS (federated learning with zero-shot data augmentation at the server).
arXiv Detail & Related papers (2021-04-27T18:23:54Z) - Auto-weighted Robust Federated Learning with Corrupted Data Sources [7.475348174281237]
Federated learning provides a communication-efficient and privacy-preserving training process.
Standard federated learning techniques that naively minimize an average loss function are vulnerable to data corruptions.
We propose Auto-weighted Robust Federated Learning (arfl) to provide robustness against corrupted data sources.
arXiv Detail & Related papers (2021-01-14T21:54:55Z) - Toward Understanding the Influence of Individual Clients in Federated
Learning [52.07734799278535]
Federated learning allows clients to jointly train a global model without sending their private data to a central server.
We defined a new notion called em-Influence, quantify this influence over parameters, and proposed an effective efficient model to estimate this metric.
arXiv Detail & Related papers (2020-12-20T14:34:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.