CADRE: Customizable Assurance of Data Readiness in Privacy-Preserving Federated Learning
- URL: http://arxiv.org/abs/2505.23849v1
- Date: Wed, 28 May 2025 21:24:46 GMT
- Title: CADRE: Customizable Assurance of Data Readiness in Privacy-Preserving Federated Learning
- Authors: Kaveen Hiniduma, Zilinghan Li, Aditya Sinha, Ravi Madduri, Suren Byna,
- Abstract summary: CADRE (Customizable Assurance of Data REadiness) for FL is a novel framework that allows users to define custom data readiness standards.<n>We demonstrate the framework's practical application by integrating it into an existing PPFL framework.
- Score: 2.9208761770445157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Privacy-Preserving Federated Learning (PPFL) is a decentralized machine learning approach where multiple clients train a model collaboratively. PPFL preserves privacy and security of the client's data by not exchanging it. However, ensuring that data at each client is of high quality and ready for federated learning (FL) is a challenge due to restricted data access. In this paper, we introduce CADRE (Customizable Assurance of Data REadiness) for FL, a novel framework that allows users to define custom data readiness (DR) standards, metrics, rules, and remedies tailored to specific FL tasks. Our framework generates comprehensive DR reports based on the user-defined metrics, rules, and remedies to ensure datasets are optimally prepared for FL while preserving privacy. We demonstrate the framework's practical application by integrating it into an existing PPFL framework. We conducted experiments across six diverse datasets, addressing seven different DR issues. The results illustrate the framework's versatility and effectiveness in ensuring DR across various dimensions, including data quality, privacy, and fairness. This approach enhances the performance and reliability of FL models as well as utilizes valuable resources by identifying and addressing data-related issues before the training phase.
Related papers
- Blockchain-Enabled Privacy-Preserving Second-Order Federated Edge Learning in Personalized Healthcare [1.859970493489417]
Federated learning (FL) has attracted increasing attention to security and privacy challenges in traditional cloud-centric machine learning models.<n>First-order FL approaches face several challenges in personalized model training due to heterogeneous non-independent and identically distributed (non-iid) data.<n>Recently, second-order FL approaches maintain the stability and consistency of non-iid datasets while improving personalized model training.
arXiv Detail & Related papers (2025-05-31T06:41:04Z) - Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG)<n>FedE4RAG facilitates collaborative training of client-side RAG retrieval models.<n>We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z) - An Interactive Framework for Implementing Privacy-Preserving Federated Learning: Experiments on Large Language Models [7.539653242367701]
Federated learning (FL) enhances privacy by keeping user data on local devices.<n>Recent attacks have demonstrated that updates shared by users during training can reveal significant information about their data.<n>We propose a framework that integrates a human entity as a privacy practitioner to determine an optimal trade-off between the model's privacy and utility.
arXiv Detail & Related papers (2025-02-11T23:07:14Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - Privacy-Enhancing Collaborative Information Sharing through Federated
Learning -- A Case of the Insurance Industry [1.8092553911119764]
The report demonstrates the benefits of harnessing the value of Federated Learning (FL) to learn a single model across multiple insurance industry datasets.
FL addresses two of the most pressing concerns: limited data volume and data variety, which are caused by privacy concerns.
During each round of FL, collaborators compute improvements on the model using their local private data, and these insights are combined to update a global model.
arXiv Detail & Related papers (2024-02-22T21:46:24Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - Privacy Preserving Bayesian Federated Learning in Heterogeneous Settings [20.33482170846688]
This paper presents a unified federated learning framework based on customized local Bayesian models that learn well even in the absence of large local datasets.
We use priors in the functional (output) space of the networks to facilitate collaboration across heterogeneous clients.
Experiments on standard FL datasets demonstrate that our approach outperforms strong baselines in both homogeneous and heterogeneous settings.
arXiv Detail & Related papers (2023-06-13T17:55:30Z) - PS-FedGAN: An Efficient Federated Learning Framework Based on Partially
Shared Generative Adversarial Networks For Data Privacy [56.347786940414935]
Federated Learning (FL) has emerged as an effective learning paradigm for distributed computation.
This work proposes a novel FL framework that requires only partial GAN model sharing.
Named as PS-FedGAN, this new framework enhances the GAN releasing and training mechanism to address heterogeneous data distributions.
arXiv Detail & Related papers (2023-05-19T05:39:40Z) - Towards Building the Federated GPT: Federated Instruction Tuning [66.7900343035733]
This paper introduces Federated Instruction Tuning (FedIT) as the learning framework for the instruction tuning of large language models (LLMs)
We demonstrate that by exploiting the heterogeneous and diverse sets of instructions on the client's end with FedIT, we improved the performance of LLMs compared to centralized training with only limited local instructions.
arXiv Detail & Related papers (2023-05-09T17:42:34Z) - Personalized Privacy-Preserving Framework for Cross-Silo Federated
Learning [0.0]
Federated learning (FL) is a promising decentralized deep learning (DL) framework that enables DL-based approaches trained collaboratively across clients without sharing private data.
In this paper, we propose a novel framework, namely Personalized Privacy-Preserving Federated Learning (PPPFL)
Our proposed framework outperforms multiple FL baselines on different datasets, including MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2023-02-22T07:24:08Z) - Improving Privacy-Preserving Vertical Federated Learning by Efficient Communication with ADMM [62.62684911017472]
Federated learning (FL) enables devices to jointly train shared models while keeping the training data local for privacy purposes.
We introduce a VFL framework with multiple heads (VIM), which takes the separate contribution of each client into account.
VIM achieves significantly higher performance and faster convergence compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-20T23:14:33Z) - Do Gradient Inversion Attacks Make Federated Learning Unsafe? [70.0231254112197]
Federated learning (FL) allows the collaborative training of AI models without needing to share raw data.
Recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data.
In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack.
arXiv Detail & Related papers (2022-02-14T18:33:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.