Related papers: Secure Federated Data Distillation

Secure Federated Data Distillation

URL: http://arxiv.org/abs/2502.13728v2
Date: Thu, 06 Mar 2025 14:07:57 GMT
Title: Secure Federated Data Distillation
Authors: Marco Arazzi, Mert Cihangiroglu, Serena Nicolazzo, Antonino Nocera,
Abstract summary: We propose a Secure Federated Data Distillation (SFDD) framework to decentralize the distillation process while preserving privacy.<n>We leverage the gradient-matching-based distillation method, adapting it for a distributed setting where clients contribute to the distillation process without sharing raw data.<n>We create an optimized Local Differential Privacy approach, called LDPO-RLD, to make our approach resilient to inference attacks.
Score: 2.5311562666866494
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dataset Distillation (DD) is a powerful technique for reducing large datasets into compact, representative synthetic datasets, accelerating Machine Learning training. However, traditional DD methods operate in a centralized manner, which poses significant privacy threats and reduces its applicability. To mitigate these risks, we propose a Secure Federated Data Distillation (SFDD) framework to decentralize the distillation process while preserving privacy. Unlike existing Federated Distillation techniques that focus on training global models with distilled knowledge, our approach aims to produce a distilled dataset without exposing local contributions. We leverage the gradient-matching-based distillation method, adapting it for a distributed setting where clients contribute to the distillation process without sharing raw data. The central aggregator iteratively refines a synthetic dataset by integrating client-side updates while ensuring data confidentiality. To make our approach resilient to inference attacks perpetrated by the server that could exploit gradient updates to reconstruct private data, we create an optimized Local Differential Privacy approach, called LDPO-RLD. Furthermore, we assess the framework's resilience against malicious clients executing backdoor attacks (such as Doorping) and demonstrate robustness under the assumption of a sufficient number of participating clients. Our experimental results demonstrate the effectiveness of SFDD and that the proposed defense concretely mitigates the identified vulnerabilities, with minimal impact on the performance of the distilled dataset. By addressing the interplay between privacy and federation in dataset distillation, this work advances the field of privacy-preserving Machine Learning making our SFDD framework a viable solution for sensitive data-sharing applications.

Related papers

Federated Distillation on Edge Devices: Efficient Client-Side Filtering for Non-IID Data [4.758807762853925]
Federated learning has emerged as a promising collaborative machine learning approach.<n>We propose a robust, resource-efficient EdgeFD method that reduces the complexity of the client-side density ratio estimation.<n>We evaluate EdgeFD across diverse practical scenarios, including strong non-IID, weak non-IID, and IID data distributions on clients, without requiring a pre-trained teacher model on the server for knowledge distillation.
arXiv Detail & Related papers (2025-08-20T15:17:59Z)
Improving Noise Efficiency in Privacy-preserving Dataset Distillation [59.57846442477106]
We introduce a novel framework that decouples sampling from optimization for better convergence and improves signal quality.<n>On CIFAR-10, our method achieves a textbf10.0% improvement with 50 images per class and textbf8.3% increase with just textbfone-fifth the distilled set size of previous state-of-the-art methods.
arXiv Detail & Related papers (2025-08-03T13:15:52Z)
Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG) FedE4RAG facilitates collaborative training of client-side RAG retrieval models. We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z)
Robust Federated Learning Against Poisoning Attacks: A GAN-Based Defense Framework [0.6554326244334868]
Federated Learning (FL) enables collaborative model training across decentralized devices without sharing raw data. We propose a privacy-preserving defense framework that leverages a Conditional Generative Adversarial Network (cGAN) to generate synthetic data at the server for authenticating client updates.
arXiv Detail & Related papers (2025-03-26T18:00:56Z)
FedEM: A Privacy-Preserving Framework for Concurrent Utility Preservation in Federated Learning [17.853502904387376]
Federated Learning (FL) enables collaborative training of models across distributed clients without sharing local data, addressing privacy concerns in decentralized systems. We propose Federated Error Minimization (FedEM), a novel algorithm that incorporates controlled perturbations through adaptive noise injection. Experimental results on benchmark datasets demonstrate that FedEM significantly reduces privacy risks and preserves model accuracy, achieving a robust balance between privacy protection and utility preservation.
arXiv Detail & Related papers (2025-03-08T02:48:00Z)
Dark Distillation: Backdooring Distilled Datasets without Accessing Raw Data [48.69361050757504]
This work is the first to address a more realistic and concerning threat: attackers may intercept the dataset distribution process, inject backdoors into the distilled datasets, and redistribute them to users.<n>While distilled datasets were previously considered resistant to backdoor attacks, we demonstrate that they remain vulnerable to such attacks.<n>Our attack method is efficient, capable of a malicious distilled dataset in under one minute in certain cases.
arXiv Detail & Related papers (2025-02-06T17:14:17Z)
Federated Instruction Tuning of LLMs with Domain Coverage Augmentation [87.49293964617128]
Federated Domain-specific Instruction Tuning (FedDIT) utilizes limited cross-client private data together with various strategies of instruction augmentation.<n>We propose FedDCA, which optimize domain coverage through greedy client center selection and retrieval-based augmentation.<n>For client-side computational efficiency and system scalability, FedDCA$*$, the variant of FedDCA, utilizes heterogeneous encoders with server-side feature alignment.
arXiv Detail & Related papers (2024-09-30T09:34:31Z)
FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners. FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks. We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z)
Logits Poisoning Attack in Federated Distillation [8.728629314547248]
We introduce FDLA, a poisoning attack method tailored for Federated Distillation (FD) We demonstrate that LPA effectively compromises client model accuracy, outperforming established baseline algorithms in this regard. Our findings underscore the critical need for robust defense mechanisms in FD settings to mitigate such adversarial threats.
arXiv Detail & Related papers (2024-01-08T06:18:46Z)
PS-FedGAN: An Efficient Federated Learning Framework Based on Partially Shared Generative Adversarial Networks For Data Privacy [56.347786940414935]
Federated Learning (FL) has emerged as an effective learning paradigm for distributed computation. This work proposes a novel FL framework that requires only partial GAN model sharing. Named as PS-FedGAN, this new framework enhances the GAN releasing and training mechanism to address heterogeneous data distributions.
arXiv Detail & Related papers (2023-05-19T05:39:40Z)
Active Membership Inference Attack under Local Differential Privacy in Federated Learning [18.017082794703555]
Federated learning (FL) was originally regarded as a framework for collaborative learning among clients with data privacy protection. We propose a new active membership inference (AMI) attack carried out by a dishonest server in FL.
arXiv Detail & Related papers (2023-02-24T15:21:39Z)
FedLAP-DP: Federated Learning by Sharing Differentially Private Loss Approximations [53.268801169075836]
We propose FedLAP-DP, a novel privacy-preserving approach for federated learning. A formal privacy analysis demonstrates that FedLAP-DP incurs the same privacy costs as typical gradient-sharing schemes. Our approach presents a faster convergence speed compared to typical gradient-sharing methods.
arXiv Detail & Related papers (2023-02-02T12:56:46Z)
Federated Learning with Privacy-Preserving Ensemble Attention Distillation [63.39442596910485]
Federated Learning (FL) is a machine learning paradigm where many local nodes collaboratively train a central model while keeping the training data decentralized. We propose a privacy-preserving FL framework leveraging unlabeled public data for one-way offline knowledge distillation. Our technique uses decentralized and heterogeneous local data like existing FL approaches, but more importantly, it significantly reduces the risk of privacy leakage.
arXiv Detail & Related papers (2022-10-16T06:44:46Z)
Stochastic Coded Federated Learning with Convergence and Privacy Guarantees [8.2189389638822]
Federated learning (FL) has attracted much attention as a privacy-preserving distributed machine learning framework. This paper proposes a coded federated learning framework, namely coded federated learning (SCFL) to mitigate the straggler issue. We characterize the privacy guarantee by the mutual information differential privacy (MI-DP) and analyze the convergence performance in federated learning.
arXiv Detail & Related papers (2022-01-25T04:43:29Z)
FedDTG:Federated Data-Free Knowledge Distillation via Three-Player Generative Adversarial Networks [4.1693122614074785]
We introduce a distributed three-player Generative Adversarial Network (GAN) to implement data-free mutual distillation. Our experiments on benchmark datasets vision demonstrate that our method outperforms other federated distillation algorithms in terms of generalization.
arXiv Detail & Related papers (2022-01-10T05:26:33Z)
SPEED: Secure, PrivatE, and Efficient Deep learning [2.283665431721732]
We introduce a deep learning framework able to deal with strong privacy constraints. Based on collaborative learning, differential privacy and homomorphic encryption, the proposed approach advances state-of-the-art.
arXiv Detail & Related papers (2020-06-16T19:31:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.