Related papers: CAFE: Catastrophic Data Leakage in Vertical Federated Learning

CAFE: Catastrophic Data Leakage in Vertical Federated Learning

URL: http://arxiv.org/abs/2110.15122v1
Date: Tue, 26 Oct 2021 23:22:58 GMT
Title: CAFE: Catastrophic Data Leakage in Vertical Federated Learning
Authors: Xiao Jin, Pin-Yu Chen, Chia-Yi Hsu, Chia-Mu Yu, Tianyi Chen
Abstract summary: Recent studies show that private training data can be leaked through the gradients sharing mechanism deployed in distributed machine learning systems. We propose an advanced data leakage attack with theoretical justification to efficiently recover batch data from the shared aggregated gradients.
Score: 65.56360219908142
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies show that private training data can be leaked through the gradients sharing mechanism deployed in distributed machine learning systems, such as federated learning (FL). Increasing batch size to complicate data recovery is often viewed as a promising defense strategy against data leakage. In this paper, we revisit this defense premise and propose an advanced data leakage attack with theoretical justification to efficiently recover batch data from the shared aggregated gradients. We name our proposed method as \textit{\underline{c}atastrophic d\underline{a}ta leakage in vertical \underline{f}ederated l\underline{e}arning} (CAFE). Comparing to existing data leakage attacks, our extensive experimental results on vertical FL settings demonstrate the effectiveness of CAFE to perform large-batch data leakage attack with improved data recovery quality. We also propose a practical countermeasure to mitigate CAFE. Our results suggest that private data participated in standard FL, especially the vertical case, have a high risk of being leaked from the training gradients. Our analysis implies unprecedented and practical data leakage risks in those learning settings. The code of our work is available at \href{https://github.com/DeRafael/CAFE}{\textcolor{blue}{\url{https://github.com/DeRafael/CAFE}}}.

Related papers

Verifiably Forgotten? Gradient Differences Still Enable Data Reconstruction in Federated Unlearning [12.13642771298657]
Federated Unlearning (FU) has emerged as a critical compliance mechanism for data privacy regulations.<n>We propose Inverting Gradient difference to Forgotten data (IGF), a novel learning-based reconstruction attack framework.<n>IGF incorporates a tailored pixel-level inversion model optimized via a composite loss that captures both structural and semantic cues.
arXiv Detail & Related papers (2025-05-16T10:28:30Z)
Optimal Defenses Against Gradient Reconstruction Attacks [13.728704430883987]
Federated Learning (FL) is designed to prevent data leakage through collaborative model training without centralized data storage. It remains vulnerable to gradient reconstruction attacks that recover original training data from shared gradients.
arXiv Detail & Related papers (2024-11-06T08:22:20Z)
Understanding Deep Gradient Leakage via Inversion Influence Functions [53.1839233598743]
Deep Gradient Leakage (DGL) is a highly effective attack that recovers private training images from gradient vectors. We propose a novel Inversion Influence Function (I$2$F) that establishes a closed-form connection between the recovered images and the private gradients. We empirically demonstrate that I$2$F effectively approximated the DGL generally on different model architectures, datasets, attack implementations, and perturbation-based defenses.
arXiv Detail & Related papers (2023-09-22T17:26:24Z)
Concealing Sensitive Samples against Gradient Leakage in Federated Learning [41.43099791763444]
Federated Learning (FL) is a distributed learning paradigm that enhances users privacy by eliminating the need for clients to share raw, private data with the server. Recent studies expose the vulnerability of FL to model inversion attacks, where adversaries reconstruct users private data via eavesdropping on the shared gradient information. We present a simple, yet effective defense strategy that obfuscates the gradients of the sensitive data with concealed samples.
arXiv Detail & Related papers (2022-09-13T04:19:35Z)
Few-Shot Class-Incremental Learning via Entropy-Regularized Data-Free Replay [52.251188477192336]
Few-shot class-incremental learning (FSCIL) has been proposed aiming to enable a deep learning system to incrementally learn new classes with limited data. We show through empirical results that adopting the data replay is surprisingly favorable. We propose using data-free replay that can synthesize data by a generator without accessing real data.
arXiv Detail & Related papers (2022-07-22T17:30:51Z)
Do Gradient Inversion Attacks Make Federated Learning Unsafe? [70.0231254112197]
Federated learning (FL) allows the collaborative training of AI models without needing to share raw data. Recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data. In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack.
arXiv Detail & Related papers (2022-02-14T18:33:12Z)
A Novel Attribute Reconstruction Attack in Federated Learning [7.426857207652392]
Federated learning (FL) emerged as a promising learning paradigm to enable a multitude of participants to construct a joint ML model without exposing their private training data. Existing FL designs have been shown to exhibit vulnerabilities which can be exploited by adversaries both within and outside of the system to compromise data privacy. We develop a more effective and efficient gradient matching based method called cos-matching to reconstruct the training data attributes.
arXiv Detail & Related papers (2021-08-16T05:57:01Z)
Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning [73.24988226158497]
We consider the high-impact problem of Data-Free Class-Incremental Learning (DFCIL) We propose a novel incremental distillation strategy for DFCIL, contributing a modified cross-entropy training and importance-weighted feature distillation. Our method results in up to a 25.1% increase in final task accuracy (absolute difference) compared to SOTA DFCIL methods for common class-incremental benchmarks.
arXiv Detail & Related papers (2021-06-17T17:56:08Z)
User Label Leakage from Gradients in Federated Learning [12.239472997714804]
Federated learning enables multiple users to build a joint model by sharing their model updates (gradients) We propose Label Leakage from Gradients (LLG), a novel attack to extract the labels of the users' training data from their shared gradients.
arXiv Detail & Related papers (2021-05-19T19:21:05Z)
Provably Efficient Causal Reinforcement Learning with Confounded Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting. We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.