CAFE: Catastrophic Data Leakage in Vertical Federated Learning
- URL: http://arxiv.org/abs/2110.15122v1
- Date: Tue, 26 Oct 2021 23:22:58 GMT
- Title: CAFE: Catastrophic Data Leakage in Vertical Federated Learning
- Authors: Xiao Jin, Pin-Yu Chen, Chia-Yi Hsu, Chia-Mu Yu, Tianyi Chen
- Abstract summary: Recent studies show that private training data can be leaked through the gradients sharing mechanism deployed in distributed machine learning systems.
We propose an advanced data leakage attack with theoretical justification to efficiently recover batch data from the shared aggregated gradients.
- Score: 65.56360219908142
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies show that private training data can be leaked through the
gradients sharing mechanism deployed in distributed machine learning systems,
such as federated learning (FL). Increasing batch size to complicate data
recovery is often viewed as a promising defense strategy against data leakage.
In this paper, we revisit this defense premise and propose an advanced data
leakage attack with theoretical justification to efficiently recover batch data
from the shared aggregated gradients. We name our proposed method as
\textit{\underline{c}atastrophic d\underline{a}ta leakage in vertical
\underline{f}ederated l\underline{e}arning} (CAFE). Comparing to existing data
leakage attacks, our extensive experimental results on vertical FL settings
demonstrate the effectiveness of CAFE to perform large-batch data leakage
attack with improved data recovery quality. We also propose a practical
countermeasure to mitigate CAFE. Our results suggest that private data
participated in standard FL, especially the vertical case, have a high risk of
being leaked from the training gradients. Our analysis implies unprecedented
and practical data leakage risks in those learning settings. The code of our
work is available at
\href{https://github.com/DeRafael/CAFE}{\textcolor{blue}{\url{https://github.com/DeRafael/CAFE}}}.
Related papers
- Optimal Defenses Against Gradient Reconstruction Attacks [13.728704430883987]
Federated Learning (FL) is designed to prevent data leakage through collaborative model training without centralized data storage.
It remains vulnerable to gradient reconstruction attacks that recover original training data from shared gradients.
arXiv Detail & Related papers (2024-11-06T08:22:20Z) - Understanding Deep Gradient Leakage via Inversion Influence Functions [53.1839233598743]
Deep Gradient Leakage (DGL) is a highly effective attack that recovers private training images from gradient vectors.
We propose a novel Inversion Influence Function (I$2$F) that establishes a closed-form connection between the recovered images and the private gradients.
We empirically demonstrate that I$2$F effectively approximated the DGL generally on different model architectures, datasets, attack implementations, and perturbation-based defenses.
arXiv Detail & Related papers (2023-09-22T17:26:24Z) - Concealing Sensitive Samples against Gradient Leakage in Federated
Learning [41.43099791763444]
Federated Learning (FL) is a distributed learning paradigm that enhances users privacy by eliminating the need for clients to share raw, private data with the server.
Recent studies expose the vulnerability of FL to model inversion attacks, where adversaries reconstruct users private data via eavesdropping on the shared gradient information.
We present a simple, yet effective defense strategy that obfuscates the gradients of the sensitive data with concealed samples.
arXiv Detail & Related papers (2022-09-13T04:19:35Z) - Few-Shot Class-Incremental Learning via Entropy-Regularized Data-Free
Replay [52.251188477192336]
Few-shot class-incremental learning (FSCIL) has been proposed aiming to enable a deep learning system to incrementally learn new classes with limited data.
We show through empirical results that adopting the data replay is surprisingly favorable.
We propose using data-free replay that can synthesize data by a generator without accessing real data.
arXiv Detail & Related papers (2022-07-22T17:30:51Z) - Do Gradient Inversion Attacks Make Federated Learning Unsafe? [70.0231254112197]
Federated learning (FL) allows the collaborative training of AI models without needing to share raw data.
Recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data.
In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack.
arXiv Detail & Related papers (2022-02-14T18:33:12Z) - A Novel Attribute Reconstruction Attack in Federated Learning [7.426857207652392]
Federated learning (FL) emerged as a promising learning paradigm to enable a multitude of participants to construct a joint ML model without exposing their private training data.
Existing FL designs have been shown to exhibit vulnerabilities which can be exploited by adversaries both within and outside of the system to compromise data privacy.
We develop a more effective and efficient gradient matching based method called cos-matching to reconstruct the training data attributes.
arXiv Detail & Related papers (2021-08-16T05:57:01Z) - Always Be Dreaming: A New Approach for Data-Free Class-Incremental
Learning [73.24988226158497]
We consider the high-impact problem of Data-Free Class-Incremental Learning (DFCIL)
We propose a novel incremental distillation strategy for DFCIL, contributing a modified cross-entropy training and importance-weighted feature distillation.
Our method results in up to a 25.1% increase in final task accuracy (absolute difference) compared to SOTA DFCIL methods for common class-incremental benchmarks.
arXiv Detail & Related papers (2021-06-17T17:56:08Z) - User Label Leakage from Gradients in Federated Learning [12.239472997714804]
Federated learning enables multiple users to build a joint model by sharing their model updates (gradients)
We propose Label Leakage from Gradients (LLG), a novel attack to extract the labels of the users' training data from their shared gradients.
arXiv Detail & Related papers (2021-05-19T19:21:05Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.