Subject Membership Inference Attacks in Federated Learning
- URL: http://arxiv.org/abs/2206.03317v3
- Date: Fri, 2 Jun 2023 13:38:47 GMT
- Title: Subject Membership Inference Attacks in Federated Learning
- Authors: Anshuman Suri, Pallika Kanani, Virendra J. Marathe, Daniel W. Peterson
- Abstract summary: We propose two black-box attacks for subject membership inference.
We find our attacks to be extremely potent, even without access to exact training records.
We also investigate the effectiveness of Differential Privacy in mitigating this threat.
- Score: 4.377743737361996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Privacy attacks on Machine Learning (ML) models often focus on inferring the
existence of particular data points in the training data. However, what the
adversary really wants to know is if a particular individual's (subject's) data
was included during training. In such scenarios, the adversary is more likely
to have access to the distribution of a particular subject than actual records.
Furthermore, in settings like cross-silo Federated Learning (FL), a subject's
data can be embodied by multiple data records that are spread across multiple
organizations. Nearly all of the existing private FL literature is dedicated to
studying privacy at two granularities -- item-level (individual data records),
and user-level (participating user in the federation), neither of which apply
to data subjects in cross-silo FL. This insight motivates us to shift our
attention from the privacy of data records to the privacy of data subjects,
also known as subject-level privacy. We propose two novel black-box attacks for
subject membership inference, of which one assumes access to a model after each
training round. Using these attacks, we estimate subject membership inference
risk on real-world data for single-party models as well as FL scenarios. We
find our attacks to be extremely potent, even without access to exact training
records, and using the knowledge of membership for a handful of subjects. To
better understand the various factors that may influence subject privacy risk
in cross-silo FL settings, we systematically generate several hundred synthetic
federation configurations, varying properties of the data, model design and
training, and the federation itself. Finally, we investigate the effectiveness
of Differential Privacy in mitigating this threat.
Related papers
- FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering [2.2194815687410627]
We show how a malicious client can leak the privacy-sensitive data of some other users in FL even without any cooperation from the server.
Our best-performing method improves the membership inference recall by 29% and achieves up to 71% private data reconstruction.
arXiv Detail & Related papers (2023-10-24T19:50:01Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network.
Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed.
As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z) - Sotto Voce: Federated Speech Recognition with Differential Privacy
Guarantees [0.761963751158349]
Speech data is expensive to collect, and incredibly sensitive to its sources.
It is often the case that organizations independently collect small datasets for their own use, but often these are not performant for the demands of machine learning.
Organizations could pool these datasets together and jointly build a strong ASR system; sharing data in the clear, however, comes with tremendous risk, in terms of intellectual property loss as well as loss of privacy of the individuals who exist in the dataset.
arXiv Detail & Related papers (2022-07-16T02:48:54Z) - Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets [53.866927712193416]
We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak private details belonging to other parties.
Our attacks are effective across membership inference, attribute inference, and data extraction.
Our results cast doubts on the relevance of cryptographic privacy guarantees in multiparty protocols for machine learning.
arXiv Detail & Related papers (2022-03-31T18:06:28Z) - Towards Multi-Objective Statistically Fair Federated Learning [1.2687030176231846]
Federated Learning (FL) has emerged as a result of data ownership and privacy concerns.
We propose a new FL framework that is able to satisfy multiple objectives including various statistical fairness metrics.
arXiv Detail & Related papers (2022-01-24T19:22:01Z) - Privacy and Robustness in Federated Learning: Attacks and Defenses [74.62641494122988]
We conduct the first comprehensive survey on this topic.
Through a concise introduction to the concept of FL, and a unique taxonomy covering: 1) threat models; 2) poisoning attacks and defenses against robustness; 3) inference attacks and defenses against privacy, we provide an accessible review of this important topic.
arXiv Detail & Related papers (2020-12-07T12:11:45Z) - WAFFLe: Weight Anonymized Factorization for Federated Learning [88.44939168851721]
In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices.
We propose Weight Anonymized Factorization for Federated Learning (WAFFLe), an approach that combines the Indian Buffet Process with a shared dictionary of weight factors for neural networks.
arXiv Detail & Related papers (2020-08-13T04:26:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.