Leveraging Randomness in Model and Data Partitioning for Privacy Amplification
- URL: http://arxiv.org/abs/2503.03043v1
- Date: Tue, 04 Mar 2025 22:49:59 GMT
- Title: Leveraging Randomness in Model and Data Partitioning for Privacy Amplification
- Authors: Andy Dong, Wei-Ning Chen, Ayfer Ozgur,
- Abstract summary: We study how inherent randomness in the training process can be leveraged for privacy amplification.<n>This includes (1) data partitioning, where a sample participates in only a subset of training iterations, and (2) model partitioning, where a sample updates only a subset of the model parameters.<n>Our results demonstrate that randomness in the training process, which is structured rather than i.i.d. and interacts with data in complex ways, can be systematically leveraged for significant privacy amplification.
- Score: 8.52745154080651
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study how inherent randomness in the training process -- where each sample (or client in federated learning) contributes only to a randomly selected portion of training -- can be leveraged for privacy amplification. This includes (1) data partitioning, where a sample participates in only a subset of training iterations, and (2) model partitioning, where a sample updates only a subset of the model parameters. We apply our framework to model parallelism in federated learning, where each client updates a randomly selected subnetwork to reduce memory and computational overhead, and show that existing methods, e.g. model splitting or dropout, provide a significant privacy amplification gain not captured by previous privacy analysis techniques. Additionally, we introduce Balanced Iteration Subsampling, a new data partitioning method where each sample (or client) participates in a fixed number of training iterations. We show that this method yields stronger privacy amplification than Poisson (i.i.d.) sampling of data (or clients). Our results demonstrate that randomness in the training process, which is structured rather than i.i.d. and interacts with data in complex ways, can be systematically leveraged for significant privacy amplification.
Related papers
- Probabilistic Federated Prompt-Tuning with Non-IID and Imbalanced Data [35.47385526394076]
Fine-tuning pre-trained models is a popular approach in machine learning for solving complex tasks with moderate data.<n>Fine-tuning the entire pre-trained model is ineffective in federated data scenarios where local data distributions are diversely skewed.<n>Our approach transforms federated learning into a distributed set modeling task, aggregating diverse sets of prompts to globally fine-tune the pre-trained model.
arXiv Detail & Related papers (2025-02-27T04:31:34Z) - Test-Time Alignment via Hypothesis Reweighting [56.71167047381817]
Large pretrained models often struggle with underspecified tasks.<n>We propose a novel framework to address the challenge of aligning models to test-time user intent.
arXiv Detail & Related papers (2024-12-11T23:02:26Z) - Balanced Data Sampling for Language Model Training with Clustering [96.46042695333655]
We propose ClusterClip Sampling to balance the text distribution of training data for better model training.
Extensive experiments validate the effectiveness of ClusterClip Sampling.
arXiv Detail & Related papers (2024-02-22T13:20:53Z) - Personalized Privacy Amplification via Importance Sampling [3.0636509793595548]
In this paper, we examine the privacy properties of importance sampling, focusing on an individualized privacy analysis.
We find that, in importance sampling, privacy is well aligned with utility but at odds with sample size.
We propose two approaches for constructing sampling distributions: one that optimize the privacy-efficiency trade-off; and one based on a utility guarantee in the form of coresets.
arXiv Detail & Related papers (2023-07-05T17:09:10Z) - FedSampling: A Better Sampling Strategy for Federated Learning [81.85411484302952]
Federated learning (FL) is an important technique for learning models from decentralized data in a privacy-preserving way.
Existing FL methods usually uniformly sample clients for local model learning in each round.
We propose a novel data uniform sampling strategy for federated learning (FedSampling)
arXiv Detail & Related papers (2023-06-25T13:38:51Z) - AdaSelection: Accelerating Deep Learning Training through Data
Subsampling [27.46630703428186]
We introduce AdaSelection, an adaptive sub-sampling method to identify the most informative sub-samples within each minibatch.
Compared with industry-standard baselines, AdaSelection consistently displays superior performance.
arXiv Detail & Related papers (2023-06-19T07:01:28Z) - Client-specific Property Inference against Secure Aggregation in
Federated Learning [52.8564467292226]
Federated learning has become a widely used paradigm for collaboratively training a common model among different participants.
Many attacks have shown that it is still possible to infer sensitive information such as membership, property, or outright reconstruction of participant data.
We show that simple linear models can effectively capture client-specific properties only from the aggregated model updates.
arXiv Detail & Related papers (2023-03-07T14:11:01Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Privacy Amplification via Shuffled Check-Ins [2.3333090554192615]
We study a protocol for distributed computation called shuffled check-in.
It achieves strong privacy guarantees without requiring any further trust assumptions beyond a trusted shuffler.
We show that shuffled check-in achieves tight privacy guarantees through privacy amplification.
arXiv Detail & Related papers (2022-06-07T09:55:15Z) - Renyi Differential Privacy of the Subsampled Shuffle Model in
Distributed Learning [7.197592390105457]
We study privacy in a distributed learning framework, where clients collaboratively build a learning model iteratively through interactions with a server from whom we need privacy.
Motivated by optimization and the federated learning (FL) paradigm, we focus on the case where a small fraction of data samples are randomly sub-sampled in each round.
To obtain even stronger local privacy guarantees, we study this in the shuffle privacy model, where each client randomizes its response using a local differentially private (LDP) mechanism.
arXiv Detail & Related papers (2021-07-19T11:43:24Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z) - Privacy Amplification via Random Check-Ins [38.72327434015975]
Differentially Private Gradient Descent (DP-SGD) forms a fundamental building block in many applications for learning over sensitive data.
In this paper, we focus on conducting iterative methods like DP-SGD in the setting of federated learning (FL) wherein the data is distributed among many devices (clients)
Our main contribution is the emphrandom check-in distributed protocol, which crucially relies only on randomized participation decisions made locally and independently by each client.
arXiv Detail & Related papers (2020-07-13T18:14:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.