Machine Learning with Privacy for Protected Attributes
- URL: http://arxiv.org/abs/2506.19836v1
- Date: Tue, 24 Jun 2025 17:53:28 GMT
- Title: Machine Learning with Privacy for Protected Attributes
- Authors: Saeed Mahloujifar, Chuan Guo, G. Edward Suh, Kamalika Chaudhuri,
- Abstract summary: We refine the definition of differential privacy (DP) to create a more general and flexible framework that we call feature differential privacy (FDP)<n>Our definition is simulation-based and allows for both addition/removal and replacement variants of privacy, and can handle arbitrary separation of protected and non-protected features.<n>We apply our framework to various machine learning tasks and show that it can significantly improve the utility of DP-trained models when public features are available.
- Score: 56.44253915927481
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differential privacy (DP) has become the standard for private data analysis. Certain machine learning applications only require privacy protection for specific protected attributes. Using naive variants of differential privacy in such use cases can result in unnecessary degradation of utility. In this work, we refine the definition of DP to create a more general and flexible framework that we call feature differential privacy (FDP). Our definition is simulation-based and allows for both addition/removal and replacement variants of privacy, and can handle arbitrary and adaptive separation of protected and non-protected features. We prove the properties of FDP, such as adaptive composition, and demonstrate its implications for limiting attribute inference attacks. We also propose a modification of the standard DP-SGD algorithm that satisfies FDP while leveraging desirable properties such as amplification via sub-sampling. We apply our framework to various machine learning tasks and show that it can significantly improve the utility of DP-trained models when public features are available. For example, we train diffusion models on the AFHQ dataset of animal faces and observe a drastic improvement in FID compared to DP, from 286.7 to 101.9 at $\epsilon=8$, assuming that the blurred version of a training image is available as a public feature. Overall, our work provides a new approach to private data analysis that can help reduce the utility cost of DP while still providing strong privacy guarantees.
Related papers
- Differentially Private Adaptation of Diffusion Models via Noisy Aggregated Embeddings [23.687702204151872]
Textual Inversion (TI) learns an embedding vector for an image or set of images, to enable adaptation under differential privacy constraints.<n>We show DPAgg-TI outperforms DP-SGD finetuning in both utility and robustness under the same privacy budget.
arXiv Detail & Related papers (2024-11-22T00:09:49Z) - Activity Recognition on Avatar-Anonymized Datasets with Masked Differential Privacy [64.32494202656801]
Privacy-preserving computer vision is an important emerging problem in machine learning and artificial intelligence.<n>We present anonymization pipeline that replaces sensitive human subjects in video datasets with synthetic avatars within context.<n>We also proposeMaskDP to protect non-anonymized but privacy sensitive background information.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - Personalized Differential Privacy for Ridge Regression [3.4751583941317166]
We introduce our novel Personalized-DP Output Perturbation method ( PDP-OP) that enables to train Ridge regression models with individual per data point privacy levels.
We provide rigorous privacy proofs for our PDP-OP as well as accuracy guarantees for the resulting model.
We show that PDP-OP outperforms the personalized privacy techniques of Jorgensen et al.
arXiv Detail & Related papers (2024-01-30T16:00:14Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.<n>We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Probing the Transition to Dataset-Level Privacy in ML Models Using an
Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets.
We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z) - Personalized DP-SGD using Sampling Mechanisms [5.50042037663784]
We extend Differentially Private Gradient Descent (DP-SGD) to support a recent privacy notion called ($Phi$,$Delta$)- Personalized Differential Privacy (($Phi$,$Delta$)- PDP.
Our algorithm uses a multi-round personalized sampling mechanism and embeds it within the DP-SGD iteration.
Experiments on real datasets show that our algorithm outperforms DP-SGD and simple combinations of DP-SGD with existing PDP mechanisms.
arXiv Detail & Related papers (2023-05-24T13:56:57Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Smoothed Differential Privacy [55.415581832037084]
Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis.
In this paper, we propose a natural extension of DP following the worst average-case idea behind the celebrated smoothed analysis.
We prove that any discrete mechanism with sampling procedures is more private than what DP predicts, while many continuous mechanisms with sampling procedures are still non-private under smoothed DP.
arXiv Detail & Related papers (2021-07-04T06:55:45Z) - Optimal Accounting of Differential Privacy via Characteristic Function [25.78065563380023]
We propose a unification of recent advances (Renyi DP, privacy profiles, $f$-DP and the PLD formalism) via the characteristic function ($phi$-function) of a certain worst-case'' privacy loss random variable.
We show that our approach allows natural adaptive composition like Renyi DP, provides exactly tight privacy accounting like PLD, and can be (often losslessly) converted to privacy profile and $f$-DP.
arXiv Detail & Related papers (2021-06-16T06:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.