Empirical Calibration and Metric Differential Privacy in Language Models
- URL: http://arxiv.org/abs/2503.13872v1
- Date: Tue, 18 Mar 2025 03:52:12 GMT
- Title: Empirical Calibration and Metric Differential Privacy in Language Models
- Authors: Pedro Faustini, Natasha Fernandes, Annabelle McIver, Mark Dras,
- Abstract summary: We show how to empirically calibrate noise across frameworks using Membership Inference Attacks (MIAs)<n>We define a novel kind of directional privacy based on the von Mises-Fisher (VMF) distribution, a metric DP mechanism that perturbs angular distance rather than adding (isotropic) Gaussian noise.<n>We show that, even though formal guarantees are incomparable, empirical privacy calibration reveals that each mechanism has different areas of strength with respect to utility-privacy trade-offs.
- Score: 2.826489388853448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: NLP models trained with differential privacy (DP) usually adopt the DP-SGD framework, and privacy guarantees are often reported in terms of the privacy budget $\epsilon$. However, $\epsilon$ does not have any intrinsic meaning, and it is generally not possible to compare across variants of the framework. Work in image processing has therefore explored how to empirically calibrate noise across frameworks using Membership Inference Attacks (MIAs). However, this kind of calibration has not been established for NLP. In this paper, we show that MIAs offer little help in calibrating privacy, whereas reconstruction attacks are more useful. As a use case, we define a novel kind of directional privacy based on the von Mises-Fisher (VMF) distribution, a metric DP mechanism that perturbs angular distance rather than adding (isotropic) Gaussian noise, and apply this to NLP architectures. We show that, even though formal guarantees are incomparable, empirical privacy calibration reveals that each mechanism has different areas of strength with respect to utility-privacy trade-offs.
Related papers
- Empirical Privacy Variance [32.41387301450962]
We show that models calibrated to the same $(varepsilon, delta)$-DP guarantee can exhibit significant variations in empirical privacy.<n>We investigate the generality of this phenomenon across multiple dimensions and discuss why it is surprising and relevant.<n>We propose two hypotheses, identify limitations in existing techniques like privacy auditing, and outline open questions for future research.
arXiv Detail & Related papers (2025-03-16T01:43:49Z) - Comparing privacy notions for protection against reconstruction attacks in machine learning [10.466570297146953]
In the machine learning community, reconstruction attacks are a principal concern and have been identified even in federated learning (FL)<n>In response to these threats, the privacy community recommends the use of differential privacy (DP) in the gradient descent algorithm, termed DP-SGD.<n>In this paper, we lay a foundational framework for comparing mechanisms with differing notions of privacy guarantees.
arXiv Detail & Related papers (2025-02-06T13:04:25Z) - Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines.<n>We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z) - Private Language Models via Truncated Laplacian Mechanism [18.77713904999236]
We propose a novel private embedding method called the high dimensional truncated Laplacian mechanism.
We show that our method has a lower variance compared to the previous private word embedding methods.
Remarkably, even in the high privacy regime, our approach only incurs a slight decrease in utility compared to the non-private scenario.
arXiv Detail & Related papers (2024-10-10T15:25:02Z) - Uncertainty quantification by block bootstrap for differentially private stochastic gradient descent [1.0742675209112622]
Gradient Descent (SGD) is a widely used tool in machine learning.
Uncertainty quantification (UQ) for SGD by bootstrap has been addressed by several authors.
We propose a novel block bootstrap for SGD under local differential privacy.
arXiv Detail & Related papers (2024-05-21T07:47:21Z) - Provable Privacy with Non-Private Pre-Processing [56.770023668379615]
We propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms.
Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions.
arXiv Detail & Related papers (2024-03-19T17:54:49Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - Directional Privacy for Deep Learning [2.826489388853448]
Differentially Private Gradient Descent (DP-SGD) is a key method for applying privacy in the training of deep learning models.
Metric DP, however, can provide alternative mechanisms based on arbitrary metrics that might be more suitable for preserving utility.
We show that this provides both $epsilon$-DP and $epsilon d$-privacy for deep learning training, rather than the $(epsilon, delta)$-privacy of the Gaussian mechanism.
arXiv Detail & Related papers (2022-11-09T05:18:08Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Smoothed Differential Privacy [55.415581832037084]
Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis.
In this paper, we propose a natural extension of DP following the worst average-case idea behind the celebrated smoothed analysis.
We prove that any discrete mechanism with sampling procedures is more private than what DP predicts, while many continuous mechanisms with sampling procedures are still non-private under smoothed DP.
arXiv Detail & Related papers (2021-07-04T06:55:45Z) - Local Differential Privacy Is Equivalent to Contraction of
$E_\gamma$-Divergence [7.807294944710216]
We show that LDP constraints can be equivalently cast in terms of the contraction coefficient of the $E_gamma$-divergence.
We then use this equivalent formula to express LDP guarantees of privacy mechanisms in terms of contraction coefficients of arbitrary $f$-divergences.
arXiv Detail & Related papers (2021-02-02T02:18:12Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.