Related papers: De-amplifying Bias from Differential Privacy in Language Model Fine-tuning

De-amplifying Bias from Differential Privacy in Language Model Fine-tuning

URL: http://arxiv.org/abs/2402.04489v1
Date: Wed, 7 Feb 2024 00:30:58 GMT
Title: De-amplifying Bias from Differential Privacy in Language Model Fine-tuning
Authors: Sanjari Srivastava, Piotr Mardziel, Zhikhun Zhang, Archana Ahlawat, Anupam Datta, John C Mitchell
Abstract summary: Fairness and privacy are two important values machine learning (ML) practitioners often seek to operationalize in models. We show that DP amplifies gender, racial, and religious bias when fine-tuning large language models. We demonstrate that Counterfactual Data Augmentation, a known method for addressing bias, also mitigates bias amplification by DP.
Score: 10.847913815093179
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fairness and privacy are two important values machine learning (ML) practitioners often seek to operationalize in models. Fairness aims to reduce model bias for social/demographic sub-groups. Privacy via differential privacy (DP) mechanisms, on the other hand, limits the impact of any individual's training data on the resulting model. The trade-offs between privacy and fairness goals of trustworthy ML pose a challenge to those wishing to address both. We show that DP amplifies gender, racial, and religious bias when fine-tuning large language models (LLMs), producing models more biased than ones fine-tuned without DP. We find the cause of the amplification to be a disparity in convergence of gradients across sub-groups. Through the case of binary gender bias, we demonstrate that Counterfactual Data Augmentation (CDA), a known method for addressing bias, also mitigates bias amplification by DP. As a consequence, DP and CDA together can be used to fine-tune models while maintaining both fairness and privacy.

Related papers

Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z)
Does Differential Privacy Impact Bias in Pretrained NLP Models? [24.63118058112066]
Differential privacy (DP) is applied when fine-tuning pre-trained large language models (LLMs) to limit leakage of training examples. We show the impact of DP on bias in LLMs through empirical analysis. Our results also show that the impact of DP on bias is not only affected by the privacy protection level but also the underlying distribution of the dataset.
arXiv Detail & Related papers (2024-10-24T13:59:03Z)
Privacy at a Price: Exploring its Dual Impact on AI Fairness [24.650648702853903]
We show that differential privacy in machine learning models can unequally impact separate demographic subgroups regarding prediction accuracy. This leads to a fairness concern, and manifests as biased performance. implementing gradient clipping in the differentially private gradient descent ML method can mitigate the negative impact of DP noise on fairness.
arXiv Detail & Related papers (2024-04-15T00:23:41Z)
Incentives in Private Collaborative Machine Learning [56.84263918489519]
Collaborative machine learning involves training models on data from multiple parties. We introduce differential privacy (DP) as an incentive. We empirically demonstrate the effectiveness and practicality of our approach on synthetic and real-world datasets.
arXiv Detail & Related papers (2024-04-02T06:28:22Z)
Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm. We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift. We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z)
Large Scale Transfer Learning for Differentially Private Image Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z)
Just Fine-tune Twice: Selective Differential Privacy for Large Language Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models. Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z)
DP-SGD vs PATE: Which Has Less Disparate Impact on GANs? [0.0]
We compare GANs trained with the two best-known DP frameworks for deep learning, DP-SGD, and PATE, in different data imbalance settings. Our experiments consistently show that for PATE, unlike DP-SGD, the privacy-utility trade-off is not monotonically decreasing.
arXiv Detail & Related papers (2021-11-26T17:25:46Z)
Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy. Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z)
DP-SGD vs PATE: Which Has Less Disparate Impact on Model Accuracy? [1.3238373064156095]
We show that application of differential privacy, specifically the DP-SGD algorithm, has a disparate impact on different sub-groups in the population. We compare PATE, another mechanism for training deep learning models using differential privacy, with DP-SGD in terms of fairness.
arXiv Detail & Related papers (2021-06-22T20:37:12Z)
Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models. We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups. We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results. We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.