To Be Forgotten or To Be Fair: Unveiling Fairness Implications of
Machine Unlearning Methods
- URL: http://arxiv.org/abs/2302.03350v2
- Date: Wed, 10 Jan 2024 23:40:39 GMT
- Title: To Be Forgotten or To Be Fair: Unveiling Fairness Implications of
Machine Unlearning Methods
- Authors: Dawen Zhang, Shidong Pan, Thong Hoang, Zhenchang Xing, Mark Staples,
Xiwei Xu, Lina Yao, Qinghua Lu, Liming Zhu
- Abstract summary: We present the first study on machine unlearning methods to reveal their fairness implications.
Under non-uniform data deletion, SISA leads to better fairness compared with ORTR and AmnesiacML, while initial training and uniform data deletion do not necessarily affect the fairness of all three methods.
- Score: 31.83413902054534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The right to be forgotten (RTBF) is motivated by the desire of people not to
be perpetually disadvantaged by their past deeds. For this, data deletion needs
to be deep and permanent, and should be removed from machine learning models.
Researchers have proposed machine unlearning algorithms which aim to erase
specific data from trained models more efficiently. However, these methods
modify how data is fed into the model and how training is done, which may
subsequently compromise AI ethics from the fairness perspective. To help
software engineers make responsible decisions when adopting these unlearning
methods, we present the first study on machine unlearning methods to reveal
their fairness implications. We designed and conducted experiments on two
typical machine unlearning methods (SISA and AmnesiacML) along with a
retraining method (ORTR) as baseline using three fairness datasets under three
different deletion strategies. Experimental results show that under non-uniform
data deletion, SISA leads to better fairness compared with ORTR and AmnesiacML,
while initial training and uniform data deletion do not necessarily affect the
fairness of all three methods. These findings have exposed an important
research problem in software engineering, and can help practitioners better
understand the potential trade-offs on fairness when considering solutions for
RTBF.
Related papers
- RESTOR: Knowledge Recovery through Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can memorize undesirable datapoints.
Many machine unlearning methods have been proposed that aim to 'erase' these datapoints from trained models.
We propose the RESTOR framework for machine unlearning based on the following dimensions.
arXiv Detail & Related papers (2024-10-31T20:54:35Z) - Debiasing Machine Unlearning with Counterfactual Examples [31.931056076782202]
We analyze the causal factors behind the unlearning process and mitigate biases at both data and algorithmic levels.
We introduce an intervention-based approach, where knowledge to forget is erased with a debiased dataset.
Our method outperforms existing machine unlearning baselines on evaluation metrics.
arXiv Detail & Related papers (2024-04-24T09:33:10Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Fair Machine Unlearning: Data Removal while Mitigating Disparities [5.724350004671127]
Right to be forgotten is core principle outlined by EU's General Regulation.
"Forgetting" can be naively achieved by retraining on remaining data.
"Unlearning" impacts other properties critical to real-world applications such as fairness.
arXiv Detail & Related papers (2023-07-27T10:26:46Z) - Random Relabeling for Efficient Machine Unlearning [8.871042314510788]
Individuals' right to retract personal data and relevant data privacy regulations pose great challenges to machine learning.
We propose unlearning scheme random relabeling to efficiently deal with sequential data removal requests.
A less constraining removal certification method based on probability distribution similarity with naive unlearning is also proposed.
arXiv Detail & Related papers (2023-05-21T02:37:26Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - SSSE: Efficiently Erasing Samples from Trained Machine Learning Models [103.43466657962242]
We propose an efficient and effective algorithm, SSSE, for samples erasure.
In certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
arXiv Detail & Related papers (2021-07-08T14:17:24Z) - Certifiable Machine Unlearning for Linear Models [1.484852576248587]
Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted.
We present an experimental study of the three state-of-the-art approximate unlearning methods for linear models.
arXiv Detail & Related papers (2021-06-29T05:05:58Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Leveraging Semi-Supervised Learning for Fairness using Neural Networks [49.604038072384995]
There has been a growing concern about the fairness of decision-making systems based on machine learning.
In this paper, we propose a semi-supervised algorithm using neural networks benefiting from unlabeled data.
The proposed model, called SSFair, exploits the information in the unlabeled data to mitigate the bias in the training data.
arXiv Detail & Related papers (2019-12-31T09:11:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.