Fast Model Debias with Machine Unlearning
- URL: http://arxiv.org/abs/2310.12560v3
- Date: Fri, 3 Nov 2023 08:03:39 GMT
- Title: Fast Model Debias with Machine Unlearning
- Authors: Ruizhe Chen, Jianfei Yang, Huimin Xiong, Jianhong Bai, Tianxiang Hu,
Jin Hao, Yang Feng, Joey Tianyi Zhou, Jian Wu, Zuozhu Liu
- Abstract summary: Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
- Score: 54.32026474971696
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent discoveries have revealed that deep neural networks might behave in a
biased manner in many real-world scenarios. For instance, deep networks trained
on a large-scale face recognition dataset CelebA tend to predict blonde hair
for females and black hair for males. Such biases not only jeopardize the
robustness of models but also perpetuate and amplify social biases, which is
especially concerning for automated decision-making processes in healthcare,
recruitment, etc., as they could exacerbate unfair economic and social
inequalities among different groups. Existing debiasing methods suffer from
high costs in bias labeling or model re-training, while also exhibiting a
deficiency in terms of elucidating the origins of biases within the model. To
this respect, we propose a fast model debiasing framework (FMD) which offers an
efficient approach to identify, evaluate and remove biases inherent in trained
models. The FMD identifies biased attributes through an explicit counterfactual
concept and quantifies the influence of data samples with influence functions.
Moreover, we design a machine unlearning-based strategy to efficiently and
effectively remove the bias in a trained model with a small counterfactual
dataset. Experiments on the Colored MNIST, CelebA, and Adult Income datasets
along with experiments with large language models demonstrate that our method
achieves superior or competing accuracies compared with state-of-the-art
methods while attaining significantly fewer biases and requiring much less
debiasing cost. Notably, our method requires only a small external dataset and
updating a minimal amount of model parameters, without the requirement of
access to training data that may be too large or unavailable in practice.
Related papers
- Understanding trade-offs in classifier bias with quality-diversity optimization: an application to talent management [2.334978724544296]
A major struggle for the development of fair AI models lies in the bias implicit in the data available to train such models.
We propose a method for visualizing the biases inherent in a dataset and understanding the potential trade-offs between fairness and accuracy.
arXiv Detail & Related papers (2024-11-25T22:14:02Z) - Addressing Bias Through Ensemble Learning and Regularized Fine-Tuning [0.2812395851874055]
This paper proposes a comprehensive approach using multiple methods to remove bias in AI models.
We train multiple models with the counter-bias of the pre-trained model through data splitting, local training, and regularized fine-tuning.
We conclude our solution with knowledge distillation that results in a single unbiased neural network.
arXiv Detail & Related papers (2024-02-01T09:24:36Z) - Improving Bias Mitigation through Bias Experts in Natural Language
Understanding [10.363406065066538]
We propose a new debiasing framework that introduces binary classifiers between the auxiliary model and the main model.
Our proposed strategy improves the bias identification ability of the auxiliary model.
arXiv Detail & Related papers (2023-12-06T16:15:00Z) - Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data.
Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data.
We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z) - Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - BLIND: Bias Removal With No Demographics [29.16221451643288]
We introduce BLIND, a method for bias removal with no prior knowledge of the demographics in the dataset.
While training a model on a downstream task, BLIND detects biased samples using an auxiliary model that predicts the main model's success, and down-weights those samples during the training process.
Experiments with racial and gender biases in sentiment classification and occupation classification tasks demonstrate that BLIND mitigates social biases without relying on a costly demographic annotation process.
arXiv Detail & Related papers (2022-12-20T18:59:42Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.