Controlling Bias Exposure for Fair Interpretable Predictions
- URL: http://arxiv.org/abs/2210.07455v1
- Date: Fri, 14 Oct 2022 01:49:01 GMT
- Title: Controlling Bias Exposure for Fair Interpretable Predictions
- Authors: Zexue He, Yu Wang, Julian McAuley, Bodhisattwa Prasad Majumder
- Abstract summary: We argue that a favorable debiasing method should use sensitive information 'fairly' rather than blindly eliminating it.
Our model achieves a desirable trade-off between debiasing and task performance along with producing debiased rationales as evidence.
- Score: 11.364105288235308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work on reducing bias in NLP models usually focuses on protecting or
isolating information related to a sensitive attribute (like gender or race).
However, when sensitive information is semantically entangled with the task
information of the input, e.g., the gender information is predictive for a
profession, a fair trade-off between task performance and bias mitigation is
difficult to achieve. Existing approaches perform this trade-off by eliminating
bias information from the latent space, lacking control over how much bias is
necessarily required to be removed. We argue that a favorable debiasing method
should use sensitive information 'fairly' rather than blindly eliminating it
(Caliskan et al., 2017; Sun et al., 2019). In this work, we provide a novel
debiasing algorithm by adjusting the predictive model's belief to (1) ignore
the sensitive information if it is not useful for the task; (2) use sensitive
information minimally as necessary for the prediction (while also incurring a
penalty). Experimental results on two text classification tasks (influenced by
gender) and an open-ended generation task (influenced by race) indicate that
our model achieves a desirable trade-off between debiasing and task performance
along with producing debiased rationales as evidence.
Related papers
- Unlabeled Debiasing in Downstream Tasks via Class-wise Low Variance Regularization [13.773597081543185]
We introduce a novel debiasing regularization technique based on the class-wise variance of embeddings.
Our method does not require attribute labels and targets any attribute, thus addressing the shortcomings of existing debiasing methods.
arXiv Detail & Related papers (2024-09-29T03:56:50Z) - Is There a One-Model-Fits-All Approach to Information Extraction? Revisiting Task Definition Biases [62.806300074459116]
Definition bias is a negative phenomenon that can mislead models.
We identify two types of definition bias in IE: bias among information extraction datasets and bias between information extraction datasets and instruction tuning datasets.
We propose a multi-stage framework consisting of definition bias measurement, bias-aware fine-tuning, and task-specific bias mitigation.
arXiv Detail & Related papers (2024-03-25T03:19:20Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - Targeted Data Augmentation for bias mitigation [0.0]
We introduce a novel and efficient approach for addressing biases called Targeted Data Augmentation (TDA)
Unlike the laborious task of removing biases, our method proposes to insert biases instead, resulting in improved performance.
To identify biases, we annotated two diverse datasets: a dataset of clinical skin lesions and a dataset of male and female faces.
arXiv Detail & Related papers (2023-08-22T12:25:49Z) - Fair Visual Recognition via Intervention with Proxy Features [13.280828458515062]
Existing work minimizes information about social attributes in models for debiasing.
High correlation between target task and social attributes makes bias mitigation incompatible with target task accuracy.
We propose emph Proxy Debiasing, to first transfer the target task's learning of bias information from bias features to artificial proxy features, and then employ causal intervention to eliminate proxy features in inference.
arXiv Detail & Related papers (2022-11-02T16:33:49Z) - InterFair: Debiasing with Natural Language Feedback for Fair
Interpretable Predictions [30.246111670391375]
We argue that a favorable debiasing method should use sensitive information 'fairly,' with explanations, rather than blindly eliminating it.
We explore two interactive setups with a frozen predictive model and show that users able to provide feedback can achieve a better and fairer balance between task performance and bias mitigation.
arXiv Detail & Related papers (2022-10-14T00:54:12Z) - Optimising Equal Opportunity Fairness in Model Training [60.0947291284978]
Existing debiasing methods, such as adversarial training and removing protected information from representations, have been shown to reduce bias.
We propose two novel training objectives which directly optimise for the widely-used criterion of it equal opportunity, and show that they are effective in reducing bias while maintaining high performance over two classification tasks.
arXiv Detail & Related papers (2022-05-05T01:57:58Z) - Semi-FairVAE: Semi-supervised Fair Representation Learning with
Adversarial Variational Autoencoder [92.67156911466397]
We propose a semi-supervised fair representation learning approach based on adversarial variational autoencoder.
We use a bias-aware model to capture inherent bias information on sensitive attribute.
We also use a bias-free model to learn debiased fair representations by using adversarial learning to remove bias information from them.
arXiv Detail & Related papers (2022-04-01T15:57:47Z) - Gradient Based Activations for Accurate Bias-Free Learning [22.264226961225003]
We show that a biased discriminator can actually be used to improve this bias-accuracy tradeoff.
Specifically, this is achieved by using a feature masking approach using the discriminator's gradients.
We show that this simple approach works well to reduce bias as well as improve accuracy significantly.
arXiv Detail & Related papers (2022-02-17T00:30:40Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.