SABAF: Removing Strong Attribute Bias from Neural Networks with
Adversarial Filtering
- URL: http://arxiv.org/abs/2311.07141v2
- Date: Thu, 16 Nov 2023 07:23:17 GMT
- Title: SABAF: Removing Strong Attribute Bias from Neural Networks with
Adversarial Filtering
- Authors: Jiazhi Li, Mahyar Khayatkhoei, Jiageng Zhu, Hanchen Xie, Mohamed E.
Hussein, Wael AbdAlmageed
- Abstract summary: We propose a new method for removing attribute bias in neural networks.
The proposed method achieves state-of-the-art performance in both strong and moderate bias settings.
- Score: 20.7209867191915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensuring a neural network is not relying on protected attributes (e.g., race,
sex, age) for prediction is crucial in advancing fair and trustworthy AI. While
several promising methods for removing attribute bias in neural networks have
been proposed, their limitations remain under-explored. To that end, in this
work, we mathematically and empirically reveal the limitation of existing
attribute bias removal methods in presence of strong bias and propose a new
method that can mitigate this limitation. Specifically, we first derive a
general non-vacuous information-theoretical upper bound on the performance of
any attribute bias removal method in terms of the bias strength, revealing that
they are effective only when the inherent bias in the dataset is relatively
weak. Next, we derive a necessary condition for the existence of any method
that can remove attribute bias regardless of the bias strength. Inspired by
this condition, we then propose a new method using an adversarial objective
that directly filters out protected attributes in the input space while
maximally preserving all other attributes, without requiring any specific
target label. The proposed method achieves state-of-the-art performance in both
strong and moderate bias settings. We provide extensive experiments on
synthetic, image, and census datasets, to verify the derived theoretical bound
and its consequences in practice, and evaluate the effectiveness of the
proposed method in removing strong attribute bias.
Related papers
- TaCo: Targeted Concept Erasure Prevents Non-Linear Classifiers From Detecting Protected Attributes [4.2560452339165895]
Targeted Concept Erasure (TaCo) is a novel approach that removes sensitive information from final latent representations.
Our experiments show that TaCo outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-11T16:22:37Z) - Causality and Independence Enhancement for Biased Node Classification [56.38828085943763]
We propose a novel Causality and Independence Enhancement (CIE) framework, applicable to various graph neural networks (GNNs)
Our approach estimates causal and spurious features at the node representation level and mitigates the influence of spurious correlations.
Our approach CIE not only significantly enhances the performance of GNNs but outperforms state-of-the-art debiased node classification methods.
arXiv Detail & Related papers (2023-10-14T13:56:24Z) - Information-Theoretic Bounds on The Removal of Attribute-Specific Bias
From Neural Networks [20.7209867191915]
We show that existing attribute bias removal methods are effective only when the inherent bias in the dataset is relatively weak.
Our findings show that existing attribute bias removal methods are effective only when the inherent bias in the dataset is relatively weak.
arXiv Detail & Related papers (2023-10-08T00:39:11Z) - Shielded Representations: Protecting Sensitive Attributes Through
Iterative Gradient-Based Projection [39.16319169760823]
Iterative Gradient-Based Projection is a novel method for removing non-linear encoded concepts from neural representations.
Our results demonstrate that IGBP is effective in mitigating bias through intrinsic and extrinsic evaluations.
arXiv Detail & Related papers (2023-05-17T13:26:57Z) - Self-supervised debiasing using low rank regularization [59.84695042540525]
Spurious correlations can cause strong biases in deep neural networks, impairing generalization ability.
We propose a self-supervised debiasing framework potentially compatible with unlabeled samples.
Remarkably, the proposed debiasing framework significantly improves the generalization performance of self-supervised learning baselines.
arXiv Detail & Related papers (2022-10-11T08:26:19Z) - Mitigating Algorithmic Bias with Limited Annotations [65.060639928772]
When sensitive attributes are not disclosed or available, it is needed to manually annotate a small part of the training data to mitigate bias.
We propose Active Penalization Of Discrimination (APOD), an interactive framework to guide the limited annotations towards maximally eliminating the effect of algorithmic bias.
APOD shows comparable performance to fully annotated bias mitigation, which demonstrates that APOD could benefit real-world applications when sensitive information is limited.
arXiv Detail & Related papers (2022-07-20T16:31:19Z) - Linear Adversarial Concept Erasure [108.37226654006153]
We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept.
We show that the method is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.
arXiv Detail & Related papers (2022-01-28T13:00:17Z) - Marked Attribute Bias in Natural Language Inference [0.0]
We present a new observation of gender bias in a downstream NLP application: marked attribute bias in natural language inference.
Bias in downstream applications can stem from training data, word embeddings, or be amplified by the model in use.
Here we seek to understand how the intrinsic properties of word embeddings contribute to this observed marked attribute effect.
arXiv Detail & Related papers (2021-09-28T20:45:02Z) - Evaluating Debiasing Techniques for Intersectional Biases [53.41549919978481]
bias is pervasive in NLP models, motivating the development of automatic debiasing techniques.
In this paper we argue that a truly fair model must consider gerrymandering' groups which comprise not only single attributes, but also intersectional groups.
arXiv Detail & Related papers (2021-09-21T22:01:28Z) - Fairness via Representation Neutralization [60.90373932844308]
We propose a new mitigation technique, namely, Representation Neutralization for Fairness (RNF)
RNF achieves that fairness by debiasing only the task-specific classification head of DNN models.
Experimental results over several benchmark datasets demonstrate our RNF framework to effectively reduce discrimination of DNN models.
arXiv Detail & Related papers (2021-06-23T22:26:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.