Null It Out: Guarding Protected Attributes by Iterative Nullspace
Projection
- URL: http://arxiv.org/abs/2004.07667v2
- Date: Tue, 28 Apr 2020 21:09:39 GMT
- Title: Null It Out: Guarding Protected Attributes by Iterative Nullspace
Projection
- Authors: Shauli Ravfogel, Yanai Elazar, Hila Gonen, Michael Twiton, Yoav
Goldberg
- Abstract summary: Iterative Null-space Projection (INLP) is a novel method for removing information from neural representations.
We show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.
- Score: 51.041763676948705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to control for the kinds of information encoded in neural
representation has a variety of use cases, especially in light of the challenge
of interpreting these models. We present Iterative Null-space Projection
(INLP), a novel method for removing information from neural representations.
Our method is based on repeated training of linear classifiers that predict a
certain property we aim to remove, followed by projection of the
representations on their null-space. By doing so, the classifiers become
oblivious to that target property, making it hard to linearly separate the data
according to it. While applicable for multiple uses, we evaluate our method on
bias and fairness use-cases, and show that our method is able to mitigate bias
in word embeddings, as well as to increase fairness in a setting of multi-class
classification.
Related papers
- XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - Shielded Representations: Protecting Sensitive Attributes Through
Iterative Gradient-Based Projection [39.16319169760823]
Iterative Gradient-Based Projection is a novel method for removing non-linear encoded concepts from neural representations.
Our results demonstrate that IGBP is effective in mitigating bias through intrinsic and extrinsic evaluations.
arXiv Detail & Related papers (2023-05-17T13:26:57Z) - Learning to Unlearn: Instance-wise Unlearning for Pre-trained
Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model.
We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z) - Linear Adversarial Concept Erasure [108.37226654006153]
We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept.
We show that the method is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.
arXiv Detail & Related papers (2022-01-28T13:00:17Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - Unsupervised Embedding Learning from Uncertainty Momentum Modeling [37.674449317054716]
We propose a novel solution to explicitly model and explore the uncertainty of the given unlabeled learning samples.
We leverage such uncertainty modeling momentum to the learning which is helpful to tackle the outliers.
arXiv Detail & Related papers (2021-07-19T14:06:19Z) - Bias-Awareness for Zero-Shot Learning the Seen and Unseen [47.09887661463657]
Generalized zero-shot learning recognizes inputs from both seen and unseen classes.
We propose a bias-aware learner to map inputs to a semantic embedding space for generalized zero-shot learning.
arXiv Detail & Related papers (2020-08-25T17:38:40Z) - Null-sampling for Interpretable and Fair Representations [8.654168514863649]
We learn invariant representations, in the data domain, to achieve interpretability in algorithmic fairness.
By placing the representations into the data domain, the changes made by the model are easily examinable by human auditors.
arXiv Detail & Related papers (2020-08-12T11:49:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.