Related papers: Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

URL: http://arxiv.org/abs/2004.07667v2
Date: Tue, 28 Apr 2020 21:09:39 GMT
Title: Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
Authors: Shauli Ravfogel, Yanai Elazar, Hila Gonen, Michael Twiton, Yoav Goldberg
Abstract summary: Iterative Null-space Projection (INLP) is a novel method for removing information from neural representations. We show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.
Score: 51.041763676948705
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ability to control for the kinds of information encoded in neural representation has a variety of use cases, especially in light of the challenge of interpreting these models. We present Iterative Null-space Projection (INLP), a novel method for removing information from neural representations. Our method is based on repeated training of linear classifiers that predict a certain property we aim to remove, followed by projection of the representations on their null-space. By doing so, the classifiers become oblivious to that target property, making it hard to linearly separate the data according to it. While applicable for multiple uses, we evaluate our method on bias and fairness use-cases, and show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.

Related papers

Nonlinear Concept Erasure: a Density Matching Approach [0.0]
We propose a process that removes information related to a specific concept from distributed representations while preserving as much of the remaining semantic information as possible.<n>Our approach involves learning an projection in the embedding space, designed to make the class-conditional feature distributions of the discrete concept to erase indistinguishable after projection.<n>Our method, termed $overlinemathrmL$EOPARD, achieves state-of-the-art performance in nonlinear erasure of a discrete attribute on classic natural language processing benchmarks.
arXiv Detail & Related papers (2025-07-16T15:36:15Z)
Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective [100.54185280153753]
We find that both classifier guidance and classifier-free guidance achieve conditional generation by pushing the denoising diffusion trajectories away from decision boundaries. We propose a generic postprocessing step built upon flow-matching to shrink the gap between the learned distribution for a pretrained denoising diffusion model and the real data distribution.
arXiv Detail & Related papers (2025-03-13T17:59:59Z)
XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification. XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations. Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z)
Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection [39.16319169760823]
Iterative Gradient-Based Projection is a novel method for removing non-linear encoded concepts from neural representations. Our results demonstrate that IGBP is effective in mitigating bias through intrinsic and extrinsic evaluations.
arXiv Detail & Related papers (2023-05-17T13:26:57Z)
Learning to Unlearn: Instance-wise Unlearning for Pre-trained Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model. We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z)
Linear Adversarial Concept Erasure [108.37226654006153]
We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept. We show that the method is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.
arXiv Detail & Related papers (2022-01-28T13:00:17Z)
Learning Debiased and Disentangled Representations for Semantic Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation. By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes. Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z)
Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations. We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z)
Unsupervised Embedding Learning from Uncertainty Momentum Modeling [37.674449317054716]
We propose a novel solution to explicitly model and explore the uncertainty of the given unlabeled learning samples. We leverage such uncertainty modeling momentum to the learning which is helpful to tackle the outliers.
arXiv Detail & Related papers (2021-07-19T14:06:19Z)
Bias-Awareness for Zero-Shot Learning the Seen and Unseen [47.09887661463657]
Generalized zero-shot learning recognizes inputs from both seen and unseen classes. We propose a bias-aware learner to map inputs to a semantic embedding space for generalized zero-shot learning.
arXiv Detail & Related papers (2020-08-25T17:38:40Z)
Null-sampling for Interpretable and Fair Representations [8.654168514863649]
We learn invariant representations, in the data domain, to achieve interpretability in algorithmic fairness. By placing the representations into the data domain, the changes made by the model are easily examinable by human auditors.
arXiv Detail & Related papers (2020-08-12T11:49:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.