Linear Adversarial Concept Erasure
- URL: http://arxiv.org/abs/2201.12091v6
- Date: Wed, 30 Oct 2024 13:50:02 GMT
- Title: Linear Adversarial Concept Erasure
- Authors: Shauli Ravfogel, Michael Twiton, Yoav Goldberg, Ryan Cotterell,
- Abstract summary: We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept.
We show that the method is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.
- Score: 108.37226654006153
- License:
- Abstract: Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to \emph{control} their content becomes an increasingly important problem. We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in order to prevent linear predictors from recovering the concept. We model this problem as a constrained, linear maximin game, and show that existing solutions are generally not optimal for this task. We derive a closed-form solution for certain objectives, and propose a convex relaxation, \method, that works well for others. When evaluated in the context of binary gender removal, the method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation. We show that the method is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.
Related papers
- Unlearning-based Neural Interpretations [51.99182464831169]
We show that current baselines defined using static functions are biased, fragile and manipulable.
We propose UNI to compute an (un)learnable, debiased and adaptive baseline by perturbing the input towards an unlearning direction of steepest ascent.
arXiv Detail & Related papers (2024-10-10T16:02:39Z) - Shielded Representations: Protecting Sensitive Attributes Through
Iterative Gradient-Based Projection [39.16319169760823]
Iterative Gradient-Based Projection is a novel method for removing non-linear encoded concepts from neural representations.
Our results demonstrate that IGBP is effective in mitigating bias through intrinsic and extrinsic evaluations.
arXiv Detail & Related papers (2023-05-17T13:26:57Z) - Log-linear Guardedness and its Implications [116.87322784046926]
Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful.
This work formally defines the notion of log-linear guardedness as the inability of an adversary to predict the concept directly from the representation.
We show that, in the binary case, under certain assumptions, a downstream log-linear model cannot recover the erased concept.
arXiv Detail & Related papers (2022-10-18T17:30:02Z) - Near-optimal Offline Reinforcement Learning with Linear Representation:
Leveraging Variance Information with Pessimism [65.46524775457928]
offline reinforcement learning seeks to utilize offline/historical data to optimize sequential decision-making strategies.
We study the statistical limits of offline reinforcement learning with linear model representations.
arXiv Detail & Related papers (2022-03-11T09:00:12Z) - Kernelized Concept Erasure [108.65038124096907]
We propose a kernelization of a linear minimax game for concept erasure.
It is possible to prevent specific non-linear adversaries from predicting the concept.
However, the protection does not transfer to different nonlinear adversaries.
arXiv Detail & Related papers (2022-01-28T15:45:13Z) - Online and Distribution-Free Robustness: Regression and Contextual
Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits.
We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z) - Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics.
The proposed approach is a nonparametric generalization of the sufficient dimension reduction method.
We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z) - Implicit Geometric Regularization for Learning Shapes [34.052738965233445]
We offer a new paradigm for computing high fidelity implicit neural representations directly from raw data.
We show that our method leads to state of the art implicit neural representations with higher level-of-details and fidelity compared to previous methods.
arXiv Detail & Related papers (2020-02-24T07:36:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.