Linear Adversarial Concept Erasure
- URL: http://arxiv.org/abs/2201.12091v1
- Date: Fri, 28 Jan 2022 13:00:17 GMT
- Title: Linear Adversarial Concept Erasure
- Authors: Shauli Ravfogel, Michael Twiton, Yoav Goldberg and Ryan Cotterell
- Abstract summary: We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept.
We model this problem as a constrained, linear minimax game, and show that existing solutions are generally not optimal for this task.
We show that the method -- despite being linear -- is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.
- Score: 98.14246446690282
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern neural models trained on textual data rely on pre-trained
representations that emerge without direct supervision. As these
representations are increasingly being used in real-world applications, the
inability to \emph{control} their content becomes an increasingly important
problem.
We formulate the problem of identifying and erasing a linear subspace that
corresponds to a given concept, in order to prevent linear predictors from
recovering the concept. We model this problem as a constrained, linear minimax
game, and show that existing solutions are generally not optimal for this task.
We derive a closed-form solution for certain objectives, and propose a convex
relaxation, R-LACE, that works well for others. When evaluated in the context
of binary gender removal, the method recovers a low-dimensional subspace whose
removal mitigates bias by intrinsic and extrinsic evaluation. We show that the
method -- despite being linear -- is highly expressive, effectively mitigating
bias in deep nonlinear classifiers while maintaining tractability and
interpretability.
Related papers
- Shielded Representations: Protecting Sensitive Attributes Through
Iterative Gradient-Based Projection [39.16319169760823]
Iterative Gradient-Based Projection is a novel method for removing non-linear encoded concepts from neural representations.
Our results demonstrate that IGBP is effective in mitigating bias through intrinsic and extrinsic evaluations.
arXiv Detail & Related papers (2023-05-17T13:26:57Z) - Log-linear Guardedness and its Implications [116.87322784046926]
Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful.
This work formally defines the notion of log-linear guardedness as the inability of an adversary to predict the concept directly from the representation.
We show that, in the binary case, under certain assumptions, a downstream log-linear model cannot recover the erased concept.
arXiv Detail & Related papers (2022-10-18T17:30:02Z) - Near-optimal Offline Reinforcement Learning with Linear Representation:
Leveraging Variance Information with Pessimism [65.46524775457928]
offline reinforcement learning seeks to utilize offline/historical data to optimize sequential decision-making strategies.
We study the statistical limits of offline reinforcement learning with linear model representations.
arXiv Detail & Related papers (2022-03-11T09:00:12Z) - Kernelized Concept Erasure [108.65038124096907]
We propose a kernelization of a linear minimax game for concept erasure.
It is possible to prevent specific non-linear adversaries from predicting the concept.
However, the protection does not transfer to different nonlinear adversaries.
arXiv Detail & Related papers (2022-01-28T15:45:13Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - Online and Distribution-Free Robustness: Regression and Contextual
Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits.
We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z) - Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics.
The proposed approach is a nonparametric generalization of the sufficient dimension reduction method.
We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z) - Implicit Geometric Regularization for Learning Shapes [34.052738965233445]
We offer a new paradigm for computing high fidelity implicit neural representations directly from raw data.
We show that our method leads to state of the art implicit neural representations with higher level-of-details and fidelity compared to previous methods.
arXiv Detail & Related papers (2020-02-24T07:36:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.