Nonlinear Concept Erasure: a Density Matching Approach
- URL: http://arxiv.org/abs/2507.12341v1
- Date: Wed, 16 Jul 2025 15:36:15 GMT
- Title: Nonlinear Concept Erasure: a Density Matching Approach
- Authors: Antoine Saillenfest, Pirmin Lemberger,
- Abstract summary: We propose a process that removes information related to a specific concept from distributed representations while preserving as much of the remaining semantic information as possible.<n>Our approach involves learning an projection in the embedding space, designed to make the class-conditional feature distributions of the discrete concept to erase indistinguishable after projection.<n>Our method, termed $overlinemathrmL$EOPARD, achieves state-of-the-art performance in nonlinear erasure of a discrete attribute on classic natural language processing benchmarks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensuring that neural models used in real-world applications cannot infer sensitive information, such as demographic attributes like gender or race, from text representations is a critical challenge when fairness is a concern. We address this issue through concept erasure, a process that removes information related to a specific concept from distributed representations while preserving as much of the remaining semantic information as possible. Our approach involves learning an orthogonal projection in the embedding space, designed to make the class-conditional feature distributions of the discrete concept to erase indistinguishable after projection. By adjusting the rank of the projector, we control the extent of information removal, while its orthogonality ensures strict preservation of the local structure of the embeddings. Our method, termed $\overline{\mathrm{L}}$EOPARD, achieves state-of-the-art performance in nonlinear erasure of a discrete attribute on classic natural language processing benchmarks. Furthermore, we demonstrate that $\overline{\mathrm{L}}$EOPARD effectively mitigates bias in deep nonlinear classifiers, thereby promoting fairness.
Related papers
- Deep Fair Learning: A Unified Framework for Fine-tuning Representations with Sufficient Networks [8.616743904155419]
We propose a framework that integrates sufficient dimension reduction with deep learning to construct fair and informative representations.<n>By introducing a novel penalty term during fine-tuning, our method enforces conditional independence between sensitive attributes and learned representations.<n>Our approach achieves a superior balance between fairness and utility, significantly outperforming state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-08T22:24:22Z) - Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective [100.54185280153753]
We find that both classifier guidance and classifier-free guidance achieve conditional generation by pushing the denoising diffusion trajectories away from decision boundaries.<n>We propose a generic postprocessing step built upon flow-matching to shrink the gap between the learned distribution for a pretrained denoising diffusion model and the real data distribution.
arXiv Detail & Related papers (2025-03-13T17:59:59Z) - Unlearning-based Neural Interpretations [51.99182464831169]
We show that current baselines defined using static functions are biased, fragile and manipulable.<n>We propose UNI to compute an (un)learnable, debiased and adaptive baseline by perturbing the input towards an unlearning direction of steepest ascent.
arXiv Detail & Related papers (2024-10-10T16:02:39Z) - TaCo: Targeted Concept Erasure Prevents Non-Linear Classifiers From Detecting Protected Attributes [4.2560452339165895]
Targeted Concept Erasure (TaCo) is a novel approach that removes sensitive information from final latent representations.
Our experiments show that TaCo outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-11T16:22:37Z) - Shielded Representations: Protecting Sensitive Attributes Through
Iterative Gradient-Based Projection [39.16319169760823]
Iterative Gradient-Based Projection is a novel method for removing non-linear encoded concepts from neural representations.
Our results demonstrate that IGBP is effective in mitigating bias through intrinsic and extrinsic evaluations.
arXiv Detail & Related papers (2023-05-17T13:26:57Z) - Linear Adversarial Concept Erasure [108.37226654006153]
We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept.<n>We show that the method is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.
arXiv Detail & Related papers (2022-01-28T13:00:17Z) - A Curriculum-style Self-training Approach for Source-Free Semantic Segmentation [91.13472029666312]
We propose a curriculum-style self-training approach for source-free domain adaptive semantic segmentation.
Our method yields state-of-the-art performance on source-free semantic segmentation tasks for both synthetic-to-real and adverse conditions.
arXiv Detail & Related papers (2021-06-22T10:21:39Z) - Deep Clustering by Semantic Contrastive Learning [67.28140787010447]
We introduce a novel variant called Semantic Contrastive Learning (SCL)
It explores the characteristics of both conventional contrastive learning and deep clustering.
It can amplify the strengths of contrastive learning and deep clustering in a unified approach.
arXiv Detail & Related papers (2021-03-03T20:20:48Z) - Null It Out: Guarding Protected Attributes by Iterative Nullspace
Projection [51.041763676948705]
Iterative Null-space Projection (INLP) is a novel method for removing information from neural representations.
We show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.
arXiv Detail & Related papers (2020-04-16T14:02:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.