Related papers: TaCo: Targeted Concept Erasure Prevents Non-Linear Classifiers From Detecting Protected Attributes

TaCo: Targeted Concept Erasure Prevents Non-Linear Classifiers From Detecting Protected Attributes

URL: http://arxiv.org/abs/2312.06499v4
Date: Wed, 16 Oct 2024 08:53:23 GMT
Title: TaCo: Targeted Concept Erasure Prevents Non-Linear Classifiers From Detecting Protected Attributes
Authors: Fanny Jourdan, Louis Béthune, Agustin Picard, Laurent Risser, Nicholas Asher,
Abstract summary: Targeted Concept Erasure (TaCo) is a novel approach that removes sensitive information from final latent representations. Our experiments show that TaCo outperforms state-of-the-art methods.
Score: 4.2560452339165895
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ensuring fairness in NLP models is crucial, as they often encode sensitive attributes like gender and ethnicity, leading to biased outcomes. Current concept erasure methods attempt to mitigate this by modifying final latent representations to remove sensitive information without retraining the entire model. However, these methods typically rely on linear classifiers, which leave models vulnerable to non-linear adversaries capable of recovering sensitive information. We introduce Targeted Concept Erasure (TaCo), a novel approach that removes sensitive information from final latent representations, ensuring fairness even against non-linear classifiers. Our experiments show that TaCo outperforms state-of-the-art methods, achieving greater reductions in the prediction accuracy of sensitive attributes by non-linear classifier while preserving overall task performance. Code is available on https://github.com/fanny-jourdan/TaCo.

Related papers

Nonlinear Concept Erasure: a Density Matching Approach [0.0]
We propose a process that removes information related to a specific concept from distributed representations while preserving as much of the remaining semantic information as possible.<n>Our approach involves learning an projection in the embedding space, designed to make the class-conditional feature distributions of the discrete concept to erase indistinguishable after projection.<n>Our method, termed $overlinemathrmL$EOPARD, achieves state-of-the-art performance in nonlinear erasure of a discrete attribute on classic natural language processing benchmarks.
arXiv Detail & Related papers (2025-07-16T15:36:15Z)
Fairness-Aware Low-Rank Adaptation Under Demographic Privacy Constraints [4.647881572951815]
Pre-trained foundation models can be adapted for specific tasks using Low-Rank Adaptation (LoRA) Existing fairness-aware fine-tuning methods rely on direct access to sensitive attributes or their predictors. We introduce a set of LoRA-based fine-tuning methods that can be trained in a distributed fashion.
arXiv Detail & Related papers (2025-03-07T18:49:57Z)
Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds [48.37843602248313]
Deep neural networks (DNNs) are vulnerable to adversarial samples crafted by adding imperceptible perturbations to clean data, potentially leading to incorrect and dangerous predictions. We propose Consistency Model-based Adversarial Purification (CMAP), which optimize vectors within the latent space of a pre-trained consistency model to generate samples for restoring clean data. CMAP significantly enhances robustness against strong adversarial attacks while preserving high natural accuracy.
arXiv Detail & Related papers (2024-12-11T14:14:02Z)
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [76.39651111467832]
We introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 seconds without necessitating additional fine-tuning. To mitigate inappropriate content potentially represented by derived embeddings, RECE aligns them with harmless concepts in cross-attention layers. The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts.
arXiv Detail & Related papers (2024-07-17T08:04:28Z)
Nonlinear Transformations Against Unlearnable Datasets [4.876873339297269]
Automated scraping stands out as a common method for collecting data in deep learning models without the authorization of data owners. Recent studies have begun to tackle the privacy concerns associated with this data collection method. The data generated by those approaches, called "unlearnable" examples, are prevented "learning" by deep learning models.
arXiv Detail & Related papers (2024-06-05T03:00:47Z)
Latent Enhancing AutoEncoder for Occluded Image Classification [2.6217304977339473]
We introduce LEARN: Latent Enhancing feAture Reconstruction Network. An auto-encoder based network that can be incorporated into the classification model before its head. On the OccludedPASCAL3D+ dataset, the proposed LEARN outperforms standard classification models.
arXiv Detail & Related papers (2024-02-10T12:22:31Z)
Credible Teacher for Semi-Supervised Object Detection in Open Scene [106.25850299007674]
In Open Scene Semi-Supervised Object Detection (O-SSOD), unlabeled data may contain unknown objects not observed in the labeled data. It is detrimental to the current methods that mainly rely on self-training, as more uncertainty leads to the lower localization and classification precision of pseudo labels. We propose Credible Teacher, an end-to-end framework to prevent uncertain pseudo labels from misleading the model.
arXiv Detail & Related papers (2024-01-01T08:19:21Z)
XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification. XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations. Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z)
Model Debiasing via Gradient-based Explanation on Representation [14.673988027271388]
We propose a novel fairness framework that performs debiasing with regard to sensitive attributes and proxy attributes. Our framework achieves better fairness-accuracy trade-off on unstructured and structured datasets than previous state-of-the-art approaches.
arXiv Detail & Related papers (2023-05-20T11:57:57Z)
Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection [39.16319169760823]
Iterative Gradient-Based Projection is a novel method for removing non-linear encoded concepts from neural representations. Our results demonstrate that IGBP is effective in mitigating bias through intrinsic and extrinsic evaluations.
arXiv Detail & Related papers (2023-05-17T13:26:57Z)
Linear Adversarial Concept Erasure [108.37226654006153]
We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept. We show that the method is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.
arXiv Detail & Related papers (2022-01-28T13:00:17Z)
Fairness via Representation Neutralization [60.90373932844308]
We propose a new mitigation technique, namely, Representation Neutralization for Fairness (RNF) RNF achieves that fairness by debiasing only the task-specific classification head of DNN models. Experimental results over several benchmark datasets demonstrate our RNF framework to effectively reduce discrimination of DNN models.
arXiv Detail & Related papers (2021-06-23T22:26:29Z)
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection [51.041763676948705]
Iterative Null-space Projection (INLP) is a novel method for removing information from neural representations. We show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.
arXiv Detail & Related papers (2020-04-16T14:02:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.