TaCo: Targeted Concept Erasure Prevents Non-Linear Classifiers From Detecting Protected Attributes
- URL: http://arxiv.org/abs/2312.06499v4
- Date: Wed, 16 Oct 2024 08:53:23 GMT
- Title: TaCo: Targeted Concept Erasure Prevents Non-Linear Classifiers From Detecting Protected Attributes
- Authors: Fanny Jourdan, Louis Béthune, Agustin Picard, Laurent Risser, Nicholas Asher,
- Abstract summary: Targeted Concept Erasure (TaCo) is a novel approach that removes sensitive information from final latent representations.
Our experiments show that TaCo outperforms state-of-the-art methods.
- Score: 4.2560452339165895
- License:
- Abstract: Ensuring fairness in NLP models is crucial, as they often encode sensitive attributes like gender and ethnicity, leading to biased outcomes. Current concept erasure methods attempt to mitigate this by modifying final latent representations to remove sensitive information without retraining the entire model. However, these methods typically rely on linear classifiers, which leave models vulnerable to non-linear adversaries capable of recovering sensitive information. We introduce Targeted Concept Erasure (TaCo), a novel approach that removes sensitive information from final latent representations, ensuring fairness even against non-linear classifiers. Our experiments show that TaCo outperforms state-of-the-art methods, achieving greater reductions in the prediction accuracy of sensitive attributes by non-linear classifier while preserving overall task performance. Code is available on https://github.com/fanny-jourdan/TaCo.
Related papers
- Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [76.39651111467832]
We introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 seconds without necessitating additional fine-tuning.
To mitigate inappropriate content potentially represented by derived embeddings, RECE aligns them with harmless concepts in cross-attention layers.
The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts.
arXiv Detail & Related papers (2024-07-17T08:04:28Z) - Nonlinear Transformations Against Unlearnable Datasets [4.876873339297269]
Automated scraping stands out as a common method for collecting data in deep learning models without the authorization of data owners.
Recent studies have begun to tackle the privacy concerns associated with this data collection method.
The data generated by those approaches, called "unlearnable" examples, are prevented "learning" by deep learning models.
arXiv Detail & Related papers (2024-06-05T03:00:47Z) - Latent Enhancing AutoEncoder for Occluded Image Classification [2.6217304977339473]
We introduce LEARN: Latent Enhancing feAture Reconstruction Network.
An auto-encoder based network that can be incorporated into the classification model before its head.
On the OccludedPASCAL3D+ dataset, the proposed LEARN outperforms standard classification models.
arXiv Detail & Related papers (2024-02-10T12:22:31Z) - Credible Teacher for Semi-Supervised Object Detection in Open Scene [106.25850299007674]
In Open Scene Semi-Supervised Object Detection (O-SSOD), unlabeled data may contain unknown objects not observed in the labeled data.
It is detrimental to the current methods that mainly rely on self-training, as more uncertainty leads to the lower localization and classification precision of pseudo labels.
We propose Credible Teacher, an end-to-end framework to prevent uncertain pseudo labels from misleading the model.
arXiv Detail & Related papers (2024-01-01T08:19:21Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - Model Debiasing via Gradient-based Explanation on Representation [14.673988027271388]
We propose a novel fairness framework that performs debiasing with regard to sensitive attributes and proxy attributes.
Our framework achieves better fairness-accuracy trade-off on unstructured and structured datasets than previous state-of-the-art approaches.
arXiv Detail & Related papers (2023-05-20T11:57:57Z) - Shielded Representations: Protecting Sensitive Attributes Through
Iterative Gradient-Based Projection [39.16319169760823]
Iterative Gradient-Based Projection is a novel method for removing non-linear encoded concepts from neural representations.
Our results demonstrate that IGBP is effective in mitigating bias through intrinsic and extrinsic evaluations.
arXiv Detail & Related papers (2023-05-17T13:26:57Z) - Linear Adversarial Concept Erasure [108.37226654006153]
We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept.
We show that the method is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.
arXiv Detail & Related papers (2022-01-28T13:00:17Z) - Fairness via Representation Neutralization [60.90373932844308]
We propose a new mitigation technique, namely, Representation Neutralization for Fairness (RNF)
RNF achieves that fairness by debiasing only the task-specific classification head of DNN models.
Experimental results over several benchmark datasets demonstrate our RNF framework to effectively reduce discrimination of DNN models.
arXiv Detail & Related papers (2021-06-23T22:26:29Z) - Null It Out: Guarding Protected Attributes by Iterative Nullspace
Projection [51.041763676948705]
Iterative Null-space Projection (INLP) is a novel method for removing information from neural representations.
We show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.
arXiv Detail & Related papers (2020-04-16T14:02:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.