From Hope to Safety: Unlearning Biases of Deep Models via Gradient
Penalization in Latent Space
- URL: http://arxiv.org/abs/2308.09437v3
- Date: Mon, 18 Dec 2023 15:36:13 GMT
- Title: From Hope to Safety: Unlearning Biases of Deep Models via Gradient
Penalization in Latent Space
- Authors: Maximilian Dreyer, Frederik Pahde, Christopher J. Anders, Wojciech
Samek, Sebastian Lapuschkin
- Abstract summary: Deep Neural Networks are prone to learning spurious correlations embedded in the training data, leading to potentially biased predictions.
This poses risks when deploying these models for high-stake decision-making, such as in medical applications.
We present a novel method for model correction on the concept level that explicitly reduces model sensitivity towards biases via gradient penalization.
- Score: 13.763716495058294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks are prone to learning spurious correlations embedded in
the training data, leading to potentially biased predictions. This poses risks
when deploying these models for high-stake decision-making, such as in medical
applications. Current methods for post-hoc model correction either require
input-level annotations which are only possible for spatially localized biases,
or augment the latent feature space, thereby hoping to enforce the right
reasons. We present a novel method for model correction on the concept level
that explicitly reduces model sensitivity towards biases via gradient
penalization. When modeling biases via Concept Activation Vectors, we highlight
the importance of choosing robust directions, as traditional regression-based
approaches such as Support Vector Machines tend to result in diverging
directions. We effectively mitigate biases in controlled and real-world
settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet
and EfficientNet architectures. Code is available on
https://github.com/frederikpahde/rrclarc.
Related papers
- Steering Without Side Effects: Improving Post-Deployment Control of Language Models [61.99293520621248]
Language models (LMs) have been shown to behave unexpectedly post-deployment.
We present KL-then-steer (KTS), a technique that decreases the side effects of steering while retaining its benefits.
Our best method prevents 44% of jailbreak attacks compared to the original Llama-2-chat-7B model.
arXiv Detail & Related papers (2024-06-21T01:37:39Z) - GRANP: A Graph Recurrent Attentive Neural Process Model for Vehicle Trajectory Prediction [3.031375888004876]
We propose a novel model named Graph Recurrent Attentive Neural Process (GRANP) for vehicle trajectory prediction.
GRANP contains an encoder with deterministic and latent paths, and a decoder for prediction.
We show that GRANP achieves state-of-the-art results and can efficiently quantify uncertainties.
arXiv Detail & Related papers (2024-04-09T05:51:40Z) - Diffusion-Model-Assisted Supervised Learning of Generative Models for
Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation.
We use the score-based diffusion model to generate labeled data.
Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z) - Studying How to Efficiently and Effectively Guide Models with Explanations [52.498055901649025]
'Model guidance' is the idea of regularizing the models' explanations to ensure that they are "right for the right reasons"
We conduct an in-depth evaluation across various loss functions, attribution methods, models, and 'guidance depths' on the PASCAL VOC 2007 and MS COCO 2014 datasets.
Specifically, we guide the models via bounding box annotations, which are much cheaper to obtain than the commonly used segmentation masks.
arXiv Detail & Related papers (2023-03-21T15:34:50Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Scaling Laws Beyond Backpropagation [64.0476282000118]
We study the ability of Direct Feedback Alignment to train causal decoder-only Transformers efficiently.
We find that DFA fails to offer more efficient scaling than backpropagation.
arXiv Detail & Related papers (2022-10-26T10:09:14Z) - Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face
Recognition [107.58227666024791]
Face recognition systems are widely deployed in safety-critical applications, including law enforcement.
They exhibit bias across a range of socio-demographic dimensions, such as gender and race.
Previous works on bias mitigation largely focused on pre-processing the training data.
arXiv Detail & Related papers (2022-10-18T15:46:05Z) - Anomaly Localization in Model Gradients Under Backdoor Attacks Against
Federated Learning [0.6091702876917281]
In this study, we make a deep gradient-level analysis on the expected variations in model gradients under several backdoor attack scenarios.
Our main novel finding is that backdoor-induced anomalies in local model updates (weights or gradients) appear in the final layer bias weights of the malicious local models.
arXiv Detail & Related papers (2021-11-29T16:46:01Z) - PRECODE - A Generic Model Extension to Prevent Deep Gradient Leakage [0.8029049649310213]
Collaborative training of neural networks leverages distributed data by exchanging gradient information between different clients.
gradient perturbation techniques have been proposed to enhance privacy, but they come at the cost of reduced model performance, increased convergence time, or increased data demand.
We introduce PRECODE, a PRivacy EnhanCing mODulE that can be used as generic extension for arbitrary model architectures.
arXiv Detail & Related papers (2021-08-10T14:43:17Z) - A Deep Latent Space Model for Graph Representation Learning [10.914558012458425]
We propose a Deep Latent Space Model (DLSM) for directed graphs to incorporate the traditional latent variable based generative model into deep learning frameworks.
Our proposed model consists of a graph convolutional network (GCN) encoder and a decoder, which are layer-wise connected by a hierarchical variational auto-encoder architecture.
Experiments on real-world datasets show that the proposed model achieves the state-of-the-art performances on both link prediction and community detection tasks.
arXiv Detail & Related papers (2021-06-22T12:41:19Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.