Utilizing Adversarial Examples for Bias Mitigation and Accuracy Enhancement
- URL: http://arxiv.org/abs/2404.11819v2
- Date: Thu, 27 Jun 2024 23:16:58 GMT
- Title: Utilizing Adversarial Examples for Bias Mitigation and Accuracy Enhancement
- Authors: Pushkar Shukla, Dhruv Srikanth, Lee Cohen, Matthew Turk,
- Abstract summary: We propose a novel approach to mitigate biases in computer vision models by utilizing counterfactual generation and fine-tuning.
Our approach leverages a curriculum learning framework combined with a fine-grained adversarial loss to fine-tune the model using adversarial examples.
We validate our approach through both qualitative and quantitative assessments, demonstrating improved bias mitigation and accuracy compared to existing methods.
- Score: 3.0820287240219795
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel approach to mitigate biases in computer vision models by utilizing counterfactual generation and fine-tuning. While counterfactuals have been used to analyze and address biases in DNN models, the counterfactuals themselves are often generated from biased generative models, which can introduce additional biases or spurious correlations. To address this issue, we propose using adversarial images, that is images that deceive a deep neural network but not humans, as counterfactuals for fair model training. Our approach leverages a curriculum learning framework combined with a fine-grained adversarial loss to fine-tune the model using adversarial examples. By incorporating adversarial images into the training data, we aim to prevent biases from propagating through the pipeline. We validate our approach through both qualitative and quantitative assessments, demonstrating improved bias mitigation and accuracy compared to existing methods. Qualitatively, our results indicate that post-training, the decisions made by the model are less dependent on the sensitive attribute and our model better disentangles the relationship between sensitive attributes and classification variables.
Related papers
- Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Improving Bias Mitigation through Bias Experts in Natural Language
Understanding [10.363406065066538]
We propose a new debiasing framework that introduces binary classifiers between the auxiliary model and the main model.
Our proposed strategy improves the bias identification ability of the auxiliary model.
arXiv Detail & Related papers (2023-12-06T16:15:00Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Toward Fair Facial Expression Recognition with Improved Distribution
Alignment [19.442685015494316]
We present a novel approach to mitigate bias in facial expression recognition (FER) models.
Our method aims to reduce sensitive attribute information such as gender, age, or race, in the embeddings produced by FER models.
For the first time, we analyze the notion of attractiveness as an important sensitive attribute in FER models and demonstrate that FER models can indeed exhibit biases towards more attractive faces.
arXiv Detail & Related papers (2023-06-11T14:59:20Z) - Through a fair looking-glass: mitigating bias in image datasets [1.0323063834827415]
We present a fast and effective model to de-bias an image dataset through reconstruction and minimizing the statistical dependence between intended variables.
We evaluate our proposed model on CelebA dataset, compare the results with a state-of-the-art de-biasing method, and show that the model achieves a promising fairness-accuracy combination.
arXiv Detail & Related papers (2022-09-18T20:28:36Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - Robust Sensible Adversarial Learning of Deep Neural Networks for Image
Classification [6.594522185216161]
We introduce sensible adversarial learning and demonstrate the synergistic effect between pursuits of standard natural accuracy and robustness.
Specifically, we define a sensible adversary which is useful for learning a robust model while keeping high natural accuracy.
We propose a novel and efficient algorithm that trains a robust model using implicit loss truncation.
arXiv Detail & Related papers (2022-05-20T22:57:44Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - A Relational Model for One-Shot Classification [80.77724423309184]
We show that a deep learning model with built-in inductive bias can bring benefits to sample-efficient learning, without relying on extensive data augmentation.
The proposed one-shot classification model performs relational matching of a pair of inputs in the form of local and pairwise attention.
arXiv Detail & Related papers (2021-11-08T07:53:12Z) - Harnessing Perceptual Adversarial Patches for Crowd Counting [92.79051296850405]
Crowd counting is vulnerable to adversarial examples in the physical world.
This paper proposes the Perceptual Adrial Patch (PAP) generation framework to learn the shared perceptual features between models.
arXiv Detail & Related papers (2021-09-16T13:51:39Z) - Evaluating and Mitigating Bias in Image Classifiers: A Causal
Perspective Using Counterfactuals [27.539001365348906]
We present a method for generating counterfactuals by incorporating a structural causal model (SCM) in an improved variant of Adversarially Learned Inference (ALI)
We show how to explain a pre-trained machine learning classifier, evaluate its bias, and mitigate the bias using a counterfactual regularizer.
arXiv Detail & Related papers (2020-09-17T13:19:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.