Biased Attention: Do Vision Transformers Amplify Gender Bias More than
Convolutional Neural Networks?
- URL: http://arxiv.org/abs/2309.08760v1
- Date: Fri, 15 Sep 2023 20:59:12 GMT
- Title: Biased Attention: Do Vision Transformers Amplify Gender Bias More than
Convolutional Neural Networks?
- Authors: Abhishek Mandal, Susan Leavy, and Suzanne Little
- Abstract summary: Deep neural networks used in computer vision have been shown to exhibit many social biases such as gender bias.
Vision Transformers (ViTs) have become increasingly popular in computer vision applications, outperforming Convolutional Neural Networks (CNNs) in many tasks such as image classification.
This research found that ViTs amplified gender bias to a greater extent than CNNs.
- Score: 2.8391805742728553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks used in computer vision have been shown to exhibit many
social biases such as gender bias. Vision Transformers (ViTs) have become
increasingly popular in computer vision applications, outperforming
Convolutional Neural Networks (CNNs) in many tasks such as image
classification. However, given that research on mitigating bias in computer
vision has primarily focused on CNNs, it is important to evaluate the effect of
a different network architecture on the potential for bias amplification. In
this paper we therefore introduce a novel metric to measure bias in
architectures, Accuracy Difference. We examine bias amplification when models
belonging to these two architectures are used as a part of large multimodal
models, evaluating the different image encoders of Contrastive Language Image
Pretraining which is an important model used in many generative models such as
DALL-E and Stable Diffusion. Our experiments demonstrate that architecture can
play a role in amplifying social biases due to the different techniques
employed by the models for feature extraction and embedding as well as their
different learning properties. This research found that ViTs amplified gender
bias to a greater extent than CNNs
Related papers
- Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition [49.14350399025926]
We apply pre-trained architectures, originally developed for the ImageNet Large Scale Visual Recognition Challenge, for periocular recognition.
Middle-layer features from CNNs and ViTs are a suitable way to recognize individuals based on periocular images.
arXiv Detail & Related papers (2024-07-28T11:52:36Z) - Super Consistency of Neural Network Landscapes and Learning Rate Transfer [72.54450821671624]
We study the landscape through the lens of the loss Hessian.
We find that certain spectral properties under $mu$P are largely independent of the size of the network.
We show that in the Neural Tangent Kernel (NTK) and other scaling regimes, the sharpness exhibits very different dynamics at different scales.
arXiv Detail & Related papers (2024-02-27T12:28:01Z) - ViTs are Everywhere: A Comprehensive Study Showcasing Vision
Transformers in Different Domain [0.0]
Vision Transformers (ViTs) are becoming more popular and dominant solutions for many vision problems.
ViTs can overcome several possible difficulties with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2023-10-09T12:31:30Z) - A Multidimensional Analysis of Social Biases in Vision Transformers [15.98510071115958]
We measure the impact of training data, model architecture, and training objectives on social biases in Vision Transformers (ViTs)
Our findings indicate that counterfactual augmentation training using diffusion-based image editing can mitigate biases, but does not eliminate them.
We find that larger models are less biased than smaller models, and that models trained using discriminative objectives are less biased than those trained using generative objectives.
arXiv Detail & Related papers (2023-08-03T09:03:40Z) - Connecting metrics for shape-texture knowledge in computer vision [1.7785095623975342]
Deep neural networks remain brittle and susceptible to many changes in the image that do not cause humans to misclassify images.
Part of this different behavior may be explained by the type of features humans and deep neural networks use in vision tasks.
arXiv Detail & Related papers (2023-01-25T14:37:42Z) - Prune and distill: similar reformatting of image information along rat
visual cortex and deep neural networks [61.60177890353585]
Deep convolutional neural networks (CNNs) have been shown to provide excellent models for its functional analogue in the brain, the ventral stream in visual cortex.
Here we consider some prominent statistical patterns that are known to exist in the internal representations of either CNNs or the visual cortex.
We show that CNNs and visual cortex share a similarly tight relationship between dimensionality expansion/reduction of object representations and reformatting of image information.
arXiv Detail & Related papers (2022-05-27T08:06:40Z) - Do Vision Transformers See Like Convolutional Neural Networks? [45.69780772718875]
Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks.
Are they acting like convolutional networks, or learning entirely different visual representations?
We find striking differences between the two architectures, such as ViT having more uniform representations across all layers.
arXiv Detail & Related papers (2021-08-19T17:27:03Z) - Are Convolutional Neural Networks or Transformers more like human
vision? [9.83454308668432]
We show that attention-based networks can achieve higher accuracy than CNNs on vision tasks.
These results have implications both for building more human-like vision models, as well as for understanding visual object recognition in humans.
arXiv Detail & Related papers (2021-05-15T10:33:35Z) - Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence.
Transformers require minimal inductive biases for their design and are naturally suited as set-functions.
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z) - A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism.
In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z) - Learning from Failure: Training Debiased Classifier from Biased
Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge.
We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously.
Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.