Fairness-aware Vision Transformer via Debiased Self-Attention
- URL: http://arxiv.org/abs/2301.13803v3
- Date: Thu, 11 Jul 2024 02:11:49 GMT
- Title: Fairness-aware Vision Transformer via Debiased Self-Attention
- Authors: Yao Qiang, Chengyin Li, Prashant Khanduri, Dongxiao Zhu,
- Abstract summary: Debiased Self-Attention (DSA) is a fairness-through-blindness approach that enforces Vision Transformer (ViT) to eliminate spurious features correlated with the sensitive label for bias mitigation.
Our framework leads to improved fairness guarantees over prior works on multiple prediction tasks without compromising target prediction performance.
- Score: 12.406960223371959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision Transformer (ViT) has recently gained significant attention in solving computer vision (CV) problems due to its capability of extracting informative features and modeling long-range dependencies through the attention mechanism. Whereas recent works have explored the trustworthiness of ViT, including its robustness and explainability, the issue of fairness has not yet been adequately addressed. We establish that the existing fairness-aware algorithms designed for CNNs do not perform well on ViT, which highlights the need to develop our novel framework via Debiased Self-Attention (DSA). DSA is a fairness-through-blindness approach that enforces ViT to eliminate spurious features correlated with the sensitive label for bias mitigation and simultaneously retain real features for target prediction. Notably, DSA leverages adversarial examples to locate and mask the spurious features in the input image patches with an additional attention weights alignment regularizer in the training objective to encourage learning real features for target prediction. Importantly, our DSA framework leads to improved fairness guarantees over prior works on multiple prediction tasks without compromising target prediction performance. Code is available at \href{https://github.com/qiangyao1988/DSA}{https://github.com/qiangyao1988/DSA}.
Related papers
- FairViT: Fair Vision Transformer via Adaptive Masking [12.623905443515802]
Vision Transformer (ViT) has achieved excellent performance and demonstrated its promising potential in various computer vision tasks.
However, most ViT-based works do not take fairness into account and it is unclear whether directly applying CNN-oriented debiased algorithm to ViT is feasible.
We propose FairViT, a novel accurate and fair ViT framework.
arXiv Detail & Related papers (2024-07-20T08:10:37Z) - Uncertainty-boosted Robust Video Activity Anticipation [72.14155465769201]
Video activity anticipation aims to predict what will happen in the future, embracing a broad application prospect ranging from robot vision to autonomous driving.
Despite the recent progress, the data uncertainty issue, reflected as the content evolution process and dynamic correlation in event labels, has been somehow ignored.
We propose an uncertainty-boosted robust video activity anticipation framework, which generates uncertainty values to indicate the credibility of the anticipation results.
arXiv Detail & Related papers (2024-04-29T12:31:38Z) - Interpretability-Aware Vision Transformer [13.310757078491916]
Vision Transformers (ViTs) have become prominent models for solving various vision tasks.
We introduce a novel training procedure that inherently enhances model interpretability.
IA-ViT is composed of a feature extractor, a predictor, and an interpreter, which are trained jointly with an interpretability-aware training objective.
arXiv Detail & Related papers (2023-09-14T21:50:49Z) - ARBEx: Attentive Feature Extraction with Reliability Balancing for
Robust Facial Expression Learning [1.9844265130823329]
ARBEx is a novel attentive feature extraction framework driven by Vision Transformer.
We employ learnable anchor points in the embedding space with label distributions and multi-head self-attention mechanism to optimize performance against weak predictions.
Our strategy outperforms current state-of-the-art methodologies, according to extensive experiments conducted in a variety of contexts.
arXiv Detail & Related papers (2023-05-02T15:10:01Z) - VISION DIFFMASK: Faithful Interpretation of Vision Transformers with
Differentiable Patch Masking [10.345616883018296]
We propose a post-hoc interpretability method called VISION DIFFMASK.
It uses the activations of the model's hidden layers to predict the relevant parts of the input that contribute to its final predictions.
Our approach uses a gating mechanism to identify the minimal subset of the original input that preserves the predicted distribution over classes.
arXiv Detail & Related papers (2023-04-13T10:49:26Z) - Top-Down Visual Attention from Analysis by Synthesis [87.47527557366593]
We consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.
We propose Analysis-by-Synthesis Vision Transformer (AbSViT), which is a top-down modulated ViT model that variationally approximates AbS, and controllable achieves top-down attention.
arXiv Detail & Related papers (2023-03-23T05:17:05Z) - Function Composition in Trustworthy Machine Learning: Implementation
Choices, Insights, and Questions [28.643482049799477]
This paper focuses on compositions of functions arising from the different 'pillars' of trustworthiness.
We report initial empirical results and new insights on 7 real-world trustworthy dimensions - fairness and explainability.
We also report progress, and implementation choices, on an composer tool to encourage the combination of functionalities from multiple pillars.
arXiv Detail & Related papers (2023-02-17T23:49:16Z) - Understanding The Robustness in Vision Transformers [140.1090560977082]
Self-attention may promote robustness through improved mid-level representations.
We propose a family of fully attentional networks (FANs) that strengthen this capability.
Our model achieves a state of-the-art 87.1% accuracy and 35.8% mCE on ImageNet-1k and ImageNet-C with 76.8M parameters.
arXiv Detail & Related papers (2022-04-26T17:16:32Z) - On Exploring Pose Estimation as an Auxiliary Learning Task for
Visible-Infrared Person Re-identification [66.58450185833479]
In this paper, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework.
By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features.
Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-01-11T09:44:00Z) - Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems.
We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN)
We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z) - Self-Supervision by Prediction for Object Discovery in Videos [62.87145010885044]
In this paper, we use the prediction task as self-supervision and build a novel object-centric model for image sequence representation.
Our framework can be trained without the help of any manual annotation or pretrained network.
Initial experiments confirm that the proposed pipeline is a promising step towards object-centric video prediction.
arXiv Detail & Related papers (2021-03-09T19:14:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.