Fairness-aware Vision Transformer via Debiased Self-Attention
- URL: http://arxiv.org/abs/2301.13803v3
- Date: Thu, 11 Jul 2024 02:11:49 GMT
- Title: Fairness-aware Vision Transformer via Debiased Self-Attention
- Authors: Yao Qiang, Chengyin Li, Prashant Khanduri, Dongxiao Zhu,
- Abstract summary: Debiased Self-Attention (DSA) is a fairness-through-blindness approach that enforces Vision Transformer (ViT) to eliminate spurious features correlated with the sensitive label for bias mitigation.
Our framework leads to improved fairness guarantees over prior works on multiple prediction tasks without compromising target prediction performance.
- Score: 12.406960223371959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision Transformer (ViT) has recently gained significant attention in solving computer vision (CV) problems due to its capability of extracting informative features and modeling long-range dependencies through the attention mechanism. Whereas recent works have explored the trustworthiness of ViT, including its robustness and explainability, the issue of fairness has not yet been adequately addressed. We establish that the existing fairness-aware algorithms designed for CNNs do not perform well on ViT, which highlights the need to develop our novel framework via Debiased Self-Attention (DSA). DSA is a fairness-through-blindness approach that enforces ViT to eliminate spurious features correlated with the sensitive label for bias mitigation and simultaneously retain real features for target prediction. Notably, DSA leverages adversarial examples to locate and mask the spurious features in the input image patches with an additional attention weights alignment regularizer in the training objective to encourage learning real features for target prediction. Importantly, our DSA framework leads to improved fairness guarantees over prior works on multiple prediction tasks without compromising target prediction performance. Code is available at \href{https://github.com/qiangyao1988/DSA}{https://github.com/qiangyao1988/DSA}.
Related papers
- Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases [69.46487306858789]
Conditional Autoregressive Slot Attention (CA-SA) is a framework that enhances the temporal consistency of extracted object-centric representations in video-centric vision tasks.
We present qualitative and quantitative results showing that our proposed method outperforms the considered baselines on downstream tasks.
arXiv Detail & Related papers (2024-10-21T07:44:44Z) - FairViT: Fair Vision Transformer via Adaptive Masking [12.623905443515802]
Vision Transformer (ViT) has achieved excellent performance and demonstrated its promising potential in various computer vision tasks.
However, most ViT-based works do not take fairness into account and it is unclear whether directly applying CNN-oriented debiased algorithm to ViT is feasible.
We propose FairViT, a novel accurate and fair ViT framework.
arXiv Detail & Related papers (2024-07-20T08:10:37Z) - Uncertainty-boosted Robust Video Activity Anticipation [72.14155465769201]
Video activity anticipation aims to predict what will happen in the future, embracing a broad application prospect ranging from robot vision to autonomous driving.
Despite the recent progress, the data uncertainty issue, reflected as the content evolution process and dynamic correlation in event labels, has been somehow ignored.
We propose an uncertainty-boosted robust video activity anticipation framework, which generates uncertainty values to indicate the credibility of the anticipation results.
arXiv Detail & Related papers (2024-04-29T12:31:38Z) - Interpretability-Aware Vision Transformer [13.310757078491916]
Vision Transformers (ViTs) have become prominent models for solving various vision tasks.
We introduce a novel training procedure that inherently enhances model interpretability.
IA-ViT is composed of a feature extractor, a predictor, and an interpreter, which are trained jointly with an interpretability-aware training objective.
arXiv Detail & Related papers (2023-09-14T21:50:49Z) - ARBEx: Attentive Feature Extraction with Reliability Balancing for Robust Facial Expression Learning [5.648318448953635]
ARBEx is a novel attentive feature extraction framework driven by Vision Transformer.
We employ learnable anchor points in the embedding space with label distributions and multi-head self-attention mechanism to optimize performance against weak predictions.
Our strategy outperforms current state-of-the-art methodologies, according to extensive experiments conducted in a variety of contexts.
arXiv Detail & Related papers (2023-05-02T15:10:01Z) - Top-Down Visual Attention from Analysis by Synthesis [87.47527557366593]
We consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.
We propose Analysis-by-Synthesis Vision Transformer (AbSViT), which is a top-down modulated ViT model that variationally approximates AbS, and controllable achieves top-down attention.
arXiv Detail & Related papers (2023-03-23T05:17:05Z) - Function Composition in Trustworthy Machine Learning: Implementation
Choices, Insights, and Questions [28.643482049799477]
This paper focuses on compositions of functions arising from the different 'pillars' of trustworthiness.
We report initial empirical results and new insights on 7 real-world trustworthy dimensions - fairness and explainability.
We also report progress, and implementation choices, on an composer tool to encourage the combination of functionalities from multiple pillars.
arXiv Detail & Related papers (2023-02-17T23:49:16Z) - Understanding The Robustness in Vision Transformers [140.1090560977082]
Self-attention may promote robustness through improved mid-level representations.
We propose a family of fully attentional networks (FANs) that strengthen this capability.
Our model achieves a state of-the-art 87.1% accuracy and 35.8% mCE on ImageNet-1k and ImageNet-C with 76.8M parameters.
arXiv Detail & Related papers (2022-04-26T17:16:32Z) - On Exploring Pose Estimation as an Auxiliary Learning Task for
Visible-Infrared Person Re-identification [66.58450185833479]
In this paper, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework.
By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features.
Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-01-11T09:44:00Z) - Self-Supervision by Prediction for Object Discovery in Videos [62.87145010885044]
In this paper, we use the prediction task as self-supervision and build a novel object-centric model for image sequence representation.
Our framework can be trained without the help of any manual annotation or pretrained network.
Initial experiments confirm that the proposed pipeline is a promising step towards object-centric video prediction.
arXiv Detail & Related papers (2021-03-09T19:14:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.