Causal Attention for Unbiased Visual Recognition
- URL: http://arxiv.org/abs/2108.08782v1
- Date: Thu, 19 Aug 2021 16:45:51 GMT
- Title: Causal Attention for Unbiased Visual Recognition
- Authors: Tan Wang, Chang Zhou, Qianru Sun, Hanwang Zhang
- Abstract summary: Attention module does not always help deep models learn causal features that are robust in any confounding context.
We propose causal attention module (CaaM) that self-annotates the confounders in unsupervised fashion.
In OOD settings, deep models with CaaM outperform those without it significantly.
- Score: 76.87114090435618
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Attention module does not always help deep models learn causal features that
are robust in any confounding context, e.g., a foreground object feature is
invariant to different backgrounds. This is because the confounders trick the
attention to capture spurious correlations that benefit the prediction when the
training and testing data are IID (identical & independent distribution); while
harm the prediction when the data are OOD (out-of-distribution). The sole
fundamental solution to learn causal attention is by causal intervention, which
requires additional annotations of the confounders, e.g., a "dog" model is
learned within "grass+dog" and "road+dog" respectively, so the "grass" and
"road" contexts will no longer confound the "dog" recognition. However, such
annotation is not only prohibitively expensive, but also inherently
problematic, as the confounders are elusive in nature. In this paper, we
propose a causal attention module (CaaM) that self-annotates the confounders in
unsupervised fashion. In particular, multiple CaaMs can be stacked and
integrated in conventional attention CNN and self-attention Vision Transformer.
In OOD settings, deep models with CaaM outperform those without it
significantly; even in IID settings, the attention localization is also
improved by CaaM, showing a great potential in applications that require robust
visual saliency. Codes are available at \url{https://github.com/Wangt-CN/CaaM}.
Related papers
- Seeing Through VisualBERT: A Causal Adventure on Memetic Landscapes [35.36331164446824]
We propose a framework based on a Structural Causal Model (SCM)
In this framework, VisualBERT is trained to predict the class of an input meme based on both meme input and causal concepts.
We find that input attribution methods do not guarantee causality within our framework, raising questions about their reliability in safety-critical applications.
arXiv Detail & Related papers (2024-10-17T12:32:00Z) - When Attention Sink Emerges in Language Models: An Empirical View [39.36282162213973]
Language Models (LMs) assign significant attention to the first token, even if it is not semantically important.
This phenomenon has been widely adopted in applications such as streaming/long context generation, KV cache optimization, inference acceleration, model quantization, and others.
We first demonstrate that attention sinks exist universally in LMs with various inputs, even in small models.
arXiv Detail & Related papers (2024-10-14T17:50:28Z) - Guiding Visual Question Answering with Attention Priors [76.21671164766073]
We propose to guide the attention mechanism using explicit linguistic-visual grounding.
This grounding is derived by connecting structured linguistic concepts in the query to their referents among the visual objects.
The resultant algorithm is capable of probing attention-based reasoning models, injecting relevant associative knowledge, and regulating the core reasoning process.
arXiv Detail & Related papers (2022-05-25T09:53:47Z) - Learning Target-aware Representation for Visual Tracking via Informative
Interactions [49.552877881662475]
We introduce a novel backbone architecture to improve target-perception ability of feature representation for tracking.
The proposed GIM module and InBN mechanism are general and applicable to different backbone types including CNN and Transformer.
arXiv Detail & Related papers (2022-01-07T16:22:27Z) - Vision Transformer with Deformable Attention [29.935891419574602]
Large, sometimes even global, receptive field endows Transformer models with higher representation power over their CNN counterparts.
We propose a novel deformable self-attention module, where the positions of key and value pairs in self-attention are selected in a data-dependent way.
We present Deformable Attention Transformer, a general backbone model with deformable attention for both image classification and dense prediction tasks.
arXiv Detail & Related papers (2022-01-03T08:29:01Z) - Deconfounded Video Moment Retrieval with Causal Intervention [80.90604360072831]
We tackle the task of video moment retrieval (VMR), which aims to localize a specific moment in a video according to a textual query.
Existing methods primarily model the matching relationship between query and moment by complex cross-modal interactions.
We propose a causality-inspired VMR framework that builds structural causal model to capture the true effect of query and video content on the prediction.
arXiv Detail & Related papers (2021-06-03T01:33:26Z) - Causal Attention for Vision-Language Tasks [142.82608295995652]
We present a novel attention mechanism: Causal Attention (CATT)
CATT removes the ever-elusive confounding effect in existing attention-based vision-language models.
In particular, we show that CATT has great potential in large-scale pre-training.
arXiv Detail & Related papers (2021-03-05T06:38:25Z) - SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity.
Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism.
We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.