Explainability of Vision Transformers: A Comprehensive Review and New
Perspectives
- URL: http://arxiv.org/abs/2311.06786v1
- Date: Sun, 12 Nov 2023 09:23:40 GMT
- Title: Explainability of Vision Transformers: A Comprehensive Review and New
Perspectives
- Authors: Rojina Kashefi, Leili Barekatain, Mohammad Sabokrou, Fatemeh
Aghaeipoor
- Abstract summary: Transformers have had a significant impact on natural language processing and have recently demonstrated their potential in computer vision.
This study explores different explainability methods proposed for visual transformers and presents a taxonomy for organizing them.
It provides a comprehensive review of evaluation criteria that can be used for comparing explanation results.
- Score: 11.853186902106067
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Transformers have had a significant impact on natural language processing and
have recently demonstrated their potential in computer vision. They have shown
promising results over convolution neural networks in fundamental computer
vision tasks. However, the scientific community has not fully grasped the inner
workings of vision transformers, nor the basis for their decision-making, which
underscores the importance of explainability methods. Understanding how these
models arrive at their decisions not only improves their performance but also
builds trust in AI systems. This study explores different explainability
methods proposed for visual transformers and presents a taxonomy for organizing
them according to their motivations, structures, and application scenarios. In
addition, it provides a comprehensive review of evaluation criteria that can be
used for comparing explanation results, as well as explainability tools and
frameworks. Finally, the paper highlights essential but unexplored aspects that
can enhance the explainability of visual transformers, and promising research
directions are suggested for future investment.
Related papers
- A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships [0.5639904484784127]
Transformer-based models have transformed the landscape of natural language processing (NLP)
These models are renowned for their ability to capture long-range dependencies and contextual information.
We discuss potential research directions and applications of transformer-based models in computer vision.
arXiv Detail & Related papers (2024-08-27T16:22:18Z) - Adventures of Trustworthy Vision-Language Models: A Survey [54.76511683427566]
This paper conducts a thorough examination of vision-language transformers, employing three fundamental principles of responsible AI: Bias, Robustness, and Interpretability.
The primary objective of this paper is to delve into the intricacies and complexities associated with the practical use of transformers, with the overarching goal of advancing our comprehension of how to enhance their reliability and accountability.
arXiv Detail & Related papers (2023-12-07T11:31:20Z) - Interpret Vision Transformers as ConvNets with Dynamic Convolutions [70.59235381143831]
We interpret vision Transformers as ConvNets with dynamic convolutions, which enables us to characterize existing Transformers and dynamic ConvNets in a unified framework.
Our interpretation can also guide the network design as researchers now can consider vision Transformers from the design space of ConvNets.
arXiv Detail & Related papers (2023-09-19T16:00:49Z) - What Makes for Good Tokenizers in Vision Transformer? [62.44987486771936]
transformers are capable of extracting their pairwise relationships using self-attention.
What makes for a good tokenizer has not been well understood in computer vision.
Modulation across Tokens (MoTo) incorporates inter-token modeling capability through normalization.
Regularization objective TokenProp is embraced in the standard training regime.
arXiv Detail & Related papers (2022-12-21T15:51:43Z) - Vision Transformers: State of the Art and Research Challenges [26.462994554165697]
This paper presents a comprehensive overview of the literature on different architecture designs and training tricks for vision transformers.
Our goal is to provide a systematic review with the open research opportunities.
arXiv Detail & Related papers (2022-07-07T02:01:56Z) - Visualizing and Understanding Patch Interactions in Vision Transformer [96.70401478061076]
Vision Transformer (ViT) has become a leading tool in various computer vision tasks.
We propose a novel explainable visualization approach to analyze and interpret the crucial attention interactions among patches for vision transformer.
arXiv Detail & Related papers (2022-03-11T13:48:11Z) - Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence.
Transformers require minimal inductive biases for their design and are naturally suited as set-functions.
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z) - A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism.
In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.