Towards Evaluating Explanations of Vision Transformers for Medical
Imaging
- URL: http://arxiv.org/abs/2304.06133v1
- Date: Wed, 12 Apr 2023 19:37:28 GMT
- Title: Towards Evaluating Explanations of Vision Transformers for Medical
Imaging
- Authors: Piotr Komorowski, Hubert Baniecki, Przemys{\l}aw Biecek
- Abstract summary: Vision Transformer (ViT) is a promising alternative to convolutional neural networks for image classification.
This paper investigates the performance of various interpretation methods on a ViT applied to classify chest X-ray images.
- Score: 7.812073412066698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As deep learning models increasingly find applications in critical domains
such as medical imaging, the need for transparent and trustworthy
decision-making becomes paramount. Many explainability methods provide insights
into how these models make predictions by attributing importance to input
features. As Vision Transformer (ViT) becomes a promising alternative to
convolutional neural networks for image classification, its interpretability
remains an open research question. This paper investigates the performance of
various interpretation methods on a ViT applied to classify chest X-ray images.
We introduce the notion of evaluating faithfulness, sensitivity, and complexity
of ViT explanations. The obtained results indicate that Layerwise relevance
propagation for transformers outperforms Local interpretable model-agnostic
explanations and Attention visualization, providing a more accurate and
reliable representation of what a ViT has actually learned. Our findings
provide insights into the applicability of ViT explanations in medical imaging
and highlight the importance of using appropriate evaluation criteria for
comparing them.
Related papers
- Evaluating Visual Explanations of Attention Maps for Transformer-based Medical Imaging [2.6505619784178047]
We compare visual explanations of attention maps to other commonly used methods for medical imaging problems.
We find that attention maps show promise under certain conditions and generally surpass GradCAM in explainability.
Our findings indicate that the efficacy of attention maps as a method of interpretability is context-dependent and may be limited as they do not consistently provide the comprehensive insights required for robust medical decision-making.
arXiv Detail & Related papers (2025-03-12T16:52:52Z) - Hierarchical Vision Transformer with Prototypes for Interpretable Medical Image Classification [0.0]
We present HierViT, a Vision Transformer that is inherently interpretable and adapts its reasoning to that of humans.
It is evaluated on two medical benchmark datasets, LIDC-IDRI for lung assessment and derm7pt for skin lesion classification.
arXiv Detail & Related papers (2025-02-13T06:24:07Z) - Adaptive Knowledge Distillation for Classification of Hand Images using Explainable Vision Transformers [2.140951338124305]
This paper investigates the use of vision transformers (ViTs) for classification of hand images.
We use explainability tools to explore the internal representations of ViTs and assess their impact on the model outputs.
arXiv Detail & Related papers (2024-08-20T03:03:56Z) - A Recent Survey of Vision Transformers for Medical Image Segmentation [2.4895533667182703]
Vision Transformers (ViTs) have emerged as a promising technique for addressing the challenges in medical image segmentation.
Their multi-scale attention mechanism enables effective modeling of long-range dependencies between distant structures.
Recently, researchers have come up with various ViT-based approaches that incorporate CNNs in their architectures, known as Hybrid Vision Transformers (HVTs)
arXiv Detail & Related papers (2023-12-01T14:54:44Z) - ViT-DAE: Transformer-driven Diffusion Autoencoder for Histopathology
Image Analysis [4.724009208755395]
We present ViT-DAE, which integrates vision transformers (ViT) and diffusion autoencoders for high-quality histopathology image synthesis.
Our approach outperforms recent GAN-based and vanilla DAE methods in generating realistic images.
arXiv Detail & Related papers (2023-04-03T15:00:06Z) - Data-Efficient Vision Transformers for Multi-Label Disease
Classification on Chest Radiographs [55.78588835407174]
Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images.
ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present.
Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.
arXiv Detail & Related papers (2022-08-17T09:07:45Z) - Towards Trustworthy Healthcare AI: Attention-Based Feature Learning for
COVID-19 Screening With Chest Radiography [70.37371604119826]
Building AI models with trustworthiness is important especially in regulated areas such as healthcare.
Previous work uses convolutional neural networks as the backbone architecture, which has shown to be prone to over-caution and overconfidence in making decisions.
We propose a feature learning approach using Vision Transformers, which use an attention-based mechanism.
arXiv Detail & Related papers (2022-07-19T14:55:42Z) - Self-Supervised Vision Transformers Learn Visual Concepts in
Histopathology [5.164102666113966]
We conduct a search for good representations in pathology by training a variety of self-supervised models with validation on a variety of weakly-supervised and patch-level tasks.
Our key finding is in discovering that Vision Transformers using DINO-based knowledge distillation are able to learn data-efficient and interpretable features in histology images.
arXiv Detail & Related papers (2022-03-01T16:14:41Z) - Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems.
We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN)
We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z) - Semantic segmentation of multispectral photoacoustic images using deep
learning [53.65837038435433]
Photoacoustic imaging has the potential to revolutionise healthcare.
Clinical translation of the technology requires conversion of the high-dimensional acquired data into clinically relevant and interpretable information.
We present a deep learning-based approach to semantic segmentation of multispectral photoacoustic images.
arXiv Detail & Related papers (2021-05-20T09:33:55Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.