Hierarchical Vision Transformer with Prototypes for Interpretable Medical Image Classification
- URL: http://arxiv.org/abs/2502.08997v1
- Date: Thu, 13 Feb 2025 06:24:07 GMT
- Title: Hierarchical Vision Transformer with Prototypes for Interpretable Medical Image Classification
- Authors: Luisa Gallée, Catharina Silvia Lisson, Meinrad Beer, Michael Götz,
- Abstract summary: We present HierViT, a Vision Transformer that is inherently interpretable and adapts its reasoning to that of humans.<n>It is evaluated on two medical benchmark datasets, LIDC-IDRI for lung assessment and derm7pt for skin lesion classification.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explainability is a highly demanded requirement for applications in high-risk areas such as medicine. Vision Transformers have mainly been limited to attention extraction to provide insight into the model's reasoning. Our approach combines the high performance of Vision Transformers with the introduction of new explainability capabilities. We present HierViT, a Vision Transformer that is inherently interpretable and adapts its reasoning to that of humans. A hierarchical structure is used to process domain-specific features for prediction. It is interpretable by design, as it derives the target output with human-defined features that are visualized by exemplary images (prototypes). By incorporating domain knowledge about these decisive features, the reasoning is semantically similar to human reasoning and therefore intuitive. Moreover, attention heatmaps visualize the crucial regions for identifying each feature, thereby providing HierViT with a versatile tool for validating predictions. Evaluated on two medical benchmark datasets, LIDC-IDRI for lung nodule assessment and derm7pt for skin lesion classification, HierViT achieves superior and comparable prediction accuracy, respectively, while offering explanations that align with human reasoning.
Related papers
- Evaluating Visual Explanations of Attention Maps for Transformer-based Medical Imaging [2.6505619784178047]
We compare visual explanations of attention maps to other commonly used methods for medical imaging problems.
We find that attention maps show promise under certain conditions and generally surpass GradCAM in explainability.
Our findings indicate that the efficacy of attention maps as a method of interpretability is context-dependent and may be limited as they do not consistently provide the comprehensive insights required for robust medical decision-making.
arXiv Detail & Related papers (2025-03-12T16:52:52Z) - DAPE V2: Process Attention Score as Feature Map for Length Extrapolation [63.87956583202729]
We conceptualize attention as a feature map and apply the convolution operator to mimic the processing methods in computer vision.
The novel insight, which can be adapted to various attention-related models, reveals that the current Transformer architecture has the potential for further evolution.
arXiv Detail & Related papers (2024-10-07T07:21:49Z) - Interpretable Medical Image Classification using Prototype Learning and
Privileged Information [0.0]
Interpretability is often an essential requirement in medical imaging.
In this work, we investigate whether additional information available during the training process can be used to create an understandable and powerful model.
We propose an innovative solution called Proto-Caps that leverages the benefits of capsule networks, prototype learning and the use of privileged information.
arXiv Detail & Related papers (2023-10-24T11:28:59Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - Improving Explainability of Disentangled Representations using
Multipath-Attribution Mappings [12.145748796751619]
We propose a framework that utilizes interpretable disentangled representations for downstream-task prediction.
We demonstrate the effectiveness of our approach on a synthetic benchmark suite and two medical datasets.
arXiv Detail & Related papers (2023-06-15T10:52:29Z) - Towards Evaluating Explanations of Vision Transformers for Medical
Imaging [7.812073412066698]
Vision Transformer (ViT) is a promising alternative to convolutional neural networks for image classification.
This paper investigates the performance of various interpretation methods on a ViT applied to classify chest X-ray images.
arXiv Detail & Related papers (2023-04-12T19:37:28Z) - Parameter-Efficient Transformer with Hybrid Axial-Attention for Medical
Image Segmentation [10.441315305453504]
We propose a parameter-efficient transformer to explore intrinsic inductive bias via position information for medical image segmentation.
Motivated by this, we present a novel Hybrid Axial-Attention (HAA) that can be equipped with spatial pixel-wise information and relative position information as inductive bias.
arXiv Detail & Related papers (2022-11-17T13:54:55Z) - Towards Trustworthy Healthcare AI: Attention-Based Feature Learning for
COVID-19 Screening With Chest Radiography [70.37371604119826]
Building AI models with trustworthiness is important especially in regulated areas such as healthcare.
Previous work uses convolutional neural networks as the backbone architecture, which has shown to be prone to over-caution and overconfidence in making decisions.
We propose a feature learning approach using Vision Transformers, which use an attention-based mechanism.
arXiv Detail & Related papers (2022-07-19T14:55:42Z) - Visualizing and Understanding Patch Interactions in Vision Transformer [96.70401478061076]
Vision Transformer (ViT) has become a leading tool in various computer vision tasks.
We propose a novel explainable visualization approach to analyze and interpret the crucial attention interactions among patches for vision transformer.
arXiv Detail & Related papers (2022-03-11T13:48:11Z) - Class-Aware Generative Adversarial Transformers for Medical Image
Segmentation [39.14169989603906]
We present CA-GANformer, a novel type of generative adversarial transformers, for medical image segmentation.
First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations.
We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures.
arXiv Detail & Related papers (2022-01-26T03:50:02Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z) - Generative Counterfactuals for Neural Networks via Attribute-Informed
Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP)
By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently.
Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.