Stable Vision Concept Transformers for Medical Diagnosis
- URL: http://arxiv.org/abs/2506.05286v1
- Date: Thu, 05 Jun 2025 17:43:27 GMT
- Title: Stable Vision Concept Transformers for Medical Diagnosis
- Authors: Lijie Hu, Songning Lai, Yuan Hua, Shu Yang, Jingfeng Zhang, Di Wang,
- Abstract summary: Concept Bottleneck Models (CBMs) aim to restrict the model's latent space to human-understandable high-level concepts.<n>CBMs rely solely on concept features to determine the model's predictions.<n>Existing methods rely solely on concept features to determine the model's predictions.
- Score: 14.082818181995776
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transparency is a paramount concern in the medical field, prompting researchers to delve into the realm of explainable AI (XAI). Among these XAI methods, Concept Bottleneck Models (CBMs) aim to restrict the model's latent space to human-understandable high-level concepts by generating a conceptual layer for extracting conceptual features, which has drawn much attention recently. However, existing methods rely solely on concept features to determine the model's predictions, which overlook the intrinsic feature embeddings within medical images. To address this utility gap between the original models and concept-based models, we propose Vision Concept Transformer (VCT). Furthermore, despite their benefits, CBMs have been found to negatively impact model performance and fail to provide stable explanations when faced with input perturbations, which limits their application in the medical field. To address this faithfulness issue, this paper further proposes the Stable Vision Concept Transformer (SVCT) based on VCT, which leverages the vision transformer (ViT) as its backbone and incorporates a conceptual layer. SVCT employs conceptual features to enhance decision-making capabilities by fusing them with image features and ensures model faithfulness through the integration of Denoised Diffusion Smoothing. Comprehensive experiments on four medical datasets demonstrate that our VCT and SVCT maintain accuracy while remaining interpretable compared to baselines. Furthermore, even when subjected to perturbations, our SVCT model consistently provides faithful explanations, thus meeting the needs of the medical field.
Related papers
- GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning [50.94508930739623]
Medical visual question answering aims to support clinical decision-making by enabling models to answer natural language questions based on medical images.<n>Current methods still suffer from limited answer reliability and poor interpretability, impairing the ability of clinicians and patients to understand and trust model-generated answers.<n>This work first proposes a Thinking with Visual Grounding dataset wherein the answer generation is decomposed into intermediate reasoning steps.<n>We introduce a novel verifiable reward mechanism for reinforcement learning to guide post-training, improving the alignment between the model's reasoning process and its final answer.
arXiv Detail & Related papers (2025-06-22T08:09:58Z) - Towards generating more interpretable counterfactuals via concept vectors: a preliminary study on chest X-rays [46.667021835430155]
We map clinical concepts into the latent space of generative models to identify Concept Activation Vectors (CAVs)<n>The extracted concepts are stable across datasets, enabling visual explanations that highlight clinically relevant features.<n>Preliminary results on chest X-rays show promise for large pathologies like cardiomegaly, while smaller pathologies remain challenging.
arXiv Detail & Related papers (2025-06-04T15:23:12Z) - Interactive Medical Image Analysis with Concept-based Similarity Reasoning [32.38056136570339]
Concept-based Similarity Reasoning network (CSR) provides patch-level prototype with intrinsic concept interpretation.<n>CSR improves upon prior state-of-the-art interpretable methods by up to 4.5% across three biomedical datasets.
arXiv Detail & Related papers (2025-03-10T02:52:47Z) - Concept Complement Bottleneck Model for Interpretable Medical Image Diagnosis [8.252227380729188]
We propose a concept complement bottleneck model for interpretable medical image diagnosis.<n>We propose to use concept adapters for specific concepts to mine the concept differences and score concepts in their own attention channels.<n>Our model outperforms the state-of-the-art competitors in concept detection and disease diagnosis tasks.
arXiv Detail & Related papers (2024-10-20T16:52:09Z) - Evidential Concept Embedding Models: Towards Reliable Concept Explanations for Skin Disease Diagnosis [24.946148305384202]
Concept Bottleneck Models (CBM) have emerged as an active interpretable framework incorporating human-interpretable concepts into decision-making.
We propose an evidential Concept Embedding Model (evi-CEM) which employs evidential learning to model the concept uncertainty.
Our evaluation demonstrates that evi-CEM achieves superior performance in terms of concept prediction.
arXiv Detail & Related papers (2024-06-27T12:29:50Z) - Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions.
Existing approaches often require numerous human interventions per image to achieve strong performances.
We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z) - MICA: Towards Explainable Skin Lesion Diagnosis via Multi-Level
Image-Concept Alignment [4.861768967055006]
We propose a multi-modal explainable disease diagnosis framework that meticulously aligns medical images and clinical-related concepts semantically at multiple strata.
Our method, while preserving model interpretability, attains high performance and label efficiency for concept detection and disease diagnosis.
arXiv Detail & Related papers (2024-01-16T17:45:01Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - Coherent Concept-based Explanations in Medical Image and Its Application
to Skin Lesion Diagnosis [0.0]
Existing deep learning approaches for melanoma skin lesion diagnosis are deemed black-box models.
We propose an inherently interpretable framework to improve the interpretability of concept-based models.
Our method outperforms existing black-box and concept-based models for skin lesion classification.
arXiv Detail & Related papers (2023-04-10T13:32:04Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - Explainable fetal ultrasound quality assessment with progressive concept bottleneck models [6.734637459963132]
We propose a holistic and explainable method for fetal ultrasound quality assessment.<n>We introduce human-readable concepts" into the task and imitate the sequential expert decision-making process.<n> Experiments show that our model outperforms equivalent concept-free models on an in-house dataset.
arXiv Detail & Related papers (2022-11-19T09:31:19Z) - Towards Trustworthy Healthcare AI: Attention-Based Feature Learning for
COVID-19 Screening With Chest Radiography [70.37371604119826]
Building AI models with trustworthiness is important especially in regulated areas such as healthcare.
Previous work uses convolutional neural networks as the backbone architecture, which has shown to be prone to over-caution and overconfidence in making decisions.
We propose a feature learning approach using Vision Transformers, which use an attention-based mechanism.
arXiv Detail & Related papers (2022-07-19T14:55:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.