Mammo-CLIP Dissect: A Framework for Analysing Mammography Concepts in Vision-Language Models
- URL: http://arxiv.org/abs/2509.21102v1
- Date: Thu, 25 Sep 2025 12:47:27 GMT
- Title: Mammo-CLIP Dissect: A Framework for Analysing Mammography Concepts in Vision-Language Models
- Authors: Suaiba Amina Salahuddin, Teresa Dorszewski, Marit Almenning Martiniussen, Tone Hovda, Antonio Portaluri, Solveig Thrun, Michael Kampffmeyer, Elisabeth Wetzer, Kristoffer Wickstrøm, Robert Jenssen,
- Abstract summary: Mammo-CLIP Dissect is the first concept-based explainability framework for systematically dissecting deep learning vision models trained for mammography.<n>We investigate how concept learning differs between DL vision models trained on general image datasets versus mammography-specific datasets.<n>We show that models trained on mammography data capture more clinically relevant concepts and align more closely with radiologists' than models not trained on mammography data.
- Score: 18.596949417108423
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding what deep learning (DL) models learn is essential for the safe deployment of artificial intelligence (AI) in clinical settings. While previous work has focused on pixel-based explainability methods, less attention has been paid to the textual concepts learned by these models, which may better reflect the reasoning used by clinicians. We introduce Mammo-CLIP Dissect, the first concept-based explainability framework for systematically dissecting DL vision models trained for mammography. Leveraging a mammography-specific vision-language model (Mammo-CLIP) as a "dissector," our approach labels neurons at specified layers with human-interpretable textual concepts and quantifies their alignment to domain knowledge. Using Mammo-CLIP Dissect, we investigate three key questions: (1) how concept learning differs between DL vision models trained on general image datasets versus mammography-specific datasets; (2) how fine-tuning for downstream mammography tasks affects concept specialisation; and (3) which mammography-relevant concepts remain underrepresented. We show that models trained on mammography data capture more clinically relevant concepts and align more closely with radiologists' workflows than models not trained on mammography data. Fine-tuning for task-specific classification enhances the capture of certain concept categories (e.g., benign calcifications) but can reduce coverage of others (e.g., density-related features), indicating a trade-off between specialisation and generalisation. Our findings show that Mammo-CLIP Dissect provides insights into how convolutional neural networks (CNNs) capture mammography-specific knowledge. By comparing models across training data and fine-tuning regimes, we reveal how domain-specific training and task-specific adaptation shape concept learning. Code and concept set are available: https://github.com/Suaiba/Mammo-CLIP-Dissect.
Related papers
- Training-free Test-time Improvement for Explainable Medical Image Classification [11.320534249593171]
We propose a training-free confusion concept identification strategy for medical image classification.<n>Our approach enhances out-of-domain performance without sacrificing source domain accuracy.<n>Our method is validated on both skin and white blood cell images.
arXiv Detail & Related papers (2025-06-22T15:37:13Z) - Pathological Prior-Guided Multiple Instance Learning For Mitigating Catastrophic Forgetting in Breast Cancer Whole Slide Image Classification [50.899861205016265]
We propose a new framework PaGMIL to mitigate catastrophic forgetting in breast cancer WSI classification.<n>Our framework introduces two key components into the common MIL model architecture.<n>We evaluate the continual learning performance of PaGMIL across several public breast cancer datasets.
arXiv Detail & Related papers (2025-03-08T04:51:58Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Pixel-Level Explanation of Multiple Instance Learning Models in
Biomedical Single Cell Images [52.527733226555206]
We investigate the use of four attribution methods to explain a multiple instance learning models.
We study two datasets of acute myeloid leukemia with over 100 000 single cell images.
We compare attribution maps with the annotations of a medical expert to see how the model's decision-making differs from the human standard.
arXiv Detail & Related papers (2023-03-15T14:00:11Z) - IAIA-BL: A Case-based Interpretable Deep Learning Model for
Classification of Mass Lesions in Digital Mammography [20.665935997959025]
Interpretability in machine learning models is important in high-stakes decisions.
We present a framework for interpretable machine learning-based mammography.
arXiv Detail & Related papers (2021-03-23T05:00:21Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z) - Abstracting Deep Neural Networks into Concept Graphs for Concept Level
Interpretability [0.39635467316436124]
We attempt to understand the behavior of trained models that perform image processing tasks in the medical domain by building a graphical representation of the concepts they learn.
We show the application of our proposed implementation on two biomedical problems - brain tumor segmentation and fundus image classification.
arXiv Detail & Related papers (2020-08-14T16:34:32Z) - On Interpretability of Deep Learning based Skin Lesion Classifiers using
Concept Activation Vectors [6.188009802619095]
We use a well-trained and high performing neural network for classification of three skin tumours, i.e. Melanocytic Naevi, Melanoma and Seborrheic Keratosis.
Human understandable concepts are mapped to RECOD image classification model with the help of Concept Activation Vectors (CAVs)
arXiv Detail & Related papers (2020-05-05T08:27:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.