Toward explainable AI approaches for breast imaging: adapting foundation models to diverse populations
- URL: http://arxiv.org/abs/2511.17828v1
- Date: Fri, 21 Nov 2025 22:45:50 GMT
- Title: Toward explainable AI approaches for breast imaging: adapting foundation models to diverse populations
- Authors: Guilherme J. Cavalcante, José Gabriel A. Moreira, Gabriel A. B. do Nascimento, Vincent Dong, Alex Nguyen, Thaís G. do Rêgo, Yuri Malheiros, Telmo M. Silva Filho, Carla R. Zeballos Torrez, James C. Gee, Anne Marie McCarthy, Andrew D. A. Maidment, Bruno Barufaldi,
- Abstract summary: Foundation models hold promise for specialized medical imaging tasks, though their effectiveness in breast imaging remains underexplored.<n>This study leverages BiomedCLIP as a foundation model to address challenges in model generalization.<n>Using 96,995 images, we compared single-modality (s2D only) and multi-modality training approaches, addressing class imbalance through weighted contrastive learning.
- Score: 4.505150709006532
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models hold promise for specialized medical imaging tasks, though their effectiveness in breast imaging remains underexplored. This study leverages BiomedCLIP as a foundation model to address challenges in model generalization. BiomedCLIP was adapted for automated BI-RADS breast density classification using multi-modality mammographic data (synthesized 2D images, digital mammography, and digital breast tomosynthesis). Using 96,995 images, we compared single-modality (s2D only) and multi-modality training approaches, addressing class imbalance through weighted contrastive learning. Both approaches achieved similar accuracy (multi-modality: 0.74, single-modality: 0.73), with the multi-modality model offering broader applicability across different imaging modalities and higher AUC values consistently above 0.84 across BI-RADS categories. External validation on the RSNA and EMBED datasets showed strong generalization capabilities (AUC range: 0.80-0.93). GradCAM visualizations confirmed consistent and clinically relevant attention patterns, highlighting the models interpretability and robustness. This research underscores the potential of foundation models for breast imaging applications, paving the way for future extensions for diagnostic tasks.
Related papers
- MM-DINOv2: Adapting Foundation Models for Multi-Modal Medical Image Analysis [19.063517827476826]
We introduce MM-DINOv2, a novel framework that adapts the pre-trained vision foundation model DINOv2 for multi-modal medical imaging.<n>Our approach incorporates multi-modal patch embeddings, enabling vision foundation models to effectively process multi-modal imaging data.<n>Our method achieves a Matthews Correlation Coefficient (MCC) of 0.6 on an external test set, surpassing state-of-the-art supervised approaches by +11.1%.
arXiv Detail & Related papers (2025-09-08T12:34:15Z) - On the effectiveness of multimodal privileged knowledge distillation in two vision transformer based diagnostic applications [42.19559765387761]
multimodal privileged knowledge distillation (MMPKD) is a training strategy that utilizes additional modalities to guide a unimodal vision model.<n>We show that MMPKD can improve the resulting attention maps' zero-shot capabilities of localizing ROI in input images.
arXiv Detail & Related papers (2025-08-06T14:14:54Z) - A Hybrid CNN-VSSM model for Multi-View, Multi-Task Mammography Analysis: Robust Diagnosis with Attention-Based Fusion [5.15423063632115]
Early and accurate interpretation of screening mammograms is essential for effective breast cancer detection.<n>Existing AI approaches fall short by focusing on single view inputs or single-task outputs.<n>We propose a novel multi-view, multitask hybrid deep learning framework that processes all four standard mammography views.
arXiv Detail & Related papers (2025-07-22T18:52:18Z) - Multimodal, Multi-Disease Medical Imaging Foundation Model (MerMED-FM) [22.690349928759986]
We developed MerMED-FM, a state-of-the-art multimodal foundation model trained using self-supervised learning and a memory module.<n>MerMED-FM was trained on 3.3 million medical images from over ten specialties and seven modalities.<n>Strong performance was achieved across all modalities, with AUROCs of 0.988 (skin); 0.982 (pathology); 0.951 (US); 0.943 (CT); 0.931 (CFP); 0.894 (CXR)
arXiv Detail & Related papers (2025-06-30T18:50:31Z) - Advancing Stroke Risk Prediction Using a Multi-modal Foundation Model [0.1671198589006117]
Predicting stroke risk is a complex challenge that can be enhanced by integrating diverse clinically available data modalities.<n>This study introduces a self-supervised multimodal framework that combines 3D brain imaging, clinical data, and image-derived features to improve stroke risk prediction prior to onset.
arXiv Detail & Related papers (2024-11-14T22:00:37Z) - PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation.
Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process.
Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z) - Leveraging Medical Foundation Model Features in Graph Neural Network-Based Retrieval of Breast Histopathology Images [1.48419209885019]
We propose a novel attention-based adversarially regularized variational graph autoencoder model for breast histological image retrieval.<n>Our top-performing model, trained with UNI features, achieved average mAP/mMV scores of 96.7%/91.5% and 97.6%/94.2% for the BreakHis and BACH datasets, respectively.
arXiv Detail & Related papers (2024-05-07T11:24:37Z) - Multibranch Generative Models for Multichannel Imaging with an Application to PET/CT Synergistic Reconstruction [42.95604565673447]
This paper presents a novel approach for learned synergistic reconstruction of medical images using multibranch generative models.<n>We demonstrate the efficacy of our approach on both Modified National Institute of Standards and Technology (MNIST) and positron emission tomography (PET)/ computed tomography (CT) datasets.
arXiv Detail & Related papers (2024-04-12T18:21:08Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Classification of lung cancer subtypes on CT images with synthetic
pathological priors [41.75054301525535]
Cross-scale associations exist in the image patterns between the same case's CT images and its pathological images.
We propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on CT images.
arXiv Detail & Related papers (2023-08-09T02:04:05Z) - Domain Generalization for Mammographic Image Analysis with Contrastive
Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities.
A novel contrastive learning is developed to equip the deep learning models with better style generalization capability.
The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z) - SEMPAI: a Self-Enhancing Multi-Photon Artificial Intelligence for
prior-informed assessment of muscle function and pathology [48.54269377408277]
We introduce the Self-Enhancing Multi-Photon Artificial Intelligence (SEMPAI), that integrates hypothesis-driven priors in a data-driven Deep Learning approach.
SEMPAI performs joint learning of several tasks to enable prediction for small datasets.
SEMPAI outperforms state-of-the-art biomarkers in six of seven predictive tasks, including those with scarce data.
arXiv Detail & Related papers (2022-10-28T17:03:04Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - Malignancy Prediction and Lesion Identification from Clinical
Dermatological Images [65.1629311281062]
We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images.
We first identify all lesions present in the image regardless of sub-type or likelihood of malignancy, then it estimates their likelihood of malignancy, and through aggregation, it also generates an image-level likelihood of malignancy.
arXiv Detail & Related papers (2021-04-02T20:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.