Related papers: A Versatile Foundation Model for AI-enabled Mammogram Interpretation

A Versatile Foundation Model for AI-enabled Mammogram Interpretation

URL: http://arxiv.org/abs/2509.20271v1
Date: Wed, 24 Sep 2025 16:04:39 GMT
Title: A Versatile Foundation Model for AI-enabled Mammogram Interpretation
Authors: Fuxiang Huang, Jiayi Zhu, Yunfang Yu, Yu Xie, Yuan Guo, Qingcong Kong, Mingxiang Wu, Xinrui Jiang, Shu Yang, Jiabo Ma, Ziyi Liu, Zhe Xu, Zhixuan Chen, Yujie Tan, Zifan He, Luhui Mao, Xi Wang, Junlin Hou, Lei Zhang, Qiong Luo, Zhenhui Li, Herui Yao, Hao Chen,
Abstract summary: VersaMammo is a versatile foundation model for mammograms designed to overcome limitations.<n>First, a teacher model is trained via self-supervised learning to extract transferable features from unlabeled mammograms.<n>Then, supervised learning combined with knowledge distillation transfers both features and clinical knowledge into VersaMammo.
Score: 36.48796962019666
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Breast cancer is the most commonly diagnosed cancer and the leading cause of cancer-related mortality in women globally. Mammography is essential for the early detection and diagnosis of breast lesions. Despite recent progress in foundation models (FMs) for mammogram analysis, their clinical translation remains constrained by several fundamental limitations, including insufficient diversity in training data, limited model generalizability, and a lack of comprehensive evaluation across clinically relevant tasks. Here, we introduce VersaMammo, a versatile foundation model for mammograms, designed to overcome these limitations. We curated the largest multi-institutional mammogram dataset to date, comprising 706,239 images from 21 sources. To improve generalization, we propose a two-stage pre-training strategy to develop VersaMammo, a mammogram foundation model. First, a teacher model is trained via self-supervised learning to extract transferable features from unlabeled mammograms. Then, supervised learning combined with knowledge distillation transfers both features and clinical knowledge into VersaMammo. To ensure a comprehensive evaluation, we established a benchmark comprising 92 specific tasks, including 68 internal tasks and 24 external validation tasks, spanning 5 major clinical task categories: lesion detection, segmentation, classification, image retrieval, and visual question answering. VersaMammo achieves state-of-the-art performance, ranking first in 50 out of 68 specific internal tasks and 20 out of 24 external validation tasks, with average ranks of 1.5 and 1.2, respectively. These results demonstrate its superior generalization and clinical utility, offering a substantial advancement toward reliable and scalable breast cancer screening and diagnosis.

Related papers

Beyond Benchmarks of IUGC: Rethinking Requirements of Deep Learning Methods for Intrapartum Ultrasound Biometry from Fetal Ultrasound Videos [58.71502465551297]
Intrapartum Ultrasound Grand Challenge (IUGC) co-hosted with MICCAI 2024 was launched.<n>IUGC introduces a clinically oriented multi-task automatic measurement framework that integrates standard plane classification, fetal head-pubic symphysis segmentation, and biometry.<n>The challenge releases the largest multi-center intrapartum ultrasound video dataset to date, comprising 774 videos (68,106 frames) collected from three hospitals.
arXiv Detail & Related papers (2026-02-13T13:28:22Z)
DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs [54.8829900010621]
Multimodal Large Language Models (MLLMs) show promise for medical applications, yet progress in dermatology lags due to limited training data, narrow task coverage, and lack of clinically-grounded supervision.<n>We present a comprehensive framework to address these gaps.<n>First, we introduce DermoInstruct, a large-scale morphology-anchored instruction corpus comprising 211,243 images and 772,675 trajectories across five task formats.<n>Second, we establish DermoBench, a rigorous benchmark evaluating 11 tasks across four clinical axes: Morphology, Diagnosis, Reasoning, and Fairness, including a challenging subset of 3,600
arXiv Detail & Related papers (2026-01-05T07:55:36Z)
Mammo-FM: Breast-specific foundational model for Integrated Mammographic Diagnosis, Prognosis, and Reporting [10.376219551996792]
Mammo-FM is the first foundation model specifically for mammography, pretrained on the largest and most diverse dataset to date.<n>Mammo-FM provides a unified foundation for core clinical tasks in breast imaging, including cancer diagnosis, pathology localization, structured report generation, and cancer risk prognosis within a single framework.
arXiv Detail & Related papers (2025-11-28T20:41:14Z)
An Explainable Hybrid AI Framework for Enhanced Tuberculosis and Symptom Detection [55.35661671061754]
Tuberculosis remains a critical global health issue, particularly in resource-limited and remote areas.<n>We propose a framework which enhances disease and symptom detection on chest X-rays by integrating two supervised heads and a self-supervised head.<n>Our model achieves an accuracy of 98.85% for distinguishing between COVID-19, tuberculosis, and normal cases, and a macro-F1 score of 90.09% for multilabel symptom detection.
arXiv Detail & Related papers (2025-10-21T17:18:55Z)
A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer [54.58205672910646]
RenalCLIP is a visual-language foundation model for characterization, diagnosis and prognosis of renal mass.<n>It achieved better performance and superior generalizability across 10 core tasks spanning the full clinical workflow of kidney cancer.
arXiv Detail & Related papers (2025-08-22T17:48:19Z)
DermINO: Hybrid Pretraining for a Versatile Dermatology Foundation Model [69.20140430678092]
DermNIO is a versatile foundation model for dermatology.<n>It incorporates a novel hybrid pretraining framework that augments the self-supervised learning paradigm.<n>It consistently outperforms state-of-the-art models across a wide range of tasks.
arXiv Detail & Related papers (2025-08-17T00:41:39Z)
Revisiting Invariant Learning for Out-of-Domain Generalization on Multi-Site Mammogram Datasets [8.080495390226115]
This paper reassesses the application of invariant learning for breast cancer risk estimation based on mammograms.<n> Evaluation metrics include accuracy, average precision, and area under the curve.<n>This research examines the advantages, limitations, and challenges of invariant learning for mammogram classification.
arXiv Detail & Related papers (2025-03-09T20:28:04Z)
Pathological Prior-Guided Multiple Instance Learning For Mitigating Catastrophic Forgetting in Breast Cancer Whole Slide Image Classification [50.899861205016265]
We propose a new framework PaGMIL to mitigate catastrophic forgetting in breast cancer WSI classification.<n>Our framework introduces two key components into the common MIL model architecture.<n>We evaluate the continual learning performance of PaGMIL across several public breast cancer datasets.
arXiv Detail & Related papers (2025-03-08T04:51:58Z)
Mammo-Clustering: A Multi-views Tri-level Information Fusion Context Clustering Framework for Localization and Classification in Mammography [13.581151516877238]
Mammography images typically have extremely high resolution, with lesions occupying only a very small area.<n>Down-sampling in neural networks can easily lead to the loss of microcalcifications or subtle structures.<n>We propose a Context Clustering Network with triple information fusion.
arXiv Detail & Related papers (2024-09-23T10:17:13Z)
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [67.09501109871351]
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals. GMAI-MMBench is the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date. It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format.
arXiv Detail & Related papers (2024-08-06T17:59:21Z)
Panoptic Segmentation of Mammograms with Text-To-Image Diffusion Model [1.2130800774416757]
Vision-language diffusion models demonstrated outstanding performance in image generation and transferability to various downstream tasks. We propose leveraging pretrained features from a Stable Diffusion model as inputs to a state-of-the-art panoptic segmentation architecture.
arXiv Detail & Related papers (2024-07-19T14:04:05Z)
Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography [6.537171378333966]
Mammo-CLIP is the first multi-modal framework to process multi-view mammograms and corresponding simple texts. To enhance learning efficiency, plug-and-play adapters are added into CLIP image and text encoders for fine-tuning parameters. Study results show that Mammo-CLIP outperforms the state-of-art cross-view transformer in AUC.
arXiv Detail & Related papers (2024-04-24T16:07:31Z)
MammoDG: Generalisable Deep Learning Breaks the Limits of Cross-Domain Multi-Center Breast Cancer Screening [4.587250201300601]
Mammography poses challenges due to the high variability and patterns in mammograms. MammoDG is a novel deep-learning framework for generalisable and reliable analysis of cross-domain multi-center mammography data.
arXiv Detail & Related papers (2023-08-02T10:10:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.