Mammo-FM: Breast-specific foundational model for Integrated Mammographic Diagnosis, Prognosis, and Reporting
- URL: http://arxiv.org/abs/2512.00198v1
- Date: Fri, 28 Nov 2025 20:41:14 GMT
- Title: Mammo-FM: Breast-specific foundational model for Integrated Mammographic Diagnosis, Prognosis, and Reporting
- Authors: Shantanu Ghosh, Vedant Parthesh Joshi, Rayan Syed, Aya Kassem, Abhishek Varshney, Payel Basak, Weicheng Dai, Judy Wawira Gichoya, Hari M. Trivedi, Imon Banerjee, Shyam Visweswaran, Clare B. Poynton, Kayhan Batmanghelich,
- Abstract summary: Mammo-FM is the first foundation model specifically for mammography, pretrained on the largest and most diverse dataset to date.<n>Mammo-FM provides a unified foundation for core clinical tasks in breast imaging, including cancer diagnosis, pathology localization, structured report generation, and cancer risk prognosis within a single framework.
- Score: 10.376219551996792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Breast cancer is one of the leading causes of death among women worldwide. We introduce Mammo-FM, the first foundation model specifically for mammography, pretrained on the largest and most diverse dataset to date - 140,677 patients (821,326 mammograms) across four U.S. institutions. Mammo-FM provides a unified foundation for core clinical tasks in breast imaging, including cancer diagnosis, pathology localization, structured report generation, and cancer risk prognosis within a single framework. Its alignment between images and text enables both visual and textual interpretability, improving transparency and clinical auditability, which are essential for real-world adoption. We rigorously evaluate Mammo-FM across diagnosis, prognosis, and report-generation tasks in in- and out-of-distribution datasets. Despite operating on native-resolution mammograms and using only one-third of the parameters of state-of-the-art generalist FMs, Mammo-FM consistently outperforms them across multiple public and private benchmarks. These results highlight the efficiency and value of domain-specific foundation models designed around the full spectrum of tasks within a clinical domain and emphasize the importance of rigorous, domain-aligned evaluation.
Related papers
- The MAMA-MIA Challenge: Advancing Generalizability and Fairness in Breast MRI Tumor Segmentation and Treatment Response Prediction [26.87720191540453]
The MAMA-MIA Challenge aims to develop a large-scale benchmark that jointly evaluates primary tumor segmentation and prediction of pathologic complete response.<n>The training cohort comprised 1,506 patients from multiple institutions in the United States, while evaluation was conducted on an external test set of 574 patients from three independent European centers.<n>Results demonstrate substantial performance variability under external testing and reveal trade-offs between overall accuracy and subgroup fairness.
arXiv Detail & Related papers (2026-03-01T20:06:30Z) - A Geometric Multimodal Foundation Model Integrating Bp-MRI and Clinical Reports in Prostate Cancer Classification [6.053648545114842]
Prostate cancer (PCa) is one of the most common cancers in men worldwide.<n>Most existing computer-aided diagnosis methods focus on imaging-based models.<n>We propose a multimodal geometric Foundation Model (FM) that learns representations from bp-MRI and clinical reports.
arXiv Detail & Related papers (2026-01-30T15:21:31Z) - DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs [54.8829900010621]
Multimodal Large Language Models (MLLMs) show promise for medical applications, yet progress in dermatology lags due to limited training data, narrow task coverage, and lack of clinically-grounded supervision.<n>We present a comprehensive framework to address these gaps.<n>First, we introduce DermoInstruct, a large-scale morphology-anchored instruction corpus comprising 211,243 images and 772,675 trajectories across five task formats.<n>Second, we establish DermoBench, a rigorous benchmark evaluating 11 tasks across four clinical axes: Morphology, Diagnosis, Reasoning, and Fairness, including a challenging subset of 3,600
arXiv Detail & Related papers (2026-01-05T07:55:36Z) - Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation [83.02147613524032]
We introduce FetalMind, a medical AI system tailored to fetal ultrasound for both report generation and diagnosis.<n>We propose Salient Epistemic Disentanglement (SED), which injects an expert-curated bipartite graph into the model to decouple view-disease associations.<n>FetalMind outperforms open- and closed-source baselines across all gestational stages, achieving +14% average gains and +61.2% higher accuracy on critical conditions.
arXiv Detail & Related papers (2025-10-14T19:57:03Z) - A Versatile Foundation Model for AI-enabled Mammogram Interpretation [36.48796962019666]
VersaMammo is a versatile foundation model for mammograms designed to overcome limitations.<n>First, a teacher model is trained via self-supervised learning to extract transferable features from unlabeled mammograms.<n>Then, supervised learning combined with knowledge distillation transfers both features and clinical knowledge into VersaMammo.
arXiv Detail & Related papers (2025-09-24T16:04:39Z) - ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification [57.22053411719822]
ChestX-Reasoner is a radiology diagnosis MLLM designed to leverage process supervision mined directly from clinical reports.<n>Our two-stage training framework combines supervised fine-tuning and reinforcement learning guided by process rewards to better align model reasoning with clinical standards.
arXiv Detail & Related papers (2025-04-29T16:48:23Z) - Integrating AI for Human-Centric Breast Cancer Diagnostics: A Multi-Scale and Multi-View Swin Transformer Framework [5.211860566766601]
The paper focuses on the integration of AI within a Human-Centric workflow to enhance breast cancer diagnostics.<n>We propose a hybrid, multi-scale and multi-view Swin Transformer-based framework (MSMV-Swin) that enhances diagnostic robustness and accuracy.
arXiv Detail & Related papers (2025-03-17T15:48:56Z) - Panoptic Segmentation of Mammograms with Text-To-Image Diffusion Model [1.2130800774416757]
Vision-language diffusion models demonstrated outstanding performance in image generation and transferability to various downstream tasks.
We propose leveraging pretrained features from a Stable Diffusion model as inputs to a state-of-the-art panoptic segmentation architecture.
arXiv Detail & Related papers (2024-07-19T14:04:05Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis [56.57177181778517]
RadGenome-Chest CT is a large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE.
We leverage the latest powerful universal segmentation and large language models to extend the original datasets.
arXiv Detail & Related papers (2024-04-25T17:11:37Z) - Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation [48.107348956719775]
We introduce Mask-Enhanced SAM (M-SAM), an innovative architecture tailored for 3D tumor lesion segmentation.
We propose a novel Mask-Enhanced Adapter (MEA) within M-SAM that enriches the semantic information of medical images with positional data from coarse segmentation masks.
Our M-SAM achieves high segmentation accuracy and also exhibits robust generalization.
arXiv Detail & Related papers (2024-03-09T13:37:02Z) - C^2M-DoT: Cross-modal consistent multi-view medical report generation
with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT)
C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z) - MammoDG: Generalisable Deep Learning Breaks the Limits of Cross-Domain
Multi-Center Breast Cancer Screening [4.587250201300601]
Mammography poses challenges due to the high variability and patterns in mammograms.
MammoDG is a novel deep-learning framework for generalisable and reliable analysis of cross-domain multi-center mammography data.
arXiv Detail & Related papers (2023-08-02T10:10:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.