MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk Prediction
- URL: http://arxiv.org/abs/2510.26151v1
- Date: Thu, 30 Oct 2025 05:12:29 GMT
- Title: MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk Prediction
- Authors: Shunjie-Fabian Zheng, Hyeonjun Lee, Thijs Kooi, Ali Diba,
- Abstract summary: Vision-Language Models (VLMs) offer a promising solution by enhancing malignancy and data efficiency in medical imaging tasks.<n>This paper introduces a novel Multi-View Mammography and Language Model for breast cancer classification and risk prediction.
- Score: 2.7165660672916787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large annotated datasets are essential for training robust Computer-Aided Diagnosis (CAD) models for breast cancer detection or risk prediction. However, acquiring such datasets with fine-detailed annotation is both costly and time-consuming. Vision-Language Models (VLMs), such as CLIP, which are pre-trained on large image-text pairs, offer a promising solution by enhancing robustness and data efficiency in medical imaging tasks. This paper introduces a novel Multi-View Mammography and Language Model for breast cancer classification and risk prediction, trained on a dataset of paired mammogram images and synthetic radiology reports. Our MV-MLM leverages multi-view supervision to learn rich representations from extensive radiology data by employing cross-modal self-supervision across image-text pairs. This includes multiple views and the corresponding pseudo-radiology reports. We propose a novel joint visual-textual learning strategy to enhance generalization and accuracy performance over different data types and tasks to distinguish breast tissues or cancer characteristics(calcification, mass) and utilize these patterns to understand mammography images and predict cancer risk. We evaluated our method on both private and publicly available datasets, demonstrating that the proposed model achieves state-of-the-art performance in three classification tasks: (1) malignancy classification, (2) subtype classification, and (3) image-based cancer risk prediction. Furthermore, the model exhibits strong data efficiency, outperforming existing fully supervised or VLM baselines while trained on synthetic text reports and without the need for actual radiology reports.
Related papers
- DRIMV_TSK: An Interpretable Surgical Evaluation Model for Incomplete Multi-View Rectal Cancer Data [26.149387171274956]
More data about rectal cancer can be collected with the development of technology.<n>With the development of artificial intelligence, its application in rectal cancer treatment is becoming possible.
arXiv Detail & Related papers (2025-06-21T02:38:45Z) - Optimizing Breast Cancer Detection in Mammograms: A Comprehensive Study of Transfer Learning, Resolution Reduction, and Multi-View Classification [0.0]
Mammography, an X-ray-based imaging technique, remains central to the early detection of breast cancer.<n>Recent advances in artificial intelligence have enabled increasingly sophisticated computer-aided diagnostic methods.<n>Despite this progress, several critical questions remain unanswered.
arXiv Detail & Related papers (2025-03-25T11:51:21Z) - MRGen: Segmentation Data Engine for Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically important imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize data, for training segmentation models for underrepresented modalities.<n>We present MRGen, a data engine for controllable medical image synthesis conditioned on text prompts and segmentation masks.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - Deep BI-RADS Network for Improved Cancer Detection from Mammograms [3.686808512438363]
We introduce a novel multi-modal approach that combines textual BI-RADS lesion descriptors with visual mammogram content.
Our method employs iterative attention layers to effectively fuse these different modalities.
Experiments on the CBIS-DDSM dataset demonstrate substantial improvements across all metrics.
arXiv Detail & Related papers (2024-11-16T21:32:51Z) - 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography [12.159236541184754]
Mammo-CLIP is the first VLM pre-trained on a substantial amount of screening mammogram-report pairs.
experiments on two public datasets demonstrate strong performance in classifying and localizing various mammographic attributes.
arXiv Detail & Related papers (2024-05-20T08:27:39Z) - Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning [26.425784890859738]
MaCo is a masked contrastive chest X-ray foundation model.
It simultaneously achieves fine-grained image understanding and zero-shot learning for a variety of medical imaging tasks.
It is shown to be superior over 10 state-of-the-art approaches across tasks such as classification, segmentation, detection, and phrase grounding.
arXiv Detail & Related papers (2023-09-12T01:29:37Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Metastatic Cancer Outcome Prediction with Injective Multiple Instance
Pooling [1.0965065178451103]
We process two public datasets to set up a benchmark cohort of 341 patient in total for studying outcome prediction of metastatic cancer.
We propose two injective multiple instance pooling functions that are better suited to outcome prediction.
Our results show that multiple instance learning with injective pooling functions can achieve state-of-the-art performance in the non-small-cell lung cancer CT and head and neck CT outcome prediction benchmarking tasks.
arXiv Detail & Related papers (2022-03-09T16:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.