GLAM: Geometry-Guided Local Alignment for Multi-View VLP in Mammography
- URL: http://arxiv.org/abs/2509.10344v1
- Date: Fri, 12 Sep 2025 15:33:18 GMT
- Title: GLAM: Geometry-Guided Local Alignment for Multi-View VLP in Mammography
- Authors: Yuexi Du, Lihui Chen, Nicha C. Dvornek,
- Abstract summary: We propose GLAM: Global and Local Alignment for Multi-view mammography for VLM pretraining using geometry guidance.<n>Our model learns local cross-view alignments and fine-grained local features through joint global and local, visual-visual, and visual-language contrastive learning.
- Score: 5.308584108793016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mammography screening is an essential tool for early detection of breast cancer. The speed and accuracy of mammography interpretation have the potential to be improved with deep learning methods. However, the development of a foundation visual language model (VLM) is hindered by limited data and domain differences between natural and medical images. Existing mammography VLMs, adapted from natural images, often ignore domain-specific characteristics, such as multi-view relationships in mammography. Unlike radiologists who analyze both views together to process ipsilateral correspondence, current methods treat them as independent images or do not properly model the multi-view correspondence learning, losing critical geometric context and resulting in suboptimal prediction. We propose GLAM: Global and Local Alignment for Multi-view mammography for VLM pretraining using geometry guidance. By leveraging the prior knowledge about the multi-view imaging process of mammograms, our model learns local cross-view alignments and fine-grained local features through joint global and local, visual-visual, and visual-language contrastive learning. Pretrained on EMBED [14], one of the largest open mammography datasets, our model outperforms baselines across multiple datasets under different settings.
Related papers
- Optimizing Breast Cancer Detection in Mammograms: A Comprehensive Study of Transfer Learning, Resolution Reduction, and Multi-View Classification [0.0]
Mammography, an X-ray-based imaging technique, plays a crucial role in the early detection of breast cancer.<n>Computer-aided detection and diagnostic methods have been proposed, increasingly leveraging advancements in artificial intelligence and machine learning.<n>In this paper, we evaluate and compare the effectiveness of single-view and multi-view mammogram classification techniques.
arXiv Detail & Related papers (2025-03-25T11:51:21Z) - From Traditional to Deep Learning Approaches in Whole Slide Image Registration: A Methodological Review [7.441179174680556]
Whole slide image (WSI) registration is an essential task for analysing the tumour microenvironment (TME) in histopathology.<n>It involves the alignment of spatial information between WSIs of the same section or serial sections of a tissue sample.<n>The goal is to identify neighbouring nuclei along the Z-axis for creating a 3D image or identifying subclasses of cells in the TME.
arXiv Detail & Related papers (2025-02-26T13:24:16Z) - Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography [4.004641316826348]
We propose one of the first adaptations of the full CLIP model to mammography.<n>We first develop a specialized supervision framework for mammography that leverages its multi-view nature.<n> Lastly, we incorporate a parameter-efficient fine-tuning approach for large language models pre-trained with medical knowledge to address data limitations.
arXiv Detail & Related papers (2024-09-26T17:56:59Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training [99.2891802841936]
We introduce the Med-ST framework for fine-grained spatial and temporal modeling.
For spatial modeling, Med-ST employs the Mixture of View Expert (MoVE) architecture to integrate different visual features from both frontal and lateral views.
For temporal modeling, we propose a novel cross-modal bidirectional cycle consistency objective by forward mapping classification (FMC) and reverse mapping regression (RMR)
arXiv Detail & Related papers (2024-05-30T03:15:09Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - MV-Swin-T: Mammogram Classification with Multi-view Swin Transformer [0.257133335028485]
We propose an innovative multi-view network based on transformers to address challenges in mammographic image classification.
Our approach introduces a novel shifted window-based dynamic attention block, facilitating the effective integration of multi-view information.
arXiv Detail & Related papers (2024-02-26T04:41:04Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Domain Generalization for Mammographic Image Analysis with Contrastive
Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities.
A novel contrastive learning is developed to equip the deep learning models with better style generalization capability.
The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z) - Domain Generalization for Mammography Detection via Multi-style and
Multi-view Contrastive Learning [47.30824944649112]
A new contrastive learning scheme is developed to augment the generalization capability of deep learning model to various vendors with limited resources.
The backbone network is trained with a multi-style and multi-view unsupervised self-learning scheme for the embedding of invariant features to various vendor-styles.
The experimental results suggest that our approach can effectively improve detection performance on both seen and unseen domains.
arXiv Detail & Related papers (2021-11-21T14:29:50Z) - Medical Image Harmonization Using Deep Learning Based Canonical Mapping:
Toward Robust and Generalizable Learning in Imaging [4.396671464565882]
We propose a new paradigm in which data from a diverse range of acquisition conditions are "harmonized" to a common reference domain.
We test this approach on two example problems, namely MRI-based brain age prediction and classification of schizophrenia.
arXiv Detail & Related papers (2020-10-11T22:01:37Z) - Dual Convolutional Neural Networks for Breast Mass Segmentation and
Diagnosis in Mammography [18.979126709943085]
We introduce a novel deep learning framework for mammogram image processing, which computes mass segmentation and simultaneously predict diagnosis results.
Our method is constructed in a dual-path architecture that solves the mapping in a dual-problem manner.
Experimental results show that DualCoreNet achieves the best mammography segmentation and classification simultaneously, outperforming recent state-of-the-art models.
arXiv Detail & Related papers (2020-08-07T02:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.