Progressive Local Alignment for Medical Multimodal Pre-training
- URL: http://arxiv.org/abs/2502.18047v1
- Date: Tue, 25 Feb 2025 10:13:13 GMT
- Title: Progressive Local Alignment for Medical Multimodal Pre-training
- Authors: Huimin Yan, Xian Yang, Liang Bai, Jiye Liang,
- Abstract summary: We propose a contrastive learning-based approach for local alignment to establish meaningful word-pixel relationships.<n>PLAN effectively improves soft region recognition while suppressing noise interference.<n>PLAN surpasses state-of-the-art methods in phrase grounding, image-text retrieval, object detection, and zero-shot classification.
- Score: 24.56496333066882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Local alignment between medical images and text is essential for accurate diagnosis, though it remains challenging due to the absence of natural local pairings and the limitations of rigid region recognition methods. Traditional approaches rely on hard boundaries, which introduce uncertainty, whereas medical imaging demands flexible soft region recognition to handle irregular structures. To overcome these challenges, we propose the Progressive Local Alignment Network (PLAN), which designs a novel contrastive learning-based approach for local alignment to establish meaningful word-pixel relationships and introduces a progressive learning strategy to iteratively refine these relationships, enhancing alignment precision and robustness. By combining these techniques, PLAN effectively improves soft region recognition while suppressing noise interference. Extensive experiments on multiple medical datasets demonstrate that PLAN surpasses state-of-the-art methods in phrase grounding, image-text retrieval, object detection, and zero-shot classification, setting a new benchmark for medical image-text alignment.
Related papers
- RL4Med-DDPO: Reinforcement Learning for Controlled Guidance Towards Diverse Medical Image Generation using Vision-Language Foundation Models [0.7165255458140439]
Vision-Language Foundation Models (VLFM) have shown a tremendous increase in performance in terms of generating high-resolution, photorealistic natural images.
We propose a multi-stage architecture where a pre-trained VLFM provides a cursory semantic understanding, while a reinforcement learning algorithm refines the alignment through an iterative process.
We demonstrate the effectiveness of our method on a medical imaging skin dataset where the generated images exhibit improved generation quality and alignment with prompt over the fine-tuned Stable Diffusion.
arXiv Detail & Related papers (2025-03-20T01:51:05Z) - Multimodal self-supervised learning for lesion localization [41.7046184109176]
A new method is introduced that takes full sentences from textual reports as the basic units for local semantic alignment.
This approach combines chest X-ray images with their corresponding textual reports, performing contrastive learning at both global and local levels.
arXiv Detail & Related papers (2024-01-03T03:33:48Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Local Contrastive Learning for Medical Image Recognition [0.0]
Local Region Contrastive Learning (LRCLR) is a flexible fine-tuning framework that adds layers for significant image region selection and cross-modality interaction.
Our results on an external validation set of chest x-rays suggest that LRCLR identifies significant local image regions and provides meaningful interpretation against radiology text.
arXiv Detail & Related papers (2023-03-24T17:04:26Z) - Joint segmentation and discontinuity-preserving deformable registration:
Application to cardiac cine-MR images [74.99415008543276]
Most deep learning-based registration methods assume that the deformation fields are smooth and continuous everywhere in the image domain.
We propose a novel discontinuity-preserving image registration method to tackle this challenge, which ensures globally discontinuous and locally smooth deformation fields.
A co-attention block is proposed in the segmentation component of the network to learn the structural correlations in the input images.
We evaluate our method on the task of intra-subject-temporal image registration using large-scale cinematic cardiac magnetic resonance image sequences.
arXiv Detail & Related papers (2022-11-24T23:45:01Z) - Image-Specific Information Suppression and Implicit Local Alignment for
Text-based Person Search [61.24539128142504]
Text-based person search (TBPS) is a challenging task that aims to search pedestrian images with the same identity from an image gallery given a query text.
Most existing methods rely on explicitly generated local parts to model fine-grained correspondence between modalities.
We propose an efficient joint Multi-level Alignment Network (MANet) for TBPS, which can learn aligned image/text feature representations between modalities at multiple levels.
arXiv Detail & Related papers (2022-08-30T16:14:18Z) - Cross-level Contrastive Learning and Consistency Constraint for
Semi-supervised Medical Image Segmentation [46.678279106837294]
We propose a cross-level constrastive learning scheme to enhance representation capacity for local features in semi-supervised medical image segmentation.
With the help of the cross-level contrastive learning and consistency constraint, the unlabelled data can be effectively explored to improve segmentation performance.
arXiv Detail & Related papers (2022-02-08T15:12:11Z) - A Deep Discontinuity-Preserving Image Registration Network [73.03885837923599]
Most deep learning-based registration methods assume that the desired deformation fields are globally smooth and continuous.
We propose a weakly-supervised Deep Discontinuity-preserving Image Registration network (DDIR) to obtain better registration performance and realistic deformation fields.
We demonstrate that our method achieves significant improvements in registration accuracy and predicts more realistic deformations, in registration experiments on cardiac magnetic resonance (MR) images.
arXiv Detail & Related papers (2021-07-09T13:35:59Z) - Explaining Clinical Decision Support Systems in Medical Imaging using
Cycle-Consistent Activation Maximization [112.2628296775395]
Clinical decision support using deep neural networks has become a topic of steadily growing interest.
clinicians are often hesitant to adopt the technology because its underlying decision-making process is considered to be intransparent and difficult to comprehend.
We propose a novel decision explanation scheme based on CycleGAN activation which generates high-quality visualizations of classifier decisions even in smaller data sets.
arXiv Detail & Related papers (2020-10-09T14:39:27Z) - Interpreting Medical Image Classifiers by Optimization Based
Counterfactual Impact Analysis [2.512212190779389]
We present a model saliency mapping framework tailored to medical imaging.
We replace techniques with a strong neighborhood conditioned inpainting approach, which avoids implausible artefacts.
arXiv Detail & Related papers (2020-04-03T14:59:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.