MADPOT: Medical Anomaly Detection with CLIP Adaptation and Partial Optimal Transport
- URL: http://arxiv.org/abs/2507.06733v1
- Date: Wed, 09 Jul 2025 10:45:52 GMT
- Title: MADPOT: Medical Anomaly Detection with CLIP Adaptation and Partial Optimal Transport
- Authors: Mahshid Shiri, Cigdem Beyan, Vittorio Murino,
- Abstract summary: We propose a novel approach combining visual adapters and prompt learning with Partial Optimal Transport (POT) and contrastive learning (CL) to improve CLIP's adaptability to medical images.<n>Our method achieves state-of-the-art results in few-shot, zero-shot, and cross-dataset scenarios without synthetic data or memory banks.
- Score: 14.023527193608142
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Medical anomaly detection (AD) is challenging due to diverse imaging modalities, anatomical variations, and limited labeled data. We propose a novel approach combining visual adapters and prompt learning with Partial Optimal Transport (POT) and contrastive learning (CL) to improve CLIP's adaptability to medical images, particularly for AD. Unlike standard prompt learning, which often yields a single representation, our method employs multiple prompts aligned with local features via POT to capture subtle abnormalities. CL further enforces intra-class cohesion and inter-class separation. Our method achieves state-of-the-art results in few-shot, zero-shot, and cross-dataset scenarios without synthetic data or memory banks. The code is available at https://github.com/mahshid1998/MADPOT.
Related papers
- AF-CLIP: Zero-Shot Anomaly Detection via Anomaly-Focused CLIP Adaptation [8.252046294696585]
We propose AF-CLIP (Anomaly-Focused CLIP) by dramatically enhancing its visual representations to focus on local defects.<n>Our approach introduces a lightweight adapter that emphasizes anomaly-relevant patterns in visual features.<n>Our method is also extended to few-shot scenarios by extra memory banks.
arXiv Detail & Related papers (2025-07-26T13:34:38Z) - MadCLIP: Few-shot Medical Anomaly Detection with CLIP [14.023527193608142]
An innovative few-shot anomaly detection approach is presented, leveraging the pre-trained CLIP model for medical data.<n>A dual-branch design is proposed to separately capture normal and abnormal features through learnable adapters.<n>To improve semantic alignment, learnable text prompts are employed to link visual features.
arXiv Detail & Related papers (2025-06-30T12:56:17Z) - MR-CLIP: Efficient Metadata-Guided Learning of MRI Contrast Representations [0.8430273876996414]
We propose MR-CLIP, a multimodal contrastive learning framework that aligns MR images with their DICOM metadata to learn contrast-aware representations.<n>We demonstrate its effectiveness in cross-modal retrieval and contrast classification, highlighting its scalability and potential for further clinical applications.
arXiv Detail & Related papers (2025-06-23T13:27:31Z) - Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections [50.343419243749054]
Anomaly Detection (AD) involves identifying deviations from normal data distributions.<n>We propose a novel approach that conditions the prompts of the text encoder based on image context extracted from the vision encoder.<n>Our method achieves state-of-the-art performance, improving performance by 2% to 29% across different metrics on 14 datasets.
arXiv Detail & Related papers (2025-04-15T10:42:25Z) - UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation [31.72930277939111]
We propose to transfer representations from CLIP, a large-scale pre-trained vision-language model, to better capture cross-modal semantics between images and texts.<n>To enable efficient adaptation, we introduce UniCrossAdapter, lightweight adapter modules that are incorporated into CLIP and fine-tuned on the target task.
arXiv Detail & Related papers (2025-03-20T08:28:53Z) - Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding [92.32881381717594]
We introduce ALternate Contrastive Decoding (ALCD) to solve hallucination issues in medical information extraction tasks.
ALCD demonstrates significant improvements in resolving hallucination issues compared to conventional decoding methods.
arXiv Detail & Related papers (2024-10-21T07:19:19Z) - Single-Shared Network with Prior-Inspired Loss for Parameter-Efficient Multi-Modal Imaging Skin Lesion Classification [6.195015783344803]
We introduce a multi-modal approach that efficiently integrates multi-scale clinical and dermoscopy features within a single network.
Our method exhibits superiority in both accuracy and model parameters compared to currently advanced methods.
arXiv Detail & Related papers (2024-03-28T08:00:14Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Generating and Weighting Semantically Consistent Sample Pairs for
Ultrasound Contrastive Learning [10.631361618707214]
Well-annotated medical datasets enable deep neural networks (DNNs) to gain strong power in extracting lesion-related features.
Model pre-training based on ImageNet is a common practice to gain better generalization when the data amount is limited.
In this work, we pre-trains on ultrasound (US) domains instead of ImageNet to reduce the domain gap in medical US applications.
arXiv Detail & Related papers (2022-12-08T06:24:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.