Towards Continual Visual Anomaly Detection in the Medical Domain
- URL: http://arxiv.org/abs/2508.18013v1
- Date: Mon, 25 Aug 2025 13:28:15 GMT
- Title: Towards Continual Visual Anomaly Detection in the Medical Domain
- Authors: Manuel Barusco, Francesco Borsatti, Nicola Beda, Davide Dalle Pezze, Gian Antonio Susto,
- Abstract summary: Visual Anomaly Detection (VAD) seeks to identify abnormal images and precisely localize the corresponding anomalous regions.<n>Continual Learning (CL) provides a framework to incrementally adapt models while preserving previously acquired knowledge.<n>This study explores for the first time the application of VAD models in a CL scenario for the medical field.
- Score: 11.262875405161417
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual Anomaly Detection (VAD) seeks to identify abnormal images and precisely localize the corresponding anomalous regions, relying solely on normal data during training. This approach has proven essential in domains such as manufacturing and, more recently, in the medical field, where accurate and explainable detection is critical. Despite its importance, the impact of evolving input data distributions over time has received limited attention, even though such changes can significantly degrade model performance. In particular, given the dynamic and evolving nature of medical imaging data, Continual Learning (CL) provides a natural and effective framework to incrementally adapt models while preserving previously acquired knowledge. This study explores for the first time the application of VAD models in a CL scenario for the medical field. In this work, we utilize a CL version of the well-established PatchCore model, called PatchCoreCL, and evaluate its performance using BMAD, a real-world medical imaging dataset with both image-level and pixel-level annotations. Our results demonstrate that PatchCoreCL is an effective solution, achieving performance comparable to the task-specific models, with a forgetting value less than a 1%, highlighting the feasibility and potential of CL for adaptive VAD in medical imaging.
Related papers
- NEURAL: Attention-Guided Pruning for Unified Multimodal Resource-Constrained Clinical Evaluation [6.253771639590563]
NEURAL is a novel framework that addresses the storage and transmission challenges of medical imaging data.<n>Our approach repurposes cross-attention scores between the image and its radiological report to structurally prune chest X-rays.<n>NEURAL achieves a 93.4-97.7% reduction in image data size while maintaining a high diagnostic performance of 0.88-0.95 AUC.
arXiv Detail & Related papers (2025-08-13T11:08:09Z) - PRETI: Patient-Aware Retinal Foundation Model via Metadata-Guided Representation Learning [3.771396977579353]
PRETI is a retinal foundation model that integrates metadata-aware learning with robust self-supervised representation learning.<n>We construct patient-level data pairs, associating images from the same individual to improve robustness against non-clinical variations.<n>Experiments demonstrate PRETI achieves state-of-the-art results across diverse diseases and biomarker predictions.
arXiv Detail & Related papers (2025-05-18T04:59:03Z) - Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detection [50.343419243749054]
Anomaly detection is critical in fields such as medical diagnostics and industrial defect detection.<n> CLIP's coarse-grained image-text alignment limits localization and detection performance for fine-grained anomalies.<n>Crane improves the state-of-the-art ZSAD from 2% to 28%, at both image and pixel levels, while remaining competitive in inference speed.
arXiv Detail & Related papers (2025-04-15T10:42:25Z) - Text-guided Foundation Model Adaptation for Long-Tailed Medical Image Classification [4.6651139122498]
In medical contexts, the imbalanced data distribution in long-tailed datasets, due to scarce labels for rare diseases, greatly impairs the diagnostic accuracy of deep learning models.
Recent multimodal text-image supervised foundation models offer new solutions to data scarcity through effective representation learning.
We propose a novel Text-guided Foundation model Adaptation for Long-Tailed medical image classification (TFA-LT)
Our method achieves an accuracy improvement of up to 27.1%, highlighting the substantial potential of foundation model adaptation in this area.
arXiv Detail & Related papers (2024-08-27T04:18:18Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Enhancing and Adapting in the Clinic: Source-free Unsupervised Domain
Adaptation for Medical Image Enhancement [34.11633495477596]
We propose an algorithm for source-free unsupervised domain adaptive medical image enhancement (SAME)
A structure-preserving enhancement network is first constructed to learn a robust source model from synthesized training data.
A pseudo-label picker is developed to boost the knowledge distillation of enhancement tasks.
arXiv Detail & Related papers (2023-12-03T10:01:59Z) - On the Out of Distribution Robustness of Foundation Models in Medical
Image Segmentation [47.95611203419802]
Foundations for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach.
We compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset.
We further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution data.
arXiv Detail & Related papers (2023-11-18T14:52:10Z) - ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic
Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment.
The scarcity of annotated data limits the effectiveness and generalization of existing methods.
We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.