Related papers: MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval

MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval

URL: http://arxiv.org/abs/2602.00522v1
Date: Sat, 31 Jan 2026 05:30:57 GMT
Title: MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval
Authors: Chaoran Xu, Chengkan Lv, Qiyu Chen, Feng Zhang, Zhengtao Zhang,
Abstract summary: Memory-Retrieval Anomaly Detection method (MRAD) is a unified framework that replaces parametric fitting with a direct memory retrieval.<n>Across 16 industrial and medical datasets, the MRAD framework consistently demonstrates superior performance.
Score: 16.654541753670348
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Zero-shot anomaly detection (ZSAD) often leverages pretrained vision or vision-language models, but many existing methods use prompt learning or complex modeling to fit the data distribution, resulting in high training or inference cost and limited cross-domain stability. To address these limitations, we propose Memory-Retrieval Anomaly Detection method (MRAD), a unified framework that replaces parametric fitting with a direct memory retrieval. The train-free base model, MRAD-TF, freezes the CLIP image encoder and constructs a two-level memory bank (image-level and pixel-level) from auxiliary data, where feature-label pairs are explicitly stored as keys and values. During inference, anomaly scores are obtained directly by similarity retrieval over the memory bank. Based on the MRAD-TF, we further propose two lightweight variants as enhancements: (i) MRAD-FT fine-tunes the retrieval metric with two linear layers to enhance the discriminability between normal and anomaly; (ii) MRAD-CLIP injects the normal and anomalous region priors from the MRAD-FT as dynamic biases into CLIP's learnable text prompts, strengthening generalization to unseen categories. Across 16 industrial and medical datasets, the MRAD framework consistently demonstrates superior performance in anomaly classification and segmentation, under both train-free and training-based settings. Our work shows that fully leveraging the empirical distribution of raw data, rather than relying only on model fitting, can achieve stronger anomaly detection performance. The code will be publicly released at https://github.com/CROVO1026/MRAD.

Related papers

Spatial Autoregressive Modeling of DINOv3 Embeddings for Unsupervised Anomaly Detection [15.896078006029475]
DINO models provide rich patch-level representations that have recently enabled strong performance in unsupervised anomaly detection (UAD)<n>Most existing methods extract patch embeddings from normal'' images and model them independently, ignoring spatial and neighborhood relationships between patches.<n>We propose a framework that explicitly models spatial and contextual dependencies between patch embeddings using a 2D autoregressive (AR) model.
arXiv Detail & Related papers (2026-03-03T13:30:33Z)
Is Training Necessary for Anomaly Detection? [12.22745989422548]
Current anomaly detection methods rely on training encoder-decoder models to reconstruct anomalies.<n>We propose Retrieval-based Anomaly Detection (RAD)<n>RAD is a training-free approach that stores anomaly-free features in a memory and detects anomalies through multi-level retrieval.
arXiv Detail & Related papers (2026-01-30T09:40:42Z)
Source-Free Object Detection with Detection Transformer [59.33653163035064]
Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data.<n>Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR)<n>In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs.
arXiv Detail & Related papers (2025-10-13T07:35:04Z)
Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech [51.14752758616364]
Speech-based depression detection (SDD) is a promising, non-invasive alternative to traditional clinical assessments.<n>We propose HAREN-CTC, a novel architecture that integrates multi-layer SSL features using cross-attention within a multitask learning framework.<n>The model achieves state-of-the-art macro F1-scores of 0.81 on DAIC-WOZ and 0.82 on MODMA, outperforming prior methods across both evaluation scenarios.
arXiv Detail & Related papers (2025-10-05T09:32:12Z)
AHDMIL: Asymmetric Hierarchical Distillation Multi-Instance Learning for Fast and Accurate Whole-Slide Image Classification [51.525891360380285]
AHDMIL is an Asymmetric Hierarchical Distillation Multi-Instance Learning framework.<n>It eliminates irrelevant patches through a two-step training process.<n>It consistently outperforms previous state-of-the-art methods in both classification performance and inference speed.
arXiv Detail & Related papers (2025-08-07T07:47:16Z)
MadCLIP: Few-shot Medical Anomaly Detection with CLIP [14.023527193608142]
An innovative few-shot anomaly detection approach is presented, leveraging the pre-trained CLIP model for medical data.<n>A dual-branch design is proposed to separately capture normal and abnormal features through learnable adapters.<n>To improve semantic alignment, learnable text prompts are employed to link visual features.
arXiv Detail & Related papers (2025-06-30T12:56:17Z)
Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detection [50.343419243749054]
Anomaly detection is critical in fields such as medical diagnostics and industrial defect detection.<n> CLIP's coarse-grained image-text alignment limits localization and detection performance for fine-grained anomalies.<n>Crane improves the state-of-the-art ZSAD from 2% to 28%, at both image and pixel levels, while remaining competitive in inference speed.
arXiv Detail & Related papers (2025-04-15T10:42:25Z)
DMAD: Dual Memory Bank for Real-World Anomaly Detection [90.97573828481832]
We propose a new framework named Dual Memory bank enhanced representation learning for Anomaly Detection (DMAD) DMAD employs a dual memory bank to calculate feature distance and feature attention between normal and abnormal patterns. We evaluate DMAD on the MVTec-AD and VisA datasets.
arXiv Detail & Related papers (2024-03-19T02:16:32Z)
Multi-level Memory-augmented Appearance-Motion Correspondence Framework for Video Anomaly Detection [1.9511777443446219]
We propose a multi-level memory-augmented appearance-motion correspondence framework. The latent correspondence between appearance and motion is explored via appearance-motion semantics alignment and semantics replacement training. Our framework outperforms the state-of-the-art methods, achieving AUCs of 99.6%, 93.8%, and 76.3% on UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets.
arXiv Detail & Related papers (2023-03-09T08:43:06Z)
Dual Memory Units with Uncertainty Regulation for Weakly Supervised Video Anomaly Detection [15.991784541576788]
Existing approaches, both video and segment-level label oriented, mainly focus on extracting representations for anomaly data. We propose an Uncertainty Regulated Dual Memory Units (UR-DMU) model to learn both the representations of normal data and discriminative features of abnormal data. Our method outperforms the state-of-the-art methods by a sizable margin.
arXiv Detail & Related papers (2023-02-10T10:39:40Z)
Adaptive Memory Networks with Self-supervised Learning for Unsupervised Anomaly Detection [54.76993389109327]
Unsupervised anomaly detection aims to build models to detect unseen anomalies by only training on the normal data. We propose a novel approach called Adaptive Memory Network with Self-supervised Learning (AMSL) to address these challenges. AMSL incorporates a self-supervised learning module to learn general normal patterns and an adaptive memory fusion module to learn rich feature representations.
arXiv Detail & Related papers (2022-01-03T03:40:21Z)
Discriminative-Generative Dual Memory Video Anomaly Detection [81.09977516403411]
Recently, people tried to use a few anomalies for video anomaly detection (VAD) instead of only normal data during the training process. We propose a DiscRiminative-gEnerative duAl Memory (DREAM) anomaly detection model to take advantage of a few anomalies and solve data imbalance.
arXiv Detail & Related papers (2021-04-29T15:49:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.