A SAM-guided Two-stream Lightweight Model for Anomaly Detection
- URL: http://arxiv.org/abs/2402.19145v1
- Date: Thu, 29 Feb 2024 13:29:10 GMT
- Title: A SAM-guided Two-stream Lightweight Model for Anomaly Detection
- Authors: Chenghao Li, Lei Qi, Xin Geng
- Abstract summary: We propose a SAM-guided Two-stream Lightweight Model for unsupervised anomaly detection (STLM)
Our experiments conducted on MVTec AD benchmark show that STLM, with about 16M parameters and achieving an inference time in 20ms, competes effectively with state-of-the-art methods.
- Score: 50.28310943263051
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In industrial anomaly detection, model efficiency and mobile-friendliness
become the primary concerns in real-world applications. Simultaneously, the
impressive generalization capabilities of Segment Anything (SAM) have garnered
broad academic attention, making it an ideal choice for localizing unseen
anomalies and diverse real-world patterns. In this paper, considering these two
critical factors, we propose a SAM-guided Two-stream Lightweight Model for
unsupervised anomaly detection (STLM) that not only aligns with the two
practical application requirements but also harnesses the robust generalization
capabilities of SAM. We employ two lightweight image encoders, i.e., our
two-stream lightweight module, guided by SAM's knowledge. To be specific, one
stream is trained to generate discriminative and general feature
representations in both normal and anomalous regions, while the other stream
reconstructs the same images without anomalies, which effectively enhances the
differentiation of two-stream representations when facing anomalous regions.
Furthermore, we employ a shared mask decoder and a feature aggregation module
to generate anomaly maps. Our experiments conducted on MVTec AD benchmark show
that STLM, with about 16M parameters and achieving an inference time in 20ms,
competes effectively with state-of-the-art methods in terms of performance,
98.26% on pixel-level AUC and 94.92% on PRO. We further experiment on more
difficult datasets, e.g., VisA and DAGM, to demonstrate the effectiveness and
generalizability of STLM.
Related papers
- SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery [54.866490321241905]
Model merging-based multitask learning (MTL) offers a promising approach for performing MTL by merging multiple expert models.
In this paper, we examine the merged model's representation distribution and uncover a critical issue of "representation bias"
This bias arises from a significant distribution gap between the representations of the merged and expert models, leading to the suboptimal performance of the merged MTL model.
arXiv Detail & Related papers (2024-10-18T11:49:40Z) - Adapt CLIP as Aggregation Instructor for Image Dehazing [17.29370328189668]
Most dehazing methods suffer from limited receptive field and do not explore the rich semantic prior encapsulated in vision-language models.
We introduce CLIPHaze, a pioneering hybrid framework that synergizes the efficient global modeling of Mamba with the prior knowledge and zero-shot capabilities of CLIP.
Our method employs parallel state space model and window-based self-attention to obtain global contextual dependency and local fine-grained perception.
arXiv Detail & Related papers (2024-08-22T11:51:50Z) - SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention [0.0]
The Segment Anything Model (SAM) has gained notable recognition for its exceptional performance in image segmentation.
Camouflaged objects typically blend into the background, making them difficult to distinguish in still images.
We propose a new method called the SAM Spider Module (SAM-PM) to overcome these challenges.
Our method effectively incorporates temporal consistency and domain-specific expertise into the segmentation network with an addition of less than 1% of SAM's parameters.
arXiv Detail & Related papers (2024-06-09T14:33:38Z) - ASAM: Boosting Segment Anything Model with Adversarial Tuning [9.566046692165884]
This paper introduces ASAM, a novel methodology that amplifies a foundation model's performance through adversarial tuning.
We harness the potential of natural adversarial examples, inspired by their successful implementation in natural language processing.
Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations.
arXiv Detail & Related papers (2024-05-01T00:13:05Z) - DMAD: Dual Memory Bank for Real-World Anomaly Detection [90.97573828481832]
We propose a new framework named Dual Memory bank enhanced representation learning for Anomaly Detection (DMAD)
DMAD employs a dual memory bank to calculate feature distance and feature attention between normal and abnormal patterns.
We evaluate DMAD on the MVTec-AD and VisA datasets.
arXiv Detail & Related papers (2024-03-19T02:16:32Z) - WSI-SAM: Multi-resolution Segment Anything Model (SAM) for histopathology whole-slide images [8.179859593451285]
We present WSI-SAM, enhancing Segment Anything Model (SAM) with precise object segmentation capabilities for histopathology images.
To fully exploit pretrained knowledge while minimizing training overhead, we keep SAM frozen, introducing only minimal extra parameters.
Our model outperforms SAM by 4.1 and 2.5 percent points on a ductal carcinoma in situ (DCIS) segmentation tasks and breast cancer metastasis segmentation task.
arXiv Detail & Related papers (2024-03-14T10:30:43Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - Dual Memory Units with Uncertainty Regulation for Weakly Supervised
Video Anomaly Detection [15.991784541576788]
Existing approaches, both video and segment-level label oriented, mainly focus on extracting representations for anomaly data.
We propose an Uncertainty Regulated Dual Memory Units (UR-DMU) model to learn both the representations of normal data and discriminative features of abnormal data.
Our method outperforms the state-of-the-art methods by a sizable margin.
arXiv Detail & Related papers (2023-02-10T10:39:40Z) - Prototypical Residual Networks for Anomaly Detection and Localization [80.5730594002466]
We propose a framework called Prototypical Residual Network (PRN)
PRN learns feature residuals of varying scales and sizes between anomalous and normal patterns to accurately reconstruct the segmentation maps of anomalous regions.
We present a variety of anomaly generation strategies that consider both seen and unseen appearance variance to enlarge and diversify anomalies.
arXiv Detail & Related papers (2022-12-05T05:03:46Z) - MIST: Multiple Instance Self-Training Framework for Video Anomaly
Detection [76.80153360498797]
We develop a multiple instance self-training framework (MIST) to efficiently refine task-specific discriminative representations.
MIST is composed of 1) a multiple instance pseudo label generator, which adapts a sparse continuous sampling strategy to produce more reliable clip-level pseudo labels, and 2) a self-guided attention boosted feature encoder.
Our method performs comparably to or even better than existing supervised and weakly supervised methods, specifically obtaining a frame-level AUC 94.83% on ShanghaiTech.
arXiv Detail & Related papers (2021-04-04T15:47:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.