Anomaly Detection via Reverse Distillation from One-Class Embedding
- URL: http://arxiv.org/abs/2201.10703v1
- Date: Wed, 26 Jan 2022 01:48:37 GMT
- Title: Anomaly Detection via Reverse Distillation from One-Class Embedding
- Authors: Hanqiu Deng, Xingyu Li
- Abstract summary: We propose a novel T-S model consisting of a teacher encoder and a student decoder.
Instead of receiving raw images directly, the student network takes teacher model's one-class embedding as input.
In addition, we introduce a trainable one-class bottleneck embedding module in our T-S model.
- Score: 2.715884199292287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation (KD) achieves promising results on the challenging
problem of unsupervised anomaly detection (AD).The representation discrepancy
of anomalies in the teacher-student (T-S) model provides essential evidence for
AD. However, using similar or identical architectures to build the teacher and
student models in previous studies hinders the diversity of anomalous
representations. To tackle this problem, we propose a novel T-S model
consisting of a teacher encoder and a student decoder and introduce a simple
yet effective "reverse distillation" paradigm accordingly. Instead of receiving
raw images directly, the student network takes teacher model's one-class
embedding as input and targets to restore the teacher's multiscale
representations. Inherently, knowledge distillation in this study starts from
abstract, high-level presentations to low-level features. In addition, we
introduce a trainable one-class bottleneck embedding (OCBE) module in our T-S
model. The obtained compact embedding effectively preserves essential
information on normal patterns, but abandons anomaly perturbations. Extensive
experimentation on AD and one-class novelty detection benchmarks shows that our
method surpasses SOTA performance, demonstrating our proposed approach's
effectiveness and generalizability.
Related papers
- Applying the Lower-Biased Teacher Model in Semi-Supervised Object Detection [2.6852789176898613]
I present the Lower Biased Teacher model, an enhancement of the Unbiased Teacher model, specifically tailored for semi-supervised object detection tasks.
The primary innovation of this model is the integration of a localization loss into the teacher model, which significantly improves the accuracy of pseudo-label generation.
arXiv Detail & Related papers (2024-09-29T13:33:14Z) - Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - Unleashing the Power of Generic Segmentation Models: A Simple Baseline for Infrared Small Target Detection [57.666055329221194]
We investigate the adaptation of generic segmentation models, such as the Segment Anything Model (SAM), to infrared small object detection tasks.
Our model demonstrates significantly improved performance in both accuracy and throughput compared to existing approaches.
arXiv Detail & Related papers (2024-09-07T05:31:24Z) - Dual-Modeling Decouple Distillation for Unsupervised Anomaly Detection [15.89869857998053]
Over-generalization of the student network to the teacher network may lead to negligible differences in representation capabilities of anomaly.
Existing methods address the possible over-generalization by using differentiated students and teachers from the structural perspective.
We propose Dual-Modeling Decouple Distillation (DMDD) for the unsupervised anomaly detection.
arXiv Detail & Related papers (2024-08-07T16:39:16Z) - Multi-Granularity Semantic Revision for Large Language Model Distillation [66.03746866578274]
We propose a multi-granularity semantic revision method for LLM distillation.
At the sequence level, we propose a sequence correction and re-generation strategy.
At the token level, we design a distribution adaptive clipping Kullback-Leibler loss as the distillation objective function.
At the span level, we leverage the span priors of a sequence to compute the probability correlations within spans, and constrain the teacher and student's probability correlations to be consistent.
arXiv Detail & Related papers (2024-07-14T03:51:49Z) - Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection [19.099643719358692]
We propose a simple yet effective two-stage industrial anomaly detection framework, termed as AAND.
In the first anomaly amplification stage, we propose a novel Residual Anomaly Amplification (RAA) module to advance the pre-trained teacher encoder.
We further employ a reverse distillation paradigm to train a student decoder, in which a novel Hard Knowledge Distillation (HKD) loss is built to better facilitate the reconstruction of normal patterns.
arXiv Detail & Related papers (2024-05-03T13:00:22Z) - Structural Teacher-Student Normality Learning for Multi-Class Anomaly
Detection and Localization [17.543208086457234]
We introduce a novel approach known as Structural Teacher-Student Normality Learning (SNL)
We evaluate our proposed approach on two anomaly detection datasets, MVTecAD and VisA.
Our method surpasses the state-of-the-art distillation-based algorithms by a significant margin of 3.9% and 1.5% on MVTecAD and 1.2% and 2.5% on VisA.
arXiv Detail & Related papers (2024-02-27T00:02:24Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.