Related papers: Learning to Be a Transformer to Pinpoint Anomalies

Learning to Be a Transformer to Pinpoint Anomalies

URL: http://arxiv.org/abs/2407.04092v3
Date: Thu, 26 Jun 2025 17:59:51 GMT
Title: Learning to Be a Transformer to Pinpoint Anomalies
Authors: Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti, Luigi Di Stefano,
Abstract summary: Recent Industrial Anomaly Detection and (IADS) methods process low-resolution images, e.g., 224x224 pixels, obtained by downsampling the original input images.<n>We propose a novel Teacher--Student paradigm to leverage strong pre-trained features while processing high-resolution input images very efficiently.<n>Our method can spot anomalies from high-resolution images and runs way faster than competitors.
Score: 12.442574943138794
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: To efficiently deploy strong, often pre-trained feature extractors, recent Industrial Anomaly Detection and Segmentation (IADS) methods process low-resolution images, e.g., 224x224 pixels, obtained by downsampling the original input images. However, while numerous industrial applications demand the identification of both large and small defects, downsampling the input image to a low resolution may hinder a method's ability to pinpoint tiny anomalies. We propose a novel Teacher--Student paradigm to leverage strong pre-trained features while processing high-resolution input images very efficiently. The core idea concerns training two shallow MLPs (the Students) by nominal images so as to mimic the mappings between the patch embeddings induced by the self-attention layers of a frozen vision Transformer (the Teacher). Indeed, learning these mappings sets forth a challenging pretext task that small-capacity models are unlikely to accomplish on out-of-distribution data such as anomalous images. Our method can spot anomalies from high-resolution images and runs way faster than competitors, achieving state-of-the-art performance on MVTec AD and the best segmentation results on VisA. We also propose novel evaluation metrics to capture robustness to defect size, i.e., the ability to preserve good localisation from large anomalies to tiny ones. Evaluating our method also by these metrics reveals its neatly superior performance.

Related papers

PETALface: Parameter Efficient Transfer Learning for Low-resolution Face Recognition [54.642714288448744]
PETALface is the first work leveraging the powers of PEFT for low resolution face recognition. We introduce two low-rank adaptation modules to the backbone, with weights adjusted based on the input image quality to account for the difference in quality for the gallery and probe images. Experiments demonstrate that the proposed method outperforms full fine-tuning on low-resolution datasets while preserving performance on high-resolution and mixed-quality datasets.
arXiv Detail & Related papers (2024-12-10T18:59:45Z)
One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts. Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation. We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z)
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models [96.76995840807615]
HiRes-LLaVA is a novel framework designed to process any size of high-resolution input without altering the original contextual and geometric information. HiRes-LLaVA comprises two innovative components: (i) a SliceRestore adapter that reconstructs sliced patches into their original form, efficiently extracting both global and local features via down-up-sampling and convolution layers, and (ii) a Self-Mining Sampler to compress the vision tokens based on themselves.
arXiv Detail & Related papers (2024-07-11T17:42:17Z)
Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection [1.0358639819750703]
In unsupervised anomaly detection (UAD) research, it is necessary to develop a computationally efficient and scalable solution. We revisit the reconstruction-by-inpainting approach and rethink to improve it by analyzing strengths and weaknesses. We propose Feature Attenuation of Defective Representation (FADeR) that only employs two layers which attenuates feature information of anomaly reconstruction.
arXiv Detail & Related papers (2024-07-05T15:44:53Z)
SOEDiff: Efficient Distillation for Small Object Editing [9.876242696640205]
A new task known as small object editing (SOE) focuses on text-based image inpainting within a constrained, small-sized area. We introduce a novel training-based approach, SOEDiff, aimed at enhancing the capability of baseline models like StableDiffusion in editing small-sized objects. Our method presents significant improvements on the test dataset collected from MSCOCO and OpenImage.
arXiv Detail & Related papers (2024-05-15T06:14:31Z)
MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection [54.545054873239295]
Deepfakes have recently raised significant trust issues and security concerns among the public. ViT-based methods take advantage of the expressivity of transformers, achieving superior detection performance. This work introduces Mixture-of-Experts modules for Face Forgery Detection (MoE-FFD), a generalized yet parameter-efficient ViT-based approach.
arXiv Detail & Related papers (2024-04-12T13:02:08Z)
Attention to detail: inter-resolution knowledge distillation [1.927195358774599]
Development of computer vision solutions for gigapixel images in digital pathology is hampered by the large size of whole slide images. Recent literature has proposed using knowledge distillation to enhance the model performance at reduced image resolutions. In this work, we propose to distill this information by incorporating attention maps during training.
arXiv Detail & Related papers (2024-01-11T16:16:20Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
One-stage Low-resolution Text Recognition with High-resolution Knowledge Transfer [53.02254290682613]
Current solutions for low-resolution text recognition typically rely on a two-stage pipeline. We propose an efficient and effective knowledge distillation framework to achieve multi-level knowledge transfer. Experiments show that the proposed one-stage pipeline significantly outperforms super-resolution based two-stage frameworks.
arXiv Detail & Related papers (2023-08-05T02:33:45Z)
Learning from Multi-Perception Features for Real-Word Image Super-resolution [87.71135803794519]
We propose a novel SR method called MPF-Net that leverages multiple perceptual features of input images. Our method incorporates a Multi-Perception Feature Extraction (MPFE) module to extract diverse perceptual information. We also introduce a contrastive regularization term (CR) that improves the model's learning capability.
arXiv Detail & Related papers (2023-05-26T07:35:49Z)
UMat: Uncertainty-Aware Single Image High Resolution Material Capture [2.416160525187799]
We propose a learning-based method to recover normals, specularity, and roughness from a single diffuse image of a material. Our method is the first one to deal with the problem of modeling uncertainty in material digitization.
arXiv Detail & Related papers (2023-05-25T17:59:04Z)
Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues. No human annotations are involved in our framework during the whole training process. Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z)
Multi-Scale Aligned Distillation for Low-Resolution Detection [68.96325141432078]
This paper focuses on boosting the performance of low-resolution models by distilling knowledge from a high- or multi-resolution model. On several instance-level detection tasks and datasets, the low-resolution models trained via our approach perform competitively with high-resolution models trained via conventional multi-scale training.
arXiv Detail & Related papers (2021-09-14T12:53:35Z)
Same Same But DifferNet: Semi-Supervised Defect Detection with Normalizing Flows [24.734388664558708]
We propose DifferNet: It leverages the descriptiveness of features extracted by convolutional neural networks to estimate their density. Based on these likelihoods we develop a scoring function that indicates defects. We demonstrate the superior performance over existing approaches on the challenging and newly proposed MVTec AD and Magnetic Tile Defects datasets.
arXiv Detail & Related papers (2020-08-28T10:49:28Z)
Invertible Image Rescaling [118.2653765756915]
We develop an Invertible Rescaling Net (IRN) to produce visually-pleasing low-resolution images. We capture the distribution of the lost information using a latent variable following a specified distribution in the downscaling process.
arXiv Detail & Related papers (2020-05-12T09:55:53Z)
Feature Super-Resolution Based Facial Expression Recognition for Multi-scale Low-Resolution Faces [7.634398926381845]
Super-resolution method is often used to enhance low-resolution images, but the performance on FER task is limited when on images of very low resolution. In this work, inspired by feature super-resolution methods for object detection, we proposed a novel generative adversary network-based super-resolution method for robust facial expression recognition.
arXiv Detail & Related papers (2020-04-05T15:38:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.