SCL-VI: Self-supervised Context Learning for Visual Inspection of
Industrial Defects
- URL: http://arxiv.org/abs/2311.06504v2
- Date: Tue, 21 Nov 2023 14:57:09 GMT
- Title: SCL-VI: Self-supervised Context Learning for Visual Inspection of
Industrial Defects
- Authors: Peng Wang, Haiming Yao, Wenyong Yu
- Abstract summary: We present a novel self-supervised learning algorithm designed to derive an optimal encoder by tackling the renowned jigsaw puzzle.
Our approach involves dividing the target image into nine patches, tasking the encoder with predicting the relative position relationships between any two patches to extract rich semantics.
- Score: 4.487908181569429
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The unsupervised visual inspection of defects in industrial products poses a
significant challenge due to substantial variations in product surfaces.
Current unsupervised models struggle to strike a balance between detecting
texture and object defects, lacking the capacity to discern latent
representations and intricate features. In this paper, we present a novel
self-supervised learning algorithm designed to derive an optimal encoder by
tackling the renowned jigsaw puzzle. Our approach involves dividing the target
image into nine patches, tasking the encoder with predicting the relative
position relationships between any two patches to extract rich semantics.
Subsequently, we introduce an affinity-augmentation method to accentuate
differences between normal and abnormal latent representations. Leveraging the
classic support vector data description algorithm yields final detection
results. Experimental outcomes demonstrate that our proposed method achieves
outstanding detection and segmentation performance on the widely used MVTec AD
dataset, with rates of 95.8% and 96.8%, respectively, establishing a
state-of-the-art benchmark for both texture and object defects. Comprehensive
experimentation underscores the effectiveness of our approach in diverse
industrial applications.
Related papers
- Patch-aware Vector Quantized Codebook Learning for Unsupervised Visual Defect Detection [4.081433571732692]
Unsupervised visual defect detection is critical in industrial applications.
We propose a novel approach using an enhanced VQ-VAE framework optimized for unsupervised defect detection.
arXiv Detail & Related papers (2025-01-15T22:26:26Z) - Exploring Large Vision-Language Models for Robust and Efficient Industrial Anomaly Detection [4.691083532629246]
We propose Vision-Language Anomaly Detection via Contrastive Cross-Modal Training (CLAD)
CLAD aligns visual and textual features into a shared embedding space using contrastive learning.
We demonstrate that CLAD outperforms state-of-the-art methods in both image-level anomaly detection and pixel-level anomaly localization.
arXiv Detail & Related papers (2024-12-01T17:00:43Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.
In this paper, we investigate how detection performance varies across model backbones, types, and datasets.
We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - Improving Vision Anomaly Detection with the Guidance of Language
Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view.
We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue.
To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z) - ReConPatch : Contrastive Patch Representation Learning for Industrial
Anomaly Detection [5.998761048990598]
We introduce ReConPatch, which constructs discriminative features for anomaly detection by training a linear modulation of patch features extracted from the pre-trained model.
Our method achieves the state-of-the-art anomaly detection performance (99.72%) for the widely used and challenging MVTec AD dataset.
arXiv Detail & Related papers (2023-05-26T07:59:36Z) - Reference-based Defect Detection Network [57.89399576743665]
The first issue is the texture shift which means a trained defect detector model will be easily affected by unseen texture.
The second issue is partial visual confusion which indicates that a partial defect box is visually similar with a complete box.
We propose a Reference-based Defect Detection Network (RDDN) to tackle these two problems.
arXiv Detail & Related papers (2021-08-10T05:44:23Z) - Contrastive Predictive Coding for Anomaly Detection [0.0]
Contrastive Predictive Coding model (arXiv:1807.03748) used for anomaly detection and segmentation.
We show that its patch-wise contrastive loss can directly be interpreted as an anomaly score.
Model achieves promising results for both anomaly detection and segmentation on the MVTec-AD dataset.
arXiv Detail & Related papers (2021-07-16T11:04:35Z) - Zero-sample surface defect detection and classification based on
semantic feedback neural network [13.796631421521765]
We propose an Ensemble Co-training algorithm, which adaptively reduces the prediction error in image tag embedding from multiple angles.
Various experiments conducted on the zero-shot dataset and the cylinder liner dataset in the industrial field provide competitive results.
arXiv Detail & Related papers (2021-06-15T08:26:36Z) - Generative Partial Visual-Tactile Fused Object Clustering [81.17645983141773]
We propose a Generative Partial Visual-Tactile Fused (i.e., GPVTF) framework for object clustering.
A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality.
To the end, two pseudo-label based KL-divergence losses are employed to update the corresponding modality-specific encoders.
arXiv Detail & Related papers (2020-12-28T02:37:03Z) - ESAD: End-to-end Deep Semi-supervised Anomaly Detection [85.81138474858197]
We propose a new objective function that measures the KL-divergence between normal and anomalous data.
The proposed method significantly outperforms several state-of-the-arts on multiple benchmark datasets.
arXiv Detail & Related papers (2020-12-09T08:16:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.