Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection
- URL: http://arxiv.org/abs/2411.19220v1
- Date: Thu, 28 Nov 2024 15:42:32 GMT
- Title: Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection
- Authors: Tsun-Hin Cheung, Ka-Chun Fung, Songjiang Lai, Kwan-Ho Lin, Vincent Ng, Kin-Man Lam,
- Abstract summary: We propose a zero-shot training-free approach for automated industrial image anomaly detection using a multimodal machine learning pipeline.
Our proposed model enables efficient, scalable, and objective quality control in industrial manufacturing settings.
- Score: 17.06832015516288
- License:
- Abstract: Identifying defects and anomalies in industrial products is a critical quality control task. Traditional manual inspection methods are slow, subjective, and error-prone. In this work, we propose a novel zero-shot training-free approach for automated industrial image anomaly detection using a multimodal machine learning pipeline, consisting of three foundation models. Our method first uses a large language model, i.e., GPT-3. generate text prompts describing the expected appearances of normal and abnormal products. We then use a grounding object detection model, called Grounding DINO, to locate the product in the image. Finally, we compare the cropped product image patches to the generated prompts using a zero-shot image-text matching model, called CLIP, to identify any anomalies. Our experiments on two datasets of industrial product images, namely MVTec-AD and VisA, demonstrate the effectiveness of this method, achieving high accuracy in detecting various types of defects and anomalies without the need for model training. Our proposed model enables efficient, scalable, and objective quality control in industrial manufacturing settings.
Related papers
- DiffDoctor: Diagnosing Image Diffusion Models Before Treating [57.82359018425674]
We propose DiffDoctor, a two-stage pipeline to assist image diffusion models in generating fewer artifacts.
We collect a dataset of over 1M flawed synthesized images and set up an efficient human-in-the-loop annotation process.
The learned artifact detector is then involved in the second stage to tune the diffusion model through assigning a per-pixel confidence map for each image.
arXiv Detail & Related papers (2025-01-21T18:56:41Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.
In this paper, we investigate how detection performance varies across model backbones, types, and datasets.
We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - Evaluating Vision Transformer Models for Visual Quality Control in Industrial Manufacturing [0.0]
One of the most promising use-cases for machine learning in industrial manufacturing is the early detection of defective products.
We evaluate current vision transformer models together with anomaly detection methods.
We give guidelines for choosing a suitable model architecture for a quality control system in practice.
arXiv Detail & Related papers (2024-11-22T14:12:35Z) - FADE: Few-shot/zero-shot Anomaly Detection Engine using Large Vision-Language Model [0.9226774742769024]
Few-shot/zero-shot anomaly detection is important for quality inspection in the manufacturing industry.
We propose the Few-shot/zero-shot Anomaly Engine Detection (FADE) which leverages the vision-language CLIP model and adjusts it for the purpose of anomaly detection.
FADE outperforms other state-of-the-art methods in anomaly segmentation with pixel-AUROC of 89.6% (91.5%) in zero-shot and 95.4% (97.5%) in 1-normal-shot.
arXiv Detail & Related papers (2024-08-31T23:05:56Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - Voxel-wise classification for porosity investigation of additive
manufactured parts with 3D unsupervised and (deeply) supervised neural
networks [5.467497693327066]
This study revisits recent supervised (UNet, UNet++, UNet 3+, MSS-UNet) and unsupervised (VAE, ceVAE, gmVAE, vqVAE) DL models for volumetric analysis of AM samples from X-CT images.
It extends them to accept 3D input data with a 3D-patch pipeline for lower computational requirements, improved efficiency and generalisability.
The VAE/ceVAE models demonstrated superior capabilities, particularly when leveraging post-processing techniques.
arXiv Detail & Related papers (2023-05-13T11:23:00Z) - Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme.
Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z) - Semi-Siamese Network for Robust Change Detection Across Different
Domains with Applications to 3D Printing [17.176767333354636]
We present a novel Semi-Siamese deep learning model for defect detection in 3D printing processes.
Our model is designed to enable comparison of heterogeneous images from different domains while being robust against perturbations in the imaging setup.
Using our model, defect localization predictions can be made in less than half a second per layer using a standard MacBook Pro while achieving an F1-score of more than 0.9.
arXiv Detail & Related papers (2022-12-16T17:02:55Z) - Reference-based Defect Detection Network [57.89399576743665]
The first issue is the texture shift which means a trained defect detector model will be easily affected by unseen texture.
The second issue is partial visual confusion which indicates that a partial defect box is visually similar with a complete box.
We propose a Reference-based Defect Detection Network (RDDN) to tackle these two problems.
arXiv Detail & Related papers (2021-08-10T05:44:23Z) - Computer Vision and Normalizing Flow Based Defect Detection [0.0]
We present a two-stage defect detection network based on the object detection model YOLO, and the normalizing flow-based defect detection model DifferNet.
Our model has high robustness and performance on defect detection using real-world video clips taken from a production line monitoring system.
Our proposed model can learn on a small number of defect-free samples of single or multiple product types.
arXiv Detail & Related papers (2020-12-12T05:38:21Z) - Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction.
We put forward an alternative measure of anomaly score to replace the reconstruction-based metric.
Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.