OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System
- URL: http://arxiv.org/abs/2403.11536v1
- Date: Mon, 18 Mar 2024 07:41:39 GMT
- Title: OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System
- Authors: Chih-Chung Hsu, Chia-Ming Lee, Chun-Hung Sun, Kuang-Ming Wu,
- Abstract summary: We introduce an external modality-guided data mining framework, primarily rooted in optical character recognition (OCR), to extract statistical features from images.
A key aspect of our approach is the alignment of external modality features, extracted using a single modality-aware model, with image features encoded by a convolutional neural network.
Our methodology considerably boosts the recall rate of the defect detection model and maintains high robustness even in challenging scenarios.
- Score: 7.1083241462091165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic optical inspection (AOI) plays a pivotal role in the manufacturing process, predominantly leveraging high-resolution imaging instruments for scanning purposes. It detects anomalies by analyzing image textures or patterns, making it an essential tool in industrial manufacturing and quality control. Despite its importance, the deployment of models for AOI often faces challenges. These include limited sample sizes, which hinder effective feature learning, variations among source domains, and sensitivities to changes in lighting and camera positions during imaging. These factors collectively compromise the accuracy of model predictions. Traditional AOI often fails to capitalize on the rich mechanism-parameter information from machines or inside images, including statistical parameters, which typically benefit AOI classification. To address this, we introduce an external modality-guided data mining framework, primarily rooted in optical character recognition (OCR), to extract statistical features from images as a second modality to enhance performance, termed OANet (Ocr-Aoi-Net). A key aspect of our approach is the alignment of external modality features, extracted using a single modality-aware model, with image features encoded by a convolutional neural network. This synergy enables a more refined fusion of semantic representations from different modalities. We further introduce feature refinement and a gating function in our OANet to optimize the combination of these features, enhancing inference and decision-making capabilities. Experimental outcomes show that our methodology considerably boosts the recall rate of the defect detection model and maintains high robustness even in challenging scenarios.
Related papers
- Time Step Generating: A Universal Synthesized Deepfake Image Detector [0.4488895231267077]
We propose a universal synthetic image detector Time Step Generating (TSG)
TSG does not rely on pre-trained models' reconstructing ability, specific datasets, or sampling algorithms.
We test the proposed TSG on the large-scale GenImage benchmark and it achieves significant improvements in both accuracy and generalizability.
arXiv Detail & Related papers (2024-11-17T09:39:50Z) - Present and Future Generalization of Synthetic Image Detectors [0.6144680854063939]
This work conducts a systematic analysis and uses its insights to develop practical guidelines for training robust synthetic image detectors.
Model generalization capabilities are evaluated across different setups including real-world deployment conditions.
We show that while current approaches excel in specific scenarios, no single detector achieves universal effectiveness.
arXiv Detail & Related papers (2024-09-21T12:46:17Z) - AssemAI: Interpretable Image-Based Anomaly Detection for Manufacturing Pipelines [0.0]
Anomaly detection in manufacturing pipelines remains a critical challenge, intensified by the complexity and variability of industrial environments.
This paper introduces AssemAI, an interpretable image-based anomaly detection system tailored for smart manufacturing pipelines.
arXiv Detail & Related papers (2024-08-05T01:50:09Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - ViTaL: An Advanced Framework for Automated Plant Disease Identification
in Leaf Images Using Vision Transformers and Linear Projection For Feature
Reduction [0.0]
This paper introduces a robust framework for the automated identification of diseases in plant leaf images.
The framework incorporates several key stages to enhance disease recognition accuracy.
We propose a novel hardware design specifically tailored for scanning diseased leaves in an omnidirectional fashion.
arXiv Detail & Related papers (2024-02-27T11:32:37Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - ESSAformer: Efficient Transformer for Hyperspectral Image
Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z) - On Sensitivity and Robustness of Normalization Schemes to Input
Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input.
DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases.
We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z) - Implicit Diffusion Models for Continuous Super-Resolution [65.45848137914592]
This paper introduces an Implicit Diffusion Model (IDM) for high-fidelity continuous image super-resolution.
IDM integrates an implicit neural representation and a denoising diffusion model in a unified end-to-end framework.
The scaling factor regulates the resolution and accordingly modulates the proportion of the LR information and generated features in the final output.
arXiv Detail & Related papers (2023-03-29T07:02:20Z) - Patch Similarity Aware Data-Free Quantization for Vision Transformers [2.954890575035673]
We propose PSAQ-ViT, a Patch Similarity Aware data-free Quantization framework for Vision Transformers.
We analyze the self-attention module's properties and reveal a general difference (patch similarity) in its processing of Gaussian noise and real images.
Experiments and ablation studies are conducted on various benchmarks to validate the effectiveness of PSAQ-ViT.
arXiv Detail & Related papers (2022-03-04T11:47:20Z) - Characteristic Regularisation for Super-Resolving Face Images [81.84939112201377]
Existing facial image super-resolution (SR) methods focus mostly on improving artificially down-sampled low-resolution (LR) imagery.
Previous unsupervised domain adaptation (UDA) methods address this issue by training a model using unpaired genuine LR and HR data.
This renders the model overstretched with two tasks: consistifying the visual characteristics and enhancing the image resolution.
We formulate a method that joins the advantages of conventional SR and UDA models.
arXiv Detail & Related papers (2019-12-30T16:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.