Related papers: A Survey on RGB, 3D, and Multimodal Approaches for Unsupervised Industrial Image Anomaly Detection

A Survey on RGB, 3D, and Multimodal Approaches for Unsupervised Industrial Image Anomaly Detection

URL: http://arxiv.org/abs/2410.21982v2
Date: Fri, 21 Mar 2025 04:51:16 GMT
Title: A Survey on RGB, 3D, and Multimodal Approaches for Unsupervised Industrial Image Anomaly Detection
Authors: Yuxuan Lin, Yang Chang, Xuan Tong, Jiawen Yu, Antonio Liotta, Guofan Huang, Wei Song, Deyu Zeng, Zongze Wu, Yan Wang, Wenqiang Zhang,
Abstract summary: Unsupervised industrial image anomaly detection technology effectively overcomes the scarcity of abnormal samples.<n>This artical provides a comprehensive review of UIAD tasks in the three modal settings.
Score: 24.634671653473397
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the advancement of industrial informatization, unsupervised anomaly detection technology effectively overcomes the scarcity of abnormal samples and significantly enhances the automation and reliability of smart manufacturing. As an important branch, industrial image anomaly detection focuses on automatically identifying visual anomalies in industrial scenarios (such as product surface defects, assembly errors, and equipment appearance anomalies) through computer vision techniques. With the rapid development of Unsupervised industrial Image Anomaly Detection (UIAD), excellent detection performance has been achieved not only in RGB setting but also in 3D and multimodal (RGB and 3D) settings. However, existing surveys primarily focus on UIAD tasks in RGB setting, with little discussion in 3D and multimodal settings. To address this gap, this artical provides a comprehensive review of UIAD tasks in the three modal settings. Specifically, we first introduce the task concept and process of UIAD. We then overview the research on UIAD in three modal settings (RGB, 3D, and multimodal), including datasets and methods, and review multimodal feature fusion strategies in multimodal setting. Finally, we summarize the main challenges faced by UIAD tasks in the three modal settings, and offer insights into future development directions, aiming to provide researchers with a comprehensive reference and offer new perspectives for the advancement of industrial informatization. Corresponding resources are available at https://github.com/Sunny5250/Awesome-Multi-Setting-UIAD.

Related papers

BridgeNet: A Unified Multimodal Framework for Bridging 2D and 3D Industrial Anomaly Detection [26.864423488101075]
We propose a novel unified multimodal anomaly detection framework.<n>Our contributions consist of 3 key aspects.<n> Experiments show our method outperforms state-of-the-art (SOTA) on MVTec-3D AD and Eyecandies datasets.
arXiv Detail & Related papers (2025-07-25T13:27:25Z)
3D-ADAM: A Dataset for 3D Anomaly Detection in Advanced Manufacturing [5.096333816641487]
3D-ADAM is the first large-scale industry-relevant dataset for high-precision 3D Anomaly Detection.<n>It comprises 14,120 high-resolution scans across 217 unique parts, captured using 4 industrial depth imaging sensors.<n>It includes 27,346 annotated defect instances from 12 categories, covering the breadth of industrial surface defects.
arXiv Detail & Related papers (2025-07-10T15:09:20Z)
Real-IAD D3: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection [53.2590751089607]
Real-IAD D3 is a high-precision multimodal dataset that incorporates an additional pseudo3D modality generated through photometric stereo. We introduce an effective approach that integrates RGB, point cloud, and pseudo-3D depth information to leverage the complementary strengths of each modality. Our experiments highlight the importance of these modalities in boosting detection robustness and overall IAD performance.
arXiv Detail & Related papers (2025-04-19T08:05:47Z)
RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR. Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer. Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z)
M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising [63.39134873744748]
Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images. This paper proposes a novel noise-resistant M3DM-NR framework to leverage strong multi-modal discriminative capabilities of CLIP. Extensive experiments show that M3DM-NR outperforms state-of-the-art methods in 3D-RGB multi-modal noisy anomaly detection.
arXiv Detail & Related papers (2024-06-04T12:33:02Z)
IPAD: Industrial Process Anomaly Detection Dataset [71.39058003212614]
Video anomaly detection (VAD) is a challenging task aiming to recognize anomalies in video frames. We propose a new dataset, IPAD, specifically designed for VAD in industrial scenarios. This dataset covers 16 different industrial devices and contains over 6 hours of both synthetic and real-world video footage.
arXiv Detail & Related papers (2024-04-23T13:38:01Z)
Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version. We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z)
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets. We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z)
Long-Tailed 3D Detection via Multi-Modal Fusion [47.03801888003686]
We study the problem of Long-Tailed 3D Detection (LT3D), which evaluates all annotated classes, including those in-the-tail. We point out that rare-class accuracy is particularly improved via multi-modal late fusion (MMLF) of independently trained uni-modal LiDAR and RGB detectors. Our proposed MMLF approach significantly improves LT3D performance over prior work, particularly improving rare class performance from 12.8 to 20.0 mAP!
arXiv Detail & Related papers (2023-12-18T07:14:25Z)
Dual-Branch Reconstruction Network for Industrial Anomaly Detection with RGB-D Data [1.861332908680942]
Multi-modal industrial anomaly detection based on 3D point clouds and RGB images is just beginning to emerge. The above methods require a longer inference time and higher memory usage, which cannot meet the real-time requirements of the industry. We propose a lightweight dual-branch reconstruction network based on RGB-D input, learning the decision boundary between normal and abnormal examples.
arXiv Detail & Related papers (2023-11-12T10:19:14Z)
ODM3D: Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D Object Detection [15.204935788297226]
ODM3D framework entails cross-modal knowledge distillation at various levels to inject LiDAR-domain knowledge into a monocular detector during training. By identifying foreground sparsity as the main culprit behind existing methods' suboptimal training, we exploit the precise localisation information embedded in LiDAR points. Our method ranks 1st in both KITTI validation and test benchmarks, significantly surpassing all existing monocular methods, supervised or semi-supervised.
arXiv Detail & Related papers (2023-10-28T07:12:09Z)
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation [28.417029383793068]
Multi-modal 3D scene understanding has gained considerable attention due to its wide applications in many areas, such as autonomous driving and human-computer interaction. introducing an additional modality not only elevates the richness and precision of scene interpretation but also ensures a more robust and resilient understanding. We present a novel taxonomy that delivers a thorough categorization of existing methods according to modalities and tasks, exploring their respective strengths and limitations.
arXiv Detail & Related papers (2023-10-24T09:39:05Z)
Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme. Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z)
Deep Industrial Image Anomaly Detection: A Survey [85.44223757234671]
Recent rapid development of deep learning has laid a milestone in industrial Image Anomaly Detection (IAD) In this paper, we provide a comprehensive review of deep learning-based image anomaly detection techniques. We highlight several opening challenges for image anomaly detection.
arXiv Detail & Related papers (2023-01-27T03:18:09Z)
Towards Multimodal Multitask Scene Understanding Models for Indoor Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments. We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges. MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks. We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.