BridgeNet: A Unified Multimodal Framework for Bridging 2D and 3D Industrial Anomaly Detection
- URL: http://arxiv.org/abs/2507.19253v1
- Date: Fri, 25 Jul 2025 13:27:25 GMT
- Title: BridgeNet: A Unified Multimodal Framework for Bridging 2D and 3D Industrial Anomaly Detection
- Authors: An Xiang, Zixuan Huang, Xitong Gao, Kejiang Ye, Cheng-zhong Xu,
- Abstract summary: We propose a novel unified multimodal anomaly detection framework.<n>Our contributions consist of 3 key aspects.<n> Experiments show our method outperforms state-of-the-art (SOTA) on MVTec-3D AD and Eyecandies datasets.
- Score: 26.864423488101075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Industrial anomaly detection for 2D objects has gained significant attention and achieved progress in anomaly detection (AD) methods. However, identifying 3D depth anomalies using only 2D information is insufficient. Despite explicitly fusing depth information into RGB images or using point cloud backbone networks to extract depth features, both approaches struggle to adequately represent 3D information in multimodal scenarios due to the disparities among different modal information. Additionally, due to the scarcity of abnormal samples in industrial data, especially in multimodal scenarios, it is necessary to perform anomaly generation to simulate real-world abnormal samples. Therefore, we propose a novel unified multimodal anomaly detection framework to address these issues. Our contributions consist of 3 key aspects. (1) We extract visible depth information from 3D point cloud data simply and use 2D RGB images to represent appearance, which disentangles depth and appearance to support unified anomaly generation. (2) Benefiting from the flexible input representation, the proposed Multi-Scale Gaussian Anomaly Generator and Unified Texture Anomaly Generator can generate richer anomalies in RGB and depth. (3) All modules share parameters for both RGB and depth data, effectively bridging 2D and 3D anomaly detection. Subsequent modules can directly leverage features from both modalities without complex fusion. Experiments show our method outperforms state-of-the-art (SOTA) on MVTec-3D AD and Eyecandies datasets. Code available at: https://github.com/Xantastic/BridgeNet
Related papers
- Multistream Network for LiDAR and Camera-based 3D Object Detection in Outdoor Scenes [59.78696921486972]
Fusion of LiDAR and RGB data has the potential to enhance outdoor 3D object detection accuracy.<n>We propose a MultiStream Detection (MuStD) network, that meticulously extracts task-relevant information from both data modalities.
arXiv Detail & Related papers (2025-07-25T14:20:16Z) - Real-IAD D3: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection [53.2590751089607]
Real-IAD D3 is a high-precision multimodal dataset that incorporates an additional pseudo3D modality generated through photometric stereo.<n>We introduce an effective approach that integrates RGB, point cloud, and pseudo-3D depth information to leverage the complementary strengths of each modality.<n>Our experiments highlight the importance of these modalities in boosting detection robustness and overall IAD performance.
arXiv Detail & Related papers (2025-04-19T08:05:47Z) - M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising [63.39134873744748]
Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images.
This paper proposes a novel noise-resistant M3DM-NR framework to leverage strong multi-modal discriminative capabilities of CLIP.
Extensive experiments show that M3DM-NR outperforms state-of-the-art methods in 3D-RGB multi-modal noisy anomaly detection.
arXiv Detail & Related papers (2024-06-04T12:33:02Z) - Dual-Branch Reconstruction Network for Industrial Anomaly Detection with
RGB-D Data [1.861332908680942]
Multi-modal industrial anomaly detection based on 3D point clouds and RGB images is just beginning to emerge.
The above methods require a longer inference time and higher memory usage, which cannot meet the real-time requirements of the industry.
We propose a lightweight dual-branch reconstruction network based on RGB-D input, learning the decision boundary between normal and abnormal examples.
arXiv Detail & Related papers (2023-11-12T10:19:14Z) - Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme.
Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z) - Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection [11.575945934519442]
LiDAR and camera fusion techniques are promising for achieving 3D object detection in autonomous driving.
Most multi-modal 3D object detection frameworks integrate semantic knowledge from 2D images into 3D LiDAR point clouds.
We propose a general multi-modal fusion framework Multi-Sem Fusion (MSF) to fuse the semantic information from both the 2D image and 3D points scene parsing results.
arXiv Detail & Related papers (2022-12-10T10:54:41Z) - Bridged Transformer for Vision and Point Cloud 3D Object Detection [92.86856146086316]
Bridged Transformer (BrT) is an end-to-end architecture for 3D object detection.
BrT learns to identify 3D and 2D object bounding boxes from both points and image patches.
We experimentally show that BrT surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.
arXiv Detail & Related papers (2022-10-04T05:44:22Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.