Related papers: 2D_3D Feature Fusion via Cross-Modal Latent Synthesis and Attention Guided Restoration for Industrial Anomaly Detection

2D_3D Feature Fusion via Cross-Modal Latent Synthesis and Attention Guided Restoration for Industrial Anomaly Detection

URL: http://arxiv.org/abs/2510.21793v1
Date: Mon, 20 Oct 2025 03:57:50 GMT
Title: 2D_3D Feature Fusion via Cross-Modal Latent Synthesis and Attention Guided Restoration for Industrial Anomaly Detection
Authors: Usman Ali, Ali Zia, Abdul Rehman, Umer Ramzan, Zohaib Hassan, Talha Sattar, Jing Wang, Wei Xiang,
Abstract summary: We propose a novel unsupervised framework, Multi-Modal Attention-Driven Fusion Restoration (MAFR)<n>MAFR synthesises a unified latent space from RGB images and point clouds using a shared fusion encoder, followed by attention-guided, modality-specific decoders.<n>Anomalies are localised by measuring reconstruction errors between input features and their restored counterparts.
Score: 9.873449426376787
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Industrial anomaly detection (IAD) increasingly benefits from integrating 2D and 3D data, but robust cross-modal fusion remains challenging. We propose a novel unsupervised framework, Multi-Modal Attention-Driven Fusion Restoration (MAFR), which synthesises a unified latent space from RGB images and point clouds using a shared fusion encoder, followed by attention-guided, modality-specific decoders. Anomalies are localised by measuring reconstruction errors between input features and their restored counterparts. Evaluations on the MVTec 3D-AD and Eyecandies benchmarks demonstrate that MAFR achieves state-of-the-art results, with a mean I-AUROC of 0.972 and 0.901, respectively. The framework also exhibits strong performance in few-shot learning settings, and ablation studies confirm the critical roles of the fusion architecture and composite loss. MAFR offers a principled approach for fusing visual and geometric information, advancing the robustness and accuracy of industrial anomaly detection. Code is available at https://github.com/adabrh/MAFR

Related papers

Cross-Modal Mapping and Dual-Branch Reconstruction for 2D-3D Multimodal Industrial Anomaly Detection [6.632019014616859]
textbfCMDR-IAD is an unsupervised framework for reliable anomaly detection in 2D+3D multimodal as well as single-modality settings.<n> CMDR-IAD achieves state-of-the-art performance while operating without memory banks, reaching 97.3% image-level AUROC (I-AUROC), 99.6% pixel-level AUROC (P-AUROC), and 97.6% AUPRO.
arXiv Detail & Related papers (2026-03-04T10:57:32Z)
IAENet: An Importance-Aware Ensemble Model for 3D Point Cloud-Based Anomaly Detection [2.08058961865456]
We argue that the key bottleneck is the absence of powerful pretrained foundation backbones in 3D comparable to those in 2D.<n>We propose Importance-Aware Ensemble Network (IAENet), an ensemble framework that synergizes 2D pretrained expert with 3D expert models.<n>IAENet achieves a new state-of-the-art with a markedly lower false positive rate, underscoring its practical value for industrial deployment.
arXiv Detail & Related papers (2025-08-28T07:19:07Z)
Real-IAD D3: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection [53.2590751089607]
Real-IAD D3 is a high-precision multimodal dataset that incorporates an additional pseudo3D modality generated through photometric stereo.<n>We introduce an effective approach that integrates RGB, point cloud, and pseudo-3D depth information to leverage the complementary strengths of each modality.<n>Our experiments highlight the importance of these modalities in boosting detection robustness and overall IAD performance.
arXiv Detail & Related papers (2025-04-19T08:05:47Z)
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency [50.11520458252128]
Existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data.<n>We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models.<n>GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data.
arXiv Detail & Related papers (2024-12-12T17:59:03Z)
A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion [41.34335755315773]
This paper focuses on how to model correlation-driven decomposing features and reason high-level graph representation.<n>We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy.<n> Experiments demonstrate the competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks.
arXiv Detail & Related papers (2024-06-11T09:32:40Z)
mmFUSION: Multimodal Fusion for 3D Objects Detection [18.401155770778757]
Multi-sensor fusion is essential for accurate 3D object detection in self-driving systems. In this paper, we propose a new intermediate-level multi-modal fusion approach to overcome these challenges. The code with the mmdetection3D project plugin will be publicly available soon.
arXiv Detail & Related papers (2023-11-07T15:11:27Z)
Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme. Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z)
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z)
EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module. The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner. The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z)
EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.