Cross-Modal Alignment and Fusion for RGB-D Transmission-Line Defect Detection
- URL: http://arxiv.org/abs/2602.01696v2
- Date: Tue, 03 Feb 2026 07:57:04 GMT
- Title: Cross-Modal Alignment and Fusion for RGB-D Transmission-Line Defect Detection
- Authors: Jiaming Cui, Wenqiang Li, Shuai Zhou, Ruifeng Qin, Feng Shen,
- Abstract summary: This paper proposes CMAFNet, a Cross-Modal Alignment and Fusion Network that integrates RGB appearance and depth geometry through a principled-then-fuse paradigm.<n> CMAFNet consists of a Semantic Recomposition Module that performs dictionary-based feature purification.<n>A lightweight variant reaches 24.8% mAP50 at 228 FPS with only 4.9M parameters, surpassing all YOLO-based detectors.
- Score: 11.637942429146172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transmission line defect detection remains challenging for automated UAV inspection due to the dominance of small-scale defects, complex backgrounds, and illumination variations. Existing RGB-based detectors, despite recent progress, struggle to distinguish geometrically subtle defects from visually similar background structures under limited chromatic contrast. This paper proposes CMAFNet, a Cross-Modal Alignment and Fusion Network that integrates RGB appearance and depth geometry through a principled purify-then-fuse paradigm. CMAFNet consists of a Semantic Recomposition Module that performs dictionary-based feature purification via a learned codebook to suppress modality-specific noise while preserving defect-discriminative information, and a Contextual Semantic Integration Framework that captures global spatial dependencies using partial-channel attention to enhance structural semantic reasoning. Position-wise normalization within the purification stage enforces explicit reconstruction-driven cross-modal alignment, ensuring statistical compatibility between heterogeneous features prior to fusion. Extensive experiments on the TLRGBD benchmark, where 94.5% of instances are small objects, demonstrate that CMAFNet achieves 32.2% mAP@50 and 12.5% APs, outperforming the strongest baseline by 9.8 and 4.0 percentage points, respectively. A lightweight variant reaches 24.8% mAP50 at 228 FPS with only 4.9M parameters, surpassing all YOLO-based detectors while matching transformer-based methods at substantially lower computational cost.
Related papers
- Halt the Hallucination: Decoupling Signal and Semantic OOD Detection Based on Cascaded Early Rejection [7.227431306238601]
We propose the Cascaded Early Rejection (CER) framework, which realizes hierarchical filtering for anomaly detection via a coarse-to-fine logic.<n> Experimental results demonstrate that CER not only reduces computational overhead by 32% but also achieves a significant performance leap on the CIFAR-100 benchmark.
arXiv Detail & Related papers (2026-02-06T02:55:35Z) - D3R-Net: Dual-Domain Denoising Reconstruction Network for Robust Industrial Anomaly Detection [0.0]
Unsupervised anomaly detection (UAD) is a key ingredient of automated visual inspection in modern manufacturing.<n>We introduce D3R-Net, a Dual-Domain Denoising Reconstruction framework that couples a self-supervised 'healing' task with frequency-aware regularization.<n>In addition to the spatial mean squared error, we employ a Fast Fourier Transform (FFT) magnitude loss that encourages consistency in the frequency domain.
arXiv Detail & Related papers (2026-01-27T23:21:59Z) - LPCAN: Lightweight Pyramid Cross-Attention Network for Rail Surface Defect Detection Using RGB-D Data [0.0]
This paper addresses the limitations of current vision-based rail defect detection methods.<n>We propose a Lightweight Pyramid Cross-Attention Network (LPCANet) that leverages RGB-D data for efficient and accurate defect identification.<n>LPCANet achieves state-of-the-art performance with only 9.90 million parameters, 2.50 G FLOPs, and 162.60 fps inference speed.
arXiv Detail & Related papers (2026-01-14T03:35:09Z) - Physics-Inspired Modeling and Content Adaptive Routing in an Infrared Gas Leak Detection Network [19.83756107644484]
We present a physics-edge hybrid gas dynamic routing network (PEG-DRNet) for detecting infrared gas leaks.<n>PEG-DRNet achieves superior overall performance with the best balance of accuracy and computational efficiency.
arXiv Detail & Related papers (2025-12-29T06:28:20Z) - MRS-YOLO Railroad Transmission Line Foreign Object Detection Based on Improved YOLO11 and Channel Pruning [2.6795746856835785]
We propose an improved algorithm MRS-YOLO based on YOLO11.<n>The mAP50 and mAP50:95 of the MRS-YOLO algorithm are improved to 94.8% and 86.4%, respectively.
arXiv Detail & Related papers (2025-10-12T11:38:09Z) - Unified Unsupervised Anomaly Detection via Matching Cost Filtering [113.43366521994396]
Unsupervised anomaly detection (UAD) aims to identify image- and pixel-level anomalies using only normal training data.<n>We present Unified Cost Filtering (UCF), a generic post-hoc refinement framework for refining anomaly cost volume of any UAD model.
arXiv Detail & Related papers (2025-10-03T03:28:18Z) - CLUE: Non-parametric Verification from Experience via Hidden-State Clustering [64.50919789875233]
We show that correctness of a solution is encoded as a geometrically separable signature within the trajectory of hidden activations.<n>ClUE consistently outperforms LLM-as-a-judge baselines and matches or exceeds modern confidence-based methods in reranking candidates.
arXiv Detail & Related papers (2025-10-02T02:14:33Z) - SPFFNet: Strip Perception and Feature Fusion Spatial Pyramid Pooling for Fabric Defect Detection [0.0]
We propose an improved fabric defect detection model based on YOLOv11.<n>We introduce a Strip Perception Module (SPM) that improves feature capture through multi-scale convolution.<n>We also propose a novel focal enhanced complete intersection over union (FECIoU) metric with adaptive weights.
arXiv Detail & Related papers (2025-02-03T15:33:11Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - Global Context Aggregation Network for Lightweight Saliency Detection of
Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.
First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module.
The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z) - G-DetKD: Towards General Distillation Framework for Object Detectors via
Contrastive and Semantic-guided Feature Imitation [49.421099172544196]
We propose a novel semantic-guided feature imitation technique, which automatically performs soft matching between feature pairs across all pyramid levels.
We also introduce contrastive distillation to effectively capture the information encoded in the relationship between different feature regions.
Our method consistently outperforms the existing detection KD techniques, and works when (1) components in the framework are used separately and in conjunction.
arXiv Detail & Related papers (2021-08-17T07:44:27Z) - Progressively Guided Alternate Refinement Network for RGB-D Salient
Object Detection [63.18846475183332]
We aim to develop an efficient and compact deep network for RGB-D salient object detection.
We propose a progressively guided alternate refinement network to refine it.
Our model outperforms existing state-of-the-art approaches by a large margin.
arXiv Detail & Related papers (2020-08-17T02:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.