LPCAN: Lightweight Pyramid Cross-Attention Network for Rail Surface Defect Detection Using RGB-D Data
- URL: http://arxiv.org/abs/2601.09118v1
- Date: Wed, 14 Jan 2026 03:35:09 GMT
- Title: LPCAN: Lightweight Pyramid Cross-Attention Network for Rail Surface Defect Detection Using RGB-D Data
- Authors: Jackie Alex, Guoqiang Huan,
- Abstract summary: This paper addresses the limitations of current vision-based rail defect detection methods.<n>We propose a Lightweight Pyramid Cross-Attention Network (LPCANet) that leverages RGB-D data for efficient and accurate defect identification.<n>LPCANet achieves state-of-the-art performance with only 9.90 million parameters, 2.50 G FLOPs, and 162.60 fps inference speed.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper addresses the limitations of current vision-based rail defect detection methods, including high computational complexity, excessive parameter counts, and suboptimal accuracy. We propose a Lightweight Pyramid Cross-Attention Network (LPCANet) that leverages RGB-D data for efficient and accurate defect identification. The architecture integrates MobileNetv2 as a backbone for RGB feature extraction with a lightweight pyramid module (LPM) for depth processing, coupled with a cross-attention mechanism (CAM) for multimodal fusion and a spatial feature extractor (SFE) for enhanced structural analysis. Comprehensive evaluations on three unsupervised RGB-D rail datasets (NEU-RSDDS-AUG, RSDD-TYPE1, RSDD-TYPE2) demonstrate that LPCANet achieves state-of-the-art performance with only 9.90 million parameters, 2.50 G FLOPs, and 162.60 fps inference speed. Compared to 18 existing methods, LPCANet shows significant improvements, including +1.48\% in $S_α$, +0.86\% in IOU, and +1.77\% in MAE over the best-performing baseline. Ablation studies confirm the critical roles of CAM and SFE, while experiments on non-rail datasets (DAGM2007, MT, Kolektor-SDD2) validate its generalization capability. The proposed framework effectively bridges traditional and deep learning approaches, offering substantial practical value for industrial defect inspection. Future work will focus on further model compression for real-time deployment.
Related papers
- MRS-YOLO Railroad Transmission Line Foreign Object Detection Based on Improved YOLO11 and Channel Pruning [2.6795746856835785]
We propose an improved algorithm MRS-YOLO based on YOLO11.<n>The mAP50 and mAP50:95 of the MRS-YOLO algorithm are improved to 94.8% and 86.4%, respectively.
arXiv Detail & Related papers (2025-10-12T11:38:09Z) - A Lightweight Group Multiscale Bidirectional Interactive Network for Real-Time Steel Surface Defect Detection [15.140649886958945]
Group Multiscale Bidirectional Interactive (GMBI) modules enhance multiscale feature extraction and interaction.<n>Experiments on SD-Saliency-900 and NRSD-MN datasets demonstrate that GMBINet delivers competitive accuracy with real-time speeds of 1048 FPS on GPU and 16.53 FPS on CPU at 512 resolution.
arXiv Detail & Related papers (2025-08-22T13:58:35Z) - Beyond RGB and Events: Enhancing Object Detection under Adverse Lighting with Monocular Normal Maps [6.240947520777607]
We introduce NRE-Net, a novel multi-modal detection framework.<n>It fuses three complementary modalities: monocularly predicted surface normal maps, RGB images, and event streams.<n>NRE-Net significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-08-04T07:19:20Z) - Lightweight RGB-D Salient Object Detection from a Speed-Accuracy Tradeoff Perspective [54.91271106816616]
Current RGB-D methods usually leverage large-scale backbones to improve accuracy but sacrifice efficiency.<n>We propose a Speed-Accuracy Tradeoff Network (SATNet) for Lightweight RGB-D SOD from three fundamental perspectives.<n> Concerning depth quality, we introduce the Depth Anything Model to generate high-quality depth maps.<n>For modality fusion, we propose a Decoupled Attention Module (DAM) to explore the consistency within and between modalities.<n>For feature representation, we develop a Dual Information Representation Module (DIRM) with a bi-directional inverted framework.
arXiv Detail & Related papers (2025-05-07T19:37:20Z) - SPFFNet: Strip Perception and Feature Fusion Spatial Pyramid Pooling for Fabric Defect Detection [0.0]
We propose an improved fabric defect detection model based on YOLOv11.<n>We introduce a Strip Perception Module (SPM) that improves feature capture through multi-scale convolution.<n>We also propose a novel focal enhanced complete intersection over union (FECIoU) metric with adaptive weights.
arXiv Detail & Related papers (2025-02-03T15:33:11Z) - Global Context Aggregation Network for Lightweight Saliency Detection of
Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.
First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module.
The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - DFTR: Depth-supervised Hierarchical Feature Fusion Transformer for
Salient Object Detection [44.94166578314837]
We propose a pure Transformer-based SOD framework, namely Depth-supervised hierarchical feature Fusion TRansformer (DFTR)
We extensively evaluate the proposed DFTR on ten benchmarking datasets. Experimental results show that our DFTR consistently outperforms the existing state-of-the-art methods for both RGB and RGB-D SOD tasks.
arXiv Detail & Related papers (2022-03-12T12:59:12Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - DUT-LFSaliency: Versatile Dataset and Light Field-to-RGB Saliency
Detection [104.50425501764806]
We introduce a large-scale dataset to enable versatile applications for light field saliency detection.
We present an asymmetrical two-stream model consisting of the Focal stream and RGB stream.
Experiments demonstrate that our Focal stream achieves state-of-the-arts performance.
arXiv Detail & Related papers (2020-12-30T11:53:27Z) - Hierarchical Dynamic Filtering Network for RGB-D Salient Object
Detection [91.43066633305662]
The main purpose of RGB-D salient object detection (SOD) is how to better integrate and utilize cross-modal fusion information.
In this paper, we explore these issues from a new perspective.
We implement a kind of more flexible and efficient multi-scale cross-modal feature processing.
arXiv Detail & Related papers (2020-07-13T07:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.