Self-Supervised Multi-Scale Transformer with Attention-Guided Fusion for Efficient Crack Detection
- URL: http://arxiv.org/abs/2510.10378v1
- Date: Sun, 12 Oct 2025 00:25:33 GMT
- Title: Self-Supervised Multi-Scale Transformer with Attention-Guided Fusion for Efficient Crack Detection
- Authors: Blessing Agyei Kyem, Joshua Kofi Asamoah, Eugene Denteh, Andrews Danyo, Armstrong Aboah,
- Abstract summary: This paper examines the feasibility of achieving effective pixel-level crack segmentation entirely without manual annotations.<n>A fully self-supervised framework, Crack-Segmenter, is developed, integrating three complementary modules.<n>Crack-Segmenter consistently outperforms 13 state-of-the-art supervised methods across all major metrics.
- Score: 6.36678579841487
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pavement crack detection has long depended on costly and time-intensive pixel-level annotations, which limit its scalability for large-scale infrastructure monitoring. To overcome this barrier, this paper examines the feasibility of achieving effective pixel-level crack segmentation entirely without manual annotations. Building on this objective, a fully self-supervised framework, Crack-Segmenter, is developed, integrating three complementary modules: the Scale-Adaptive Embedder (SAE) for robust multi-scale feature extraction, the Directional Attention Transformer (DAT) for maintaining linear crack continuity, and the Attention-Guided Fusion (AGF) module for adaptive feature integration. Through evaluations on ten public datasets, Crack-Segmenter consistently outperforms 13 state-of-the-art supervised methods across all major metrics, including mean Intersection over Union (mIoU), Dice score, XOR, and Hausdorff Distance (HD). These findings demonstrate that annotation-free crack detection is not only feasible but also superior, enabling transportation agencies and infrastructure managers to conduct scalable and cost-effective monitoring. This work advances self-supervised learning and motivates pavement cracks detection research.
Related papers
- Rethinking Evaluation of Infrared Small Target Detection [105.59753496831739]
This paper introduces a hybrid-level metric incorporating pixel- and target-level performance, proposing a systematic error analysis method, and emphasizing the importance of cross-dataset evaluation.<n>An open-source toolkit has be released to facilitate standardized benchmarking.
arXiv Detail & Related papers (2025-09-21T02:45:07Z) - ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification [51.07970070817353]
An ideal time series classification (TSC) should be able to capture invariant representations.<n>Current methods are largely unguided, lacking the semantic direction required to isolate truly universal features.<n>We propose an end-to-end Energy-Regularized Information for Shift-Robustness framework to enable guided and reliable feature disentanglement.
arXiv Detail & Related papers (2025-08-19T12:13:41Z) - HELPNet: Hierarchical Perturbations Consistency and Entropy-guided Ensemble for Scribble Supervised Medical Image Segmentation [4.034121387622003]
We propose HELPNet, a novel scribble-based weakly supervised segmentation framework.<n>HELPNet integrates three modules to bridge the gap between annotation efficiency and segmentation performance.<n>HELPNet significantly outperforms state-of-the-art methods for scribble-based weakly supervised segmentation.
arXiv Detail & Related papers (2024-12-25T01:52:01Z) - CrackESS: A Self-Prompting Crack Segmentation System for Edge Devices [5.051837985130048]
This paper introduces CrackESS, a novel system for detecting and segmenting concrete cracks.<n>We conduct experiments on three datasets(Khanhha's dataset, Crack500, CrackCR) and validate CrackESS on a climbing robot system to demonstrate the advantage and effectiveness of our approach.
arXiv Detail & Related papers (2024-12-10T05:50:50Z) - Source-Free Domain Adaptive Object Detection with Semantics Compensation [54.00183496587841]
We introduce Weak-to-strong Semantics Compensation (WSCo) for strong data augmentation.<n>WSCo compensates for the class-relevant semantics that may be lost during strong augmentation on the fly.<n>WSCo can be implemented as a generic plug-in, easily integrable with any existing SFOD pipelines.
arXiv Detail & Related papers (2024-10-07T23:32:06Z) - Hybrid-Segmentor: A Hybrid Approach to Automated Fine-Grained Crack Segmentation in Civil Infrastructure [52.2025114590481]
We introduce Hybrid-Segmentor, an encoder-decoder based approach that is capable of extracting both fine-grained local and global crack features.
This allows the model to improve its generalization capabilities in distinguish various type of shapes, surfaces and sizes of cracks.
The proposed model outperforms existing benchmark models across 5 quantitative metrics (accuracy 0.971, precision 0.804, recall 0.744, F1-score 0.770, and IoU score 0.630), achieving state-of-the-art status.
arXiv Detail & Related papers (2024-09-04T16:47:16Z) - From Classification to Segmentation with Explainable AI: A Study on Crack Detection and Growth Monitoring [8.57765854420254]
Monitoring surface cracks in infrastructure is crucial for structural health monitoring.
Machine learning approaches have proven their effectiveness but typically require large annotated datasets for supervised training.
To mitigate this cost, one can leverage explainable artificial intelligence (XAI) to derive segmentations from the explanations of a classifier, requiring only weak image-level supervision.
arXiv Detail & Related papers (2023-09-20T12:50:52Z) - Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images.
Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z) - CRAUM-Net: Contextual Recursive Attention with Uncertainty Modeling for Salient Object Detection [0.0]
We present a novel framework that integrates multi-scale context aggregation, advanced attention mechanisms, and an uncertainty-aware module for improved SOD performance.<n>Our Adaptive Cross-Scale Context Module effectively fuses features from multiple levels, leveraging Recursive Channel Spatial Attention and Convolutional Block Attention.<n>To train our network robustly, we employ a combination of boundary-sensitive and topology-preserving loss functions, including Boundary IoU, Focal Tversky, and Topological Saliency losses.
arXiv Detail & Related papers (2020-06-04T18:33:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.