Related papers: WP-CrackNet: A Collaborative Adversarial Learning Framework for End-to-End Weakly-Supervised Road Crack Detection

WP-CrackNet: A Collaborative Adversarial Learning Framework for End-to-End Weakly-Supervised Road Crack Detection

URL: http://arxiv.org/abs/2510.17566v1
Date: Mon, 20 Oct 2025 14:13:26 GMT
Title: WP-CrackNet: A Collaborative Adversarial Learning Framework for End-to-End Weakly-Supervised Road Crack Detection
Authors: Nachuan Ma, Zhengfei Song, Qiang Hu, Xiaoyu Tang, Chengxi Zhang, Rui Fan, Lihua Xie,
Abstract summary: WP-CrackNet is an end-to-end weakly-supervised method for pixel-wise crack detection.<n>We create three image-level datasets and extensive experiments show that WP-CrackNet achieves comparable results to supervised methods.
Score: 27.876703832929778
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Road crack detection is essential for intelligent infrastructure maintenance in smart cities. To reduce reliance on costly pixel-level annotations, we propose WP-CrackNet, an end-to-end weakly-supervised method that trains with only image-level labels for pixel-wise crack detection. WP-CrackNet integrates three components: a classifier generating class activation maps (CAMs), a reconstructor measuring feature inferability, and a detector producing pixel-wise road crack detection results. During training, the classifier and reconstructor alternate in adversarial learning to encourage crack CAMs to cover complete crack regions, while the detector learns from pseudo labels derived from post-processed crack CAMs. This mutual feedback among the three components improves learning stability and detection accuracy. To further boost detection performance, we design a path-aware attention module (PAAM) that fuses high-level semantics from the classifier with low-level structural cues from the reconstructor by modeling spatial and channel-wise dependencies. Additionally, a center-enhanced CAM consistency module (CECCM) is proposed to refine crack CAMs using center Gaussian weighting and consistency constraints, enabling better pseudo-label generation. We create three image-level datasets and extensive experiments show that WP-CrackNet achieves comparable results to supervised methods and outperforms existing weakly-supervised methods, significantly advancing scalable road inspection. The source code package and datasets are available at https://mias.group/WP-CrackNet/.

Related papers

Source-Free Object Detection with Detection Transformer [59.33653163035064]
Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data.<n>Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR)<n>In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs.
arXiv Detail & Related papers (2025-10-13T07:35:04Z)
Hybrid-Segmentor: A Hybrid Approach to Automated Fine-Grained Crack Segmentation in Civil Infrastructure [52.2025114590481]
We introduce Hybrid-Segmentor, an encoder-decoder based approach that is capable of extracting both fine-grained local and global crack features. This allows the model to improve its generalization capabilities in distinguish various type of shapes, surfaces and sizes of cracks. The proposed model outperforms existing benchmark models across 5 quantitative metrics (accuracy 0.971, precision 0.804, recall 0.744, F1-score 0.770, and IoU score 0.630), achieving state-of-the-art status.
arXiv Detail & Related papers (2024-09-04T16:47:16Z)
UP-CrackNet: Unsupervised Pixel-Wise Road Crack Detection via Adversarial Image Restoration [23.71017765426465]
We propose an unsupervised pixel-wise road crack detection network, known as UP-CrackNet. Our approach first generates multi-scale square masks and randomly selects them to corrupt undamaged road images by removing certain regions. A generative adversarial network is trained to restore the corrupted regions by leveraging the semantic context learned from surrounding uncorrupted regions.
arXiv Detail & Related papers (2024-01-28T12:51:01Z)
Shape Anchor Guided Holistic Indoor Scene Understanding [9.463220988312218]
We propose a shape anchor guided learning strategy (AncLearn) for robust holistic indoor scene understanding. AncLearn generates anchors that dynamically fit instance surfaces to (i) unmix noise and target-related features for offering reliable proposals at the detection stage. We embed AncLearn into a reconstruction-from-detection learning system (AncRec) to generate high-quality semantic scene models.
arXiv Detail & Related papers (2023-09-20T08:30:20Z)
Enhancing Infrared Small Target Detection Robustness with Bi-Level Adversarial Framework [61.34862133870934]
We propose a bi-level adversarial framework to promote the robustness of detection in the presence of distinct corruptions. Our scheme remarkably improves 21.96% IOU across a wide array of corruptions and notably promotes 4.97% IOU on the general benchmark.
arXiv Detail & Related papers (2023-09-03T06:35:07Z)
A Tri-Layer Plugin to Improve Occluded Detection [100.99802831241583]
We propose a simple '' module for the detection head of two-stage object detectors to improve the recall of partially occluded objects. The module predicts a tri-layer of segmentation masks for the target object, the occluder and the occludee, and by doing so is able to better predict the mask of the target object. We also establish a COCO evaluation dataset to measure the recall performance of partially occluded and separated objects.
arXiv Detail & Related papers (2022-10-18T17:59:51Z)
Evaluation and Comparison of Deep Learning Methods for Pavement Crack Identification with Visual Images [0.0]
pavement crack identification with visual images via deep learning algorithms has the advantages of not being limited by the material of object to be detected. In the aspect of patch sample classification, the fine-tuned TL models can be equivalent to or even slightly better than the ED models in accuracy. In the aspect of accurate crack location, both ED and GAN algorithms can achieve pixel-level segmentation and is expected to be detected in real time on low computing power platform.
arXiv Detail & Related papers (2021-12-20T08:23:43Z)
PointCrack3D: Crack Detection in Unstructured Environments using a 3D-Point-Cloud-Based Deep Neural Network [20.330700719146215]
This paper presents PointCrack3D, a new 3D-point-cloud-based crack detection algorithm for unstructured surfaces. The method was validated experimentally on a new large natural rock dataset. Results demonstrate a crack detection rate of 97% overall and 100% for cracks with a maximum width of more than 3 cm.
arXiv Detail & Related papers (2021-11-23T02:33:18Z)
Dense Label Encoding for Boundary Discontinuity Free Rotation Detection [69.75559390700887]
This paper explores a relatively less-studied methodology based on classification. We propose new techniques to push its frontier in two aspects. Experiments and visual analysis on large-scale public datasets for aerial images show the effectiveness of our approach.
arXiv Detail & Related papers (2020-11-19T05:42:02Z)
Automatic Crack Detection on Road Pavements Using Encoder Decoder Architecture [9.34360241512198]
The proposed algorithm considers an encoder-decoder architecture with hierarchical feature learning and dilated convolution, named U-Hierarchical Dilated Network (U-HDN) Crack characteristics with multiple context information are automatically able to learn and perform end-to-end crack detection.
arXiv Detail & Related papers (2020-07-01T13:32:23Z)
Towards High Performance Human Keypoint Detection [87.1034745775229]
We find that context information plays an important role in reasoning human body configuration and invisible keypoints. Inspired by this, we propose a cascaded context mixer ( CCM) which efficiently integrates spatial and channel context information. To maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy. We present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy.
arXiv Detail & Related papers (2020-02-03T02:24:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.