CrossDiff: Diffusion Probabilistic Model With Cross-conditional Encoder-Decoder for Crack Segmentation
- URL: http://arxiv.org/abs/2501.12860v1
- Date: Wed, 22 Jan 2025 13:13:41 GMT
- Title: CrossDiff: Diffusion Probabilistic Model With Cross-conditional Encoder-Decoder for Crack Segmentation
- Authors: Xianglong Shi, Yunhan Jiang, Xiaoheng Jiang, Mingling Xu, Yang Liu,
- Abstract summary: We propose a novel diffusion-based model with a cross-conditional encoder-decoder, named CrossDiff.
The proposed CrossDiff model achieves impressive performance, outperforming other state-of-the-art methods by 8.0% in terms of both Dice score and IoU.
- Score: 5.69969816883978
- License:
- Abstract: Crack Segmentation in industrial concrete surfaces is a challenging task because cracks usually exhibit intricate morphology with slender appearances. Traditional segmentation methods often struggle to accurately locate such cracks, leading to inefficiencies in maintenance and repair processes. In this paper, we propose a novel diffusion-based model with a cross-conditional encoder-decoder, named CrossDiff, which is the first to introduce the diffusion probabilistic model for the crack segmentation task. Specifically, CrossDiff integrates a cross-encoder and a cross-decoder into the diffusion model to constitute a cross-shaped diffusion model structure. The cross-encoder enhances the ability to retain crack details and the cross-decoder helps extract the semantic features of cracks. As a result, CrossDiff can better handle slender cracks. Extensive experiments were conducted on five challenging crack datasets including CFD, CrackTree200, DeepCrack, GAPs384, and Rissbilder. The results demonstrate that the proposed CrossDiff model achieves impressive performance, outperforming other state-of-the-art methods by 8.0% in terms of both Dice score and IoU. The code will be open-source soon.
Related papers
- I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models [50.34197948438868]
ThinkDiff is an alignment paradigm that empowers text-to-image diffusion models with multimodal in-context understanding and reasoning capabilities.
We show that ThinkDiff significantly improves accuracy from 19.2% to 46.3% on the challenging CoBSAT benchmark for multimodal in-context reasoning generation.
We also demonstrate exceptional performance in composing multiple images and texts into logically coherent images.
arXiv Detail & Related papers (2025-02-12T05:30:08Z) - CrackSegDiff: Diffusion Probability Model-based Multi-modal Crack Segmentation [5.534972596061796]
We propose a novel DPM-based approach for crack segmentation, named CrackSegDiff.
Our approach employs Vm-unet to efficiently capture long-range information of the original data.
CrackSegDiff outperforms state-of-the-art methods, particularly in the detection of shallow cracks.
arXiv Detail & Related papers (2024-10-10T16:44:10Z) - Hybrid-Segmentor: A Hybrid Approach to Automated Fine-Grained Crack Segmentation in Civil Infrastructure [52.2025114590481]
We introduce Hybrid-Segmentor, an encoder-decoder based approach that is capable of extracting both fine-grained local and global crack features.
This allows the model to improve its generalization capabilities in distinguish various type of shapes, surfaces and sizes of cracks.
The proposed model outperforms existing benchmark models across 5 quantitative metrics (accuracy 0.971, precision 0.804, recall 0.744, F1-score 0.770, and IoU score 0.630), achieving state-of-the-art status.
arXiv Detail & Related papers (2024-09-04T16:47:16Z) - DiffCut: Catalyzing Zero-Shot Semantic Segmentation with Diffusion Features and Recursive Normalized Cut [62.63481844384229]
Foundation models have emerged as powerful tools across various domains including language, vision, and multimodal tasks.
In this paper, we use a diffusion UNet encoder as a foundation vision encoder and introduce DiffCut, an unsupervised zero-shot segmentation method.
Our work highlights the remarkably accurate semantic knowledge embedded within diffusion UNet encoders that could then serve as foundation vision encoders for downstream tasks.
arXiv Detail & Related papers (2024-06-05T01:32:31Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - CrackCLF: Automatic Pavement Crack Detection based on Closed-Loop
Feedback [14.986335013488643]
CrackCLF is a neural network model that learns to correct errors on its own.
The proposed CLF can be defined as a plug and play module, which can be embedded into different neural network models to improve their performances.
arXiv Detail & Related papers (2023-11-20T14:52:48Z) - Real-time High-Resolution Neural Network with Semantic Guidance for
Crack Segmentation [4.651261550392625]
This paper describes HrSegNet, a high-resolution network with semantic guidance specifically designed for crack segmentation.
HrSegNet guarantees real-time inference speed while preserving crack details.
This approach demonstrates that there is a trade-off between high-resolution modeling and real-time detection.
arXiv Detail & Related papers (2023-07-01T08:38:18Z) - A Convolutional-Transformer Network for Crack Segmentation with Boundary
Awareness [5.98717173705421]
Cracks play a crucial role in assessing the safety and durability of manufactured buildings.
We propose a novel convolutional-transformer network based on encoder-decoder architecture to solve this challenge.
arXiv Detail & Related papers (2023-02-23T01:27:57Z) - Revisiting Code Search in a Two-Stage Paradigm [67.02322603435628]
TOSS is a two-stage fusion code search framework.
It first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates.
It then uses fine-grained cross-encoders for finer ranking.
arXiv Detail & Related papers (2022-08-24T02:34:27Z) - Crosslink-Net: Double-branch Encoder Segmentation Network via Fusing
Vertical and Horizontal Convolutions [58.71117402626524]
We present a novel double-branch encoder architecture for medical image segmentation.
Our architecture is inspired by two observations: 1) Since the discrimination of features learned via square convolutional kernels needs to be further improved, we propose to utilize non-square vertical and horizontal convolutional kernels.
The experiments validate the effectiveness of our model on four datasets.
arXiv Detail & Related papers (2021-07-24T02:58:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.