CrackSegDiff: Diffusion Probability Model-based Multi-modal Crack Segmentation
- URL: http://arxiv.org/abs/2410.08100v2
- Date: Sat, 12 Oct 2024 05:08:59 GMT
- Title: CrackSegDiff: Diffusion Probability Model-based Multi-modal Crack Segmentation
- Authors: Xiaoyan Jiang, Licheng Jiang, Anjie Wang, Kaiying Zhu, Yongbin Gao,
- Abstract summary: We propose a novel DPM-based approach for crack segmentation, named CrackSegDiff.
Our approach employs Vm-unet to efficiently capture long-range information of the original data.
CrackSegDiff outperforms state-of-the-art methods, particularly in the detection of shallow cracks.
- Score: 5.534972596061796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Integrating grayscale and depth data in road inspection robots could enhance the accuracy, reliability, and comprehensiveness of road condition assessments, leading to improved maintenance strategies and safer infrastructure. However, these data sources are often compromised by significant background noise from the pavement. Recent advancements in Diffusion Probabilistic Models (DPM) have demonstrated remarkable success in image segmentation tasks, showcasing potent denoising capabilities, as evidenced in studies like SegDiff. Despite these advancements, current DPM-based segmentors do not fully capitalize on the potential of original image data. In this paper, we propose a novel DPM-based approach for crack segmentation, named CrackSegDiff, which uniquely fuses grayscale and range/depth images. This method enhances the reverse diffusion process by intensifying the interaction between local feature extraction via DPM and global feature extraction. Unlike traditional methods that utilize Transformers for global features, our approach employs Vm-unet to efficiently capture long-range information of the original data. The integration of features is further refined through two innovative modules: the Channel Fusion Module (CFM) and the Shallow Feature Compensation Module (SFCM). Our experimental evaluation on the three-class crack image segmentation tasks within the FIND dataset demonstrates that CrackSegDiff outperforms state-of-the-art methods, particularly excelling in the detection of shallow cracks. Code is available at https://github.com/sky-visionX/CrackSegDiff.
Related papers
- CrossDiff: Diffusion Probabilistic Model With Cross-conditional Encoder-Decoder for Crack Segmentation [5.69969816883978]
We propose a novel diffusion-based model with a cross-conditional encoder-decoder, named CrossDiff.
The proposed CrossDiff model achieves impressive performance, outperforming other state-of-the-art methods by 8.0% in terms of both Dice score and IoU.
arXiv Detail & Related papers (2025-01-22T13:13:41Z) - Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method [11.01048485795428]
We propose a new weakly supervised violence detection framework.
It consists of unimodal multiple-instance learning for extracting unimodal semantic features, multimodal alignment, multimodal fusion, and final detection.
Experimental results on benchmark datasets demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2025-01-13T17:14:25Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - Deep Common Feature Mining for Efficient Video Semantic Segmentation [25.851900402539467]
We present Deep Common Feature Mining (DCFM) for video semantic segmentation.
DCFM explicitly decomposes features into two complementary components.
We incorporate a self-supervised loss function to reinforce intra-class feature similarity and enhance temporal consistency.
arXiv Detail & Related papers (2024-03-05T06:17:59Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Towards Generalizable Deepfake Detection by Primary Region
Regularization [52.41801719896089]
This paper enhances the generalization capability from a novel regularization perspective.
Our method consists of two stages, namely the static localization for primary region maps, and the dynamic exploitation of primary region masks.
We conduct extensive experiments over three widely used deepfake datasets - DFDC, DF-1.0, and Celeb-DF with five backbones.
arXiv Detail & Related papers (2023-07-24T05:43:34Z) - Dual flow fusion model for concrete surface crack segmentation [0.0]
Cracks and other damages pose a significant threat to the safe operation of transportation infrastructure.
Deep learning models have been widely applied to practical visual segmentation tasks.
This paper proposes a crack segmentation model based on the fusion of dual streams.
arXiv Detail & Related papers (2023-05-09T02:35:58Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.