Topology-aware Mamba for Crack Segmentation in Structures
- URL: http://arxiv.org/abs/2410.19894v1
- Date: Fri, 25 Oct 2024 15:17:52 GMT
- Title: Topology-aware Mamba for Crack Segmentation in Structures
- Authors: Xin Zuo, Yu Sheng, Jifeng Shen, Yongwei Shan,
- Abstract summary: CrackMamba, a Mamba-based model, is designed for efficient and accurate crack segmentation for monitoring the structural health of infrastructure.
CrackMamba addresses these challenges by utilizing the VMambaV2 with pre-trained ImageNet-1k weights as the encoder and a newly designed decoder for better performance.
Experiments show that CrackMamba achieves state-of-the-art (SOTA) performance on the CrackSeg9k and SewerCrack datasets, and demonstrates competitive performance on the retinal vessel segmentation dataset CHASEunderlineDB1.
- Score: 5.9184143707401775
- License:
- Abstract: CrackMamba, a Mamba-based model, is designed for efficient and accurate crack segmentation for monitoring the structural health of infrastructure. Traditional Convolutional Neural Network (CNN) models struggle with limited receptive fields, and while Vision Transformers (ViT) improve segmentation accuracy, they are computationally intensive. CrackMamba addresses these challenges by utilizing the VMambaV2 with pre-trained ImageNet-1k weights as the encoder and a newly designed decoder for better performance. To handle the random and complex nature of crack development, a Snake Scan module is proposed to reshape crack feature sequences, enhancing feature extraction. Additionally, the three-branch Snake Conv VSS (SCVSS) block is proposed to target cracks more effectively. Experiments show that CrackMamba achieves state-of-the-art (SOTA) performance on the CrackSeg9k and SewerCrack datasets, and demonstrates competitive performance on the retinal vessel segmentation dataset CHASE\underline{~}DB1, highlighting its generalization capability. The code is publicly available at: {https://github.com/shengyu27/CrackMamba.}
Related papers
- DSCformer: A Dual-Branch Network Integrating Enhanced Dynamic Snake Convolution and SegFormer for Crack Segmentation [6.898227391740093]
Current convolutional neural networks (CNNs) have demonstrated strong performance in crack segmentation tasks.
Transformers excel at capturing global context but lack precision in detailed feature extraction.
We introduce DSCformer, a novel hybrid model that integrates an enhanced Dynamic Snake Convolution (DSConv) with a Transformer architecture for crack segmentation.
arXiv Detail & Related papers (2024-11-14T11:25:32Z) - Hybrid-Segmentor: A Hybrid Approach to Automated Fine-Grained Crack Segmentation in Civil Infrastructure [52.2025114590481]
We introduce Hybrid-Segmentor, an encoder-decoder based approach that is capable of extracting both fine-grained local and global crack features.
This allows the model to improve its generalization capabilities in distinguish various type of shapes, surfaces and sizes of cracks.
The proposed model outperforms existing benchmark models across 5 quantitative metrics (accuracy 0.971, precision 0.804, recall 0.744, F1-score 0.770, and IoU score 0.630), achieving state-of-the-art status.
arXiv Detail & Related papers (2024-09-04T16:47:16Z) - Mamba meets crack segmentation [0.18416014644193066]
Cracks pose safety risks to infrastructure and cannot be overlooked.
CNNs exhibit a deficiency in global modeling capability, hindering the representation to entire crack features.
This study explores the representation capabilities of Mamba to crack features.
arXiv Detail & Related papers (2024-07-22T15:21:35Z) - CAMS: Convolution and Attention-Free Mamba-based Cardiac Image Segmentation [0.508267104652645]
Convolutional Neural Networks (CNNs) and Transformer-based self-attention models have become the standard for medical image segmentation.
We present a Convolution and self-attention-free Mamba-based semantic Network named CAMS-Net.
Our model outperforms the existing state-of-the-art CNN, self-attention, and Mamba-based methods on CMR and M&Ms-2 Cardiac segmentation datasets.
arXiv Detail & Related papers (2024-06-09T13:53:05Z) - MambaVC: Learned Visual Compression with Selective State Spaces [74.29217829932895]
We introduce MambaVC, a simple, strong and efficient compression network based on SSM.
MambaVC develops a visual state space (VSS) block with a 2D selective scanning (2DSS) module as the nonlinear activation function after each downsampling.
On compression benchmark datasets, MambaVC achieves superior rate-distortion performance with lower computational and memory overheads.
arXiv Detail & Related papers (2024-05-24T10:24:30Z) - ReMamber: Referring Image Segmentation with Mamba Twister [51.291487576255435]
ReMamber is a novel RIS architecture that integrates the power of Mamba with a multi-modal Mamba Twister block.
The Mamba Twister explicitly models image-text interaction, and fuses textual and visual features through its unique channel and spatial twisting mechanism.
arXiv Detail & Related papers (2024-03-26T16:27:37Z) - Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for
Scribble-based Medical Image Segmentation [13.748446415530937]
This paper introduces Weak-Mamba-UNet, an innovative weakly-supervised learning (WSL) framework for medical image segmentation.
WSL strategy incorporates three distinct architecture but same symmetrical encoder-decoder networks: a CNN-based UNet for detailed local feature extraction, a Swin Transformer-based SwinUNet for comprehensive global context understanding, and a VMamba-based Mamba-UNet for efficient long-range dependency modeling.
The effectiveness of Weak-Mamba-UNet is validated on a publicly available MRI cardiac segmentation dataset with processed annotations, where it surpasses the performance of a similar WSL
arXiv Detail & Related papers (2024-02-16T18:43:39Z) - MambaByte: Token-free Selective State Space Model [71.90159903595514]
MambaByte is a token-free adaptation of the Mamba SSM trained autoregressively on byte sequences.
We show MambaByte to be competitive with, and even to outperform, state-of-the-art subword Transformers on language modeling tasks.
arXiv Detail & Related papers (2024-01-24T18:53:53Z) - SegViTv2: Exploring Efficient and Continual Semantic Segmentation with
Plain Vision Transformers [76.13755422671822]
This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework.
We introduce a novel Attention-to-Mask (atm) module to design a lightweight decoder effective for plain ViT.
Our decoder outperforms the popular decoder UPerNet using various ViT backbones while consuming only about $5%$ of the computational cost.
arXiv Detail & Related papers (2023-06-09T22:29:56Z) - A Convolutional-Transformer Network for Crack Segmentation with Boundary
Awareness [5.98717173705421]
Cracks play a crucial role in assessing the safety and durability of manufactured buildings.
We propose a novel convolutional-transformer network based on encoder-decoder architecture to solve this challenge.
arXiv Detail & Related papers (2023-02-23T01:27:57Z) - Learning Fast and Robust Target Models for Video Object Segmentation [83.3382606349118]
Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time.
Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting.
We propose a novel VOS architecture consisting of two network components.
arXiv Detail & Related papers (2020-02-27T21:58:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.