UCS: A Universal Model for Curvilinear Structure Segmentation
- URL: http://arxiv.org/abs/2504.04034v1
- Date: Sat, 05 Apr 2025 03:05:04 GMT
- Title: UCS: A Universal Model for Curvilinear Structure Segmentation
- Authors: Dianshuo Li, Li Chen, Yunxiang Cao, Kai Zhu, Jun Cheng,
- Abstract summary: Curvilinear structure segmentation (CSS) is vital in various domains, including medical imaging, landscape analysis, industrial surface inspection, and plant analysis.<n>We propose the Universal Curvilinear structure (textitUCS) model, which adapts SAM to CSS tasks while enhancing its generalization.<n>textitUCS demonstrates state-of-the-art generalization and open-set segmentation performance across medical, engineering, natural, and plant imagery.
- Score: 11.10994320036562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Curvilinear structure segmentation (CSS) is vital in various domains, including medical imaging, landscape analysis, industrial surface inspection, and plant analysis. While existing methods achieve high performance within specific domains, their generalizability is limited. On the other hand, large-scale models such as Segment Anything Model (SAM) exhibit strong generalization but are not optimized for curvilinear structures. Existing adaptations of SAM primarily focus on general object segmentation and lack specialized design for CSS tasks. To bridge this gap, we propose the Universal Curvilinear structure Segmentation (\textit{UCS}) model, which adapts SAM to CSS tasks while enhancing its generalization. \textit{UCS} features a novel encoder architecture integrating a pretrained SAM encoder with two innovations: a Sparse Adapter, strategically inserted to inherit the pre-trained SAM encoder's generalization capability while minimizing the number of fine-tuning parameters, and a Prompt Generation module, which leverages Fast Fourier Transform with a high-pass filter to generate curve-specific prompts. Furthermore, the \textit{UCS} incorporates a mask decoder that eliminates reliance on manual interaction through a dual-compression module: a Hierarchical Feature Compression module, which aggregates the outputs of the sampled encoder to enhance detail preservation, and a Guidance Feature Compression module, which extracts and compresses image-driven guidance features. Evaluated on a comprehensive multi-domain dataset, including an in-house dataset covering eight natural curvilinear structures, \textit{UCS} demonstrates state-of-the-art generalization and open-set segmentation performance across medical, engineering, natural, and plant imagery, establishing a new benchmark for universal CSS.
Related papers
- MGD-SAM2: Multi-view Guided Detail-enhanced Segment Anything Model 2 for High-Resolution Class-agnostic Segmentation [6.976534642198541]
We propose MGD-SAM2, which integrates SAM2 with multi-view feature interaction between a global image and local patches to achieve precise segmentation.
We first introduce MPAdapter to adapt the SAM2 encoder for enhanced extraction of local details and global semantics in HRCS images.
Then, MCEM and HMIM are proposed to further exploit local texture and global context by aggregating multi-view features within and across multi-scales.
arXiv Detail & Related papers (2025-03-31T07:02:32Z) - Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning [81.02648336552421]
We propose a Multi-Constraint Consistency Learning approach to facilitate the staged enhancement of the encoder and decoder.<n>Self-adaptive feature masking and noise injection are designed in an instance-specific manner to perturb the features for robust learning of the decoder.<n> Experimental results on Pascal VOC2012 and Cityscapes datasets demonstrate that our proposed MCCL achieves new state-of-the-art performance.
arXiv Detail & Related papers (2025-03-23T03:21:33Z) - FlexiCrackNet: A Flexible Pipeline for Enhanced Crack Segmentation with General Features Transfered from SAM [24.99233476254989]
FlexiCrackNet is a novel pipeline that seamlessly integrates traditional deep learning paradigms with the strengths of large-scale pre-trained models.<n>Experiments show that FlexiCrackNet outperforms state-of-the-art methods, excels in zero-shot generalization, computational efficiency, and segmentation robustness.<n>These advancements underscore the potential of FlexiCrackNet for real-world applications in automated crack detection and comprehensive structural health monitoring systems.
arXiv Detail & Related papers (2025-01-31T02:37:09Z) - CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing [66.6712018832575]
Domain generalization (DG) based Face Anti-Spoofing (FAS) aims to improve the model's performance on unseen domains.
We make use of large-scale VLMs like CLIP and leverage the textual feature to dynamically adjust the classifier's weights for exploring generalizable visual features.
arXiv Detail & Related papers (2024-03-21T11:58:50Z) - Unite-Divide-Unite: Joint Boosting Trunk and Structure for High-accuracy
Dichotomous Image Segmentation [48.995367430746086]
High-accuracy Dichotomous Image rendering (DIS) aims to pinpoint category-agnostic foreground objects from natural scenes.
We introduce a novel Unite-Divide-Unite Network (UDUN) that restructures and bipartitely arranges complementary features to boost the effectiveness of trunk and structure identification.
Using 1024*1024 input, our model enables real-time inference at 65.3 fps with ResNet-18.
arXiv Detail & Related papers (2023-07-26T09:04:35Z) - DepGraph: Towards Any Structural Pruning [68.40343338847664]
We study general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers.
We propose a general and fully automatic method, emphDependency Graph (DepGraph), to explicitly model the dependency between layers and comprehensively group parameters for pruning.
In this work, we extensively evaluate our method on several architectures and tasks, including ResNe(X)t, DenseNet, MobileNet and Vision transformer for images, GAT for graph, DGCNN for 3D point cloud, alongside LSTM for language, and demonstrate that, even with a
arXiv Detail & Related papers (2023-01-30T14:02:33Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - Squeezeformer: An Efficient Transformer for Automatic Speech Recognition [99.349598600887]
Conformer is the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture.
We propose the Squeezeformer model, which consistently outperforms the state-of-the-art ASR models under the same training schemes.
arXiv Detail & Related papers (2022-06-02T06:06:29Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.