Unite-Divide-Unite: Joint Boosting Trunk and Structure for High-accuracy
Dichotomous Image Segmentation
- URL: http://arxiv.org/abs/2307.14052v1
- Date: Wed, 26 Jul 2023 09:04:35 GMT
- Title: Unite-Divide-Unite: Joint Boosting Trunk and Structure for High-accuracy
Dichotomous Image Segmentation
- Authors: Jialun Pei, Zhangjun Zhou, Yueming Jin, He Tang, Pheng-Ann Heng
- Abstract summary: High-accuracy Dichotomous Image rendering (DIS) aims to pinpoint category-agnostic foreground objects from natural scenes.
We introduce a novel Unite-Divide-Unite Network (UDUN) that restructures and bipartitely arranges complementary features to boost the effectiveness of trunk and structure identification.
Using 1024*1024 input, our model enables real-time inference at 65.3 fps with ResNet-18.
- Score: 48.995367430746086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-accuracy Dichotomous Image Segmentation (DIS) aims to pinpoint
category-agnostic foreground objects from natural scenes. The main challenge
for DIS involves identifying the highly accurate dominant area while rendering
detailed object structure. However, directly using a general encoder-decoder
architecture may result in an oversupply of high-level features and neglect the
shallow spatial information necessary for partitioning meticulous structures.
To fill this gap, we introduce a novel Unite-Divide-Unite Network (UDUN} that
restructures and bipartitely arranges complementary features to simultaneously
boost the effectiveness of trunk and structure identification. The proposed
UDUN proceeds from several strengths. First, a dual-size input feeds into the
shared backbone to produce more holistic and detailed features while keeping
the model lightweight. Second, a simple Divide-and-Conquer Module (DCM) is
proposed to decouple multiscale low- and high-level features into our structure
decoder and trunk decoder to obtain structure and trunk information
respectively. Moreover, we design a Trunk-Structure Aggregation module (TSA) in
our union decoder that performs cascade integration for uniform high-accuracy
segmentation. As a result, UDUN performs favorably against state-of-the-art
competitors in all six evaluation metrics on overall DIS-TE, i.e., achieving
0.772 weighted F-measure and 977 HCE. Using 1024*1024 input, our model enables
real-time inference at 65.3 fps with ResNet-18.
Related papers
- High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models.
By leveraging the robust generalization capabilities and rich, versatile image representation prior to the SD models, we significantly reduce the inference time while preserving high-fidelity, detailed generation.
Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z) - FIF-UNet: An Efficient UNet Using Feature Interaction and Fusion for Medical Image Segmentation [5.510679875888542]
A novel U-shaped model, called FIF-UNet, is proposed to address the above issue, including three plug-and-play modules.
Experiments on the Synapse and ACDC datasets demonstrate that the proposed FIF-UNet outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-09-09T04:34:47Z) - P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation [8.46409964236009]
Diffusion models and multi-scale features are essential components in semantic segmentation tasks.
We propose a new model for semantic segmentation known as the diffusion model with parallel multi-scale branches.
Our model demonstrates superior performance based on the J1 metric on both the UAVid and Vaihingen Building datasets.
arXiv Detail & Related papers (2024-05-30T19:40:08Z) - CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection [1.837431956557716]
Feature pyramids have been widely adopted in convolutional neural networks (CNNs) and transformers for tasks like medical image segmentation and object detection.
We propose a novel decoder block that integrates feature pyramids and transformers.
Our model achieves superior performance in detecting small objects compared to existing methods.
arXiv Detail & Related papers (2024-04-23T18:46:07Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - BRAU-Net++: U-Shaped Hybrid CNN-Transformer Network for Medical Image Segmentation [11.986549780782724]
We propose a hybrid yet effective CNN-Transformer network, named BRAU-Net++, for an accurate medical image segmentation task.
Specifically, BRAU-Net++ uses bi-level routing attention as the core building block to design our u-shaped encoder-decoder structure.
Our proposed approach surpasses other state-of-the-art methods including its baseline: BRAU-Net.
arXiv Detail & Related papers (2024-01-01T10:49:09Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic
Segmentation [23.25851281719734]
We propose a Fast Bilateral Symmetrical Network (FBSNet) for real-time semantic segmentation.
FBSNet employs a symmetrical-decoder structure with two branches, semantic information branch, and spatial detail branch.
The experimental results of Cityscapes and CamVid show that the proposed FBSNet can strike a good balance between accuracy and efficiency.
arXiv Detail & Related papers (2021-09-02T04:16:39Z) - Combining Progressive Rethinking and Collaborative Learning: A Deep
Framework for In-Loop Filtering [67.22506488158707]
We design a deep network with both progressive rethinking and collaborative learning mechanisms to improve quality of the reconstructed intra-frames and inter-frames.
Our PRN with intra-frame side information provides 9.0% BD-rate reduction on average compared to HEVC baseline under All-intra (AI) configuration.
arXiv Detail & Related papers (2020-01-16T05:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.