SCTNet: Single-Branch CNN with Transformer Semantic Information for
Real-Time Segmentation
- URL: http://arxiv.org/abs/2312.17071v2
- Date: Mon, 15 Jan 2024 16:43:32 GMT
- Title: SCTNet: Single-Branch CNN with Transformer Semantic Information for
Real-Time Segmentation
- Authors: Zhengze Xu, Dongyue Wu, Changqian Yu, Xiangxiang Chu, Nong Sang,
Changxin Gao
- Abstract summary: SCTNet is a single branch CNN with transformer semantic information for real-time segmentation.
SCTNet enjoys the rich semantic representations of an inference-free semantic branch while retaining the high efficiency of lightweight single branch CNN.
We conduct extensive experiments on Cityscapes, ADE20K, and COCO-Stuff-10K, and the results show that our method achieves the new state-of-the-art performance.
- Score: 46.068509764538085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent real-time semantic segmentation methods usually adopt an additional
semantic branch to pursue rich long-range context. However, the additional
branch incurs undesirable computational overhead and slows inference speed. To
eliminate this dilemma, we propose SCTNet, a single branch CNN with transformer
semantic information for real-time segmentation. SCTNet enjoys the rich
semantic representations of an inference-free semantic branch while retaining
the high efficiency of lightweight single branch CNN. SCTNet utilizes a
transformer as the training-only semantic branch considering its superb ability
to extract long-range context. With the help of the proposed transformer-like
CNN block CFBlock and the semantic information alignment module, SCTNet could
capture the rich semantic information from the transformer branch in training.
During the inference, only the single branch CNN needs to be deployed. We
conduct extensive experiments on Cityscapes, ADE20K, and COCO-Stuff-10K, and
the results show that our method achieves the new state-of-the-art performance.
The code and model is available at https://github.com/xzz777/SCTNet
Related papers
- OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation [70.17681136234202]
We reexamine the design distinctions and test the limits of what a sparse CNN can achieve.
We propose two key components, i.e., adaptive receptive fields (spatially) and adaptive relation, to bridge the gap.
This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module.
arXiv Detail & Related papers (2024-03-21T14:06:38Z) - ScribFormer: Transformer Makes CNN Work Better for Scribble-based
Medical Image Segmentation [43.24187067938417]
This paper proposes a new CNN-Transformer hybrid solution for scribble-supervised medical image segmentation called ScribFormer.
The proposed ScribFormer model has a triple-branch structure, i.e., the hybrid of a CNN branch, a Transformer branch, and an attention-guided class activation map (ACAM) branch.
arXiv Detail & Related papers (2024-02-03T04:55:22Z) - Dynamic Semantic Compression for CNN Inference in Multi-access Edge
Computing: A Graph Reinforcement Learning-based Autoencoder [82.8833476520429]
We propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN) for effective semantic extraction and compression in partial offloading.
In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features.
In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy.
arXiv Detail & Related papers (2024-01-19T15:19:47Z) - Lightweight Real-time Semantic Segmentation Network with Efficient
Transformer and CNN [34.020978009518245]
We propose a lightweight real-time semantic segmentation network called LETNet.
LETNet combines a U-shaped CNN with Transformer effectively in a capsule embedding style to compensate for respective deficiencies.
Experiments performed on challenging datasets demonstrate that LETNet achieves superior performances in accuracy and efficiency balance.
arXiv Detail & Related papers (2023-02-21T07:16:53Z) - RTFormer: Efficient Design for Real-Time Semantic Segmentation with
Transformer [63.25665813125223]
We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation.
It achieves better trade-off between performance and efficiency than CNN-based models.
Experiments on mainstream benchmarks demonstrate the effectiveness of our proposed RTFormer.
arXiv Detail & Related papers (2022-10-13T16:03:53Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - Contextual Attention Network: Transformer Meets U-Net [0.0]
convolutional neural networks (CNN) have become the de facto standard and attained immense success in medical image segmentation.
However, CNN based methods fail to build long-range dependencies and global context connections.
Recent articles have exploited Transformer variants for medical image segmentation tasks.
arXiv Detail & Related papers (2022-03-02T21:10:24Z) - SegTransVAE: Hybrid CNN -- Transformer with Regularization for medical
image segmentation [0.0]
A novel network named SegTransVAE is proposed in this paper.
SegTransVAE is built upon encoder-decoder architecture, exploiting transformer with the variational autoencoder (VAE) branch to the network.
Evaluation on various recently introduced datasets shows that SegTransVAE outperforms previous methods in Dice Score and $95%$-Haudorff Distance.
arXiv Detail & Related papers (2022-01-21T08:02:55Z) - Semi-Supervised Medical Image Segmentation via Cross Teaching between
CNN and Transformer [11.381487613753004]
We present a framework for semi-supervised medical image segmentation by introducing the cross teaching between CNN and Transformer.
Notably, this work may be the first attempt to combine CNN and transformer for semi-supervised medical image segmentation and achieve promising results on a public benchmark.
arXiv Detail & Related papers (2021-12-09T13:22:38Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.