Cross-CBAM: A Lightweight network for Scene Segmentation
- URL: http://arxiv.org/abs/2306.02306v1
- Date: Sun, 4 Jun 2023 09:03:05 GMT
- Title: Cross-CBAM: A Lightweight network for Scene Segmentation
- Authors: Zhengbin Zhang, Zhenhao Xu, Xingsheng Gu, Juan Xiong
- Abstract summary: We present the Cross-CBAM network, a novel lightweight network for real-time semantic segmentation.
In experiments on the Cityscapes dataset and Camvid dataset, we achieve 73.4% mIoU with a speed of 240.9FPS and 77.2% mIoU with a speed of 88.6FPS on NVIDIA GTX 1080Ti.
- Score: 2.064612766965483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene parsing is a great challenge for real-time semantic segmentation.
Although traditional semantic segmentation networks have made remarkable
leap-forwards in semantic accuracy, the performance of inference speed is
unsatisfactory. Meanwhile, this progress is achieved with fairly large networks
and powerful computational resources. However, it is difficult to run extremely
large models on edge computing devices with limited computing power, which
poses a huge challenge to the real-time semantic segmentation tasks. In this
paper, we present the Cross-CBAM network, a novel lightweight network for
real-time semantic segmentation. Specifically, a Squeeze-and-Excitation Atrous
Spatial Pyramid Pooling Module(SE-ASPP) is proposed to get variable
field-of-view and multiscale information. And we propose a Cross Convolutional
Block Attention Module(CCBAM), in which a cross-multiply operation is employed
in the CCBAM module to make high-level semantic information guide low-level
detail information. Different from previous work, these works use attention to
focus on the desired information in the backbone. CCBAM uses cross-attention
for feature fusion in the FPN structure. Extensive experiments on the
Cityscapes dataset and Camvid dataset demonstrate the effectiveness of the
proposed Cross-CBAM model by achieving a promising trade-off between
segmentation accuracy and inference speed. On the Cityscapes test set, we
achieve 73.4% mIoU with a speed of 240.9FPS and 77.2% mIoU with a speed of
88.6FPS on NVIDIA GTX 1080Ti.
Related papers
- Deep Multi-Branch Aggregation Network for Real-Time Semantic
Segmentation in Street Scenes [32.54045305607654]
Many state-of-the-art real-time semantic segmentation methods tend to sacrifice some spatial details or contextual information for fast inference.
We propose a novel Deep Multi-branch Aggregation Network (called DMA-Net) based on the encoder-decoder structure to perform real-time semantic segmentation in street scenes.
Our proposed DMA-Net respectively obtains 77.0% and 73.6% mean Intersection over Union (mIoU) at the inference speed of 46.7 FPS and 119.8 FPS by only using a single NVIDIA GTX 1080Ti GPU.
arXiv Detail & Related papers (2022-03-08T12:07:32Z) - Stage-Aware Feature Alignment Network for Real-Time Semantic
Segmentation of Street Scenes [59.81228011432776]
We present a novel Stage-aware Feature Alignment Network (SFANet) for real-time semantic segmentation of street scenes.
By taking into account the unique role of each stage in the decoder, a novel stage-aware Feature Enhancement Block (FEB) is designed to enhance spatial details and contextual information of feature maps from the encoder.
Experimental results show that the proposed SFANet exhibits a good balance between accuracy and speed for real-time semantic segmentation of street scenes.
arXiv Detail & Related papers (2022-03-08T11:46:41Z) - Prototypical Cross-Attention Networks for Multiple Object Tracking and
Segmentation [95.74244714914052]
Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes.
We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich-temporal information online.
PCAN outperforms current video instance tracking and segmentation competition winners on Youtube-VIS and BDD100K datasets.
arXiv Detail & Related papers (2021-06-22T17:57:24Z) - Feature Reuse and Fusion for Real-time Semantic segmentation [0.0]
How to increase the speed while maintaining high resolution is a problem that has been discussed and solved.
We hope to design a light-weight network based on previous design experience and reach the level of state-of-the-art real-time semantic segmentation.
arXiv Detail & Related papers (2021-05-27T06:47:02Z) - Rethinking BiSeNet For Real-time Semantic Segmentation [6.622485130017622]
BiSeNet has been proved to be a popular two-stream network for real-time segmentation.
We propose a novel structure named Short-Term Dense Concatenate network (STDC) by removing structure redundancy.
arXiv Detail & Related papers (2021-04-27T13:49:47Z) - Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism.
We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z) - Semantic Segmentation With Multi Scale Spatial Attention For Self
Driving Cars [2.7317088388886384]
We present a novel neural network using multi scale feature fusion at various scales for accurate and efficient semantic image segmentation.
We used ResNet based feature extractor, dilated convolutional layers in downsampling part, atrous convolutional layers in the upsampling part and used concat operation to merge them.
A new attention module is proposed to encode more contextual information and enhance the receptive field of the network.
arXiv Detail & Related papers (2020-06-30T20:19:09Z) - BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time
Semantic Segmentation [118.46210049742993]
We propose an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral spatial Network (BiSeNet V2)
For a 2,048x1, input, we achieve 72.6% Mean IoU on the Cityscapes test set with a speed of 156 FPS on one NVIDIA GeForce 1080 Ti card, which is significantly faster than existing methods, yet we achieve better segmentation accuracy.
arXiv Detail & Related papers (2020-04-05T10:26:38Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z) - Efficient Video Semantic Segmentation with Labels Propagation and
Refinement [138.55845680523908]
This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach.
We propose an Efficient Video(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next.
On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.
arXiv Detail & Related papers (2019-12-26T11:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.