Bilateral Network with Residual U-blocks and Dual-Guided Attention for
Real-time Semantic Segmentation
- URL: http://arxiv.org/abs/2310.20305v1
- Date: Tue, 31 Oct 2023 09:20:59 GMT
- Title: Bilateral Network with Residual U-blocks and Dual-Guided Attention for
Real-time Semantic Segmentation
- Authors: Liang Liao, Liang Wan, Mingsheng Liu, Shusheng Li
- Abstract summary: We design a new fusion mechanism for two-branch architecture which is guided by attention computation.
To be precise, we use the Dual-Guided Attention (DGA) module we proposed to replace some multi-scale transformations.
Experiments on Cityscapes and CamVid dataset show the effectiveness of our method.
- Score: 18.393208069320362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When some application scenarios need to use semantic segmentation technology,
like automatic driving, the primary concern comes to real-time performance
rather than extremely high segmentation accuracy. To achieve a good trade-off
between speed and accuracy, two-branch architecture has been proposed in recent
years. It treats spatial information and semantics information separately which
allows the model to be composed of two networks both not heavy. However, the
process of fusing features with two different scales becomes a performance
bottleneck for many nowaday two-branch models. In this research, we design a
new fusion mechanism for two-branch architecture which is guided by attention
computation. To be precise, we use the Dual-Guided Attention (DGA) module we
proposed to replace some multi-scale transformations with the calculation of
attention which means we only use several attention layers of near linear
complexity to achieve performance comparable to frequently-used multi-layer
fusion. To ensure that our module can be effective, we use Residual U-blocks
(RSU) to build one of the two branches in our networks which aims to obtain
better multi-scale features. Extensive experiments on Cityscapes and CamVid
dataset show the effectiveness of our method.
Related papers
- CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes [0.0]
multimodal semantic segmentation methods suffer from high computational complexity and low inference speed.
We propose the Cosine Similarity Fusion Network (CSFNet) as a real-time RGB-X semantic segmentation model.
CSFNet has competitive accuracy with state-of-the-art methods while being state-of-the-art in terms of speed.
arXiv Detail & Related papers (2024-07-01T14:34:32Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - ESDMR-Net: A Lightweight Network With Expand-Squeeze and Dual Multiscale
Residual Connections for Medical Image Segmentation [7.921517156237902]
This paper presents an expand-squeeze dual multiscale residual network ( ESDMR-Net)
It is a fully convolutional network that is well-suited for resource-constrained computing hardware such as mobile devices.
We present experiments on seven datasets from five distinct examples of applications.
arXiv Detail & Related papers (2023-12-17T02:15:49Z) - Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for
Mobile Robots [17.90723909170376]
We introduce Mobile-Seed, a lightweight framework for simultaneous semantic segmentation and boundary detection.
Our framework features a two-stream encoder, an active fusion decoder (AFD) and a dual-task regularization approach.
Experiments on the Cityscapes dataset have shown that Mobile-Seed achieves notable improvement over the state-of-the-art (SOTA) baseline.
arXiv Detail & Related papers (2023-11-21T14:53:02Z) - General-Purpose Multimodal Transformer meets Remote Sensing Semantic
Segmentation [35.100738362291416]
Multimodal AI seeks to exploit complementary data sources, particularly for complex tasks like semantic segmentation.
Recent trends in general-purpose multimodal networks have shown great potential to achieve state-of-the-art performance.
We propose a UNet-inspired module that employs 3D convolution to encode vital local information and learn cross-modal features simultaneously.
arXiv Detail & Related papers (2023-07-07T04:58:34Z) - RTFormer: Efficient Design for Real-Time Semantic Segmentation with
Transformer [63.25665813125223]
We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation.
It achieves better trade-off between performance and efficiency than CNN-based models.
Experiments on mainstream benchmarks demonstrate the effectiveness of our proposed RTFormer.
arXiv Detail & Related papers (2022-10-13T16:03:53Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.