Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for
Mobile Robots
- URL: http://arxiv.org/abs/2311.12651v3
- Date: Mon, 11 Mar 2024 04:12:45 GMT
- Title: Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for
Mobile Robots
- Authors: Youqi Liao, Shuhao Kang, Jianping Li, Yang Liu, Yun Liu, Zhen Dong,
Bisheng Yang, Xieyuanli Chen
- Abstract summary: We introduce Mobile-Seed, a lightweight framework for simultaneous semantic segmentation and boundary detection.
Our framework features a two-stream encoder, an active fusion decoder (AFD) and a dual-task regularization approach.
Experiments on the Cityscapes dataset have shown that Mobile-Seed achieves notable improvement over the state-of-the-art (SOTA) baseline.
- Score: 17.90723909170376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Precise and rapid delineation of sharp boundaries and robust semantics is
essential for numerous downstream robotic tasks, such as robot grasping and
manipulation, real-time semantic mapping, and online sensor calibration
performed on edge computing units. Although boundary detection and semantic
segmentation are complementary tasks, most studies focus on lightweight models
for semantic segmentation but overlook the critical role of boundary detection.
In this work, we introduce Mobile-Seed, a lightweight, dual-task framework
tailored for simultaneous semantic segmentation and boundary detection. Our
framework features a two-stream encoder, an active fusion decoder (AFD) and a
dual-task regularization approach. The encoder is divided into two pathways:
one captures category-aware semantic information, while the other discerns
boundaries from multi-scale features. The AFD module dynamically adapts the
fusion of semantic and boundary information by learning channel-wise
relationships, allowing for precise weight assignment of each channel.
Furthermore, we introduce a regularization loss to mitigate the conflicts in
dual-task learning and deep diversity supervision. Compared to existing
methods, the proposed Mobile-Seed offers a lightweight framework to
simultaneously improve semantic segmentation performance and accurately locate
object boundaries. Experiments on the Cityscapes dataset have shown that
Mobile-Seed achieves notable improvement over the state-of-the-art (SOTA)
baseline by 2.2 percentage points (pp) in mIoU and 4.2 pp in mF-score, while
maintaining an online inference speed of 23.9 frames-per-second (FPS) with
1024x2048 resolution input on an RTX 2080 Ti GPU. Additional experiments on
CamVid and PASCAL Context datasets confirm our method's generalizability. Code
and additional results are publicly available at
https://whu-usi3dv.github.io/Mobile-Seed/.
Related papers
- MacFormer: Semantic Segmentation with Fine Object Boundaries [38.430631361558426]
We introduce a new semantic segmentation architecture, MacFormer'', which features two key components.
Firstly, using learnable agent tokens, a Mutual Agent Cross-Attention (MACA) mechanism effectively facilitates the bidirectional integration of features across encoder and decoder layers.
Secondly, a Frequency Enhancement Module (FEM) in the decoder leverages high-frequency and low-frequency components to boost features in the frequency domain.
MacFormer is demonstrated to be compatible with various network architectures and outperforms existing methods in both accuracy and efficiency on datasets benchmark ADE20K and Cityscapes.
arXiv Detail & Related papers (2024-08-11T05:36:10Z) - Bilateral Network with Residual U-blocks and Dual-Guided Attention for
Real-time Semantic Segmentation [18.393208069320362]
We design a new fusion mechanism for two-branch architecture which is guided by attention computation.
To be precise, we use the Dual-Guided Attention (DGA) module we proposed to replace some multi-scale transformations.
Experiments on Cityscapes and CamVid dataset show the effectiveness of our method.
arXiv Detail & Related papers (2023-10-31T09:20:59Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Lightweight Salient Object Detection in Optical Remote-Sensing Images
via Semantic Matching and Edge Alignment [61.45639694373033]
We propose a novel lightweight network for optical remote sensing images (ORSI-SOD) based on semantic matching and edge alignment, termed SeaNet.
Specifically, SeaNet includes a lightweight MobileNet-V2 for feature extraction, a dynamic semantic matching module (DSMM) for high-level features, and a portable decoder for inference.
arXiv Detail & Related papers (2023-01-07T04:33:51Z) - PSNet: Parallel Symmetric Network for Video Salient Object Detection [85.94443548452729]
We propose a VSOD network with up and down parallel symmetry, named PSNet.
Two parallel branches with different dominant modalities are set to achieve complete video saliency decoding.
arXiv Detail & Related papers (2022-10-12T04:11:48Z) - DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection [34.42038300372715]
We propose DPTNet, a simple yet effective architecture to model the global and local information for the scene text detection task.
We propose a parallel design that integrates the convolutional network with a powerful self-attention mechanism to provide complementary clues between the attention path and convolutional path.
Our DPTNet achieves state-of-the-art results on the MSRA-TD500 dataset, and provides competitive results on other standard benchmarks in terms of both detection accuracy and speed.
arXiv Detail & Related papers (2022-08-21T12:58:45Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame.
Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency.
We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector.
We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z) - Joint Semantic Segmentation and Boundary Detection using Iterative
Pyramid Contexts [35.28037460530125]
We present a joint multi-task learning framework for semantic segmentation and boundary detection.
For semantic boundary detection, we propose the novel spatial gradient fusion to suppress nonsemantic edges.
Our experiments demonstrate superior performance over state-of-the-art works.
arXiv Detail & Related papers (2020-04-16T14:46:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.