S$^3$M-Net: Joint Learning of Semantic Segmentation and Stereo Matching
for Autonomous Driving
- URL: http://arxiv.org/abs/2401.11414v2
- Date: Mon, 29 Jan 2024 02:07:56 GMT
- Title: S$^3$M-Net: Joint Learning of Semantic Segmentation and Stereo Matching
for Autonomous Driving
- Authors: Zhiyuan Wu, Yi Feng, Chuang-Wei Liu, Fisher Yu, Qijun Chen, Rui Fan
- Abstract summary: S$3$M-Net is a novel joint learning framework developed to perform semantic segmentation and stereo matching simultaneously.
S$3$M-Net shares the features extracted from RGB images between both tasks, resulting in an improved overall scene understanding capability.
- Score: 40.305452898732774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic segmentation and stereo matching are two essential components of 3D
environmental perception systems for autonomous driving. Nevertheless,
conventional approaches often address these two problems independently,
employing separate models for each task. This approach poses practical
limitations in real-world scenarios, particularly when computational resources
are scarce or real-time performance is imperative. Hence, in this article, we
introduce S$^3$M-Net, a novel joint learning framework developed to perform
semantic segmentation and stereo matching simultaneously. Specifically,
S$^3$M-Net shares the features extracted from RGB images between both tasks,
resulting in an improved overall scene understanding capability. This feature
sharing process is realized using a feature fusion adaption (FFA) module, which
effectively transforms the shared features into semantic space and subsequently
fuses them with the encoded disparity features. The entire joint learning
framework is trained by minimizing a novel semantic consistency-guided (SCG)
loss, which places emphasis on the structural consistency in both tasks.
Extensive experimental results conducted on the vKITTI2 and KITTI datasets
demonstrate the effectiveness of our proposed joint learning framework and its
superior performance compared to other state-of-the-art single-task networks.
Our project webpage is accessible at mias.group/S3M-Net.
Related papers
- TiCoSS: Tightening the Coupling between Semantic Segmentation and Stereo Matching within A Joint Learning Framework [10.005854418001219]
TiCoSS is a state-of-the-art joint learning framework that simultaneously tackles semantic segmentation and stereo matching.
This study introduces three novelties: (1) a tightly coupled, gated feature fusion strategy, (2) a hierarchical deep supervision strategy, and (3) a coupling tightening loss function.
arXiv Detail & Related papers (2024-07-25T13:31:55Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - DenseMTL: Cross-task Attention Mechanism for Dense Multi-task Learning [18.745373058797714]
We propose a novel multi-task learning architecture that leverages pairwise cross-task exchange through correlation-guided attention and self-attention.
We conduct extensive experiments across three multi-task setups, showing the advantages of our approach compared to competitive baselines in both synthetic and real-world benchmarks.
arXiv Detail & Related papers (2022-06-17T17:59:45Z) - Global-and-Local Collaborative Learning for Co-Salient Object Detection [162.62642867056385]
The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images.
We propose a global-and-local collaborative learning architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM)
The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms eleven state-of-the-art competitors trained on some large datasets (about 8k-200k images)
arXiv Detail & Related papers (2022-04-19T14:32:41Z) - Cross-modal Consensus Network for Weakly Supervised Temporal Action
Localization [74.34699679568818]
Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to localize action instances in the given video with video-level categorical supervision.
We propose a cross-modal consensus network (CO2-Net) to tackle this problem.
arXiv Detail & Related papers (2021-07-27T04:21:01Z) - LiDAR-based Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
LiDAR-based panoptic segmentation aims to parse both objects and scenes in a unified manner.
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm.
Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods.
arXiv Detail & Related papers (2020-11-24T08:44:46Z) - Bi-Directional Attention for Joint Instance and Semantic Segmentation in
Point Clouds [9.434847591440485]
We build a Bi-Directional Attention module on backbone neural networks for 3D point cloud perception.
It uses similarity matrix measured from features for one task to help aggregate non-local information for the other task.
From comprehensive experiments and ablation studies on the S3DIS dataset and the PartNet dataset, the superiority of our method is verified.
arXiv Detail & Related papers (2020-03-11T17:16:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.