Exploiting the Complementarity of 2D and 3D Networks to Address
Domain-Shift in 3D Semantic Segmentation
- URL: http://arxiv.org/abs/2304.02991v1
- Date: Thu, 6 Apr 2023 10:59:43 GMT
- Title: Exploiting the Complementarity of 2D and 3D Networks to Address
Domain-Shift in 3D Semantic Segmentation
- Authors: Adriano Cardace, Pierluigi Zama Ramirez, Samuele Salti, Luigi Di
Stefano
- Abstract summary: 3D semantic segmentation is a critical task in many real-world applications, such as autonomous driving, robotics, and mixed reality.
A possible solution is to combine the 3D information with others coming from sensors featuring a different modality, such as RGB cameras.
Recent multi-modal 3D semantic segmentation networks exploit these modalities relying on two branches that process the 2D and 3D information independently.
- Score: 14.30113021974841
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D semantic segmentation is a critical task in many real-world applications,
such as autonomous driving, robotics, and mixed reality. However, the task is
extremely challenging due to ambiguities coming from the unstructured, sparse,
and uncolored nature of the 3D point clouds. A possible solution is to combine
the 3D information with others coming from sensors featuring a different
modality, such as RGB cameras. Recent multi-modal 3D semantic segmentation
networks exploit these modalities relying on two branches that process the 2D
and 3D information independently, striving to maintain the strength of each
modality. In this work, we first explain why this design choice is effective
and then show how it can be improved to make the multi-modal semantic
segmentation more robust to domain shift. Our surprisingly simple contribution
achieves state-of-the-art performances on four popular multi-modal unsupervised
domain adaptation benchmarks, as well as better results in a domain
generalization scenario.
Related papers
- ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations.
The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images.
Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z) - Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic
Segmentation [17.557697146752652]
2D & 3D semantic segmentation has become mainstream in 3D scene understanding.
It still remains elusive how to fuse and process the cross-dimensional features from these two distinct spaces.
In this paper, we argue that despite its simplicity, projecting unidirectionally multi-view 2D deep semantic features into the 3D space aligned with 3D deep semantic features could lead to better feature fusion.
arXiv Detail & Related papers (2022-12-13T15:58:25Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - Multi-initialization Optimization Network for Accurate 3D Human Pose and
Shape Estimation [75.44912541912252]
We propose a three-stage framework named Multi-Initialization Optimization Network (MION)
In the first stage, we strategically select different coarse 3D reconstruction candidates which are compatible with the 2D keypoints of input sample.
In the second stage, we design a mesh refinement transformer (MRT) to respectively refine each coarse reconstruction result via a self-attention mechanism.
Finally, a Consistency Estimation Network (CEN) is proposed to find the best result from mutiple candidates by evaluating if the visual evidence in RGB image matches a given 3D reconstruction.
arXiv Detail & Related papers (2021-12-24T02:43:58Z) - Data Augmented 3D Semantic Scene Completion with 2D Segmentation Priors [1.0973642726108543]
We present SPAwN, a novel lightweight multimodal 3D deep CNN.
A crucial difficulty in this field is the lack of fully labeled real-world 3D datasets.
We introduce the use of a 3D data augmentation strategy that can be applied to multimodal SSC networks.
arXiv Detail & Related papers (2021-11-26T04:08:34Z) - Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal
Learning in Domain Adaptation for 3D Semantic Segmentation [46.110739803985076]
We propose Dynamic sparse-to-dense Cross Modal Learning (DsCML) to increase the sufficiency of multi-modality information interaction for domain adaptation.
For inter-domain cross modal learning, we further advance Cross Modal Adversarial Learning (CMAL) on 2D and 3D data.
We evaluate our model under various multi-modality domain adaptation settings including day-to-night, country-to-country and dataset-to-dataset.
arXiv Detail & Related papers (2021-07-30T15:55:55Z) - Multi-Modality Task Cascade for 3D Object Detection [22.131228757850373]
Many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data.
We propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions.
We show that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D task performance.
arXiv Detail & Related papers (2021-07-08T17:55:01Z) - Virtual Multi-view Fusion for 3D Semantic Segmentation [11.259694096475766]
We show that our virtual views enable more effective training of 2D semantic segmentation networks than previous multiview approaches.
When the 2D per pixel predictions are aggregated on 3D surfaces, our virtual multiview fusion method is able to achieve significantly better 3D semantic segmentation results.
arXiv Detail & Related papers (2020-07-26T14:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.