Related papers: seg_3D_by_PC2D: Multi-View Projection for Domain Generalization and Adaptation in 3D Semantic Segmentation

seg_3D_by_PC2D: Multi-View Projection for Domain Generalization and Adaptation in 3D Semantic Segmentation

URL: http://arxiv.org/abs/2505.15545v1
Date: Wed, 21 May 2025 14:08:42 GMT
Title: seg_3D_by_PC2D: Multi-View Projection for Domain Generalization and Adaptation in 3D Semantic Segmentation
Authors: Andrew Caunes, Thierry Chateau, Vincent Fremont,
Abstract summary: 3D semantic segmentation plays a pivotal role in autonomous driving and road infrastructure analysis.<n>We propose a novel multi-view projection framework that excels in both domain generalization (DG) and unsupervised domain adaptation (UDA)<n>We achieve state-of-the-art results in UDA and close to state-of-the-art in DG, with particularly large gains on large, static classes.
Score: 2.4549463031236396
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: 3D semantic segmentation plays a pivotal role in autonomous driving and road infrastructure analysis, yet state-of-the-art 3D models are prone to severe domain shift when deployed across different datasets. We propose a novel multi-view projection framework that excels in both domain generalization (DG) and unsupervised domain adaptation (UDA). Our approach first aligns Lidar scans into coherent 3D scenes and renders them from multiple virtual camera poses to create a large-scale synthetic 2D dataset (PC2D). We then use it to train a 2D segmentation model in-domain. During inference, the model processes hundreds of views per scene; the resulting logits are back-projected to 3D with an occlusion-aware voting scheme to generate final point-wise labels. Our framework is modular and enables extensive exploration of key design parameters, such as view generation optimization (VGO), visualization modality optimization (MODO), and 2D model choice. We evaluate on the nuScenes and SemanticKITTI datasets under both the DG and UDA settings. We achieve state-of-the-art results in UDA and close to state-of-the-art in DG, with particularly large gains on large, static classes. Our code and dataset generation tools will be publicly available at https://github.com/andrewcaunes/ia4markings

Related papers

DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation [51.43837087865105]
Vision foundation models (VFMs) trained on large-scale image datasets provide high-quality features that have significantly advanced 2D visual recognition.<n>Their potential in 3D vision remains largely untapped, despite the common availability of 2D images alongside 3D point cloud datasets.<n>We introduce DITR, a simple yet effective approach that extracts 2D foundation model features, projects them to 3D, and finally injects them into a 3D point cloud segmentation model.
arXiv Detail & Related papers (2025-03-24T17:59:11Z)
xMOD: Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D motion [4.878192303432336]
DIOD-3D is the first baseline for multi-object discovery in 3D data using 2D motion.<n>xMOD is a cross-modal training framework that integrates 2D and 3D data while always using 2D motion cues.<n>Our approach yields a substantial performance improvement compared with the 2D object discovery state-of-the-art on all datasets.
arXiv Detail & Related papers (2025-03-19T09:20:35Z)
Unifying 2D and 3D Vision-Language Understanding [85.84054120018625]
We introduce UniVLG, a unified architecture for 2D and 3D vision-language learning.<n>UniVLG bridges the gap between existing 2D-centric models and the rich 3D sensory data available in embodied systems.
arXiv Detail & Related papers (2025-03-13T17:56:22Z)
DCSEG: Decoupled 3D Open-Set Segmentation using Gaussian Splatting [0.0]
We present a decoupled 3D segmentation pipeline to ensure modularity and adaptability to novel 3D representations.<n>We evaluate on synthetic and real-world indoor datasets, demonstrating improved performance over comparable NeRF-based pipelines.
arXiv Detail & Related papers (2024-12-14T21:26:44Z)
Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation [19.2297264550686]
Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods. We introduce Zero-Shot Dual-Path Integration Framework that equally values the contributions of both 3D and 2D modalities. Our framework, utilizing pre-trained models in a zero-shot manner, is model-agnostic and demonstrates superior performance on both seen and unseen data.
arXiv Detail & Related papers (2024-08-16T07:52:00Z)
Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task. Our approach involves making initial predictions of 2D semantic masks using different large vision models. To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z)
Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic Segmentation [82.47872784972861]
Cross-modal domain adaptation has been studied on the paired 2D image and 3D LiDAR data to ease the labeling costs for 3D LiDAR semantic segmentation (3DLSS) in the target domain. This paper studies a new 3DLSS setting where a 2D dataset with semantic annotations and a paired but unannotated 2D image and 3D LiDAR data (target) are available. To achieve 3DLSS in this scenario, we propose Cross-Modal and Cross-Domain Learning (CoMoDaL)
arXiv Detail & Related papers (2023-08-05T14:00:05Z)
MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks. We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework. As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z)
Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation [46.110739803985076]
We propose Dynamic sparse-to-dense Cross Modal Learning (DsCML) to increase the sufficiency of multi-modality information interaction for domain adaptation. For inter-domain cross modal learning, we further advance Cross Modal Adversarial Learning (CMAL) on 2D and 3D data. We evaluate our model under various multi-modality domain adaptation settings including day-to-night, country-to-country and dataset-to-dataset.
arXiv Detail & Related papers (2021-07-30T15:55:55Z)
Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation [87.54570024320354]
State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space. A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space. We develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds.
arXiv Detail & Related papers (2020-08-04T13:56:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.