Few-shot 3D LiDAR Semantic Segmentation for Autonomous Driving
- URL: http://arxiv.org/abs/2302.08785v1
- Date: Fri, 17 Feb 2023 09:52:36 GMT
- Title: Few-shot 3D LiDAR Semantic Segmentation for Autonomous Driving
- Authors: Jilin Mei, Junbao Zhou and Yu Hu
- Abstract summary: We propose a few-shot 3D LiDAR semantic segmentation method that predicts both novel classes and base classes simultaneously.
Our method tries to solve the background ambiguity problem in generalized few-shot semantic segmentation.
- Score: 3.0033590064167317
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In autonomous driving, the novel objects and lack of annotations challenge
the traditional 3D LiDAR semantic segmentation based on deep learning. Few-shot
learning is a feasible way to solve these issues. However, currently few-shot
semantic segmentation methods focus on camera data, and most of them only
predict the novel classes without considering the base classes. This setting
cannot be directly applied to autonomous driving due to safety concerns. Thus,
we propose a few-shot 3D LiDAR semantic segmentation method that predicts both
novel classes and base classes simultaneously. Our method tries to solve the
background ambiguity problem in generalized few-shot semantic segmentation. We
first review the original cross-entropy and knowledge distillation losses, then
propose a new loss function that incorporates the background information to
achieve 3D LiDAR few-shot semantic segmentation. Extensive experiments on
SemanticKITTI demonstrate the effectiveness of our method.
Related papers
- TeFF: Tracking-enhanced Forgetting-free Few-shot 3D LiDAR Semantic Segmentation [10.628870775939161]
This paper addresses the limitations of current few-shot semantic segmentation by exploiting the temporal continuity of LiDAR data.
We employ a tracking model to generate pseudo-ground-truths from a sequence of LiDAR frames, enhancing the dataset's ability to learn on novel classes.
We incorporate LoRA, a technique that reduces the number of trainable parameters, thereby preserving the model's performance on base classes while improving its adaptability to novel classes.
arXiv Detail & Related papers (2024-08-28T09:18:36Z) - Label-Efficient 3D Brain Segmentation via Complementary 2D Diffusion Models with Orthogonal Views [10.944692719150071]
We propose a novel 3D brain segmentation approach using complementary 2D diffusion models.
Our goal is to achieve reliable segmentation quality without requiring complete labels for each individual subject.
arXiv Detail & Related papers (2024-07-17T06:14:53Z) - Exploring the Untouched Sweeps for Conflict-Aware 3D Segmentation Pretraining [41.145598142457686]
LiDAR-camera 3D representation pretraining has shown significant promise for 3D perception tasks and related applications.
We propose a novel Vision-Foundation-Model-driven sample exploring module to meticulously select LiDAR-Image pairs from unexplored frames.
Our method consistently outperforms existing state-of-the-art pretraining frameworks across three major public autonomous driving datasets.
arXiv Detail & Related papers (2024-07-10T08:46:29Z) - Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud
Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding.
The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data.
We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z) - 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation [40.49322398635262]
We propose the first method to tackle 3D open-vocabulary panoptic segmentation.
Our model takes advantage of the fusion between learnable LiDAR features and dense frozen vision CLIP features.
We propose two novel loss functions: object-level distillation loss and voxel-level distillation loss.
arXiv Detail & Related papers (2024-01-04T18:39:32Z) - S4C: Self-Supervised Semantic Scene Completion with Neural Fields [54.35865716337547]
3D semantic scene understanding is a fundamental challenge in computer vision.
Current methods for SSC are generally trained on 3D ground truth based on aggregated LiDAR scans.
Our work presents the first self-supervised approach to SSC called S4C that does not rely on 3D ground truth data.
arXiv Detail & Related papers (2023-10-11T14:19:05Z) - Generalized Few-Shot 3D Object Detection of LiDAR Point Cloud for
Autonomous Driving [91.39625612027386]
We propose a novel task, called generalized few-shot 3D object detection, where we have a large amount of training data for common (base) objects, but only a few data for rare (novel) classes.
Specifically, we analyze in-depth differences between images and point clouds, and then present a practical principle for the few-shot setting in the 3D LiDAR dataset.
To solve this task, we propose an incremental fine-tuning method to extend existing 3D detection models to recognize both common and rare objects.
arXiv Detail & Related papers (2023-02-08T07:11:36Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Unsupervised Multi-View Object Segmentation Using Radiance Field
Propagation [55.9577535403381]
We present a novel approach to segmenting objects in 3D during reconstruction given only unlabeled multi-view images of a scene.
The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss.
To the best of our knowledge, RFP is the first unsupervised approach for tackling 3D scene object segmentation for neural radiance field (NeRF)
arXiv Detail & Related papers (2022-10-02T11:14:23Z) - 3D Registration for Self-Occluded Objects in Context [66.41922513553367]
We introduce the first deep learning framework capable of effectively handling this scenario.
Our method consists of an instance segmentation module followed by a pose estimation one.
It allows us to perform 3D registration in a one-shot manner, without requiring an expensive iterative procedure.
arXiv Detail & Related papers (2020-11-23T08:05:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.