LeAP: Consistent multi-domain 3D labeling using Foundation Models
- URL: http://arxiv.org/abs/2502.03901v1
- Date: Thu, 06 Feb 2025 09:24:47 GMT
- Title: LeAP: Consistent multi-domain 3D labeling using Foundation Models
- Authors: Simon Gebraad, Andras Palffy, Holger Caesar,
- Abstract summary: This work introduces Label Any Pointcloud (LeAP), leveraging 2D VFMs to automatically label 3D data with any set of classes in any kind of application.
We show that our method can generate high-quality 3D semantic labels across diverse fields without any manual labeling.
- Score: 0.7919810878571297
- License:
- Abstract: Availability of datasets is a strong driver for research on 3D semantic understanding, and whilst obtaining unlabeled 3D point cloud data is straightforward, manually annotating this data with semantic labels is time-consuming and costly. Recently, Vision Foundation Models (VFMs) enable open-set semantic segmentation on camera images, potentially aiding automatic labeling. However,VFMs for 3D data have been limited to adaptations of 2D models, which can introduce inconsistencies to 3D labels. This work introduces Label Any Pointcloud (LeAP), leveraging 2D VFMs to automatically label 3D data with any set of classes in any kind of application whilst ensuring label consistency. Using a Bayesian update, point labels are combined into voxels to improve spatio-temporal consistency. A novel 3D Consistency Network (3D-CN) exploits 3D information to further improve label quality. Through various experiments, we show that our method can generate high-quality 3D semantic labels across diverse fields without any manual labeling. Further, models adapted to new domains using our labels show up to a 34.2 mIoU increase in semantic segmentation tasks.
Related papers
- 3DSES: an indoor Lidar point cloud segmentation dataset with real and pseudo-labels from a 3D model [1.7249361224827533]
We present 3DSES, a new dataset of indoor dense TLS colorized point clouds covering 427 m 2.
3DSES has a unique double annotation format: semantic labels annotated at the point level alongside a full 3D CAD model of the building.
We show that our model-to-cloud alignment can produce pseudo-labels on our point clouds with a > 95% accuracy, allowing us to train deep models with significant time savings.
arXiv Detail & Related papers (2025-01-29T10:09:32Z) - Cross-Modal Self-Training: Aligning Images and Pointclouds to Learn Classification without Labels [69.55622471172941]
Large-scale vision 2D vision language models, such as CLIP can be aligned with a 3D encoder to learn generalizable (open-vocabulary) 3D vision models.
We propose an optimization framework: Cross-MoST: Cross-Modal Self-Training, to improve the label-free classification performance of a zero-shot 3D vision model.
arXiv Detail & Related papers (2024-04-15T21:30:50Z) - Weakly-Supervised 3D Scene Graph Generation via Visual-Linguistic Assisted Pseudo-labeling [9.440800948514449]
We propose a weakly-supervised 3D scene graph generation method via Visual-Linguistic Assisted Pseudo-labeling.
Our 3D-VLAP exploits the superior ability of current large-scale visual-linguistic models to align the semantics between texts and 2D images.
We design an edge self-attention based graph neural network to generate scene graphs of 3D point cloud scenes.
arXiv Detail & Related papers (2024-04-03T07:30:09Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - LABELMAKER: Automatic Semantic Label Generation from RGB-D Trajectories [59.14011485494713]
This work introduces a fully automated 2D/3D labeling framework that can generate labels for RGB-D scans at equal (or better) level of accuracy.
We demonstrate the effectiveness of our LabelMaker pipeline by generating significantly better labels for the ScanNet datasets and automatically labelling the previously unlabeled ARKitScenes dataset.
arXiv Detail & Related papers (2023-11-20T20:40:24Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes [54.49897326605168]
We present PanopticNeRF-360, a novel approach that combines coarse 3D annotations with noisy 2D semantic cues to generate high-quality panoptic labels.
Our experiments demonstrate PanopticNeRF-360's state-of-the-art performance over label transfer methods on the KITTI-360 dataset.
arXiv Detail & Related papers (2023-09-19T17:54:22Z) - Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene
Segmentation [48.677336052620895]
We present a novel 3D-to-2D label transfer method, Panoptic NeRF, which aims for obtaining per-pixel 2D semantic and instance labels.
By inferring in 3D space and rendering to 2D labels, our 2D semantic and instance labels are multi-view consistent by design.
arXiv Detail & Related papers (2022-03-29T04:16:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.