Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene
Contexts
- URL: http://arxiv.org/abs/2012.09165v1
- Date: Wed, 16 Dec 2020 18:59:26 GMT
- Title: Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene
Contexts
- Authors: Ji Hou, Benjamin Graham, Matthias Nie{\ss}ner, Saining Xie
- Abstract summary: Contrastive Scene Contexts is a 3D pre-training method that makes use of both point-level correspondences and spatial contexts in a scene.
Our study reveals that exhaustive labelling of 3D point clouds might be unnecessary.
On ScanNet, even using 0.1% of point labels, we still achieve 89% (instance segmentation) and 96% (semantic segmentation) of the baseline performance that uses full annotations.
- Score: 21.201984953068614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid progress in 3D scene understanding has come with growing demand for
data; however, collecting and annotating 3D scenes (e.g. point clouds) are
notoriously hard. For example, the number of scenes (e.g. indoor rooms) that
can be accessed and scanned might be limited; even given sufficient data,
acquiring 3D labels (e.g. instance masks) requires intensive human labor. In
this paper, we explore data-efficient learning for 3D point cloud. As a first
step towards this direction, we propose Contrastive Scene Contexts, a 3D
pre-training method that makes use of both point-level correspondences and
spatial contexts in a scene. Our method achieves state-of-the-art results on a
suite of benchmarks where training data or labels are scarce. Our study reveals
that exhaustive labelling of 3D point clouds might be unnecessary; and
remarkably, on ScanNet, even using 0.1% of point labels, we still achieve 89%
(instance segmentation) and 96% (semantic segmentation) of the baseline
performance that uses full annotations.
Related papers
- SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining [100.23919762298227]
We introduce SceneSplat, the first large-scale 3D indoor scene understanding approach that operates on 3DGS.
We also propose a self-supervised learning scheme that unlocks rich 3D feature learning from unlabeled scenes.
SceneSplat-7K is the first large-scale 3DGS dataset for indoor scenes, comprising of 6868 scenes.
arXiv Detail & Related papers (2025-03-23T12:50:25Z) - Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - U3DS$^3$: Unsupervised 3D Semantic Scene Segmentation [19.706172244951116]
This paper presents U3DS$3$, as a step towards completely unsupervised point cloud segmentation for any holistic 3D scenes.
The initial step of our proposed approach involves generating superpoints based on the geometric characteristics of each scene.
We then undergo a learning process through a spatial clustering-based methodology, followed by iterative training using pseudo-labels generated in accordance with the cluster centroids.
arXiv Detail & Related papers (2023-11-10T12:05:35Z) - SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene
Reconstruction [16.643252717745348]
We present SGRec3D, a novel self-supervised pre-training method for 3D scene graph prediction.
Pre-training SGRec3D does not require object relationship labels, making it possible to exploit large-scale 3D scene understanding datasets.
Our experiments demonstrate that in contrast to recent point cloud-based pre-training approaches, our proposed pre-training improves the 3D scene graph prediction considerably.
arXiv Detail & Related papers (2023-09-27T14:45:29Z) - Lowis3D: Language-Driven Open-World Instance-Level 3D Scene
Understanding [57.47315482494805]
Open-world instance-level scene understanding aims to locate and recognize unseen object categories that are not present in the annotated dataset.
This task is challenging because the model needs to both localize novel 3D objects and infer their semantic categories.
We propose to harness pre-trained vision-language (VL) foundation models that encode extensive knowledge from image-text pairs to generate captions for 3D scenes.
arXiv Detail & Related papers (2023-08-01T07:50:14Z) - OpenScene: 3D Scene Understanding with Open Vocabularies [73.1411930820683]
Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision.
We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space.
This zero-shot approach enables task-agnostic training and open-vocabulary queries.
arXiv Detail & Related papers (2022-11-28T18:58:36Z) - Interactive Object Segmentation in 3D Point Clouds [27.88495480980352]
We present an interactive 3D object segmentation method in which the user interacts directly with the 3D point cloud.
Our model does not require training data from the target domain.
It performs well on several other datasets with different data characteristics as well as different object classes.
arXiv Detail & Related papers (2022-04-14T18:31:59Z) - Weakly Supervised Learning of Rigid 3D Scene Flow [81.37165332656612]
We propose a data-driven scene flow estimation algorithm exploiting the observation that many 3D scenes can be explained by a collection of agents moving as rigid bodies.
We showcase the effectiveness and generalization capacity of our method on four different autonomous driving datasets.
arXiv Detail & Related papers (2021-02-17T18:58:02Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z) - Multi-Path Region Mining For Weakly Supervised 3D Semantic Segmentation
on Point Clouds [67.0904905172941]
We propose a weakly supervised approach to predict point-level results using weak labels on 3D point clouds.
To the best of our knowledge, this is the first method that uses cloud-level weak labels on raw 3D space to train a point cloud semantic segmentation network.
arXiv Detail & Related papers (2020-03-29T14:13:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.