VIBUS: Data-efficient 3D Scene Parsing with VIewpoint Bottleneck and
Uncertainty-Spectrum Modeling
- URL: http://arxiv.org/abs/2210.11472v1
- Date: Thu, 20 Oct 2022 17:59:57 GMT
- Title: VIBUS: Data-efficient 3D Scene Parsing with VIewpoint Bottleneck and
Uncertainty-Spectrum Modeling
- Authors: Beiwen Tian, Liyi Luo, Hao Zhao, Guyue Zhou
- Abstract summary: Training 3D scene parsing models with sparse supervision is an intriguing alternative.
We term this task as data-efficient 3D scene parsing.
We propose an effective two-stage framework named VIBUS to resolve it.
- Score: 2.0624279915507047
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recently, 3D scenes parsing with deep learning approaches has been a heating
topic. However, current methods with fully-supervised models require manually
annotated point-wise supervision which is extremely user-unfriendly and
time-consuming to obtain. As such, training 3D scene parsing models with sparse
supervision is an intriguing alternative. We term this task as data-efficient
3D scene parsing and propose an effective two-stage framework named VIBUS to
resolve it by exploiting the enormous unlabeled points. In the first stage, we
perform self-supervised representation learning on unlabeled points with the
proposed Viewpoint Bottleneck loss function. The loss function is derived from
an information bottleneck objective imposed on scenes under different
viewpoints, making the process of representation learning free of degradation
and sampling. In the second stage, pseudo labels are harvested from the sparse
labels based on uncertainty-spectrum modeling. By combining data-driven
uncertainty measures and 3D mesh spectrum measures (derived from normal
directions and geodesic distances), a robust local affinity metric is obtained.
Finite gamma/beta mixture models are used to decompose category-wise
distributions of these measures, leading to automatic selection of thresholds.
We evaluate VIBUS on the public benchmark ScanNet and achieve state-of-the-art
results on both validation set and online test server. Ablation studies show
that both Viewpoint Bottleneck and uncertainty-spectrum modeling bring
significant improvements. Codes and models are publicly available at
https://github.com/AIR-DISCOVER/VIBUS.
Related papers
- Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object
Detection [55.210991151015534]
We present a novel Dual-Perspective Knowledge Enrichment approach named DPKE for semi-supervised 3D object detection.
Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective.
arXiv Detail & Related papers (2024-01-10T08:56:07Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Augment and Criticize: Exploring Informative Samples for Semi-Supervised
Monocular 3D Object Detection [64.65563422852568]
We improve the challenging monocular 3D object detection problem with a general semi-supervised framework.
We introduce a novel, simple, yet effective Augment and Criticize' framework that explores abundant informative samples from unlabeled data.
The two new detectors, dubbed 3DSeMo_DLE and 3DSeMo_FLEX, achieve state-of-the-art results with remarkable improvements for over 3.5% AP_3D/BEV (Easy) on KITTI.
arXiv Detail & Related papers (2023-03-20T16:28:15Z) - GLENet: Boosting 3D Object Detectors with Generative Label Uncertainty Estimation [70.75100533512021]
In this paper, we formulate the label uncertainty problem as the diversity of potentially plausible bounding boxes of objects.
We propose GLENet, a generative framework adapted from conditional variational autoencoders, to model the one-to-many relationship between a typical 3D object and its potential ground-truth bounding boxes with latent variables.
The label uncertainty generated by GLENet is a plug-and-play module and can be conveniently integrated into existing deep 3D detectors.
arXiv Detail & Related papers (2022-07-06T06:26:17Z) - 3D Object Detection with a Self-supervised Lidar Scene Flow Backbone [10.341296683155973]
We propose using a self-supervised training strategy to learn a general point cloud backbone model for downstream 3D vision tasks.
Our main contribution leverages learned flow and motion representations and combines a self-supervised backbone with a 3D detection head.
Experiments on KITTI and nuScenes benchmarks show that the proposed self-supervised pre-training increases 3D detection performance significantly.
arXiv Detail & Related papers (2022-05-02T07:53:29Z) - Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck [3.2790748006553643]
Given that point-wise semantic annotation is expensive, in this paper, we address the challenge of learning models with extremely sparse labels.
We propose a self-supervised 3D representation learning framework named viewpoint bottleneck.
arXiv Detail & Related papers (2021-09-17T13:54:20Z) - Semi-supervised 3D Object Detection via Adaptive Pseudo-Labeling [18.209409027211404]
3D object detection is an important task in computer vision.
Most existing methods require a large number of high-quality 3D annotations, which are expensive to collect.
We propose a novel semi-supervised framework based on pseudo-labeling for outdoor 3D object detection tasks.
arXiv Detail & Related papers (2021-08-15T02:58:43Z) - Delving into Localization Errors for Monocular 3D Object Detection [85.77319416168362]
Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving.
In this work, we quantify the impact introduced by each sub-task and find the localization error' is the vital factor in restricting monocular 3D detection.
arXiv Detail & Related papers (2021-03-30T10:38:01Z) - SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [9.924083358178239]
We propose two variants of self-attention for contextual modeling in 3D object detection.
We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors.
Next, we propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations.
arXiv Detail & Related papers (2021-01-07T18:30:32Z) - 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object
Detection [76.42897462051067]
3DIoUMatch is a novel semi-supervised method for 3D object detection applicable to both indoor and outdoor scenes.
We leverage a teacher-student mutual learning framework to propagate information from the labeled to the unlabeled train set in the form of pseudo-labels.
Our method consistently improves state-of-the-art methods on both ScanNet and SUN-RGBD benchmarks by significant margins under all label ratios.
arXiv Detail & Related papers (2020-12-08T11:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.