OVO: Open-Vocabulary Occupancy
- URL: http://arxiv.org/abs/2305.16133v2
- Date: Wed, 14 Jun 2023 17:30:54 GMT
- Title: OVO: Open-Vocabulary Occupancy
- Authors: Zhiyu Tan, Zichao Dong, Cheng Zhang, Weikun Zhang, Hang Ji, Hao Li
- Abstract summary: semantic occupancy prediction aims to infer dense geometry and semantics of surroundings for an autonomous agent to operate safely in the 3D environment.
Existing occupancy prediction methods are almost entirely trained on human-annotated volumetric data.
This paper proposes Open Vocabulary Occupancy (OVO), a novel approach that allows semantic occupancy prediction of arbitrary classes but without the need for 3D annotations during training.
- Score: 12.596828397087085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic occupancy prediction aims to infer dense geometry and semantics of
surroundings for an autonomous agent to operate safely in the 3D environment.
Existing occupancy prediction methods are almost entirely trained on
human-annotated volumetric data. Although of high quality, the generation of
such 3D annotations is laborious and costly, restricting them to a few specific
object categories in the training dataset. To address this limitation, this
paper proposes Open Vocabulary Occupancy (OVO), a novel approach that allows
semantic occupancy prediction of arbitrary classes but without the need for 3D
annotations during training. Keys to our approach are (1) knowledge
distillation from a pre-trained 2D open-vocabulary segmentation model to the 3D
occupancy network, and (2) pixel-voxel filtering for high-quality training data
generation. The resulting framework is simple, compact, and compatible with
most state-of-the-art semantic occupancy prediction models. On NYUv2 and
SemanticKITTI datasets, OVO achieves competitive performance compared to
supervised semantic occupancy prediction approaches. Furthermore, we conduct
extensive analyses and ablation studies to offer insights into the design of
the proposed framework. Our code is publicly available at
https://github.com/dzcgaara/OVO.
Related papers
- OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks [75.10231099007494]
We introduce a self-supervised pretraining method, called OccFeat, for Bird's-Eye-View (BEV) segmentation networks.
With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks.
Models pretrained with our method exhibit improved BEV semantic segmentation performance, particularly in low-data scenarios.
arXiv Detail & Related papers (2024-04-22T09:43:03Z) - WildScenes: A Benchmark for 2D and 3D Semantic Segmentation in
Large-scale Natural Environments [34.24004079703609]
We introduce WildScenes, a bi-modal benchmark dataset consisting of multiple large-scales in natural environments.
The data is trajectory-centric with accurate localization and globally aligned point clouds.
We introduce benchmarks on 2D and 3D semantic segmentation and evaluate a variety of recent deep-learning techniques.
arXiv Detail & Related papers (2023-12-23T22:27:40Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations [76.45009891152178]
Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks.
We show, for the first time, that general representations learning can be achieved through the task of occupancy prediction.
Our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
arXiv Detail & Related papers (2023-09-19T11:13:01Z) - UniOcc: Unifying Vision-Centric 3D Occupancy Prediction with Geometric
and Semantic Rendering [27.712689811093362]
We present our solution, named UniOCC, for the Vision-Centric 3D occupancy prediction track.
Our solution achieves 51.27% mIoU on the official leaderboard with single model, placing 3rd in this challenge.
arXiv Detail & Related papers (2023-06-15T13:23:57Z) - VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic
Scene Graph Prediction in Point Cloud [51.063494002003154]
3D semantic scene graph (3DSSG) prediction in the point cloud is challenging since the 3D point cloud only captures geometric structures with limited semantics compared to 2D images.
We propose Visual-Linguistic Semantics Assisted Training scheme that can significantly empower 3DSSG prediction models with discrimination about long-tailed and ambiguous semantic relations.
arXiv Detail & Related papers (2023-03-25T09:14:18Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.