OVO: Open-Vocabulary Occupancy
- URL: http://arxiv.org/abs/2305.16133v2
- Date: Wed, 14 Jun 2023 17:30:54 GMT
- Title: OVO: Open-Vocabulary Occupancy
- Authors: Zhiyu Tan, Zichao Dong, Cheng Zhang, Weikun Zhang, Hang Ji, Hao Li
- Abstract summary: semantic occupancy prediction aims to infer dense geometry and semantics of surroundings for an autonomous agent to operate safely in the 3D environment.
Existing occupancy prediction methods are almost entirely trained on human-annotated volumetric data.
This paper proposes Open Vocabulary Occupancy (OVO), a novel approach that allows semantic occupancy prediction of arbitrary classes but without the need for 3D annotations during training.
- Score: 12.596828397087085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic occupancy prediction aims to infer dense geometry and semantics of
surroundings for an autonomous agent to operate safely in the 3D environment.
Existing occupancy prediction methods are almost entirely trained on
human-annotated volumetric data. Although of high quality, the generation of
such 3D annotations is laborious and costly, restricting them to a few specific
object categories in the training dataset. To address this limitation, this
paper proposes Open Vocabulary Occupancy (OVO), a novel approach that allows
semantic occupancy prediction of arbitrary classes but without the need for 3D
annotations during training. Keys to our approach are (1) knowledge
distillation from a pre-trained 2D open-vocabulary segmentation model to the 3D
occupancy network, and (2) pixel-voxel filtering for high-quality training data
generation. The resulting framework is simple, compact, and compatible with
most state-of-the-art semantic occupancy prediction models. On NYUv2 and
SemanticKITTI datasets, OVO achieves competitive performance compared to
supervised semantic occupancy prediction approaches. Furthermore, we conduct
extensive analyses and ablation studies to offer insights into the design of
the proposed framework. Our code is publicly available at
https://github.com/dzcgaara/OVO.
Related papers
- Language Driven Occupancy Prediction [11.208411421996052]
We introduce LOcc, an effective and generalizable framework for open-vocabulary occupancy prediction.
Our pipeline presents a feasible way to dig into the valuable semantic information of images.
LOcc effectively uses the generated language ground truth to guide the learning of 3D language volume.
arXiv Detail & Related papers (2024-11-25T03:47:10Z) - ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries.
OPUS incorporates a suite of non-trivial strategies to enhance model performance.
Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z) - Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance [11.090775523892074]
We introduce a novel semi-supervised framework to alleviate the dependency on densely annotated data.
Our approach leverages 2D foundation models to generate essential 3D scene geometric and semantic cues.
Our method achieves up to 85% of the fully-supervised performance using only 10% labeled data.
arXiv Detail & Related papers (2024-08-21T12:13:18Z) - OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks [75.10231099007494]
We introduce a self-supervised pretraining method, called OccFeat, for Bird's-Eye-View (BEV) segmentation networks.
With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks.
Models pretrained with our method exhibit improved BEV semantic segmentation performance, particularly in low-data scenarios.
arXiv Detail & Related papers (2024-04-22T09:43:03Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations [76.45009891152178]
Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks.
We show, for the first time, that general representations learning can be achieved through the task of occupancy prediction.
Our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
arXiv Detail & Related papers (2023-09-19T11:13:01Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.