SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
- URL: http://arxiv.org/abs/2311.12754v2
- Date: Wed, 29 Nov 2023 15:19:38 GMT
- Title: SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
- Authors: Yuanhui Huang, Wenzhao Zheng, Borui Zhang, Jie Zhou, Jiwen Lu
- Abstract summary: We propose SelfOcc to explore a self-supervised way to learn 3D occupancy using only video sequences.
We first transform the images into the 3D space (e.g., bird's eye view) to obtain 3D representation of the scene.
We can then render 2D images of previous and future frames as self-supervision signals to learn the 3D representations.
- Score: 77.15924044466976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D occupancy prediction is an important task for the robustness of
vision-centric autonomous driving, which aims to predict whether each point is
occupied in the surrounding 3D space. Existing methods usually require 3D
occupancy labels to produce meaningful results. However, it is very laborious
to annotate the occupancy status of each voxel. In this paper, we propose
SelfOcc to explore a self-supervised way to learn 3D occupancy using only video
sequences. We first transform the images into the 3D space (e.g., bird's eye
view) to obtain 3D representation of the scene. We directly impose constraints
on the 3D representations by treating them as signed distance fields. We can
then render 2D images of previous and future frames as self-supervision signals
to learn the 3D representations. We propose an MVS-embedded strategy to
directly optimize the SDF-induced weights with multiple depth proposals. Our
SelfOcc outperforms the previous best method SceneRF by 58.7% using a single
frame as input on SemanticKITTI and is the first self-supervised work that
produces reasonable 3D occupancy for surround cameras on nuScenes. SelfOcc
produces high-quality depth and achieves state-of-the-art results on novel
depth synthesis, monocular depth estimation, and surround-view depth estimation
on the SemanticKITTI, KITTI-2015, and nuScenes, respectively. Code:
https://github.com/huang-yh/SelfOcc.
Related papers
- OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving [67.49461023261536]
We learn a new framework of learning a world model, OccWorld, in the 3D Occupancy space.
We simultaneously predict the movement of the ego car and the evolution of the surrounding scenes.
OccWorld produces competitive planning results without using instance and map supervision.
arXiv Detail & Related papers (2023-11-27T17:59:41Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering
Supervision [36.15913507034939]
We present RenderOcc, a novel paradigm for training 3D occupancy models only using 2D labels.
Specifically, we extract a NeRF-style 3D volume representation from multi-view images.
We employ volume rendering techniques to establish 2D renderings, thus enabling direct 3D supervision from 2D semantics and depth labels.
arXiv Detail & Related papers (2023-09-18T06:08:15Z) - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving.
We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z) - Gait Recognition in the Wild with Dense 3D Representations and A
Benchmark [86.68648536257588]
Existing studies for gait recognition are dominated by 2D representations like the silhouette or skeleton of the human body in constrained scenes.
This paper aims to explore dense 3D representations for gait recognition in the wild.
We build the first large-scale 3D representation-based gait recognition dataset, named Gait3D.
arXiv Detail & Related papers (2022-04-06T03:54:06Z) - Unsupervised Learning of Visual 3D Keypoints for Control [104.92063943162896]
Learning sensorimotor control policies from high-dimensional images crucially relies on the quality of the underlying visual representations.
We propose a framework to learn such a 3D geometric structure directly from images in an end-to-end unsupervised manner.
These discovered 3D keypoints tend to meaningfully capture robot joints as well as object movements in a consistent manner across both time and 3D space.
arXiv Detail & Related papers (2021-06-14T17:59:59Z) - Ground-aware Monocular 3D Object Detection for Autonomous Driving [6.5702792909006735]
Estimating the 3D position and orientation of objects in the environment with a single RGB camera is a challenging task for low-cost urban autonomous driving and mobile robots.
Most of the existing algorithms are based on the geometric constraints in 2D-3D correspondence, which stems from generic 6D object pose estimation.
We introduce a novel neural network module to fully utilize such application-specific priors in the framework of deep learning.
arXiv Detail & Related papers (2021-02-01T08:18:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.