NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization
- URL: http://arxiv.org/abs/2305.17763v1
- Date: Sun, 28 May 2023 16:18:41 GMT
- Title: NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization
- Authors: Zhixiang Min, Bingbing Zhuang, Samuel Schulter, Buyu Liu, Enrique
Dunn, Manmohan Chandraker
- Abstract summary: We present NeurOCS, a framework that uses instance masks 3D boxes as input to learn 3D object shapes by means of differentiable rendering.
Our approach rests on insights in learning a category-level shape prior directly from real driving scenes.
We make critical design choices to learn object coordinates more effectively from an object-centric view.
- Score: 80.3424839706698
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular 3D object localization in driving scenes is a crucial task, but
challenging due to its ill-posed nature. Estimating 3D coordinates for each
pixel on the object surface holds great potential as it provides dense 2D-3D
geometric constraints for the underlying PnP problem. However, high-quality
ground truth supervision is not available in driving scenes due to sparsity and
various artifacts of Lidar data, as well as the practical infeasibility of
collecting per-instance CAD models. In this work, we present NeurOCS, a
framework that uses instance masks and 3D boxes as input to learn 3D object
shapes by means of differentiable rendering, which further serves as
supervision for learning dense object coordinates. Our approach rests on
insights in learning a category-level shape prior directly from real driving
scenes, while properly handling single-view ambiguities. Furthermore, we study
and make critical design choices to learn object coordinates more effectively
from an object-centric view. Altogether, our framework leads to new
state-of-the-art in monocular 3D localization that ranks 1st on the
KITTI-Object benchmark among published monocular methods.
Related papers
- General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [53.6394928681237]
holistic understanding of urban scenes based on RGB images is a challenging yet important problem.
Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians.
Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy.
arXiv Detail & Related papers (2024-03-19T13:39:05Z) - Learning 3D Scene Priors with 2D Supervision [37.79852635415233]
We propose a new method to learn 3D scene priors of layout and shape without requiring any 3D ground truth.
Our method represents a 3D scene as a latent vector, from which we can progressively decode to a sequence of objects characterized by their class categories.
Experiments on 3D-FRONT and ScanNet show that our method outperforms state of the art in single-view reconstruction.
arXiv Detail & Related papers (2022-11-25T15:03:32Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - MonoGRNet: A General Framework for Monocular 3D Object Detection [23.59839921644492]
We propose MonoGRNet for the amodal 3D object detection from a monocular image via geometric reasoning.
MonoGRNet decomposes the monocular 3D object detection task into four sub-tasks including 2D object detection, instance-level depth estimation, projected 3D center estimation and local corner regression.
Experiments are conducted on KITTI, Cityscapes and MS COCO datasets.
arXiv Detail & Related papers (2021-04-18T10:07:52Z) - Monocular Differentiable Rendering for Self-Supervised 3D Object
Detection [21.825158925459732]
3D object detection from monocular images is an ill-posed problem due to the projective entanglement of depth and scale.
We present a novel self-supervised method for textured 3D shape reconstruction and pose estimation of rigid objects.
Our method predicts the 3D location and meshes of each object in an image using differentiable rendering and a self-supervised objective.
arXiv Detail & Related papers (2020-09-30T09:21:43Z) - Object-Aware Centroid Voting for Monocular 3D Object Detection [30.59728753059457]
We propose an end-to-end trainable monocular 3D object detector without learning the dense depth.
A novel object-aware voting approach is introduced, which considers both the region-wise appearance attention and the geometric projection distribution.
With the late fusion and the predicted 3D orientation and dimension, the 3D bounding boxes of objects can be detected from a single RGB image.
arXiv Detail & Related papers (2020-07-20T02:11:18Z) - Kinematic 3D Object Detection in Monocular Video [123.7119180923524]
We propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization.
We achieve state-of-the-art performance on monocular 3D object detection and the Bird's Eye View tasks within the KITTI self-driving dataset.
arXiv Detail & Related papers (2020-07-19T01:15:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.