MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D
Object Detection
- URL: http://arxiv.org/abs/2203.08563v1
- Date: Wed, 16 Mar 2022 11:54:10 GMT
- Title: MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D
Object Detection
- Authors: Qing Lian, Peiliang Li, Xiaozhi Chen
- Abstract summary: monocular 3D object detection lacks accurate depth recovery ability.
Deep neural network (DNN) enables monocular depth-sensing from high-level learned features.
We propose a joint semantic and geometric cost volume to model the depth error.
- Score: 10.377424252002792
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Due to the inherent ill-posed nature of 2D-3D projection, monocular 3D object
detection lacks accurate depth recovery ability. Although the deep neural
network (DNN) enables monocular depth-sensing from high-level learned features,
the pixel-level cues are usually omitted due to the deep convolution mechanism.
To benefit from both the powerful feature representation in DNN and pixel-level
geometric constraints, we reformulate the monocular object depth estimation as
a progressive refinement problem and propose a joint semantic and geometric
cost volume to model the depth error. Specifically, we first leverage neural
networks to learn the object position, dimension, and dense normalized 3D
object coordinates. Based on the object depth, the dense coordinates patch
together with the corresponding object features is reprojected to the image
space to build a cost volume in a joint semantic and geometric error manner.
The final depth is obtained by feeding the cost volume to a refinement network,
where the distribution of semantic and geometric error is regularized by direct
depth supervision. Through effectively mitigating depth error by the refinement
framework, we achieve state-of-the-art results on both the KITTI and Waymo
datasets.
Related papers
- MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors [24.753860375872215]
This paper presents a Transformer-based monocular 3D object detection method called MonoDGP.
It adopts perspective-invariant geometry errors to modify the projection formula.
Our method demonstrates state-of-the-art performance on the KITTI benchmark without extra data.
arXiv Detail & Related papers (2024-10-25T14:31:43Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.
Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - MonoCD: Monocular 3D Object Detection with Complementary Depths [9.186673054867866]
Depth estimation is an essential but challenging subtask of monocular 3D object detection.
We propose to increase the complementarity of depths with two novel designs.
Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data.
arXiv Detail & Related papers (2024-04-04T03:30:49Z) - MonoPGC: Monocular 3D Object Detection with Pixel Geometry Contexts [6.639648061168067]
We propose MonoPGC, a novel end-to-end Monocular 3D object detection framework with rich Pixel Geometry Contexts.
We introduce the pixel depth estimation as our auxiliary task and design depth cross-attention pyramid module (DCPM) to inject local and global depth geometry knowledge into visual features.
In addition, we present the depth-space-aware transformer (DSAT) to integrate 3D space position and depth-aware features efficiently.
arXiv Detail & Related papers (2023-02-21T09:21:58Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - VR3Dense: Voxel Representation Learning for 3D Object Detection and
Monocular Dense Depth Reconstruction [0.951828574518325]
We introduce a method for jointly training 3D object detection and monocular dense depth reconstruction neural networks.
It takes as inputs, a LiDAR point-cloud, and a single RGB image during inference and produces object pose predictions as well as a densely reconstructed depth map.
While our object detection is trained in a supervised manner, the depth prediction network is trained with both self-supervised and supervised loss functions.
arXiv Detail & Related papers (2021-04-13T04:25:54Z) - PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View
Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net.
Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z) - Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction.
By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation.
We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z) - GeoNet++: Iterative Geometric Neural Network with Edge-Aware Refinement
for Joint Depth and Surface Normal Estimation [204.13451624763735]
We propose a geometric neural network with edge-aware refinement (GeoNet++) to jointly predict both depth and surface normal maps from a single image.
GeoNet++ effectively predicts depth and surface normals with strong 3D consistency and sharp boundaries.
In contrast to current metrics that focus on evaluating pixel-wise error/accuracy, 3DGM measures whether the predicted depth can reconstruct high-quality 3D surface normals.
arXiv Detail & Related papers (2020-12-13T06:48:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.