Monocular 3D Object Detection with Depth from Motion
- URL: http://arxiv.org/abs/2207.12988v1
- Date: Tue, 26 Jul 2022 15:48:46 GMT
- Title: Monocular 3D Object Detection with Depth from Motion
- Authors: Tai Wang, Jiangmiao Pang, Dahua Lin
- Abstract summary: We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
- Score: 74.29588921594853
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Perceiving 3D objects from monocular inputs is crucial for robotic systems,
given its economy compared to multi-sensor settings. It is notably difficult as
a single image can not provide any clues for predicting absolute depth values.
Motivated by binocular methods for 3D object detection, we take advantage of
the strong geometry structure provided by camera ego-motion for accurate object
depth estimation and detection. We first make a theoretical analysis on this
general two-view case and notice two challenges: 1) Cumulative errors from
multiple estimations that make the direct prediction intractable; 2) Inherent
dilemmas caused by static cameras and matching ambiguity. Accordingly, we
establish the stereo correspondence with a geometry-aware cost volume as the
alternative for depth estimation and further compensate it with monocular
understanding to address the second problem. Our framework, named Depth from
Motion (DfM), then uses the established geometry to lift 2D image features to
the 3D space and detects 3D objects thereon. We also present a pose-free DfM to
make it usable when the camera pose is unavailable. Our framework outperforms
state-of-the-art methods by a large margin on the KITTI benchmark. Detailed
quantitative and qualitative analyses also validate our theoretical
conclusions. The code will be released at
https://github.com/Tai-Wang/Depth-from-Motion.
Related papers
- MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors [24.753860375872215]
This paper presents a Transformer-based monocular 3D object detection method called MonoDGP.
It adopts perspective-invariant geometry errors to modify the projection formula.
Our method demonstrates state-of-the-art performance on the KITTI benchmark without extra data.
arXiv Detail & Related papers (2024-10-25T14:31:43Z) - Tame a Wild Camera: In-the-Wild Monocular Camera Calibration [12.55056916519563]
Previous methods for the monocular camera calibration rely on specific 3D objects or strong geometry prior.
Our method is assumption-free and calibrates the complete $4$ Degree-of-Freedom (DoF) intrinsic parameters.
We demonstrate downstream applications in image manipulation detection & restoration, uncalibrated two-view pose estimation, and 3D sensing.
arXiv Detail & Related papers (2023-06-19T14:55:26Z) - 3D Object Aided Self-Supervised Monocular Depth Estimation [5.579605877061333]
We propose a new method to address dynamic object movements through monocular 3D object detection.
Specifically, we first detect 3D objects in the images and build the per-pixel correspondence of the dynamic pixels with the detected object pose.
In this way, the depth of every pixel can be learned via a meaningful geometry model.
arXiv Detail & Related papers (2022-12-04T08:52:33Z) - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation [64.7198752089041]
Given a set of known 3D objects and an RGB or RGB-D input image, we detect and estimate the 6D pose of each object.
Our approach iteratively refines both pose and correspondence in a tightly coupled manner, allowing us to dynamically remove outliers to improve accuracy.
arXiv Detail & Related papers (2022-04-26T18:00:08Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - MonoGRNet: A General Framework for Monocular 3D Object Detection [23.59839921644492]
We propose MonoGRNet for the amodal 3D object detection from a monocular image via geometric reasoning.
MonoGRNet decomposes the monocular 3D object detection task into four sub-tasks including 2D object detection, instance-level depth estimation, projected 3D center estimation and local corner regression.
Experiments are conducted on KITTI, Cityscapes and MS COCO datasets.
arXiv Detail & Related papers (2021-04-18T10:07:52Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.