OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object
Detection
- URL: http://arxiv.org/abs/2211.01142v1
- Date: Wed, 2 Nov 2022 14:19:13 GMT
- Title: OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object
Detection
- Authors: Yongzhi Su, Yan Di, Fabian Manhardt, Guangyao Zhai, Jason Rambach,
Benjamin Busam, Didier Stricker, Federico Tombari
- Abstract summary: OPA-3D is a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network.
It jointly estimates dense scene depth with depth-bounding box residuals and object bounding boxes.
It outperforms state-of-the-art methods on the main Car category.
- Score: 51.153003057515754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite monocular 3D object detection having recently made a significant leap
forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR
recovery, such two-stage methods typically suffer from overfitting and are
incapable of explicitly encapsulating the geometric relation between depth and
object bounding box. To overcome this limitation, we instead propose OPA-3D, a
single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network that
to jointly estimate dense scene depth with depth-bounding box residuals and
object bounding boxes, allowing a two-stream detection of 3D objects, leading
to significantly more robust detections. Thereby, the geometry stream denoted
as the Geometry Stream, combines visible depth and depth-bounding box residuals
to recover the object bounding box via explicit occlusion-aware optimization.
In addition, a bounding box based geometry projection scheme is employed in an
effort to enhance distance perception. The second stream, named as the Context
Stream, directly regresses 3D object location and size. This novel two-stream
representation further enables us to enforce cross-stream consistency terms
which aligns the outputs of both streams, improving the overall performance.
Extensive experiments on the public benchmark demonstrate that OPA-3D
outperforms state-of-the-art methods on the main Car category, whilst keeping a
real-time inference speed. We plan to release all codes and trained models
soon.
Related papers
- MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors [24.753860375872215]
This paper presents a Transformer-based monocular 3D object detection method called MonoDGP.
It adopts perspective-invariant geometry errors to modify the projection formula.
Our method demonstrates state-of-the-art performance on the KITTI benchmark without extra data.
arXiv Detail & Related papers (2024-10-25T14:31:43Z) - CVCP-Fusion: On Implicit Depth Estimation for 3D Bounding Box Prediction [2.0375637582248136]
Cross-View Center Point-Fusion is a state-of-the-art model to perform 3D object detection.
Our architecture utilizes aspects from previously established algorithms, Cross-View Transformers and CenterPoint.
arXiv Detail & Related papers (2024-10-15T02:55:07Z) - OriCon3D: Effective 3D Object Detection using Orientation and Confidence [0.0]
We propose an advanced methodology for the detection of 3D objects from a single image.
We use a deep convolutional neural network-based 3D object weighted orientation regression paradigm.
Our approach significantly improves the accuracy of 3D object pose determination, surpassing baseline methodologies.
arXiv Detail & Related papers (2023-04-27T19:52:47Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - Categorical Depth Distribution Network for Monocular 3D Object Detection [7.0405916639906785]
Key challenge in monocular 3D detection is accurately predicting object depth.
Many methods attempt to directly estimate depth to assist in 3D detection, but show limited performance as a result of depth inaccuracy.
We propose Categorical Depth Distribution Network (CaDDN) to project rich contextual feature information to the appropriate depth interval in 3D space.
We validate our approach on the KITTI 3D object detection benchmark, where we rank 1st among published monocular methods.
arXiv Detail & Related papers (2021-03-01T16:08:29Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Monocular 3D Object Detection with Decoupled Structured Polygon
Estimation and Height-Guided Depth Estimation [41.29145717658494]
This paper proposes a novel unified framework which decomposes the detection problem into a structured polygon prediction task and a depth recovery task.
Compared to the widely-used 3D bounding box proposals, it is shown to be a better representation for 3D detection.
Experiments are conducted on the challenging KITTI benchmark, in which our method achieves state-of-the-art detection accuracy.
arXiv Detail & Related papers (2020-02-05T03:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.