OCM3D: Object-Centric Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2104.06041v1
- Date: Tue, 13 Apr 2021 09:15:40 GMT
- Title: OCM3D: Object-Centric Monocular 3D Object Detection
- Authors: Liang Peng, Fei Liu, Senbo Yan, Xiaofei He, Deng Cai
- Abstract summary: We propose a novel object-centric voxel representation tailored for monocular 3D object detection.
Specifically, voxels are built on each object proposal, and their sizes are adaptively determined by the 3D spatial distribution of the points.
Our method outperforms state-of-the-art methods by a large margin.
- Score: 35.804542148335706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-only and pseudo-LiDAR representations are commonly used for monocular
3D object detection. However, methods based on them have shortcomings of either
not well capturing the spatial relationships in neighbored image pixels or
being hard to handle the noisy nature of the monocular pseudo-LiDAR point
cloud. To overcome these issues, in this paper we propose a novel
object-centric voxel representation tailored for monocular 3D object detection.
Specifically, voxels are built on each object proposal, and their sizes are
adaptively determined by the 3D spatial distribution of the points, allowing
the noisy point cloud to be organized effectively within a voxel grid. This
representation is proved to be able to locate the object in 3D space
accurately. Furthermore, prior works would like to estimate the orientation via
deep features extracted from an entire image or a noisy point cloud. By
contrast, we argue that the local RoI information from the object image patch
alone with a proper resizing scheme is a better input as it provides complete
semantic clues meanwhile excludes irrelevant interferences. Besides, we
decompose the confidence mechanism in monocular 3D object detection by
considering the relationship between 3D objects and the associated 2D boxes.
Evaluated on KITTI, our method outperforms state-of-the-art methods by a large
margin. The code will be made publicly available soon.
Related papers
- 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding [58.924180772480504]
3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.
We propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3-Net)
arXiv Detail & Related papers (2023-07-25T09:33:25Z) - Neural Correspondence Field for Object Pose Estimation [67.96767010122633]
We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image.
Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum.
arXiv Detail & Related papers (2022-07-30T01:48:23Z) - SparseDet: Towards End-to-End 3D Object Detection [12.3069609175534]
We propose SparseDet for end-to-end 3D object detection from point cloud.
As a new detection paradigm, SparseDet maintains a fixed set of learnable proposals to represent latent candidates.
SparseDet achieves highly competitive detection accuracy while running with a more efficient speed of 34.5 FPS.
arXiv Detail & Related papers (2022-06-02T09:49:53Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object
Detection [101.20784125067559]
We propose a new architecture, namely Hallucinated Hollow-3D R-CNN, to address the problem of 3D object detection.
In our approach, we first extract the multi-view features by sequentially projecting the point clouds into the perspective view and the bird-eye view.
The 3D objects are detected via a box refinement module with a novel Hierarchical Voxel RoI Pooling operation.
arXiv Detail & Related papers (2021-07-30T02:00:06Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty
Propagation [4.202461384355329]
We propose MonoRUn, a novel 3D object detection framework that learns dense correspondences and geometry in a self-supervised manner.
Our proposed approach outperforms current state-of-the-art methods on KITTI benchmark.
arXiv Detail & Related papers (2021-03-23T15:03:08Z) - Stereo Object Matching Network [78.35697025102334]
This paper presents a stereo object matching method that exploits both 2D contextual information from images and 3D object-level information.
We present two novel strategies to handle 3D objectness in the cost volume space: selective sampling (RoISelect) and 2D-3D fusion.
arXiv Detail & Related papers (2021-03-23T12:54:43Z) - Monocular Differentiable Rendering for Self-Supervised 3D Object
Detection [21.825158925459732]
3D object detection from monocular images is an ill-posed problem due to the projective entanglement of depth and scale.
We present a novel self-supervised method for textured 3D shape reconstruction and pose estimation of rigid objects.
Our method predicts the 3D location and meshes of each object in an image using differentiable rendering and a self-supervised objective.
arXiv Detail & Related papers (2020-09-30T09:21:43Z) - Object-Aware Centroid Voting for Monocular 3D Object Detection [30.59728753059457]
We propose an end-to-end trainable monocular 3D object detector without learning the dense depth.
A novel object-aware voting approach is introduced, which considers both the region-wise appearance attention and the geometric projection distribution.
With the late fusion and the predicted 3D orientation and dimension, the 3D bounding boxes of objects can be detected from a single RGB image.
arXiv Detail & Related papers (2020-07-20T02:11:18Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.