Consistency of Implicit and Explicit Features Matters for Monocular 3D
Object Detection
- URL: http://arxiv.org/abs/2207.07933v1
- Date: Sat, 16 Jul 2022 13:00:32 GMT
- Title: Consistency of Implicit and Explicit Features Matters for Monocular 3D
Object Detection
- Authors: Qian Ye, Ling Jiang, Yuyang Du
- Abstract summary: Monocular 3D object detection is a common solution for low-cost autonomous agents to perceive their surroundings.
We present CIEF, with the first orientation-aware image backbone to eliminate the disparity of implicit and explicit features in subsequent 3D representation.
CIEF ranked 1st among all reported methods on both 3D and BEV detection benchmark of KITTI at submission time.
- Score: 4.189643331553922
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Monocular 3D object detection is a common solution for low-cost autonomous
agents to perceive their surrounding environment. Monocular detection has
progressed into two categories: (1)Direct methods that infer 3D bounding boxes
directly from a frontal-view image; (2)3D intermedia representation methods
that map image features to 3D space for subsequent 3D detection. The second
category is standing out not only because 3D detection forges ahead at the
mercy of more meaningful and representative features, but because of emerging
SOTA end-to-end prediction and planning paradigms that require a
bird's-eye-view feature map from a perception pipeline. However, in
transforming to 3D representation, these methods do not guarantee that objects'
implicit orientations and locations in latent space are consistent with those
explicitly observed in Euclidean space, which will hurt model performance.
Hence, we argue that the consistency of implicit and explicit features matters
and present a novel monocular detection method, named CIEF, with the first
orientation-aware image backbone to eliminate the disparity of implicit and
explicit features in subsequent 3D representation. As a second contribution, we
introduce a ray attention mechanism. In contrast to previous methods that
repeat features along the projection ray or rely on another intermedia frustum
point cloud, we directly transform image features to voxel representations with
well-localized features. We also propose a handcrafted gaussian positional
encoding function that outperforms the sinusoidal encoding function but
maintains the benefit of being continuous. CIEF ranked 1st among all reported
methods on both 3D and BEV detection benchmark of KITTI at submission time.
Related papers
- OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation [67.56268991234371]
OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6% on average.
Code and pre-trained models will be released later.
arXiv Detail & Related papers (2024-03-28T17:05:04Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data [80.14669385741202]
We propose a self-supervised pre-training method for 3D perception models tailored to autonomous driving data.
We leverage the availability of synchronized and calibrated image and Lidar sensors in autonomous driving setups.
Our method does not require any point cloud nor image annotations.
arXiv Detail & Related papers (2022-03-30T12:40:30Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - IAFA: Instance-aware Feature Aggregation for 3D Object Detection from a
Single Image [37.83574424518901]
3D object detection from a single image is an important task in Autonomous Driving.
We propose an instance-aware approach to aggregate useful information for improving the accuracy of 3D object detection.
arXiv Detail & Related papers (2021-03-05T05:47:52Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - Object-Aware Centroid Voting for Monocular 3D Object Detection [30.59728753059457]
We propose an end-to-end trainable monocular 3D object detector without learning the dense depth.
A novel object-aware voting approach is introduced, which considers both the region-wise appearance attention and the geometric projection distribution.
With the late fusion and the predicted 3D orientation and dimension, the 3D bounding boxes of objects can be detected from a single RGB image.
arXiv Detail & Related papers (2020-07-20T02:11:18Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.