Perspective-aware Convolution for Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2308.12938v1
- Date: Thu, 24 Aug 2023 17:25:36 GMT
- Title: Perspective-aware Convolution for Monocular 3D Object Detection
- Authors: Jia-Quan Yu, Soo-Chang Pei
- Abstract summary: We propose a novel perspective-aware convolutional layer that captures long-range dependencies in images.
By enforcing convolutional kernels to extract features along the depth axis of every image pixel, we incorporates perspective information into network architecture.
We demonstrate improved performance on the KITTI3D dataset, achieving a 23.9% average precision in the easy benchmark.
- Score: 2.33877878310217
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular 3D object detection is a crucial and challenging task for
autonomous driving vehicle, while it uses only a single camera image to infer
3D objects in the scene. To address the difficulty of predicting depth using
only pictorial clue, we propose a novel perspective-aware convolutional layer
that captures long-range dependencies in images. By enforcing convolutional
kernels to extract features along the depth axis of every image pixel, we
incorporates perspective information into network architecture. We integrate
our perspective-aware convolutional layer into a 3D object detector and
demonstrate improved performance on the KITTI3D dataset, achieving a 23.9\%
average precision in the easy benchmark. These results underscore the
importance of modeling scene clues for accurate depth inference and highlight
the benefits of incorporating scene structure in network design. Our
perspective-aware convolutional layer has the potential to enhance object
detection accuracy by providing more precise and context-aware feature
extraction.
Related papers
- OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection [102.0744303467713]
We propose a new multi-view 3D object detector named OPEN.
Our main idea is to effectively inject object-wise depth information into the network through our proposed object-wise position embedding.
OPEN achieves a new state-of-the-art performance with 64.4% NDS and 56.7% mAP on the nuScenes test benchmark.
arXiv Detail & Related papers (2024-07-15T14:29:15Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - MoGDE: Boosting Mobile Monocular 3D Object Detection with Ground Depth
Estimation [20.697822444708237]
We propose a novel Mono3D framework, called MoGDE, which constantly estimates the corresponding ground depth of an image.
MoGDE yields the best performance compared with the state-of-the-art methods by a large margin and is ranked number one on the KITTI 3D benchmark.
arXiv Detail & Related papers (2023-03-23T04:06:01Z) - MonoPGC: Monocular 3D Object Detection with Pixel Geometry Contexts [6.639648061168067]
We propose MonoPGC, a novel end-to-end Monocular 3D object detection framework with rich Pixel Geometry Contexts.
We introduce the pixel depth estimation as our auxiliary task and design depth cross-attention pyramid module (DCPM) to inject local and global depth geometry knowledge into visual features.
In addition, we present the depth-space-aware transformer (DSAT) to integrate 3D space position and depth-aware features efficiently.
arXiv Detail & Related papers (2023-02-21T09:21:58Z) - Surface-biased Multi-Level Context 3D Object Detection [1.9723551683930771]
This work addresses the object detection task in 3D point clouds using a highly efficient, surface-biased, feature extraction method (wang2022rbgnet)
We propose a 3D object detector that extracts accurate feature representations of object candidates and leverages self-attention on point patches, object candidates, and on the global scene in 3D scene.
arXiv Detail & Related papers (2023-02-13T11:50:04Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - MDS-Net: A Multi-scale Depth Stratification Based Monocular 3D Object
Detection Algorithm [4.958840734249869]
This paper proposes a one-stage monocular 3D object detection algorithm based on multi-scale depth stratification.
Experiments on the KITTI benchmark show that the MDS-Net outperforms the existing monocular 3D detection methods in 3D detection and BEV detection tasks.
arXiv Detail & Related papers (2022-01-12T07:11:18Z) - Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images
with Virtual Depth [64.29043589521308]
We propose a rendering module to augment the training data by synthesizing images with virtual-depths.
The rendering module takes as input the RGB image and its corresponding sparse depth image, outputs a variety of photo-realistic synthetic images.
Besides, we introduce an auxiliary module to improve the detection model by jointly optimizing it through a depth estimation task.
arXiv Detail & Related papers (2021-07-28T11:00:47Z) - VR3Dense: Voxel Representation Learning for 3D Object Detection and
Monocular Dense Depth Reconstruction [0.951828574518325]
We introduce a method for jointly training 3D object detection and monocular dense depth reconstruction neural networks.
It takes as inputs, a LiDAR point-cloud, and a single RGB image during inference and produces object pose predictions as well as a densely reconstructed depth map.
While our object detection is trained in a supervised manner, the depth prediction network is trained with both self-supervised and supervised loss functions.
arXiv Detail & Related papers (2021-04-13T04:25:54Z) - Deep Continuous Fusion for Multi-Sensor 3D Object Detection [103.5060007382646]
We propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization.
We design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.
arXiv Detail & Related papers (2020-12-20T18:43:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.