DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention
- URL: http://arxiv.org/abs/2212.07849v1
- Date: Thu, 15 Dec 2022 14:18:47 GMT
- Title: DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention
- Authors: Zhipeng Luo, Changqing Zhou, Gongjie Zhang, Shijian Lu
- Abstract summary: 3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
- Score: 50.11672196146829
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection with surround-view images is an essential task for
autonomous driving. In this work, we propose DETR4D, a Transformer-based
framework that explores sparse attention and direct feature query for 3D object
detection in multi-view images. We design a novel projective cross-attention
mechanism for query-image interaction to address the limitations of existing
methods in terms of geometric cue exploitation and information loss for
cross-view objects. In addition, we introduce a heatmap generation technique
that bridges 3D and 2D spaces efficiently via query initialization.
Furthermore, unlike the common practice of fusing intermediate spatial features
for temporal aggregation, we provide a new perspective by introducing a novel
hybrid approach that performs cross-frame fusion over past object queries and
image features, enabling efficient and robust modeling of temporal information.
Extensive experiments on the nuScenes dataset demonstrate the effectiveness and
efficiency of the proposed DETR4D.
Related papers
- 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - FusionViT: Hierarchical 3D Object Detection via LiDAR-Camera Vision
Transformer Fusion [8.168523242105763]
We will introduce a novel vision transformer-based 3D object detection model, namely FusionViT.
Our FusionViT model can achieve state-of-the-art performance and outperforms existing baseline methods.
arXiv Detail & Related papers (2023-11-07T00:12:01Z) - Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - Multi-Projection Fusion and Refinement Network for Salient Object
Detection in 360{\deg} Omnidirectional Image [141.10227079090419]
We propose a Multi-Projection Fusion and Refinement Network (MPFR-Net) to detect the salient objects in 360deg omnidirectional image.
MPFR-Net uses the equirectangular projection image and four corresponding cube-unfolding images as inputs.
Experimental results on two omnidirectional datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-12-23T14:50:40Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - Single-Shot 3D Detection of Vehicles from Monocular RGB Images via
Geometry Constrained Keypoints in Real-Time [6.82446891805815]
We propose a novel 3D single-shot object detection method for detecting vehicles in monocular RGB images.
Our approach lifts 2D detections to 3D space by predicting additional regression and classification parameters.
We test our approach on different datasets for autonomous driving and evaluate it using the challenging KITTI 3D Object Detection and the novel nuScenes Object Detection benchmarks.
arXiv Detail & Related papers (2020-06-23T15:10:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.