D-Align: Dual Query Co-attention Network for 3D Object Detection Based
on Multi-frame Point Cloud Sequence
- URL: http://arxiv.org/abs/2210.00087v1
- Date: Fri, 30 Sep 2022 20:41:25 GMT
- Title: D-Align: Dual Query Co-attention Network for 3D Object Detection Based
on Multi-frame Point Cloud Sequence
- Authors: Junhyung Lee, Junho Koh, Youngwoo Lee, Jun Won Choi
- Abstract summary: Conventional 3D object detectors detect objects using a set of points acquired over a fixed duration.
Recent studies have shown that the performance of object detection can be further enhanced by utilizing point cloud sequences.
We propose D-Align, which can effectively produce strong bird's-eye-view (BEV) features by aligning and aggregating the features obtained from a sequence of point sets.
- Score: 8.21339007493213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LiDAR sensors are widely used for 3D object detection in various mobile
robotics applications. LiDAR sensors continuously generate point cloud data in
real-time. Conventional 3D object detectors detect objects using a set of
points acquired over a fixed duration. However, recent studies have shown that
the performance of object detection can be further enhanced by utilizing
spatio-temporal information obtained from point cloud sequences. In this paper,
we propose a new 3D object detector, named D-Align, which can effectively
produce strong bird's-eye-view (BEV) features by aligning and aggregating the
features obtained from a sequence of point sets. The proposed method includes a
novel dual-query co-attention network that uses two types of queries, including
target query set (T-QS) and support query set (S-QS), to update the features of
target and support frames, respectively. D-Align aligns S-QS to T-QS based on
the temporal context features extracted from the adjacent feature maps and then
aggregates S-QS with T-QS using a gated attention mechanism. The dual queries
are updated through multiple attention layers to progressively enhance the
target frame features used to produce the detection results. Our experiments on
the nuScenes dataset show that the proposed D-Align method greatly improved the
performance of a single frame-based baseline method and significantly
outperformed the latest 3D object detectors.
Related papers
- SEED: A Simple and Effective 3D DETR in Point Clouds [72.74016394325675]
We argue that the main challenges are challenging due to the high sparsity and uneven distribution of point clouds.
We propose a simple and effective 3D DETR method (SEED) for detecting 3D objects from point clouds.
arXiv Detail & Related papers (2024-07-15T14:21:07Z) - Boosting 3D Object Detection with Semantic-Aware Multi-Branch Framework [44.44329455757931]
In autonomous driving, LiDAR sensors are vital for acquiring 3D point clouds, providing reliable geometric information.
Traditional sampling methods of preprocessing often ignore semantic features, leading to detail loss and ground point interference.
We propose a multi-branch two-stage 3D object detection framework using a Semantic-aware Multi-branch Sampling (SMS) module and multi-view constraints.
arXiv Detail & Related papers (2024-07-08T09:25:45Z) - PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - MGTANet: Encoding Sequential LiDAR Points Using Long Short-Term
Motion-Guided Temporal Attention for 3D Object Detection [8.305942415868042]
Most LiDAR sensors generate a sequence of point clouds in real-time.
Recent studies have revealed that substantial performance improvement can be achieved by exploiting the context present in a sequence of point sets.
We propose a novel 3D object detection architecture, which can encode point cloud sequences acquired by multiple successive scans.
arXiv Detail & Related papers (2022-12-01T11:24:47Z) - 3D Cascade RCNN: High Quality Object Detection in Point Clouds [122.42455210196262]
We present 3D Cascade RCNN, which allocates multiple detectors based on the voxelized point clouds in a cascade paradigm.
We validate the superiority of our proposed 3D Cascade RCNN, when comparing to state-of-the-art 3D object detection techniques.
arXiv Detail & Related papers (2022-11-15T15:58:36Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - Joint 3D Object Detection and Tracking Using Spatio-Temporal
Representation of Camera Image and LiDAR Point Clouds [12.334725127696395]
We propose a new joint object detection and tracking (DT) framework for 3D object detection and tracking based on camera and LiDAR sensors.
The proposed method, referred to as 3D DetecJo, enables the detector and tracker to cooperate to generate atemporal-representation of the camera and LiDAR data.
arXiv Detail & Related papers (2021-12-14T02:38:45Z) - LiDAR-based Online 3D Video Object Detection with Graph-based Message
Passing and Spatiotemporal Transformer Attention [100.52873557168637]
3D object detectors usually focus on the single-frame detection, while ignoring the information in consecutive point cloud frames.
In this paper, we propose an end-to-end online 3D video object detector that operates on point sequences.
arXiv Detail & Related papers (2020-04-03T06:06:52Z) - Boundary-Aware Dense Feature Indicator for Single-Stage 3D Object
Detection from Point Clouds [32.916690488130506]
We propose a universal module that helps 3D detectors focus on the densest region of the point clouds in a boundary-aware manner.
Experiments on KITTI dataset show that DENFI improves the performance of the baseline single-stage detector remarkably.
arXiv Detail & Related papers (2020-04-01T01:21:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.