MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection
from Point Cloud Sequences
- URL: http://arxiv.org/abs/2303.08316v1
- Date: Wed, 15 Mar 2023 02:10:27 GMT
- Title: MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection
from Point Cloud Sequences
- Authors: Chenhang He, Ruihuang Li, Yabin Zhang, Shuai Li, Lei Zhang
- Abstract summary: Point cloud sequences are commonly used to accurately detect 3D objects in applications such as autonomous driving.
Current top-performing multi-frame detectors mostly follow a Detect-and-Fuse framework, which extracts features from each frame of the sequence and fuses them to detect the objects in the current frame.
We propose an efficient Motion-guided Sequential Fusion (MSF) method, which exploits the continuity of object motion to mine useful sequential contexts for object detection in the current frame.
- Score: 21.50329070835023
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Point cloud sequences are commonly used to accurately detect 3D objects in
applications such as autonomous driving. Current top-performing multi-frame
detectors mostly follow a Detect-and-Fuse framework, which extracts features
from each frame of the sequence and fuses them to detect the objects in the
current frame. However, this inevitably leads to redundant computation since
adjacent frames are highly correlated. In this paper, we propose an efficient
Motion-guided Sequential Fusion (MSF) method, which exploits the continuity of
object motion to mine useful sequential contexts for object detection in the
current frame. We first generate 3D proposals on the current frame and
propagate them to preceding frames based on the estimated velocities. The
points-of-interest are then pooled from the sequence and encoded as proposal
features. A novel Bidirectional Feature Aggregation (BiFA) module is further
proposed to facilitate the interactions of proposal features across frames.
Besides, we optimize the point cloud pooling by a voxel-based sampling
technique so that millions of points can be processed in several milliseconds.
The proposed MSF method achieves not only better efficiency than other
multi-frame detectors but also leading accuracy, with 83.12% and 78.30% mAP on
the LEVEL1 and LEVEL2 test sets of Waymo Open Dataset, respectively. Codes can
be found at \url{https://github.com/skyhehe123/MSF}.
Related papers
- Multiway Point Cloud Mosaicking with Diffusion and Global Optimization [74.3802812773891]
We introduce a novel framework for multiway point cloud mosaicking (named Wednesday)
At the core of our approach is ODIN, a learned pairwise registration algorithm that identifies overlaps and refines attention scores.
Tested on four diverse, large-scale datasets, our method state-of-the-art pairwise and rotation registration results by a large margin on all benchmarks.
arXiv Detail & Related papers (2024-03-30T17:29:13Z) - PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D
Object Detection [19.419030878019974]
unstructured 3D point clouds are filled in the 2D plane and 3D point cloud features are extracted faster using projection-aware convolution layers.
The corresponding indexes between different sensor signals are established in advance in the data preprocessing.
Two new plug-and-play fusion modules, LiCamFuse and BiLiCamFuse, are proposed.
arXiv Detail & Related papers (2022-09-15T16:13:19Z) - TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object
Detection [47.941714033657675]
3D object detection using point clouds has attracted increasing attention due to its wide applications in autonomous driving and robotics.
We design TransPillars, a novel transformer-based feature aggregation technique that exploits temporal features of consecutive point cloud frames.
Our proposed TransPillars achieves state-of-art performance as compared to existing multi-frame detection approaches.
arXiv Detail & Related papers (2022-08-04T15:41:43Z) - MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D
Temporal Object Detection [44.619039588252676]
We present a flexible and high-performance 3D detection framework, named MPPNet, for 3D temporal object detection with point cloud sequences.
We propose a novel three-hierarchy framework with proxy points for multi-frame feature encoding and interactions to achieve better detection.
Our approach outperforms state-of-the-art methods with large margins when applied to both short (e.g., 4-frame) and long (e.g., 16-frame) point cloud sequences.
arXiv Detail & Related papers (2022-05-12T09:38:42Z) - Segment as Points for Efficient Online Multi-Object Tracking and
Segmentation [66.03023110058464]
We propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation.
Our method generates a new tracking-by-points paradigm where discriminative instance embeddings are learned from randomly selected points rather than images.
The resulting online MOTS framework, named PointTrack, surpasses all the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2020-07-03T08:29:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.