Frame Fusion with Vehicle Motion Prediction for 3D Object Detection
- URL: http://arxiv.org/abs/2306.10699v1
- Date: Mon, 19 Jun 2023 04:57:53 GMT
- Title: Frame Fusion with Vehicle Motion Prediction for 3D Object Detection
- Authors: Xirui Li, Feng Wang, Naiyan Wang, Chao Ma
- Abstract summary: In LiDAR-based 3D detection, history point clouds contain rich temporal information helpful for future prediction.
We propose a detection enhancement method, namely FrameFusion, which improves 3D object detection results by fusing history frames.
- Score: 18.354273907772278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In LiDAR-based 3D detection, history point clouds contain rich temporal
information helpful for future prediction. In the same way, history detections
should contribute to future detections. In this paper, we propose a detection
enhancement method, namely FrameFusion, which improves 3D object detection
results by fusing history frames. In FrameFusion, we ''forward'' history frames
to the current frame and apply weighted Non-Maximum-Suppression on dense
bounding boxes to obtain a fused frame with merged boxes. To ''forward''
frames, we use vehicle motion models to estimate the future pose of the
bounding boxes. However, the commonly used constant velocity model fails
naturally on turning vehicles, so we explore two vehicle motion models to
address this issue. On Waymo Open Dataset, our FrameFusion method consistently
improves the performance of various 3D detectors by about $2$ vehicle level 2
APH with negligible latency and slightly enhances the performance of the
temporal fusion method MPPNet. We also conduct extensive experiments on motion
model selection.
Related papers
- CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection [9.509625131289429]
We introduce CRT-Fusion, a novel framework that integrates temporal information into radar-camera fusion.
CRT-Fusion achieves state-of-the-art performance for radar-camera-based 3D object detection.
arXiv Detail & Related papers (2024-11-05T11:25:19Z) - PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - LEF: Late-to-Early Temporal Fusion for LiDAR 3D Object Detection [40.267769862404684]
We propose a late-to-early recurrent feature fusion scheme for 3D object detection using temporal LiDAR point clouds.
Our main motivation is fusing object-aware latent embeddings into the early stages of a 3D object detector.
arXiv Detail & Related papers (2023-09-28T21:58:25Z) - DetZero: Rethinking Offboard 3D Object Detection with Long-term
Sequential Point Clouds [55.755450273390004]
Existing offboard 3D detectors always follow a modular pipeline design to take advantage of unlimited sequential point clouds.
We have found that the full potential of offboard 3D detectors is not explored mainly due to two reasons: (1) the onboard multi-object tracker cannot generate sufficient complete object trajectories, and (2) the motion state of objects poses an inevitable challenge for the object-centric refining stage.
To tackle these problems, we propose a novel paradigm of offboard 3D object detection, named DetZero.
arXiv Detail & Related papers (2023-06-09T16:42:00Z) - TrajectoryFormer: 3D Object Tracking Transformer with Predictive
Trajectory Hypotheses [51.60422927416087]
3D multi-object tracking (MOT) is vital for many applications including autonomous driving vehicles and service robots.
We present TrajectoryFormer, a novel point-cloud-based 3D MOT framework.
arXiv Detail & Related papers (2023-06-09T13:31:50Z) - GOOD: General Optimization-based Fusion for 3D Object Detection via
LiDAR-Camera Object Candidates [10.534984939225014]
3D object detection serves as the core basis of the perception tasks in autonomous driving.
Good is a general optimization-based fusion framework that can achieve satisfying detection without training additional models.
Experiments on both nuScenes and KITTI datasets are carried out and the results show that GOOD outperforms by 9.1% on mAP score compared with PointPillars.
arXiv Detail & Related papers (2023-03-17T07:05:04Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information.
We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z) - Temp-Frustum Net: 3D Object Detection with Temporal Fusion [0.0]
3D object detection is a core component of automated driving systems.
Frame-by-frame 3D object detection suffers from noise, field-of-view obstruction, and sparsity.
We propose a novel Temporal Fusion Module to mitigate these problems.
arXiv Detail & Related papers (2021-04-25T09:08:14Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.