Temp-Frustum Net: 3D Object Detection with Temporal Fusion
- URL: http://arxiv.org/abs/2104.12106v1
- Date: Sun, 25 Apr 2021 09:08:14 GMT
- Title: Temp-Frustum Net: 3D Object Detection with Temporal Fusion
- Authors: Eme\c{c} Er\c{c}elik, Ekim Yurtsever and Alois Knoll
- Abstract summary: 3D object detection is a core component of automated driving systems.
Frame-by-frame 3D object detection suffers from noise, field-of-view obstruction, and sparsity.
We propose a novel Temporal Fusion Module to mitigate these problems.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection is a core component of automated driving systems.
State-of-the-art methods fuse RGB imagery and LiDAR point cloud data
frame-by-frame for 3D bounding box regression. However, frame-by-frame 3D
object detection suffers from noise, field-of-view obstruction, and sparsity.
We propose a novel Temporal Fusion Module (TFM) to use information from
previous time-steps to mitigate these problems. First, a state-of-the-art
frustum network extracts point cloud features from raw RGB and LiDAR point
cloud data frame-by-frame. Then, our TFM module fuses these features with a
recurrent neural network. As a result, 3D object detection becomes robust
against single frame failures and transient occlusions. Experiments on the
KITTI object tracking dataset show the efficiency of the proposed TFM, where we
obtain ~6%, ~4%, and ~6% improvements on Car, Pedestrian, and Cyclist classes,
respectively, compared to frame-by-frame baselines. Furthermore, ablation
studies reinforce that the subject of improvement is temporal fusion and show
the effects of different placements of TFM in the object detection pipeline.
Our code is open-source and available at
https://gitlab.lrz.de/emec_ercelik/temp-frustnet.
Related papers
- CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection [9.509625131289429]
We introduce CRT-Fusion, a novel framework that integrates temporal information into radar-camera fusion.
CRT-Fusion achieves state-of-the-art performance for radar-camera-based 3D object detection.
arXiv Detail & Related papers (2024-11-05T11:25:19Z) - PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - LEF: Late-to-Early Temporal Fusion for LiDAR 3D Object Detection [40.267769862404684]
We propose a late-to-early recurrent feature fusion scheme for 3D object detection using temporal LiDAR point clouds.
Our main motivation is fusing object-aware latent embeddings into the early stages of a 3D object detector.
arXiv Detail & Related papers (2023-09-28T21:58:25Z) - Frame Fusion with Vehicle Motion Prediction for 3D Object Detection [18.354273907772278]
In LiDAR-based 3D detection, history point clouds contain rich temporal information helpful for future prediction.
We propose a detection enhancement method, namely FrameFusion, which improves 3D object detection results by fusing history frames.
arXiv Detail & Related papers (2023-06-19T04:57:53Z) - Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer for Autonomous
Driving [0.0]
We propose MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer architecture to fuse image and LiDAR features to improve the detection accuracy.
Our end-to-end single-stage, anchor-free and NMS-free network takes in multi-view images and LiDAR point clouds and predicts 3D bounding boxes.
MSF3DDETR network is trained end-to-end on the nuScenes dataset using Hungarian algorithm based bipartite matching and set-to-set loss inspired by DETR.
arXiv Detail & Related papers (2022-10-27T10:55:15Z) - 3D-VField: Learning to Adversarially Deform Point Clouds for Robust 3D
Object Detection [111.32054128362427]
In safety-critical settings, robustness on out-of-distribution and long-tail samples is fundamental to circumvent dangerous issues.
We substantially improve the generalization of 3D object detectors to out-of-domain data by taking into account deformed point clouds during training.
We propose and share open source CrashD: a synthetic dataset of realistic damaged and rare cars.
arXiv Detail & Related papers (2021-12-09T08:50:54Z) - M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object
Detection with Transformers [78.48081972698888]
We present M3DeTR, which combines different point cloud representations with different feature scales based on multi-scale feature pyramids.
M3DeTR is the first approach that unifies multiple point cloud representations, feature scales, as well as models mutual relationships between point clouds simultaneously using transformers.
arXiv Detail & Related papers (2021-04-24T06:48:23Z) - An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds [16.658604637005535]
We propose a sparse LSTM-based multi-frame 3d object detection algorithm.
We use a U-Net style 3D sparse convolution network to extract features for each frame's LiDAR point-cloud.
arXiv Detail & Related papers (2020-07-24T07:34:15Z) - LiDAR-based Online 3D Video Object Detection with Graph-based Message
Passing and Spatiotemporal Transformer Attention [100.52873557168637]
3D object detectors usually focus on the single-frame detection, while ignoring the information in consecutive point cloud frames.
In this paper, we propose an end-to-end online 3D video object detector that operates on point sequences.
arXiv Detail & Related papers (2020-04-03T06:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.