SUIT: Learning Significance-guided Information for 3D Temporal Detection
- URL: http://arxiv.org/abs/2307.01807v1
- Date: Tue, 4 Jul 2023 16:22:10 GMT
- Title: SUIT: Learning Significance-guided Information for 3D Temporal Detection
- Authors: Zheyuan Zhou, Jiachen Lu, Yihan Zeng, Hang Xu, Li Zhang
- Abstract summary: We learn Significance-gUided Information for 3D Temporal detection (SUIT), which simplifies temporal information as sparse features for information fusion across frames.
We evaluate our method on large-scale nuScenes and dataset, where our SUIT not only significantly reduces the memory and cost of temporal fusion, but also performs well over the state-of-the-art baselines.
- Score: 15.237488449422008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection from LiDAR point cloud is of critical importance for
autonomous driving and robotics. While sequential point cloud has the potential
to enhance 3D perception through temporal information, utilizing these temporal
features effectively and efficiently remains a challenging problem. Based on
the observation that the foreground information is sparsely distributed in
LiDAR scenes, we believe sufficient knowledge can be provided by sparse format
rather than dense maps. To this end, we propose to learn Significance-gUided
Information for 3D Temporal detection (SUIT), which simplifies temporal
information as sparse features for information fusion across frames.
Specifically, we first introduce a significant sampling mechanism that extracts
information-rich yet sparse features based on predicted object centroids. On
top of that, we present an explicit geometric transformation learning
technique, which learns the object-centric transformations among sparse
features across frames. We evaluate our method on large-scale nuScenes and
Waymo dataset, where our SUIT not only significantly reduces the memory and
computation cost of temporal fusion, but also performs well over the
state-of-the-art baselines.
Related papers
- Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences [25.74000325019015]
We introduce a novel LiDAR 3D object detection framework, namely LiSTM, to facilitate spatial-temporal feature learning with cross-frame motion forecasting information.
We have conducted experiments on the aggregation and nuScenes datasets to demonstrate that the proposed framework achieves superior 3D detection performance.
arXiv Detail & Related papers (2024-09-06T16:29:04Z) - TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation [80.13343299606146]
We propose a Temporal LiDAR Aggregation and Distillation (TLAD) algorithm, which leverages historical priors to assign different aggregation steps for different classes.
To make full use of temporal images, we design a Temporal Image Aggregation and Fusion (TIAF) module, which can greatly expand the camera FOV.
We also develop a Static-Moving Switch Augmentation (SMSA) algorithm, which utilizes sufficient temporal information to enable objects to switch their motion states freely.
arXiv Detail & Related papers (2024-07-13T03:00:16Z) - TimePillars: Temporally-Recurrent 3D LiDAR Object Detection [8.955064958311517]
TimePillars is a temporally-recurrent object detection pipeline.
It exploits the pillar representation of LiDAR data across time.
We show how basic building blocks are enough to achieve robust and efficient results.
arXiv Detail & Related papers (2023-12-22T10:25:27Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - Learning Spatial and Temporal Variations for 4D Point Cloud Segmentation [0.39373541926236766]
We argue that the temporal information across the frames provides crucial knowledge for 3D scene perceptions.
We design a temporal variation-aware module and a temporal voxel-point refiner to capture the temporal variation in the 4D point cloud.
arXiv Detail & Related papers (2022-07-11T07:36:26Z) - Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving
Object Segmentation [23.666607237164186]
We propose a novel deep neural network exploiting both spatial-temporal information and different representation modalities of LiDAR scans to improve LiDAR-MOS performance.
Specifically, we first use a range image-based dual-branch structure to separately deal with spatial and temporal information.
We also use a point refinement module via 3D sparse convolution to fuse the information from both LiDAR range image and point cloud representations.
arXiv Detail & Related papers (2022-07-05T17:59:17Z) - Learning-based Point Cloud Registration for 6D Object Pose Estimation in
the Real World [55.7340077183072]
We tackle the task of estimating the 6D pose of an object from point cloud data.
Recent learning-based approaches to addressing this task have shown great success on synthetic datasets.
We analyze the causes of these failures, which we trace back to the difference between the feature distributions of the source and target point clouds.
arXiv Detail & Related papers (2022-03-29T07:55:04Z) - Revisiting Point Cloud Simplification: A Learnable Feature Preserving
Approach [57.67932970472768]
Mesh and Point Cloud simplification methods aim to reduce the complexity of 3D models while retaining visual quality and relevant salient features.
We propose a fast point cloud simplification method by learning to sample salient points.
The proposed method relies on a graph neural network architecture trained to select an arbitrary, user-defined, number of points from the input space and to re-arrange their positions so as to minimize the visual perception error.
arXiv Detail & Related papers (2021-09-30T10:23:55Z) - Spatio-temporal Self-Supervised Representation Learning for 3D Point
Clouds [96.9027094562957]
We introduce a-temporal representation learning framework, capable of learning from unlabeled tasks.
Inspired by how infants learn from visual data in the wild, we explore rich cues derived from the 3D data.
STRL takes two temporally-related frames from a 3D point cloud sequence as the input, transforms it with the spatial data augmentation, and learns the invariant representation self-supervisedly.
arXiv Detail & Related papers (2021-09-01T04:17:11Z) - SIENet: Spatial Information Enhancement Network for 3D Object Detection
from Point Cloud [20.84329063509459]
LiDAR-based 3D object detection pushes forward an immense influence on autonomous vehicles.
Due to the limitation of the intrinsic properties of LiDAR, fewer points are collected at the objects farther away from the sensor.
To address the challenge, we propose a novel two-stage 3D object detection framework, named SIENet.
arXiv Detail & Related papers (2021-03-29T07:45:09Z) - InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic
Information Modeling [65.47126868838836]
We propose a novel 3D object detection framework with dynamic information modeling.
Coarse predictions are generated in the first stage via a voxel-based region proposal network.
Experiments are conducted on the large-scale nuScenes 3D detection benchmark.
arXiv Detail & Related papers (2020-07-16T18:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.