Related papers: Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks

Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks

URL: http://arxiv.org/abs/2110.04076v1
Date: Tue, 28 Sep 2021 19:58:13 GMT
Title: Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks
Authors: Benedikt Mersch, Xieyuanli Chen, Jens Behley, Cyrill Stachniss
Abstract summary: Exploiting past 3D LiDAR scans to predict future point clouds is a promising method for autonomous mobile systems. We propose an end-to-end approach that exploits a 2D range image representation of each 3D LiDAR scan. We develop an encoder-decoder architecture using 3D convolutions to jointly aggregate spatial and temporal information of the scene.
Score: 27.49539859498477
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Exploiting past 3D LiDAR scans to predict future point clouds is a promising method for autonomous mobile systems to realize foresighted state estimation, collision avoidance, and planning. In this paper, we address the problem of predicting future 3D LiDAR point clouds given a sequence of past LiDAR scans. Estimating the future scene on the sensor level does not require any preceding steps as in localization or tracking systems and can be trained self-supervised. We propose an end-to-end approach that exploits a 2D range image representation of each 3D LiDAR scan and concatenates a sequence of range images to obtain a 3D tensor. Based on such tensors, we develop an encoder-decoder architecture using 3D convolutions to jointly aggregate spatial and temporal information of the scene and to predict the future 3D point clouds. We evaluate our method on multiple datasets and the experimental results suggest that our method outperforms existing point cloud prediction architectures and generalizes well to new, unseen environments without additional fine-tuning. Our method operates online and is faster than the common LiDAR frame rate of 10 Hz.

Related papers

TREND: Unsupervised 3D Representation Learning via Temporal Forecasting for LiDAR Perception [39.3873954435857]
TREND is the first work on temporal forecasting for unsupervised 3D representation learning. We evaluate TREND on downstream 3D object detection tasks on popular datasets, including NuScenes, Once and NuScenes. Experiment results show that TREND brings up to 90% more improvement as compared to previous SOTA unsupervised 3D pre-training methods.
arXiv Detail & Related papers (2024-12-04T06:17:24Z)
OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision. We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range. For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z)
A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds [50.54083964183614]
It is non-trivial to perform accurate target-specific detection since the point cloud of objects in raw LiDAR scans is usually sparse and incomplete. We propose DMT, a Detector-free Motion prediction based 3D Tracking network that totally removes the usage of complicated 3D detectors.
arXiv Detail & Related papers (2022-03-08T17:49:07Z)
FPPN: Future Pseudo-LiDAR Frame Prediction for Autonomous Driving [30.18167579599365]
We propose the first future pseudo-LiDAR frame prediction network. We first predict a future dense depth map based on dynamic motion information coarsely. We refine the predicted dense depth map using static contextual information. The future pseudo-LiDAR frame can be obtained by converting the predicted dense depth map into corresponding 3D point clouds.
arXiv Detail & Related papers (2021-12-08T16:46:18Z)
Lifting 2D Object Locations to 3D by Discounting LiDAR Outliers across Objects and Views [70.1586005070678]
We present a system for automatically converting 2D mask object predictions and raw LiDAR point clouds into full 3D bounding boxes of objects. Our method significantly outperforms previous work despite the fact that those methods use significantly more complex pipelines, 3D models and additional human-annotated external sources of prior information.
arXiv Detail & Related papers (2021-09-16T13:01:13Z)
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution. A natural remedy is to utilize the 3D voxelization and 3D convolution network. We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z)
SLPC: a VRNN-based approach for stochastic lidar prediction and completion in autonomous driving [63.87272273293804]
We propose a new LiDAR prediction framework that is based on generative models namely Variational Recurrent Neural Networks (VRNNs) Our algorithm is able to address the limitations of previous video prediction frameworks when dealing with sparse data by spatially inpainting the depth maps in the upcoming frames. We present a sparse version of VRNNs and an effective self-supervised training method that does not require any labels.
arXiv Detail & Related papers (2021-02-19T11:56:44Z)
Anchor-Based Spatial-Temporal Attention Convolutional Networks for Dynamic 3D Point Cloud Sequences [20.697745449159097]
Anchor-based Spatial-Temporal Attention Convolution operation (ASTAConv) is proposed in this paper to process dynamic 3D point cloud sequences. The proposed convolution operation builds a regular receptive field around each point by setting several virtual anchors around each point. The proposed method makes better use of the structured information within the local region, and learn spatial-temporal embedding features from dynamic 3D point cloud sequences.
arXiv Detail & Related papers (2020-12-20T07:35:37Z)
Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation [87.54570024320354]
State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space. A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space. We develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds.
arXiv Detail & Related papers (2020-08-04T13:56:19Z)
3DMotion-Net: Learning Continuous Flow Function for 3D Motion Prediction [12.323767993152968]
We deal with the problem to predict the future 3D motions of 3D object scans from previous two consecutive frames. We propose a self-supervised approach that leverages the power of the deep neural network to learn a continuous flow function of 3D point clouds. We perform extensive experiments on D-FAUST, SCAPE and TOSCA benchmark data sets and the results demonstrate that our approach is capable of handling temporally inconsistent input.
arXiv Detail & Related papers (2020-06-24T17:39:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.