Learning Spatial and Temporal Variations for 4D Point Cloud Segmentation
- URL: http://arxiv.org/abs/2207.04673v1
- Date: Mon, 11 Jul 2022 07:36:26 GMT
- Title: Learning Spatial and Temporal Variations for 4D Point Cloud Segmentation
- Authors: Shi Hanyu, Wei Jiacheng, Wang Hao, Liu Fayao and Lin Guosheng
- Abstract summary: We argue that the temporal information across the frames provides crucial knowledge for 3D scene perceptions.
We design a temporal variation-aware module and a temporal voxel-point refiner to capture the temporal variation in the 4D point cloud.
- Score: 0.39373541926236766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LiDAR-based 3D scene perception is a fundamental and important task for
autonomous driving. Most state-of-the-art methods on LiDAR-based 3D recognition
tasks focus on single frame 3D point cloud data, and the temporal information
is ignored in those methods. We argue that the temporal information across the
frames provides crucial knowledge for 3D scene perceptions, especially in the
driving scenario. In this paper, we focus on spatial and temporal variations to
better explore the temporal information across the 3D frames. We design a
temporal variation-aware interpolation module and a temporal voxel-point
refiner to capture the temporal variation in the 4D point cloud. The temporal
variation-aware interpolation generates local features from the previous and
current frames by capturing spatial coherence and temporal variation
information. The temporal voxel-point refiner builds a temporal graph on the 3D
point cloud sequences and captures the temporal variation with a graph
convolution module. The temporal voxel-point refiner also transforms the coarse
voxel-level predictions into fine point-level predictions. With our proposed
modules, the new network TVSN achieves state-of-the-art performance on
SemanticKITTI and SemantiPOSS. Specifically, our method achieves 52.5\% in mIoU
(+5.5% against previous best approaches) on the multiple scan segmentation task
on SemanticKITTI, and 63.0% on SemanticPOSS (+2.8% against previous best
approaches).
Related papers
- 3D Single-object Tracking in Point Clouds with High Temporal Variation [79.5863632942935]
High temporal variation of point clouds is the key challenge of 3D single-object tracking (3D SOT)
Existing approaches rely on the assumption that the shape variation of the point clouds and the motion of the objects across neighboring frames are smooth.
We present a novel framework for 3D SOT in point clouds with high temporal variation, called HVTrack.
arXiv Detail & Related papers (2024-08-04T14:57:28Z) - Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - SUIT: Learning Significance-guided Information for 3D Temporal Detection [15.237488449422008]
We learn Significance-gUided Information for 3D Temporal detection (SUIT), which simplifies temporal information as sparse features for information fusion across frames.
We evaluate our method on large-scale nuScenes and dataset, where our SUIT not only significantly reduces the memory and cost of temporal fusion, but also performs well over the state-of-the-art baselines.
arXiv Detail & Related papers (2023-07-04T16:22:10Z) - PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences [51.53563462897779]
We propose a point-ordered (PST) convolution to achieve informative representations of point cloud sequences.
PST first disentangles space and time in point cloud sequences, then a spatial convolution is employed to capture local structure points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.
We incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner.
arXiv Detail & Related papers (2022-05-27T02:14:43Z) - IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding
Alignment [58.8330387551499]
We formulate the problem as estimation of point-wise trajectories (i.e., smooth curves)
We propose IDEA-Net, an end-to-end deep learning framework, which disentangles the problem under the assistance of the explicitly learned temporal consistency.
We demonstrate the effectiveness of our method on various point cloud sequences and observe large improvement over state-of-the-art methods both quantitatively and visually.
arXiv Detail & Related papers (2022-03-22T10:14:08Z) - Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information.
We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z) - Anchor-Based Spatial-Temporal Attention Convolutional Networks for
Dynamic 3D Point Cloud Sequences [20.697745449159097]
Anchor-based Spatial-Temporal Attention Convolution operation (ASTAConv) is proposed in this paper to process dynamic 3D point cloud sequences.
The proposed convolution operation builds a regular receptive field around each point by setting several virtual anchors around each point.
The proposed method makes better use of the structured information within the local region, and learn spatial-temporal embedding features from dynamic 3D point cloud sequences.
arXiv Detail & Related papers (2020-12-20T07:35:37Z) - Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR
Point Clouds [2.924868086534434]
This paper introduces a novel approach for 3D point cloud semantic segmentation that exploits multiple projections of the point cloud.
Our Multi-Projection Fusion framework analyzes spherical and bird's-eye view projections using two separate highly-efficient 2D fully convolutional models.
arXiv Detail & Related papers (2020-11-03T19:40:43Z) - 3DMotion-Net: Learning Continuous Flow Function for 3D Motion Prediction [12.323767993152968]
We deal with the problem to predict the future 3D motions of 3D object scans from previous two consecutive frames.
We propose a self-supervised approach that leverages the power of the deep neural network to learn a continuous flow function of 3D point clouds.
We perform extensive experiments on D-FAUST, SCAPE and TOSCA benchmark data sets and the results demonstrate that our approach is capable of handling temporally inconsistent input.
arXiv Detail & Related papers (2020-06-24T17:39:19Z) - Pseudo-LiDAR Point Cloud Interpolation Based on 3D Motion Representation
and Spatial Supervision [68.35777836993212]
We propose a Pseudo-LiDAR point cloud network to generate temporally and spatially high-quality point cloud sequences.
By exploiting the scene flow between point clouds, the proposed network is able to learn a more accurate representation of the 3D spatial motion relationship.
arXiv Detail & Related papers (2020-06-20T03:11:04Z) - A Graph Attention Spatio-temporal Convolutional Network for 3D Human
Pose Estimation in Video [7.647599484103065]
We improve the learning of constraints in human skeleton by modeling local global spatial information via attention mechanisms.
Our approach effectively mitigates depth ambiguity and self-occlusion, generalizes to half upper body estimation, and achieves competitive performance on 2D-to-3D video pose estimation.
arXiv Detail & Related papers (2020-03-11T14:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.