Spatiotemporal Transformer Attention Network for 3D Voxel Level Joint
Segmentation and Motion Prediction in Point Cloud
- URL: http://arxiv.org/abs/2203.00138v1
- Date: Mon, 28 Feb 2022 23:18:27 GMT
- Title: Spatiotemporal Transformer Attention Network for 3D Voxel Level Joint
Segmentation and Motion Prediction in Point Cloud
- Authors: Zhensong Wei, Xuewei Qi, Zhengwei Bai, Guoyuan Wu, Saswat Nayak, Peng
Hao, Matthew Barth, Yongkang Liu, and Kentaro Oguchi
- Abstract summary: Motion prediction is key enabler for automated driving systems and intelligent transportation applications.
Current challenges are how to effectively combine different perception tasks into a single backbone.
We propose a novel attention network based on a transformer self-attention mechanism for joint semantic segmentation.
- Score: 9.570438238511073
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Environment perception including detection, classification, tracking, and
motion prediction are key enablers for automated driving systems and
intelligent transportation applications. Fueled by the advances in sensing
technologies and machine learning techniques, LiDAR-based sensing systems have
become a promising solution. The current challenges of this solution are how to
effectively combine different perception tasks into a single backbone and how
to efficiently learn the spatiotemporal features directly from point cloud
sequences. In this research, we propose a novel spatiotemporal attention
network based on a transformer self-attention mechanism for joint semantic
segmentation and motion prediction within a point cloud at the voxel level. The
network is trained to simultaneously outputs the voxel level class and
predicted motion by learning directly from a sequence of point cloud datasets.
The proposed backbone includes both a temporal attention module (TAM) and a
spatial attention module (SAM) to learn and extract the complex spatiotemporal
features. This approach has been evaluated with the nuScenes dataset, and
promising performance has been achieved.
Related papers
- Point Cloud Understanding via Attention-Driven Contrastive Learning [64.65145700121442]
Transformer-based models have advanced point cloud understanding by leveraging self-attention mechanisms.
PointACL is an attention-driven contrastive learning framework designed to address these limitations.
Our method employs an attention-driven dynamic masking strategy that guides the model to focus on under-attended regions.
arXiv Detail & Related papers (2024-11-22T05:41:00Z) - Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations [53.797896854533384]
Class-agnostic motion prediction methods directly predict the motion of the entire point cloud.
While most existing methods rely on fully-supervised learning, the manual labeling of point cloud data is laborious and time-consuming.
We introduce three simple spatial and temporal regularization losses, which facilitate the self-supervised training process effectively.
arXiv Detail & Related papers (2024-03-20T02:58:45Z) - A Generic Approach to Integrating Time into Spatial-Temporal Forecasting
via Conditional Neural Fields [1.7661845949769062]
This paper presents a general approach to integrating the time component into forecasting models.
The main idea is to employ conditional neural fields to represent the auxiliary features extracted from the time component.
Experiments on road traffic and cellular network traffic datasets prove the effectiveness of the proposed approach.
arXiv Detail & Related papers (2023-05-11T14:20:23Z) - Self-Supervised Pillar Motion Learning for Autonomous Driving [10.921208239968827]
We propose a learning framework that leverages free supervisory signals from point clouds and paired camera images to estimate motion purely via self-supervision.
Our model involves a point cloud based structural consistency augmented with probabilistic motion masking as well as a cross-sensor motion regularization to realize the desired self-supervision.
arXiv Detail & Related papers (2021-04-18T02:32:08Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Self-Supervised Learning of Part Mobility from Point Cloud Sequence [9.495859862104515]
We introduce a self-supervised method for segmenting parts and predicting their motion attributes from a point sequence representing a dynamic object.
We generate trajectories by using correlations among successive frames of the sequence.
We evaluate our method on various tasks including motion part segmentation, motion axis prediction and motion range estimation.
arXiv Detail & Related papers (2020-10-20T11:29:46Z) - A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion
Compensation for Action Recognition in the EPIC-Kitchens Dataset [68.8204255655161]
Action recognition is one of the top-challenging research fields in computer vision.
ego-motion recorded sequences have become of important relevance.
The proposed method aims to cope with it by estimating this ego-motion or camera motion.
arXiv Detail & Related papers (2020-08-26T14:44:45Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z) - Any Motion Detector: Learning Class-agnostic Scene Dynamics from a
Sequence of LiDAR Point Clouds [4.640835690336654]
We propose a novel real-time approach of temporal context aggregation for motion detection and motion parameters estimation.
We introduce an ego-motion compensation layer to achieve real-time inference with performance comparable to a naive odometric transform of the original point cloud sequence.
arXiv Detail & Related papers (2020-04-24T10:40:07Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.