STPOTR: Simultaneous Human Trajectory and Pose Prediction Using a
Non-Autoregressive Transformer for Robot Following Ahead
- URL: http://arxiv.org/abs/2209.07600v1
- Date: Thu, 15 Sep 2022 20:27:54 GMT
- Title: STPOTR: Simultaneous Human Trajectory and Pose Prediction Using a
Non-Autoregressive Transformer for Robot Following Ahead
- Authors: Mohammad Mahdavian, Payam Nikdel, Mahdi TaherAhmadi and Mo Chen
- Abstract summary: We develop a neural network model to predict future human motion from an observed human motion history.
We propose a non-autoregressive transformer architecture to leverage its parallel nature for easier training and fast, accurate predictions at test time.
Our model is well-suited for robotic applications in terms of test accuracy and speed favorably with respect to state-of-the-art methods.
- Score: 8.227864212055035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we develop a neural network model to predict future human
motion from an observed human motion history. We propose a non-autoregressive
transformer architecture to leverage its parallel nature for easier training
and fast, accurate predictions at test time. The proposed architecture divides
human motion prediction into two parts: 1) the human trajectory, which is the
hip joint 3D position over time and 2) the human pose which is the all other
joints 3D positions over time with respect to a fixed hip joint. We propose to
make the two predictions simultaneously, as the shared representation can
improve the model performance. Therefore, the model consists of two sets of
encoders and decoders. First, a multi-head attention module applied to encoder
outputs improves human trajectory. Second, another multi-head self-attention
module applied to encoder outputs concatenated with decoder outputs facilitates
learning of temporal dependencies. Our model is well-suited for robotic
applications in terms of test accuracy and speed, and compares favorably with
respect to state-of-the-art methods. We demonstrate the real-world
applicability of our work via the Robot Follow-Ahead task, a challenging yet
practical case study for our proposed model.
Related papers
- Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - TransFusion: A Practical and Effective Transformer-based Diffusion Model
for 3D Human Motion Prediction [1.8923948104852863]
We propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction.
Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers.
In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization, we treat all inputs, including conditions, as tokens to create a more lightweight model.
arXiv Detail & Related papers (2023-07-30T01:52:07Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - SPOTR: Spatio-temporal Pose Transformers for Human Motion Prediction [12.248428883804763]
3D human motion prediction is a research area computation of high significance and a challenge in computer vision.
Traditionally, autogregressive models have been used to predict human motion.
We present a non-autoregressive model for human motion prediction.
arXiv Detail & Related papers (2023-03-11T01:44:29Z) - Robust Human Motion Forecasting using Transformer-based Model [14.088942546585068]
We propose a new model based on Transformer that deals with the real time 3D human motion forecasting in short and long term.
Our model is tested in conditions where the human motion is severely occluded, demonstrating its robustness in reconstructing and predicting 3D human motion in a highly noisy environment.
Our model reduces in 8.89% the mean squared error of ST-Transformer in short-term prediction, and 2.57% in long-term prediction in Humanrere3.6M dataset with 400ms input prefix.
arXiv Detail & Related papers (2023-02-16T13:06:39Z) - MotionBERT: A Unified Perspective on Learning Human Motion
Representations [46.67364057245364]
We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources.
We propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations.
We implement motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network.
arXiv Detail & Related papers (2022-10-12T19:46:25Z) - T3VIP: Transformation-based 3D Video Prediction [49.178585201673364]
We propose a 3D video prediction (T3VIP) approach that explicitly models the 3D motion by decomposing a scene into its object parts.
Our model is fully unsupervised, captures the nature of the real world, and the observational cues in image and point cloud domains constitute its learning signals.
To the best of our knowledge, our model is the first generative model that provides an RGB-D video prediction of the future for a static camera.
arXiv Detail & Related papers (2022-09-19T15:01:09Z) - Multi-Person 3D Motion Prediction with Multi-Range Transformers [16.62864429495888]
We introduce a Multi-Range Transformers model which contains of a local-range encoder for individual motion and a global-range encoder for social interactions.
Our model not only outperforms state-of-the-art methods on long-term 3D motion prediction, but also generates diverse social interactions.
arXiv Detail & Related papers (2021-11-23T18:41:13Z) - Future Frame Prediction for Robot-assisted Surgery [57.18185972461453]
We propose a ternary prior guided variational autoencoder (TPG-VAE) model for future frame prediction in robotic surgical video sequences.
Besides content distribution, our model learns motion distribution, which is novel to handle the small movements of surgical tools.
arXiv Detail & Related papers (2021-03-18T15:12:06Z) - Motion Prediction Using Temporal Inception Module [96.76721173517895]
We propose a Temporal Inception Module (TIM) to encode human motion.
Our framework produces input embeddings using convolutional layers, by using different kernel sizes for different input lengths.
The experimental results on standard motion prediction benchmark datasets Human3.6M and CMU motion capture dataset show that our approach consistently outperforms the state of the art methods.
arXiv Detail & Related papers (2020-10-06T20:26:01Z) - End-to-end Contextual Perception and Prediction with Interaction
Transformer [79.14001602890417]
We tackle the problem of detecting objects in 3D and forecasting their future motion in the context of self-driving.
To capture their spatial-temporal dependencies, we propose a recurrent neural network with a novel Transformer architecture.
Our model can be trained end-to-end, and runs in real-time.
arXiv Detail & Related papers (2020-08-13T14:30:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.