Multi-Person 3D Motion Prediction with Multi-Range Transformers
- URL: http://arxiv.org/abs/2111.12073v1
- Date: Tue, 23 Nov 2021 18:41:13 GMT
- Title: Multi-Person 3D Motion Prediction with Multi-Range Transformers
- Authors: Jiashun Wang, Huazhe Xu, Medhini Narasimhan, Xiaolong Wang
- Abstract summary: We introduce a Multi-Range Transformers model which contains of a local-range encoder for individual motion and a global-range encoder for social interactions.
Our model not only outperforms state-of-the-art methods on long-term 3D motion prediction, but also generates diverse social interactions.
- Score: 16.62864429495888
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel framework for multi-person 3D motion trajectory
prediction. Our key observation is that a human's action and behaviors may
highly depend on the other persons around. Thus, instead of predicting each
human pose trajectory in isolation, we introduce a Multi-Range Transformers
model which contains of a local-range encoder for individual motion and a
global-range encoder for social interactions. The Transformer decoder then
performs prediction for each person by taking a corresponding pose as a query
which attends to both local and global-range encoder features. Our model not
only outperforms state-of-the-art methods on long-term 3D motion prediction,
but also generates diverse social interactions. More interestingly, our model
can even predict 15-person motion simultaneously by automatically dividing the
persons into different interaction groups. Project page with code is available
at https://jiashunwang.github.io/MRT/.
Related papers
- Multi-Transmotion: Pre-trained Model for Human Motion Prediction [68.87010221355223]
Multi-Transmotion is an innovative transformer-based model designed for cross-modality pre-training.
Our methodology demonstrates competitive performance across various datasets on several downstream tasks.
arXiv Detail & Related papers (2024-11-04T23:15:21Z) - Massively Multi-Person 3D Human Motion Forecasting with Scene Context [13.197408989895102]
We propose a scene-aware social transformer model (SAST) to forecast long-term (10s) human motion motion.
We combine a temporal convolutional encoder-decoder architecture with a Transformer-based bottleneck that allows us to efficiently combine motion and scene information.
Our model outperforms other approaches in terms of realism and diversity on different metrics and in a user study.
arXiv Detail & Related papers (2024-09-18T17:58:51Z) - Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - STPOTR: Simultaneous Human Trajectory and Pose Prediction Using a
Non-Autoregressive Transformer for Robot Following Ahead [8.227864212055035]
We develop a neural network model to predict future human motion from an observed human motion history.
We propose a non-autoregressive transformer architecture to leverage its parallel nature for easier training and fast, accurate predictions at test time.
Our model is well-suited for robotic applications in terms of test accuracy and speed favorably with respect to state-of-the-art methods.
arXiv Detail & Related papers (2022-09-15T20:27:54Z) - DMMGAN: Diverse Multi Motion Prediction of 3D Human Joints using
Attention-Based Generative Adverserial Network [9.247294820004143]
We propose a transformer-based generative model for forecasting multiple diverse human motions.
Our model first predicts the pose of the body relative to the hip joint. Then the textitHip Prediction Module predicts the trajectory of the hip movement for each predicted pose frame.
We show that our system outperforms the state-of-the-art in human motion prediction while it can predict diverse multi-motion future trajectories with hip movements.
arXiv Detail & Related papers (2022-09-13T23:22:33Z) - SoMoFormer: Multi-Person Pose Forecasting with Transformers [15.617263162155062]
We present a new method, called Social Motion Transformer (SoMoFormer), for multi-person 3D pose forecasting.
Our transformer architecture uniquely models human motion input as a joint sequence rather than a time sequence.
We show that with this problem reformulation, SoMoFormer naturally extends to multi-person scenes by using the joints of all people in a scene as input queries.
arXiv Detail & Related papers (2022-08-30T06:59:28Z) - Weakly-supervised Action Transition Learning for Stochastic Human Motion
Prediction [81.94175022575966]
We introduce the task of action-driven human motion prediction.
It aims to predict multiple plausible future motions given a sequence of action labels and a short motion history.
arXiv Detail & Related papers (2022-05-31T08:38:07Z) - Generating Smooth Pose Sequences for Diverse Human Motion Prediction [90.45823619796674]
We introduce a unified deep generative network for both diverse and controllable motion prediction.
Our experiments on two standard benchmark datasets, Human3.6M and HumanEva-I, demonstrate that our approach outperforms the state-of-the-art baselines in terms of both sample diversity and accuracy.
arXiv Detail & Related papers (2021-08-19T00:58:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.