Related papers: Multi-Person 3D Motion Prediction with Multi-Range Transformers

Multi-Person 3D Motion Prediction with Multi-Range Transformers

URL: http://arxiv.org/abs/2111.12073v1
Date: Tue, 23 Nov 2021 18:41:13 GMT
Title: Multi-Person 3D Motion Prediction with Multi-Range Transformers
Authors: Jiashun Wang, Huazhe Xu, Medhini Narasimhan, Xiaolong Wang
Abstract summary: We introduce a Multi-Range Transformers model which contains of a local-range encoder for individual motion and a global-range encoder for social interactions. Our model not only outperforms state-of-the-art methods on long-term 3D motion prediction, but also generates diverse social interactions.
Score: 16.62864429495888
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel framework for multi-person 3D motion trajectory prediction. Our key observation is that a human's action and behaviors may highly depend on the other persons around. Thus, instead of predicting each human pose trajectory in isolation, we introduce a Multi-Range Transformers model which contains of a local-range encoder for individual motion and a global-range encoder for social interactions. The Transformer decoder then performs prediction for each person by taking a corresponding pose as a query which attends to both local and global-range encoder features. Our model not only outperforms state-of-the-art methods on long-term 3D motion prediction, but also generates diverse social interactions. More interestingly, our model can even predict 15-person motion simultaneously by automatically dividing the persons into different interaction groups. Project page with code is available at https://jiashunwang.github.io/MRT/.

Related papers

UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction [0.688204255655161]
We propose a technique to predict full-body pose and trajectory key-points in a global coordinate frame.<n>We use an off-the-shelf 3D human pose estimation module, a graph attention network, and a compact, non-autoregressive transformer.<n>In comparison to prior work, we show that our approach is compact, real-time, and accurate in predicting human navigation motion across all datasets.
arXiv Detail & Related papers (2025-05-20T19:57:25Z)
Multi-Transmotion: Pre-trained Model for Human Motion Prediction [68.87010221355223]
Multi-Transmotion is an innovative transformer-based model designed for cross-modality pre-training. Our methodology demonstrates competitive performance across various datasets on several downstream tasks.
arXiv Detail & Related papers (2024-11-04T23:15:21Z)
Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes [90.39860012099393]
Sitcom-Crafter is a system for human motion generation in 3D space. Central to the function generation modules is our novel 3D scene-aware human-human interaction module. Augmentation modules encompass plot comprehension for command generation, motion synchronization for seamless integration of different motion types.
arXiv Detail & Related papers (2024-10-14T17:56:19Z)
Massively Multi-Person 3D Human Motion Forecasting with Scene Context [13.197408989895102]
We propose a scene-aware social transformer model (SAST) to forecast long-term (10s) human motion motion. We combine a temporal convolutional encoder-decoder architecture with a Transformer-based bottleneck that allows us to efficiently combine motion and scene information. Our model outperforms other approaches in terms of realism and diversity on different metrics and in a user study.
arXiv Detail & Related papers (2024-09-18T17:58:51Z)
Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior. Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z)
InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z)
Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations. Our method generates continuous motions that are parameterized only by the temporal coordinate. This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z)
STPOTR: Simultaneous Human Trajectory and Pose Prediction Using a Non-Autoregressive Transformer for Robot Following Ahead [8.227864212055035]
We develop a neural network model to predict future human motion from an observed human motion history. We propose a non-autoregressive transformer architecture to leverage its parallel nature for easier training and fast, accurate predictions at test time. Our model is well-suited for robotic applications in terms of test accuracy and speed favorably with respect to state-of-the-art methods.
arXiv Detail & Related papers (2022-09-15T20:27:54Z)
DMMGAN: Diverse Multi Motion Prediction of 3D Human Joints using Attention-Based Generative Adverserial Network [9.247294820004143]
We propose a transformer-based generative model for forecasting multiple diverse human motions. Our model first predicts the pose of the body relative to the hip joint. Then the textitHip Prediction Module predicts the trajectory of the hip movement for each predicted pose frame. We show that our system outperforms the state-of-the-art in human motion prediction while it can predict diverse multi-motion future trajectories with hip movements.
arXiv Detail & Related papers (2022-09-13T23:22:33Z)
SoMoFormer: Multi-Person Pose Forecasting with Transformers [15.617263162155062]
We present a new method, called Social Motion Transformer (SoMoFormer), for multi-person 3D pose forecasting. Our transformer architecture uniquely models human motion input as a joint sequence rather than a time sequence. We show that with this problem reformulation, SoMoFormer naturally extends to multi-person scenes by using the joints of all people in a scene as input queries.
arXiv Detail & Related papers (2022-08-30T06:59:28Z)
Weakly-supervised Action Transition Learning for Stochastic Human Motion Prediction [81.94175022575966]
We introduce the task of action-driven human motion prediction. It aims to predict multiple plausible future motions given a sequence of action labels and a short motion history.
arXiv Detail & Related papers (2022-05-31T08:38:07Z)
Generating Smooth Pose Sequences for Diverse Human Motion Prediction [90.45823619796674]
We introduce a unified deep generative network for both diverse and controllable motion prediction. Our experiments on two standard benchmark datasets, Human3.6M and HumanEva-I, demonstrate that our approach outperforms the state-of-the-art baselines in terms of both sample diversity and accuracy.
arXiv Detail & Related papers (2021-08-19T00:58:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.