SoMoFormer: Multi-Person Pose Forecasting with Transformers
- URL: http://arxiv.org/abs/2208.14023v1
- Date: Tue, 30 Aug 2022 06:59:28 GMT
- Title: SoMoFormer: Multi-Person Pose Forecasting with Transformers
- Authors: Edward Vendrow, Satyajit Kumar, Ehsan Adeli, Hamid Rezatofighi
- Abstract summary: We present a new method, called Social Motion Transformer (SoMoFormer), for multi-person 3D pose forecasting.
Our transformer architecture uniquely models human motion input as a joint sequence rather than a time sequence.
We show that with this problem reformulation, SoMoFormer naturally extends to multi-person scenes by using the joints of all people in a scene as input queries.
- Score: 15.617263162155062
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human pose forecasting is a challenging problem involving complex human body
motion and posture dynamics. In cases that there are multiple people in the
environment, one's motion may also be influenced by the motion and dynamic
movements of others. Although there are several previous works targeting the
problem of multi-person dynamic pose forecasting, they often model the entire
pose sequence as time series (ignoring the underlying relationship between
joints) or only output the future pose sequence of one person at a time. In
this paper, we present a new method, called Social Motion Transformer
(SoMoFormer), for multi-person 3D pose forecasting. Our transformer
architecture uniquely models human motion input as a joint sequence rather than
a time sequence, allowing us to perform attention over joints while predicting
an entire future motion sequence for each joint in parallel. We show that with
this problem reformulation, SoMoFormer naturally extends to multi-person scenes
by using the joints of all people in a scene as input queries. Using learned
embeddings to denote the type of joint, person identity, and global position,
our model learns the relationships between joints and between people, attending
more strongly to joints from the same or nearby people. SoMoFormer outperforms
state-of-the-art methods for long-term motion prediction on the SoMoF benchmark
as well as the CMU-Mocap and MuPoTS-3D datasets. Code will be made available
after publication.
Related papers
- ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions [66.87211993793807]
We present ReMoS, a denoising diffusion based model that synthesizes full body motion of a person in two person interaction scenario.
We demonstrate ReMoS across challenging two person scenarios such as pair dancing, Ninjutsu, kickboxing, and acrobatics.
We also contribute the ReMoCap dataset for two person interactions containing full body and finger motions.
arXiv Detail & Related papers (2023-11-28T18:59:52Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis [14.347147051922175]
We present a novel task-independent model called UNIMASK-M, which can effectively address challenges using a unified architecture.
Inspired by Vision TransformersVi (Ts), our UNIMASK-M model decomposes a human pose into body parts to leverage thetemporal relationships existing in human motion.
Experimental results show that our model successfully forecasts human motion on the Human3.6M dataset.
arXiv Detail & Related papers (2023-08-14T17:39:44Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - DMMGAN: Diverse Multi Motion Prediction of 3D Human Joints using
Attention-Based Generative Adverserial Network [9.247294820004143]
We propose a transformer-based generative model for forecasting multiple diverse human motions.
Our model first predicts the pose of the body relative to the hip joint. Then the textitHip Prediction Module predicts the trajectory of the hip movement for each predicted pose frame.
We show that our system outperforms the state-of-the-art in human motion prediction while it can predict diverse multi-motion future trajectories with hip movements.
arXiv Detail & Related papers (2022-09-13T23:22:33Z) - SoMoFormer: Social-Aware Motion Transformer for Multi-Person Motion
Prediction [10.496276090281825]
We propose a novel Social-Aware Motion Transformer (SoMoFormer) to model individual motion and social interactions in a joint manner.
SoMoFormer extracts motion features from sub-sequences in displacement trajectory space to learn both local and global pose dynamics for each individual.
In addition, we devise a novel social-aware motion attention mechanism in SoMoFormer to further optimize dynamics representations and capture interaction dependencies simultaneously.
arXiv Detail & Related papers (2022-08-19T08:57:34Z) - Motion Prediction via Joint Dependency Modeling in Phase Space [40.54430409142653]
We introduce a novel convolutional neural model to leverage explicit prior knowledge of motion anatomy.
We then propose a global optimization module that learns the implicit relationships between individual joint features.
Our method is evaluated on large-scale 3D human motion benchmark datasets.
arXiv Detail & Related papers (2022-01-07T08:30:01Z) - Generating Smooth Pose Sequences for Diverse Human Motion Prediction [90.45823619796674]
We introduce a unified deep generative network for both diverse and controllable motion prediction.
Our experiments on two standard benchmark datasets, Human3.6M and HumanEva-I, demonstrate that our approach outperforms the state-of-the-art baselines in terms of both sample diversity and accuracy.
arXiv Detail & Related papers (2021-08-19T00:58:00Z) - Perpetual Motion: Generating Unbounded Human Motion [61.40259979876424]
We focus on long-term prediction; that is, generating long sequences of human motion that is plausible.
We propose a model to generate non-deterministic, textitever-changing, perpetual human motion.
We train this using a heavy-tailed function of the KL divergence of a white-noise Gaussian process, allowing latent sequence temporal dependency.
arXiv Detail & Related papers (2020-07-27T21:50:36Z) - Socially and Contextually Aware Human Motion and Pose Forecasting [48.083060946226]
We propose a novel framework to tackle both tasks of human motion (or skeleton pose) and body skeleton pose forecasting.
We consider incorporating both scene and social contexts, as critical clues for this prediction task.
Our proposed framework achieves a superior performance compared to several baselines on two social datasets.
arXiv Detail & Related papers (2020-07-14T06:12:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.