SoMoFormer: Social-Aware Motion Transformer for Multi-Person Motion
Prediction
- URL: http://arxiv.org/abs/2208.09224v1
- Date: Fri, 19 Aug 2022 08:57:34 GMT
- Title: SoMoFormer: Social-Aware Motion Transformer for Multi-Person Motion
Prediction
- Authors: Xiaogang Peng, Yaodi Shen, Haoran Wang, Binling Nie, Yigang Wang and
Zizhao Wu
- Abstract summary: We propose a novel Social-Aware Motion Transformer (SoMoFormer) to model individual motion and social interactions in a joint manner.
SoMoFormer extracts motion features from sub-sequences in displacement trajectory space to learn both local and global pose dynamics for each individual.
In addition, we devise a novel social-aware motion attention mechanism in SoMoFormer to further optimize dynamics representations and capture interaction dependencies simultaneously.
- Score: 10.496276090281825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-person motion prediction remains a challenging problem, especially in
the joint representation learning of individual motion and social interactions.
Most prior methods only involve learning local pose dynamics for individual
motion (without global body trajectory) and also struggle to capture complex
interaction dependencies for social interactions. In this paper, we propose a
novel Social-Aware Motion Transformer (SoMoFormer) to effectively model
individual motion and social interactions in a joint manner. Specifically,
SoMoFormer extracts motion features from sub-sequences in displacement
trajectory space to effectively learn both local and global pose dynamics for
each individual. In addition, we devise a novel social-aware motion attention
mechanism in SoMoFormer to further optimize dynamics representations and
capture interaction dependencies simultaneously via motion similarity
calculation across time and social dimensions. On both short- and long-term
horizons, we empirically evaluate our framework on multi-person motion datasets
and demonstrate that our method greatly outperforms state-of-the-art methods of
single- and multi-person motion prediction. Code will be made publicly
available upon acceptance.
Related papers
- KinMo: Kinematic-aware Human Motion Understanding and Generation [6.962697597686156]
Controlling human motion based on text presents an important challenge in computer vision.
Traditional approaches often rely on holistic action descriptions for motion synthesis.
We propose a novel motion representation that decomposes motion into distinct body joint group movements.
arXiv Detail & Related papers (2024-11-23T06:50:11Z) - Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation [52.87672306545577]
Existing motion generation methods primarily focus on the direct synthesis of global motions.
We propose the local action-guided motion diffusion model, which facilitates global motion generation by utilizing local actions as fine-grained control signals.
Our method provides flexibility in seamlessly combining various local actions and continuous guiding weight adjustment.
arXiv Detail & Related papers (2024-07-15T08:35:00Z) - FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis [65.85686550683806]
This paper reconsiders motion generation and proposes to unify the single and multi-person motion by the conditional motion distribution.
Based on our framework, the current single-person motion spatial control method could be seamlessly integrated, achieving precise control of multi-person motion.
arXiv Detail & Related papers (2024-05-24T17:57:57Z) - ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions [66.87211993793807]
We present ReMoS, a denoising diffusion based model that synthesizes full body motion of a person in two person interaction scenario.
We demonstrate ReMoS across challenging two person scenarios such as pair dancing, Ninjutsu, kickboxing, and acrobatics.
We also contribute the ReMoCap dataset for two person interactions containing full body and finger motions.
arXiv Detail & Related papers (2023-11-28T18:59:52Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Persistent-Transient Duality: A Multi-mechanism Approach for Modeling
Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts.
In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline.
This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z) - SoMoFormer: Multi-Person Pose Forecasting with Transformers [15.617263162155062]
We present a new method, called Social Motion Transformer (SoMoFormer), for multi-person 3D pose forecasting.
Our transformer architecture uniquely models human motion input as a joint sequence rather than a time sequence.
We show that with this problem reformulation, SoMoFormer naturally extends to multi-person scenes by using the joints of all people in a scene as input queries.
arXiv Detail & Related papers (2022-08-30T06:59:28Z) - Interaction Transformer for Human Reaction Generation [61.22481606720487]
We propose a novel interaction Transformer (InterFormer) consisting of a Transformer network with both temporal and spatial attentions.
Our method is general and can be used to generate more complex and long-term interactions.
arXiv Detail & Related papers (2022-07-04T19:30:41Z) - Collaborative Motion Prediction via Neural Motion Message Passing [37.72454920355321]
We propose neural motion message passing (NMMP) to explicitly model the interaction and learn representations for directed interactions between actors.
Based on the proposed NMMP, we design the motion prediction systems for two settings: the pedestrian setting and the joint pedestrian and vehicle setting.
Both systems outperform the previous state-of-the-art methods on several existing benchmarks.
arXiv Detail & Related papers (2020-03-14T10:12:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.