ActFormer: A GAN Transformer Framework towards General
Action-Conditioned 3D Human Motion Generation
- URL: http://arxiv.org/abs/2203.07706v1
- Date: Tue, 15 Mar 2022 07:50:12 GMT
- Title: ActFormer: A GAN Transformer Framework towards General
Action-Conditioned 3D Human Motion Generation
- Authors: Ziyang Song, Dongliang Wang, Nan Jiang, Zhicheng Fang, Chenjing Ding,
Weihao Gan, Wei Wu
- Abstract summary: We present a GAN Transformer framework for general action-conditioned 3D human motion generation.
Our approach consists of a powerful Action-conditioned transFormer (ActFormer) under a GAN training scheme.
ActFormer can be naturally extended to multi-person motions by alternately modeling temporal correlations and human interactions with Transformer encoders.
- Score: 16.1094669439815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a GAN Transformer framework for general action-conditioned 3D
human motion generation, including not only single-person actions but also
multi-person interactive actions. Our approach consists of a powerful
Action-conditioned motion transFormer (ActFormer) under a GAN training scheme,
equipped with a Gaussian Process latent prior. Such a design combines the
strong spatio-temporal representation capacity of Transformer, superiority in
generative modeling of GAN, and inherent temporal correlations from latent
prior. Furthermore, ActFormer can be naturally extended to multi-person motions
by alternately modeling temporal correlations and human interactions with
Transformer encoders. We validate our approach by comparison with other methods
on larger-scale benchmarks, including NTU RGB+D 120 and BABEL. We also
introduce a new synthetic dataset of complex multi-person combat behaviors to
facilitate research on multi-person motion generation. Our method demonstrates
adaptability to various human motion representations and achieves leading
performance over SOTA methods on both single-person and multi-person motion
generation tasks, indicating a hopeful step towards a universal human motion
generator.
Related papers
- Temporal and Interactive Modeling for Efficient Human-Human Motion Generation [30.857021853999644]
We introduce TIM (Temporal and Interactive Modeling), an efficient and effective approach that presents the pioneering human-human motion generation model.
Specifically, we first propose Causal Interactive Injection to leverage the temporal properties of motion sequences and avoid non-causal and cumbersome modeling.
Finally, to generate smoother and more rational motion, we design Localized Pattern Amplification to capture short-term motion patterns.
arXiv Detail & Related papers (2024-08-30T09:22:07Z) - Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs [67.59291068131438]
Motion-Agent is a conversational framework designed for general human motion generation, editing, and understanding.
Motion-Agent employs an open-source pre-trained language model to develop a generative agent, MotionLLM, that bridges the gap between motion and text.
arXiv Detail & Related papers (2024-05-27T09:57:51Z) - FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis [65.85686550683806]
This paper reconsiders motion generation and proposes to unify the single and multi-person motion by the conditional motion distribution.
Based on our framework, the current single-person motion spatial control method could be seamlessly integrated, achieving precise control of multi-person motion.
arXiv Detail & Related papers (2024-05-24T17:57:57Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - TransFusion: A Practical and Effective Transformer-based Diffusion Model
for 3D Human Motion Prediction [1.8923948104852863]
We propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction.
Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers.
In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization, we treat all inputs, including conditions, as tokens to create a more lightweight model.
arXiv Detail & Related papers (2023-07-30T01:52:07Z) - Persistent-Transient Duality: A Multi-mechanism Approach for Modeling
Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts.
In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline.
This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z) - Stochastic Multi-Person 3D Motion Forecasting [21.915057426589744]
We deal with the ignored real-world complexities in prior work on human motion forecasting.
Our framework is general; we instantiate it with different generative models.
Our approach produces diverse and accurate multi-person predictions, significantly outperforming the state of the art.
arXiv Detail & Related papers (2023-06-08T17:59:09Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - MUGL: Large Scale Multi Person Conditional Action Generation with
Locomotion [9.30315673109153]
MUGL is a novel deep neural model for large-scale, diverse generation of single and multi-person pose-based action sequences with locomotion.
Our controllable approach enables variable-length generations customizable by action category, across more than 100 categories.
arXiv Detail & Related papers (2021-10-21T20:11:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.