Related papers: ActFormer: A GAN Transformer Framework towards General Action-Conditioned 3D Human Motion Generation

ActFormer: A GAN Transformer Framework towards General Action-Conditioned 3D Human Motion Generation

URL: http://arxiv.org/abs/2203.07706v1
Date: Tue, 15 Mar 2022 07:50:12 GMT
Title: ActFormer: A GAN Transformer Framework towards General Action-Conditioned 3D Human Motion Generation
Authors: Ziyang Song, Dongliang Wang, Nan Jiang, Zhicheng Fang, Chenjing Ding, Weihao Gan, Wei Wu
Abstract summary: We present a GAN Transformer framework for general action-conditioned 3D human motion generation. Our approach consists of a powerful Action-conditioned transFormer (ActFormer) under a GAN training scheme. ActFormer can be naturally extended to multi-person motions by alternately modeling temporal correlations and human interactions with Transformer encoders.
Score: 16.1094669439815
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a GAN Transformer framework for general action-conditioned 3D human motion generation, including not only single-person actions but also multi-person interactive actions. Our approach consists of a powerful Action-conditioned motion transFormer (ActFormer) under a GAN training scheme, equipped with a Gaussian Process latent prior. Such a design combines the strong spatio-temporal representation capacity of Transformer, superiority in generative modeling of GAN, and inherent temporal correlations from latent prior. Furthermore, ActFormer can be naturally extended to multi-person motions by alternately modeling temporal correlations and human interactions with Transformer encoders. We validate our approach by comparison with other methods on larger-scale benchmarks, including NTU RGB+D 120 and BABEL. We also introduce a new synthetic dataset of complex multi-person combat behaviors to facilitate research on multi-person motion generation. Our method demonstrates adaptability to various human motion representations and achieves leading performance over SOTA methods on both single-person and multi-person motion generation tasks, indicating a hopeful step towards a universal human motion generator.

Related papers

Two-in-One: Unified Multi-Person Interactive Motion Generation by Latent Diffusion Transformer [24.166147954731652]
Multi-person interactive motion generation is a critical yet under-explored domain in computer character animation. Current research often employs separate module branches for individual motions, leading to a loss of interaction information. We propose a novel, unified approach that models multi-person motions and their interactions within a single latent space.
arXiv Detail & Related papers (2024-12-21T15:35:50Z)
CoMA: Compositional Human Motion Generation with Multi-modal Agents [22.151443524452876]
CoMA is an agent-based solution for complex human motion generation, editing, and comprehension. Our framework enables generation of both short and long motion sequences with detailed instructions, text-guided motion editing, and self-correction.
arXiv Detail & Related papers (2024-12-10T09:08:41Z)
Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs [67.59291068131438]
Motion-Agent is a conversational framework designed for general human motion generation, editing, and understanding. Motion-Agent employs an open-source pre-trained language model to develop a generative agent, MotionLLM, that bridges the gap between motion and text.
arXiv Detail & Related papers (2024-05-27T09:57:51Z)
FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis [65.85686550683806]
This paper reconsiders motion generation and proposes to unify the single and multi-person motion by the conditional motion distribution. Based on our framework, the current single-person motion spatial control method could be seamlessly integrated, achieving precise control of multi-person motion.
arXiv Detail & Related papers (2024-05-24T17:57:57Z)
in2IN: Leveraging individual Information to Generate Human INteractions [29.495166514135295]
We introduce in2IN, a novel diffusion model for human-human motion generation conditioned on individual descriptions. We also propose DualMDM, a model composition technique that combines the motions generated with in2IN and the motions generated by a single-person motion prior pre-trained on HumanML3D.
arXiv Detail & Related papers (2024-04-15T17:59:04Z)
InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z)
TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction [1.8923948104852863]
We propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction. Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers. In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization, we treat all inputs, including conditions, as tokens to create a more lightweight model.
arXiv Detail & Related papers (2023-07-30T01:52:07Z)
Persistent-Transient Duality: A Multi-mechanism Approach for Modeling Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts. In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline. This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z)
Stochastic Multi-Person 3D Motion Forecasting [21.915057426589744]
We deal with the ignored real-world complexities in prior work on human motion forecasting. Our framework is general; we instantiate it with different generative models. Our approach produces diverse and accurate multi-person predictions, significantly outperforming the state of the art.
arXiv Detail & Related papers (2023-06-08T17:59:09Z)
Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations. Our method generates continuous motions that are parameterized only by the temporal coordinate. This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z)
Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs. Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z)
MUGL: Large Scale Multi Person Conditional Action Generation with Locomotion [9.30315673109153]
MUGL is a novel deep neural model for large-scale, diverse generation of single and multi-person pose-based action sequences with locomotion. Our controllable approach enables variable-length generations customizable by action category, across more than 100 categories.
arXiv Detail & Related papers (2021-10-21T20:11:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.