Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
- URL: http://arxiv.org/abs/2407.10528v1
- Date: Mon, 15 Jul 2024 08:35:00 GMT
- Title: Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
- Authors: Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Runyi Yu, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen,
- Abstract summary: Existing motion generation methods primarily focus on the direct synthesis of global motions.
We propose the local action-guided motion diffusion model, which facilitates global motion generation by utilizing local actions as fine-grained control signals.
Our method provides flexibility in seamlessly combining various local actions and continuous guiding weight adjustment.
- Score: 52.87672306545577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-motion generation requires not only grounding local actions in language but also seamlessly blending these individual actions to synthesize diverse and realistic global motions. However, existing motion generation methods primarily focus on the direct synthesis of global motions while neglecting the importance of generating and controlling local actions. In this paper, we propose the local action-guided motion diffusion model, which facilitates global motion generation by utilizing local actions as fine-grained control signals. Specifically, we provide an automated method for reference local action sampling and leverage graph attention networks to assess the guiding weight of each local action in the overall motion synthesis. During the diffusion process for synthesizing global motion, we calculate the local-action gradient to provide conditional guidance. This local-to-global paradigm reduces the complexity associated with direct global motion generation and promotes motion diversity via sampling diverse actions as conditions. Extensive experiments on two human motion datasets, i.e., HumanML3D and KIT, demonstrate the effectiveness of our method. Furthermore, our method provides flexibility in seamlessly combining various local actions and continuous guiding weight adjustment, accommodating diverse user preferences, which may hold potential significance for the community. The project page is available at https://jpthu17.github.io/GuidedMotion-project/.
Related papers
- Autonomous Character-Scene Interaction Synthesis from Text Instruction [45.255215402142596]
We introduce a framework for synthesizing multi-stage scene-aware interaction motions directly from a single text instruction and goal location.
Our approach employs an auto-regressive diffusion model to synthesize the next motion segment, along with an autonomous scheduler predicting the transition for each action stage.
We present a comprehensive motion-captured dataset comprising 16 hours of motion sequences in 120 indoor scenes covering 40 types of motions, each annotated with precise language descriptions.
arXiv Detail & Related papers (2024-10-04T06:58:45Z) - FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis [65.85686550683806]
This paper reconsiders motion generation and proposes to unify the single and multi-person motion by the conditional motion distribution.
Based on our framework, the current single-person motion spatial control method could be seamlessly integrated, achieving precise control of multi-person motion.
arXiv Detail & Related papers (2024-05-24T17:57:57Z) - LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model [23.864126853396527]
We introduce LGTM, a novel Local-to-Global pipeline for Text-to-Motion generation.
It aims to address the challenge of accurately translating textual descriptions into semantically coherent human motion in computer animation.
Our experiments demonstrate that LGTM gains significant improvements in generating locally semantically-aligned human motion.
arXiv Detail & Related papers (2024-05-06T13:56:56Z) - Guided Motion Diffusion for Controllable Human Motion Synthesis [18.660523853430497]
We propose Guided Motion Diffusion (GMD), a method that incorporates spatial constraints into the motion generation process.
Specifically, we propose an effective feature projection scheme that manipulates motion representation to enhance the coherency between spatial information and local poses.
Our experiments justify the development of GMD, which achieves a significant improvement over state-of-the-art methods in text-based motion generation.
arXiv Detail & Related papers (2023-05-21T21:54:31Z) - Human MotionFormer: Transferring Human Motions with Vision Transformers [73.48118882676276]
Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis.
We propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching.
Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-02-22T11:42:44Z) - Motion Transformer with Global Intention Localization and Local Movement
Refinement [103.75625476231401]
Motion TRansformer (MTR) models motion prediction as the joint optimization of global intention localization and local movement refinement.
MTR achieves state-of-the-art performance on both the marginal and joint motion prediction challenges.
arXiv Detail & Related papers (2022-09-27T16:23:14Z) - SoMoFormer: Social-Aware Motion Transformer for Multi-Person Motion
Prediction [10.496276090281825]
We propose a novel Social-Aware Motion Transformer (SoMoFormer) to model individual motion and social interactions in a joint manner.
SoMoFormer extracts motion features from sub-sequences in displacement trajectory space to learn both local and global pose dynamics for each individual.
In addition, we devise a novel social-aware motion attention mechanism in SoMoFormer to further optimize dynamics representations and capture interaction dependencies simultaneously.
arXiv Detail & Related papers (2022-08-19T08:57:34Z) - Global-local Motion Transformer for Unsupervised Skeleton-based Action
Learning [23.051184131833292]
We propose a new transformer model for the task of unsupervised learning of skeleton motion sequences.
The proposed model successfully learns local dynamics of the joints and captures global context from the motion sequences.
arXiv Detail & Related papers (2022-07-13T10:18:07Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for
Mobile Manipulation [99.2543521972137]
ReLMoGen is a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals.
Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments.
ReLMoGen shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.
arXiv Detail & Related papers (2020-08-18T08:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.