MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators
- URL: http://arxiv.org/abs/2306.10900v2
- Date: Mon, 18 Mar 2024 04:14:50 GMT
- Title: MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators
- Authors: Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, Wanli Ouyang,
- Abstract summary: This paper presents a Motion General-Purpose generaTor (MotionGPT) that can use multimodal control signals.
We first quantize multimodal control signals into discrete codes and then formulate them in a unified prompt instruction.
Our MotionGPT demonstrates a unified human motion generation model with multimodal control signals by tuning a mere 0.4% of LLM parameters.
- Score: 108.67006263044772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans. While recent works have achieved impressive results in generating motion directly from textual action descriptions, they often support only a single modality of the control signal, which limits their application in the real digital human industry. This paper presents a Motion General-Purpose generaTor (MotionGPT) that can use multimodal control signals, e.g., text and single-frame poses, for generating consecutive human motions by treating multimodal signals as special input tokens in large language models (LLMs). Specifically, we first quantize multimodal control signals into discrete codes and then formulate them in a unified prompt instruction to ask the LLMs to generate the motion answer. Our MotionGPT demonstrates a unified human motion generation model with multimodal control signals by tuning a mere 0.4% of LLM parameters. To the best of our knowledge, MotionGPT is the first method to generate human motion by multimodal control signals, which we hope can shed light on this new direction. Visit our webpage at https://qiqiapink.github.io/MotionGPT/.
Related papers
- FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models [19.09048969615117]
We explore open-set human motion synthesis using natural language instructions as user control signals based on MLLMs.
Our method can achieve general human motion synthesis for many downstream tasks.
arXiv Detail & Related papers (2024-06-15T21:10:37Z) - MotionLLM: Multimodal Motion-Language Learning with Large Language Models [69.5875073447454]
We propose MotionLLM to achieve single-human, multi-human motion generation and motion captioning.
Specifically, we encode and quantize motions into discrete LLM-understandable tokens, which results in a unified vocabulary consisting of both motion and text tokens.
Our approach is scalable and flexible, allowing easy extension to multi-human motion generation through autoregressive generation of single-human motions.
arXiv Detail & Related papers (2024-05-27T09:57:51Z) - Taming Diffusion Probabilistic Models for Character Control [46.52584236101806]
We present a novel character control framework that responds in real-time to a variety of user-supplied control signals.
At the heart of our method lies a transformer-based Conditional Autoregressive Motion Diffusion Model.
Our work represents the first model that enables real-time generation of high-quality, diverse character animations.
arXiv Detail & Related papers (2024-04-23T15:20:17Z) - Universal Humanoid Motion Representations for Physics-Based Control [71.46142106079292]
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control.
We first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset.
We then create our motion representation by distilling skills directly from the imitator.
arXiv Detail & Related papers (2023-10-06T20:48:43Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion
Synthesis System [51.43113919042621]
We present a neural network-based system for long-term, multi-action human motion synthesis.
The system can produce meaningful motions with smooth transitions from simple user input.
We also present a new dataset dedicated to the multi-action motion synthesis task.
arXiv Detail & Related papers (2022-09-27T07:10:20Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.