MotionGPT: Human Motion as a Foreign Language
- URL: http://arxiv.org/abs/2306.14795v2
- Date: Thu, 20 Jul 2023 03:39:19 GMT
- Title: MotionGPT: Human Motion as a Foreign Language
- Authors: Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, Tao Chen
- Abstract summary: Human motion displays a semantic coupling akin to human language, often perceived as a form of body language.
By fusing language data with large-scale motion models, motion-language pre-training can enhance the performance of motion-related tasks.
We propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks.
- Score: 47.21648303282788
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Though the advancement of pre-trained large language models unfolds, the
exploration of building a unified model for language and other multi-modal
data, such as motion, remains challenging and untouched so far. Fortunately,
human motion displays a semantic coupling akin to human language, often
perceived as a form of body language. By fusing language data with large-scale
motion models, motion-language pre-training that can enhance the performance of
motion-related tasks becomes feasible. Driven by this insight, we propose
MotionGPT, a unified, versatile, and user-friendly motion-language model to
handle multiple motion-relevant tasks. Specifically, we employ the discrete
vector quantization for human motion and transfer 3D motion into motion tokens,
similar to the generation process of word tokens. Building upon this "motion
vocabulary", we perform language modeling on both motion and text in a unified
manner, treating human motion as a specific language. Moreover, inspired by
prompt learning, we pre-train MotionGPT with a mixture of motion-language data
and fine-tune it on prompt-based question-and-answer tasks. Extensive
experiments demonstrate that MotionGPT achieves state-of-the-art performances
on multiple motion tasks including text-driven motion generation, motion
captioning, motion prediction, and motion in-between.
Related papers
- Human Motion Instruction Tuning [30.71209562108675]
This paper presents LLaMo, a framework for human motion instruction tuning.
LLaMo retains motion in its native form for instruction tuning.
By processing both video and motion data alongside textual inputs, LLaMo enables a flexible, human-centric analysis.
arXiv Detail & Related papers (2024-11-25T14:38:43Z) - MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding [76.30210465222218]
MotionGPT-2 is a unified Large Motion-Language Model (LMLMLM)
It supports multimodal control conditions through pre-trained Large Language Models (LLMs)
It is highly adaptable to the challenging 3D holistic motion generation task.
arXiv Detail & Related papers (2024-10-29T05:25:34Z) - MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations [85.85596165472663]
We build MotionBank, which comprises 13 video action datasets, 1.24M motion sequences, and 132.9M frames of natural and diverse human motions.
Our MotionBank is beneficial for general motion-related tasks of human motion generation, motion in-context generation, and motion understanding.
arXiv Detail & Related papers (2024-10-17T17:31:24Z) - MotionLLM: Understanding Human Behaviors from Human Motions and Videos [40.132643319573205]
This study delves into the realm of multi-modality (i.e., video and motion modalities) human behavior understanding.
We present MotionLLM, a framework for human motion understanding, captioning, and reasoning.
arXiv Detail & Related papers (2024-05-30T17:59:50Z) - Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs [67.59291068131438]
Motion-Agent is a conversational framework designed for general human motion generation, editing, and understanding.
Motion-Agent employs an open-source pre-trained language model to develop a generative agent, MotionLLM, that bridges the gap between motion and text.
arXiv Detail & Related papers (2024-05-27T09:57:51Z) - MotionChain: Conversational Motion Controllers via Multimodal Prompts [25.181069337771127]
We present MotionChain, a conversational human motion controller to generate continuous and long-term human motion through multimodal prompts.
By leveraging large-scale language, vision-language, and vision-motion data, MotionChain comprehends each instruction in multi-turn conversation and generates human motions followed by these prompts.
arXiv Detail & Related papers (2024-04-02T07:09:29Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z) - Language-Guided Face Animation by Recurrent StyleGAN-based Generator [87.56260982475564]
We study a novel task, language-guided face animation, that aims to animate a static face image with the help of languages.
We propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.
arXiv Detail & Related papers (2022-08-11T02:57:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.