Related papers: MotionChain: Conversational Motion Controllers via Multimodal Prompts

MotionChain: Conversational Motion Controllers via Multimodal Prompts

URL: http://arxiv.org/abs/2404.01700v2
Date: Wed, 3 Apr 2024 06:40:46 GMT
Title: MotionChain: Conversational Motion Controllers via Multimodal Prompts
Authors: Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang YU, Jiayuan Fan,
Abstract summary: We present MotionChain, a conversational human motion controller to generate continuous and long-term human motion through multimodal prompts. By leveraging large-scale language, vision-language, and vision-motion data, MotionChain comprehends each instruction in multi-turn conversation and generates human motions followed by these prompts.
Score: 25.181069337771127
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent advancements in language models have demonstrated their adeptness in conducting multi-turn dialogues and retaining conversational context. However, this proficiency remains largely unexplored in other multimodal generative models, particularly in human motion models. By integrating multi-turn conversations in controlling continuous virtual human movements, generative human motion models can achieve an intuitive and step-by-step process of human task execution for humanoid robotics, game agents, or other embodied systems. In this work, we present MotionChain, a conversational human motion controller to generate continuous and long-term human motion through multimodal prompts. Specifically, MotionChain consists of multi-modal tokenizers that transform various data types such as text, image, and motion, into discrete tokens, coupled with a Vision-Motion-aware Language model. By leveraging large-scale language, vision-language, and vision-motion data to assist motion-related generation tasks, MotionChain thus comprehends each instruction in multi-turn conversation and generates human motions followed by these prompts. Extensive experiments validate the efficacy of MotionChain, demonstrating state-of-the-art performance in conversational motion generation, as well as more intuitive manners of controlling and interacting with virtual humans.

Related papers

Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset [113.25650486482762]
We introduce the Seamless Interaction dataset, a large-scale collection of over 4,000 hours of face-to-face interaction footage.<n>This dataset enables the development of AI technologies that understand dyadic embodied dynamics.<n>We develop a suite of models that utilize the dataset to generate dyadic motion gestures and facial expressions aligned with human speech.
arXiv Detail & Related papers (2025-06-27T18:09:49Z)
TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation [7.900728371180723]
We present TokenMotion, the first DiT-based video diffusion framework that enables fine-grained control over camera motion. Our approach introduces a unified modeling framework utilizing a decouple-and-fuse strategy, bridged by a human-aware dynamic mask. Our work represents a significant advancement in controllable video generation, with particular relevance for creative production applications.
arXiv Detail & Related papers (2025-04-11T00:41:25Z)
ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis [37.60532857094311]
ChatMotion is a multimodal multi-agent framework for human motion analysis. It interprets user intent, decomposes complex tasks into meta-tasks, and activates specialized function modules for motion comprehension. It integrates multiple specialized modules, such as the MotionCore, to analyze human motion from various perspectives.
arXiv Detail & Related papers (2025-02-25T13:12:55Z)
Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories. We translate high-level user requests into detailed, semi-dense motion prompts. We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z)
EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning [10.266351600604612]
This paper introduces a framework, called EMOTION, for generating expressive motion sequences in humanoid robots. We conduct online user studies comparing the naturalness and understandability of the motions generated by EMOTION and its human-feedback version, EMOTION++.
arXiv Detail & Related papers (2024-10-30T17:22:45Z)
Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes [83.55301458112672]
Sitcom-Crafter is a system for human motion generation in 3D space. Central to the function generation modules is our novel 3D scene-aware human-human interaction module. Augmentation modules encompass plot comprehension for command generation, motion synchronization for seamless integration of different motion types.
arXiv Detail & Related papers (2024-10-14T17:56:19Z)
Versatile Motion Language Models for Multi-Turn Interactive Agents [28.736843383405603]
We introduce Versatile Interactive Motion language model, which integrates both language and motion modalities. We evaluate the versatility of our method across motion-related tasks, motion to text, text to motion, reaction generation, motion editing, and reasoning about motion sequences.
arXiv Detail & Related papers (2024-10-08T02:23:53Z)
MotionLLM: Understanding Human Behaviors from Human Motions and Videos [40.132643319573205]
This study delves into the realm of multi-modality (i.e., video and motion modalities) human behavior understanding. We present MotionLLM, a framework for human motion understanding, captioning, and reasoning.
arXiv Detail & Related papers (2024-05-30T17:59:50Z)
Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs [67.59291068131438]
Motion-Agent is a conversational framework designed for general human motion generation, editing, and understanding. Motion-Agent employs an open-source pre-trained language model to develop a generative agent, MotionLLM, that bridges the gap between motion and text.
arXiv Detail & Related papers (2024-05-27T09:57:51Z)
ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis [50.69464138626748]
We present ConvoFusion, a diffusion-based approach for multi-modal gesture synthesis. Our method proposes two guidance objectives that allow the users to modulate the impact of different conditioning modalities. Our method is versatile in that it can be trained either for generating monologue gestures or even the conversational gestures.
arXiv Detail & Related papers (2024-03-26T17:59:52Z)
MotionGPT: Human Motion as a Foreign Language [47.21648303282788]
Human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training can enhance the performance of motion-related tasks. We propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks.
arXiv Detail & Related papers (2023-06-26T15:53:02Z)
Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations. Our method generates continuous motions that are parameterized only by the temporal coordinate. This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z)
Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion [89.01668641930206]
We present a framework for modeling interactional communication in dyadic conversations. We autoregressively output multiple possibilities of corresponding listener motion. Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions.
arXiv Detail & Related papers (2022-04-18T17:58:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.