Related papers: Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

URL: http://arxiv.org/abs/2311.07446v1
Date: Mon, 13 Nov 2023 16:22:38 GMT
Title: Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text
Authors: Zhongfei Qing, Zhongang Cai, Zhitao Yang and Lei Yang
Abstract summary: A new task, Story-to-Motion, arises when characters are required to perform specific motions based on a long text description. Previous works in character control and text-to-motion have addressed related aspects, yet a comprehensive solution remains elusive. We propose a novel system that generates controllable, infinitely long motions and trajectories aligned with the input text.
Score: 14.473103773197838
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating natural human motion from a story has the potential to transform the landscape of animation, gaming, and film industries. A new and challenging task, Story-to-Motion, arises when characters are required to move to various locations and perform specific motions based on a long text description. This task demands a fusion of low-level control (trajectories) and high-level control (motion semantics). Previous works in character control and text-to-motion have addressed related aspects, yet a comprehensive solution remains elusive: character control methods do not handle text description, whereas text-to-motion methods lack position constraints and often produce unstable motions. In light of these limitations, we propose a novel system that generates controllable, infinitely long motions and trajectories aligned with the input text. (1) We leverage contemporary Large Language Models to act as a text-driven motion scheduler to extract a series of (text, position, duration) pairs from long text. (2) We develop a text-driven motion retrieval scheme that incorporates motion matching with motion semantic and trajectory constraints. (3) We design a progressive mask transformer that addresses common artifacts in the transition motion such as unnatural pose and foot sliding. Beyond its pioneering role as the first comprehensive solution for Story-to-Motion, our system undergoes evaluation across three distinct sub-tasks: trajectory following, temporal action composition, and motion blending, where it outperforms previous state-of-the-art motion synthesis methods across the board. Homepage: https://story2motion.github.io/.

Related papers

TextOp: Real-time Interactive Text-Driven Humanoid Robot Motion Generation and Control [62.93681680333618]
TextOp is a real-time text-driven humanoid motion generation and control framework.<n>It supports streaming language commands and on-the-fly instruction modification during execution.<n>By bridging interactive motion generation with robust whole-body control, TextOp unlocks free-form intent expression.
arXiv Detail & Related papers (2026-02-07T08:42:11Z)
Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories. We translate high-level user requests into detailed, semi-dense motion prompts. We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z)
DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control [12.465927271402442]
Text-conditioned human motion generation allows for user interaction through natural language. DART is a Diffusion-based Autoregressive motion primitive model for Real-time Text-driven motion control. We present effective algorithms for both approaches, demonstrating our model's versatility and superior performance in various motion synthesis tasks.
arXiv Detail & Related papers (2024-10-07T17:58:22Z)
Unimotion: Unifying 3D Human Motion Synthesis and Understanding [47.18338511861108]
We introduce Unimotion, the first unified multi-task human motion model capable of both flexible motion control and frame-level motion understanding. Unimotion allows to control motion with global text, or local frame-level text, or both at once, providing more flexible control for users.
arXiv Detail & Related papers (2024-09-24T09:20:06Z)
Infinite Motion: Extended Motion Generation via Long Text Instructions [51.61117351997808]
"Infinite Motion" is a novel approach that leverages long text to extended motion generation. Key innovation of our model is its ability to accept arbitrary lengths of text as input. We incorporate the timestamp design for text which allows precise editing of local segments within the generated sequences.
arXiv Detail & Related papers (2024-07-11T12:33:56Z)
Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model. To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z)
LivePhoto: Real Image Animation with Text-guided Motion Control [51.31418077586208]
This work presents a practical system, named LivePhoto, which allows users to animate an image of their interest with text descriptions. We first establish a strong baseline that helps a well-learned text-to-image generator (i.e., Stable Diffusion) take an image as a further input. We then equip the improved generator with a motion module for temporal modeling and propose a carefully designed training pipeline to better link texts and motions.
arXiv Detail & Related papers (2023-12-05T17:59:52Z)
AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism [24.049207982022214]
We propose textbftT2M, a two-stage method with multi-perspective attention mechanism. Our method outperforms the current state-of-the-art in terms of qualitative and quantitative evaluation.
arXiv Detail & Related papers (2023-09-02T02:18:17Z)
Synthesizing Long-Term Human Motions with Diffusion Models via Coherent Sampling [74.62570964142063]
Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions. We propose a novel approach that utilizes a past-conditioned diffusion model with two optional coherent sampling methods. Our proposed method is capable of generating compositional and coherent long-term 3D human motions controlled by a user-instructed long text stream.
arXiv Detail & Related papers (2023-08-03T16:18:32Z)
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework. It excels at modeling complicated data distribution and generating vivid motion sequences. It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z)
TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts [20.336481832461168]
Inspired by the strong ties between vision and language, our paper aims to explore the generation of 3D human full-body motions from texts. We propose the use of motion token, a discrete and compact motion representation. Our approach is flexible, could be used for both text2motion and motion2text tasks.
arXiv Detail & Related papers (2022-07-04T19:52:18Z)
Synthesis of Compositional Animations from Textual Descriptions [54.85920052559239]
"How unstructured and complex can we make a sentence and still generate plausible movements from it?" "How can we animate 3D-characters from a movie script or move robots by simply telling them what we would like them to do?"
arXiv Detail & Related papers (2021-03-26T18:23:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.