Exploring Timeline Control for Facial Motion Generation
- URL: http://arxiv.org/abs/2505.20861v1
- Date: Tue, 27 May 2025 08:13:38 GMT
- Title: Exploring Timeline Control for Facial Motion Generation
- Authors: Yifeng Ma, Jinwei Qi, Chaonan Ji, Peng Zhang, Bang Zhang, Zhidong Deng, Liefeng Bo,
- Abstract summary: This paper introduces a new control signal for facial motion generation: timeline control.<n>Compared to audio and text signals, timelines provide more fine-grained control, such as generating specific facial motions with precise timing.
- Score: 24.903064994915734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a new control signal for facial motion generation: timeline control. Compared to audio and text signals, timelines provide more fine-grained control, such as generating specific facial motions with precise timing. Users can specify a multi-track timeline of facial actions arranged in temporal intervals, allowing precise control over the timing of each action. To model the timeline control capability, We first annotate the time intervals of facial actions in natural facial motion sequences at a frame-level granularity. This process is facilitated by Toeplitz Inverse Covariance-based Clustering to minimize human labor. Based on the annotations, we propose a diffusion-based generation model capable of generating facial motions that are natural and accurately aligned with input timelines. Our method supports text-guided motion generation by using ChatGPT to convert text into timelines. Experimental results show that our method can annotate facial action intervals with satisfactory accuracy, and produces natural facial motions accurately aligned with timelines.
Related papers
- MaskControl: Spatio-Temporal Control for Masked Motion Synthesis [38.16884934336603]
We propose MaskControl, the first approach to introduce controllability to the generative masked motion model.<n>First, textitLogits Regularizer implicitly perturbs logits at training time to align the distribution of motion tokens with the controlled joint positions.<n>Second, textitLogit optimization explicitly reshapes the token distribution that forces the generated motion to accurately align with the controlled joint positions.
arXiv Detail & Related papers (2024-10-14T17:50:27Z) - DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control [12.465927271402442]
Text-conditioned human motion generation allows for user interaction through natural language.<n>DartControl is a Diffusion-based Autoregressive motion primitive model for Real-time Text-driven motion control.<n>Our model effectively learns a compact motion primitive space jointly conditioned on motion history and text inputs.
arXiv Detail & Related papers (2024-10-07T17:58:22Z) - Temporal Residual Jacobians For Rig-free Motion Transfer [45.640576754352104]
We introduce Residual Temporal Jacobians as a novel representation to enable data-driven motion transfer.
Our approach does not assume access to any rigging or intermediate shapes, produces geometrically and temporally consistent motions, and can be used to transfer long motion sequences.
arXiv Detail & Related papers (2024-07-20T18:29:22Z) - Infinite Motion: Extended Motion Generation via Long Text Instructions [51.61117351997808]
"Infinite Motion" is a novel approach that leverages long text to extended motion generation.
Key innovation of our model is its ability to accept arbitrary lengths of text as input.
We incorporate the timestamp design for text which allows precise editing of local segments within the generated sequences.
arXiv Detail & Related papers (2024-07-11T12:33:56Z) - Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation [71.08922726494842]
We introduce the problem of timeline control for text-driven motion synthesis.
Instead of a single prompt, users can specify a multi-track timeline of multiple prompts organized in temporal intervals that may overlap.
We propose a new test-time denoising method to generate composite animations from a multi-track timeline.
arXiv Detail & Related papers (2024-01-16T18:39:15Z) - DiffusionPhase: Motion Diffusion in Frequency Domain [69.811762407278]
We introduce a learning-based method for generating high-quality human motion sequences from text descriptions.
Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences.
We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space.
arXiv Detail & Related papers (2023-12-07T04:39:22Z) - Synthesizing Long-Term Human Motions with Diffusion Models via Coherent
Sampling [74.62570964142063]
Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions.
We propose a novel approach that utilizes a past-conditioned diffusion model with two optional coherent sampling methods.
Our proposed method is capable of generating compositional and coherent long-term 3D human motions controlled by a user-instructed long text stream.
arXiv Detail & Related papers (2023-08-03T16:18:32Z) - Text-driven Video Prediction [83.04845684117835]
We propose a new task called Text-driven Video Prediction (TVP)
Taking the first frame and text caption as inputs, this task aims to synthesize the following frames.
To investigate the capability of text in causal inference for progressive motion information, our TVP framework contains a Text Inference Module (TIM)
arXiv Detail & Related papers (2022-10-06T12:43:07Z) - Real-time Controllable Motion Transition for Characters [14.88407656218885]
Real-time in-between motion generation is universally required in games and highly desirable in existing animation pipelines.
Our approach consists of two key components: motion manifold and conditional transitioning.
We show that our method is able to generate high-quality motions measured under multiple metrics.
arXiv Detail & Related papers (2022-05-05T10:02:54Z) - Hierarchical Style-based Networks for Motion Synthesis [150.226137503563]
We propose a self-supervised method for generating long-range, diverse and plausible behaviors to achieve a specific goal location.
Our proposed method learns to model the motion of human by decomposing a long-range generation task in a hierarchical manner.
On large-scale skeleton dataset, we show that the proposed method is able to synthesise long-range, diverse and plausible motion.
arXiv Detail & Related papers (2020-08-24T02:11:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.