Related papers: Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Related papers

StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model [73.30619724574642]
Speech-driven 3D facial animation aims to generate realistic and synchronized facial motions driven by speech inputs.<n>Recent methods have employed audio-conditioned diffusion models for 3D facial animation.<n>We propose a novel autoregressive diffusion model that processes audio in a streaming manner.
arXiv Detail & Related papers (2025-11-18T07:55:16Z)
DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis [15.304037069236536]
DEMO is a flow-matching generative framework for audio-driven talking-head video synthesis.<n>It delivers disentangled, high-fidelity control of lip motion, head pose, and eye gaze.
arXiv Detail & Related papers (2025-10-12T15:10:33Z)
MOSPA: Human Motion Generation Driven by Spatial Audio [56.735282455483954]
We introduce the first comprehensive Spatial Audio-Driven Human Motion dataset, which contains diverse and high-quality spatial audio and motion data.<n>We develop a simple yet effective diffusion-based generative framework for human MOtion generation driven by SPatial Audio, termed MOSPA.<n>Once trained, MOSPA could generate diverse realistic human motions conditioned on varying spatial audio inputs.
arXiv Detail & Related papers (2025-07-16T06:33:11Z)
M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation [65.08520614570288]
We reformulate talking head generation into a unified framework comprising video preprocessing, motion representation, and rendering reconstruction.<n>M2DAO-Talker achieves state-of-the-art performance, with the 2.43 dB PSNR improvement in generation quality and 0.64 gain in user-evaluated video realness.
arXiv Detail & Related papers (2025-07-11T04:48:12Z)
Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation [26.597877504216196]
We introduce direct preference optimization tailored for human-centric animation.<n>Second, the proposed temporal motion modulation resolves resolution mismatches.<n>Experiments demonstrate obvious improvements in lip-audio synchronization, expression vividness, body motion coherence over baseline methods.
arXiv Detail & Related papers (2025-05-29T15:04:00Z)
Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories. We translate high-level user requests into detailed, semi-dense motion prompts. We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z)
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models [59.10171699717122]
MoTrans is a customized motion transfer method enabling video generation of similar motion in new context. multimodal representations from recaptioned prompt and video frames promote the modeling of appearance. Our method effectively learns specific motion pattern from singular or multiple reference videos.
arXiv Detail & Related papers (2024-12-02T10:07:59Z)
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait [3.3672851080270374]
FLOAT is an audio-driven talking portrait video generation method based on flow matching generative model. We shift the generative modeling from the pixel-based latent space to a learned motion latent space, enabling efficient design of temporally consistent motion. Our method supports speech-driven emotion enhancement, enabling a natural incorporation of expressive motions.
arXiv Detail & Related papers (2024-12-02T02:50:07Z)
Lagrangian Motion Fields for Long-term Motion Generation [51.02126882968116]
We introduce the concept of Lagrangian Motion Fields, specifically designed for long-term motion generation.<n>By treating each joint as a Lagrangian particle with uniform velocity over short intervals, our approach condenses motion representations into a series of "supermotions"<n>Our solution is versatile and lightweight, eliminating the need for neural network preprocessing.
arXiv Detail & Related papers (2024-09-03T01:38:06Z)
Infinite Motion: Extended Motion Generation via Long Text Instructions [51.61117351997808]
"Infinite Motion" is a novel approach that leverages long text to extended motion generation. Key innovation of our model is its ability to accept arbitrary lengths of text as input. We incorporate the timestamp design for text which allows precise editing of local segments within the generated sequences.
arXiv Detail & Related papers (2024-07-11T12:33:56Z)
Controllable Longer Image Animation with Diffusion Models [12.565739255499594]
We introduce an open-domain controllable image animation method using motion priors with video diffusion models. Our method achieves precise control over the direction and speed of motion in the movable region by extracting the motion field information from videos. We propose an efficient long-duration video generation method based on noise reschedule specifically tailored for image animation tasks.
arXiv Detail & Related papers (2024-05-27T16:08:00Z)
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model [17.98911328064481]
Co-speech gestures can achieve superior visual effects in human-machine interaction. We present a novel motion-decoupled framework to generate co-speech gesture videos. Our proposed framework significantly outperforms existing approaches in both motion and video-related evaluations.
arXiv Detail & Related papers (2024-04-02T11:40:34Z)
Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs. SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions. Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z)
MotionMix: Weakly-Supervised Diffusion for Controllable Motion Generation [19.999239668765885]
MotionMix is a weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences. Our framework consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
arXiv Detail & Related papers (2024-01-20T04:58:06Z)
MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method. MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model. During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z)
DiffusionPhase: Motion Diffusion in Frequency Domain [69.811762407278]
We introduce a learning-based method for generating high-quality human motion sequences from text descriptions. Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences. We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space.
arXiv Detail & Related papers (2023-12-07T04:39:22Z)
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models. Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference. We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z)
LaMD: Latent Motion Diffusion for Image-Conditional Video Generation [63.34574080016687]
latent motion diffusion (LaMD) framework consists of a motion-decomposed video autoencoder and a diffusion-based motion generator. LaMD generates high-quality videos on various benchmark datasets, including BAIR, Landscape, NATOPS, MUG and CATER-GEN.
arXiv Detail & Related papers (2023-04-23T10:32:32Z)
MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis. We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework. We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z)
Continuous-Time Video Generation via Learning Motion Dynamics with Neural ODE [26.13198266911874]
We propose a novel video generation approach that learns separate distributions for motion and appearance. We employ a two-stage approach where the first stage converts a noise vector to a sequence of keypoints in arbitrary frame rates, and the second stage synthesizes videos based on the given keypoints sequence and the appearance noise vector.
arXiv Detail & Related papers (2021-12-21T03:30:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.