Related papers: EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation

EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation

URL: http://arxiv.org/abs/2503.18552v1
Date: Mon, 24 Mar 2025 11:05:41 GMT
Title: EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation
Authors: Qiang Qu, Ming Li, Xiaoming Chen, Tongliang Liu,
Abstract summary: EvAnimate is a framework that leverages event streams as motion cues to animate static human images.<n>We show that EvAnimate achieves high temporal fidelity and robust performance in scenarios where traditional video-derived cues fall short.
Score: 58.41979933166173
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Conditional human animation transforms a static reference image into a dynamic sequence by applying motion cues such as poses. These motion cues are typically derived from video data but are susceptible to limitations including low temporal resolution, motion blur, overexposure, and inaccuracies under low-light conditions. In contrast, event cameras provide data streams with exceptionally high temporal resolution, a wide dynamic range, and inherent resistance to motion blur and exposure issues. In this work, we propose EvAnimate, a framework that leverages event streams as motion cues to animate static human images. Our approach employs a specialized event representation that transforms asynchronous event streams into 3-channel slices with controllable slicing rates and appropriate slice density, ensuring compatibility with diffusion models. Subsequently, a dual-branch architecture generates high-quality videos by harnessing the inherent motion dynamics of the event streams, thereby enhancing both video quality and temporal consistency. Specialized data augmentation strategies further enhance cross-person generalization. Finally, we establish a new benchmarking, including simulated event data for training and validation, and a real-world event dataset capturing human actions under normal and extreme scenarios. The experiment results demonstrate that EvAnimate achieves high temporal fidelity and robust performance in scenarios where traditional video-derived cues fall short.

Related papers

Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation [26.597877504216196]
We introduce direct preference optimization tailored for human-centric animation.<n>Second, the proposed temporal motion modulation resolves resolution mismatches.<n>Experiments demonstrate obvious improvements in lip-audio synchronization, expression vividness, body motion coherence over baseline methods.
arXiv Detail & Related papers (2025-05-29T15:04:00Z)
HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions [12.46263584777151]
We introduce the textbfOpen-HyperMotionX dataset and textbfHyperMotionX Bench, which provide high-quality human pose annotations and curated video clips.<n>We also propose a simple yet powerful DiT-based video generation baseline and design spatial low-frequency enhanced RoPE.<n>Our method significantly improves structural stability and appearance consistency in highly dynamic human motion sequences.
arXiv Detail & Related papers (2025-05-29T01:30:46Z)
EGVD: Event-Guided Video Diffusion Model for Physically Realistic Large-Motion Frame Interpolation [16.22243283808375]
Event-Guided Video Diffusion Model (EGVD) is a novel framework that leverages the powerful priors of pre-trained stable video diffusion models. Our approach features a Multi-modal Motion Condition Generator (MMCG) that effectively integrates RGB frames and event signals to guide the diffusion process. Experiments on both real and simulated datasets demonstrate that EGVD significantly outperforms existing methods in handling large motion.
arXiv Detail & Related papers (2025-03-26T06:33:32Z)
X-Dyna: Expressive Dynamic Human Image Animation [49.896933584815926]
X-Dyna is a zero-shot, diffusion-based pipeline for animating a single human image.<n>It generates realistic, context-aware dynamics for both the subject and the surrounding environment.
arXiv Detail & Related papers (2025-01-17T08:10:53Z)
Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories.<n>We translate high-level user requests into detailed, semi-dense motion prompts.<n>We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z)
UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation [53.16986875759286]
We present a UniAnimate framework to enable efficient and long-term human video generation. We map the reference image along with the posture guidance and noise video into a common feature space. We also propose a unified noise input that supports random noised input as well as first frame conditioned input.
arXiv Detail & Related papers (2024-06-03T10:51:10Z)
Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation [15.569467643817447]
We introduce a technique that concurrently learns both foreground and background dynamics by segregating their movements using distinct motion representations. We train on real-world videos enhanced with this innovative motion depiction approach. To further extend video generation to longer sequences without accumulating errors, we adopt a clip-by-clip generation strategy.
arXiv Detail & Related papers (2024-05-26T00:53:26Z)
Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs. SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions. Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z)
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models. Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference. We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z)
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities. In this paper, we propose a novel framework tailored for character animation. By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z)
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion. We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z)
Neuromorphic Imaging and Classification with Graph Learning [11.882239213276392]
Bio-inspired neuromorphic cameras asynchronously record pixel brightness changes and generate sparse event streams. Due to the multidimensional address-event structure, most existing vision algorithms cannot properly handle asynchronous event streams. We propose a new graph representation of the event data and couple it with a Graph Transformer to perform accurate neuromorphic classification.
arXiv Detail & Related papers (2023-09-27T12:58:18Z)
Render In-between: Motion Guided Video Synthesis for Action Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance. A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset. Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.