Less is More: Improving Motion Diffusion Models with Sparse Keyframes
- URL: http://arxiv.org/abs/2503.13859v1
- Date: Tue, 18 Mar 2025 03:20:02 GMT
- Title: Less is More: Improving Motion Diffusion Models with Sparse Keyframes
- Authors: Jinseok Bae, Inwoo Hwang, Young Yoon Lee, Ziyu Guo, Joseph Liu, Yizhak Ben-Shabat, Young Min Kim, Mubbasir Kapadia,
- Abstract summary: We propose a novel diffusion framework explicitly designed around sparse and geometrically meaningfuls.<n>Our method reduces by masking non-keyframes and efficiently interpolating missing frames.<n>Our approach consistently outperforms state-of-the-art methods in text alignment and motion realism.
- Score: 21.48244441857993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in motion diffusion models have led to remarkable progress in diverse motion generation tasks, including text-to-motion synthesis. However, existing approaches represent motions as dense frame sequences, requiring the model to process redundant or less informative frames. The processing of dense animation frames imposes significant training complexity, especially when learning intricate distributions of large motion datasets even with modern neural architectures. This severely limits the performance of generative motion models for downstream tasks. Inspired by professional animators who mainly focus on sparse keyframes, we propose a novel diffusion framework explicitly designed around sparse and geometrically meaningful keyframes. Our method reduces computation by masking non-keyframes and efficiently interpolating missing frames. We dynamically refine the keyframe mask during inference to prioritize informative frames in later diffusion steps. Extensive experiments show that our approach consistently outperforms state-of-the-art methods in text alignment and motion realism, while also effectively maintaining high performance at significantly fewer diffusion steps. We further validate the robustness of our framework by using it as a generative prior and adapting it to different downstream tasks. Source code and pre-trained models will be released upon acceptance.
Related papers
- AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models [5.224806515926022]
We introduce AnyMoLe, a novel method to generate motion in-between frames for arbitrary characters without external data.<n>Our approach employs a two-stage frame generation process to enhance contextual understanding.
arXiv Detail & Related papers (2025-03-11T13:28:59Z) - Thin-Plate Spline-based Interpolation for Animation Line Inbetweening [54.69811179222127]
Chamfer Distance (CD) is commonly adopted for evaluating inbetweening performance.
We propose a simple yet effective method for animation line inbetweening that adopts thin-plate spline-based transformation.
Our method outperforms existing approaches by delivering high-quality results with enhanced fluidity.
arXiv Detail & Related papers (2024-08-17T08:05:31Z) - Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior [13.595032265551184]
Video-to-video synthesis poses significant challenges in maintaining character consistency, smooth temporal transitions, and preserving visual quality during fast motion.
We propose an adaptive motion-guided cross-frame attention mechanism that selectively reduces redundant computations.
This enables a greater number of cross-frame attentions over more frames within the same computational budget.
arXiv Detail & Related papers (2024-06-07T12:12:25Z) - Motion-aware Latent Diffusion Models for Video Frame Interpolation [51.78737270917301]
Motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity.
We propose a novel diffusion framework, motion-aware latent diffusion models (MADiff)
Our method achieves state-of-the-art performance significantly outperforming existing approaches.
arXiv Detail & Related papers (2024-04-21T05:09:56Z) - HumMUSS: Human Motion Understanding using State Space Models [6.821961232645209]
We propose a novel attention-free model for human motion understanding building upon recent advancements in state space models.
Our model supports both offline and real-time applications.
For real-time sequential prediction, our model is both memory efficient and several times faster than transformer-based approaches.
arXiv Detail & Related papers (2024-04-16T19:59:21Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z) - All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced
Motion Modeling [52.425236515695914]
State-of-the-art methods are iterative solutions interpolating one frame at the time.
This work introduces a true multi-frame interpolator.
It utilizes a pyramidal style network in the temporal domain to complete the multi-frame task in one-shot.
arXiv Detail & Related papers (2020-07-23T02:34:39Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.