Related papers: Motion Prior Distillation in Time Reversal Sampling for Generative Inbetweening

Motion Prior Distillation in Time Reversal Sampling for Generative Inbetweening

URL: http://arxiv.org/abs/2602.12679v2
Date: Thu, 19 Feb 2026 09:50:18 GMT
Title: Motion Prior Distillation in Time Reversal Sampling for Generative Inbetweening
Authors: Wooseok Jeon, Seunghyun Shin, Dongmin Shin, Hae-Gon Jeon,
Abstract summary: We propose Motion Prior Distillation (MPD), a simple yet effective inference-time distillation technique.<n>MPD suppresses bidirectional mismatch by distilling the motion residual of the forward path into the backward path.<n>Our method can deliberately avoid denoising the end-conditioned path which causes the ambiguity of the path.
Score: 23.537461698380607
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent progress in image-to-video (I2V) diffusion models has significantly advanced the field of generative inbetweening, which aims to generate semantically plausible frames between two keyframes. In particular, inference-time sampling strategies, which leverage the generative priors of large-scale pre-trained I2V models without additional training, have become increasingly popular. However, existing inference-time sampling, either fusing forward and backward paths in parallel or alternating them sequentially, often suffers from temporal discontinuities and undesirable visual artifacts due to the misalignment between the two generated paths. This is because each path follows the motion prior induced by its own conditioning frame. In this work, we propose Motion Prior Distillation (MPD), a simple yet effective inference-time distillation technique that suppresses bidirectional mismatch by distilling the motion residual of the forward path into the backward path. Our method can deliberately avoid denoising the end-conditioned path which causes the ambiguity of the path, and yield more temporally coherent inbetweening results with the forward motion prior. We not only perform quantitative evaluations on standard benchmarks, but also conduct extensive user studies to demonstrate the effectiveness of our approach in practical scenarios.

Related papers

STEP: Warm-Started Visuomotor Policies with Spatiotemporal Consistency Prediction [16.465783114087223]
iterative denoising leads to substantial inference latency, limiting control frequency in real-time closed-loop systems.<n>We propose STEP, a lightweighttemporal consistency prediction mechanism to construct high-quality warm-start actions.<n> STEP with 2 steps can achieve an average 21.6% and 27.5% higher success rate than BRIDGER and DDIM on the RoboMimic benchmark and real-world tasks.
arXiv Detail & Related papers (2026-02-09T03:50:40Z)
Action-to-Action Flow Matching [25.301629044539325]
Diffusion-based policies have recently achieved remarkable success in robotics by formulating action prediction as a conditional denoising process.<n>We propose Action-to-Action flow matching (A2A), a novel policy paradigm that shifts from random sampling to initialization informed by the previous action.<n>A2A enables high-quality action generation in as few as a single inference step (0.56 ms latency), and exhibits superior robustness to visual perturbations and enhanced generalization to unseen configurations.
arXiv Detail & Related papers (2026-02-07T02:39:49Z)
FlowConsist: Make Your Flow Consistent with Real Trajectory [99.22869983378062]
We argue that current fast-flow training paradigms suffer from two fundamental issues.<n> conditional velocities constructed from randomly paired noise-data samples introduce systematic trajectory drift.<n>We propose FlowConsist, a training framework designed to enforce trajectory consistency in fast flows.
arXiv Detail & Related papers (2026-02-06T03:24:23Z)
Accelerated Sequential Flow Matching: A Bayesian Filtering Perspective [16.29333060724397]
We introduce Sequential Flow Matching, a principled framework grounded in Bayesian filtering.<n>By treating streaming inference as learning a probability flow that transports the predictive distribution from one time step to the next, our approach naturally aligns with the structure of Bayesian belief updates.<n>Our method achieves performance competitive with full-step diffusion while requiring only one or very few sampling steps, therefore with faster sampling.
arXiv Detail & Related papers (2026-02-05T05:37:14Z)
Euphonium: Steering Video Flow Matching via Process Reward Gradient Guided Stochastic Dynamics [49.242224984144904]
We propose Euphonium, a novel framework that steers generation via process reward gradient guided dynamics.<n>Our key insight is to formulate the sampling process as a theoretically principled algorithm that explicitly incorporates the gradient of a Process Reward Model.<n>We derive a distillation objective that internalizes the guidance signal into the flow network, eliminating inference-time dependency on the reward model.
arXiv Detail & Related papers (2026-02-04T08:59:57Z)
SynCast: Synergizing Contradictions in Precipitation Nowcasting via Diffusion Sequential Preference Optimization [62.958457694151384]
We introduce preference optimization into precipitation nowcasting for the first time, motivated by the success of reinforcement learning from human feedback in large language models.<n>In the first stage, the framework focuses on reducing FAR, training the model to effectively suppress false alarms.
arXiv Detail & Related papers (2025-10-22T16:11:22Z)
SwiftVideo: A Unified Framework for Few-Step Video Generation through Trajectory-Distribution Alignment [76.60024640625478]
Diffusion-based or flow-based models have achieved significant progress in video synthesis but require multiple iterative sampling steps.<n>We propose a unified and stable distillation framework that combines the advantages of trajectory-preserving and distribution-matching strategies.<n>Our method maintains high-quality video generation while substantially reducing the number of inference steps.
arXiv Detail & Related papers (2025-08-08T07:26:34Z)
SCoT: Unifying Consistency Models and Rectified Flows via Straight-Consistent Trajectories [31.60548236936739]
We propose a Straight Consistent Trajectory(SCoT) model for pre-trained diffusion models.<n>SCoT enjoys the benefits of both approaches for fast sampling, producing trajectories with consistent and straight properties simultaneously.
arXiv Detail & Related papers (2025-02-24T08:57:19Z)
Sequential Controlled Langevin Diffusions [94.82767690147865]
Two popular methods are (1) Sequential Monte Carlo (SMC), where the transport is performed through successive densities via prescribed Markov chains and resampling steps, and (2) recently developed diffusion-based sampling methods, where a learned dynamical transport is used.<n>We present a principled framework for combining SMC with diffusion-based samplers by viewing both methods in continuous time and considering measures on path space.<n>This culminates in the new Sequential Controlled Langevin Diffusion (SCLD) sampling method, which is able to utilize the benefits of both methods and reaches improved performance on multiple benchmark problems, in many cases using only 10% of the training budget of previous diffusion-
arXiv Detail & Related papers (2024-12-10T00:47:10Z)
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference [54.43177605637759]
We propose a tractable alignment objective that emphasizes the initial steps of the T2I reverse chain. In experiments on single and multiple prompt generation, our method is competitive with strong relevant baselines.
arXiv Detail & Related papers (2024-02-13T07:37:24Z)
Synthesizing Long-Term Human Motions with Diffusion Models via Coherent Sampling [74.62570964142063]
Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions. We propose a novel approach that utilizes a past-conditioned diffusion model with two optional coherent sampling methods. Our proposed method is capable of generating compositional and coherent long-term 3D human motions controlled by a user-instructed long text stream.
arXiv Detail & Related papers (2023-08-03T16:18:32Z)
Human Motion Diffusion as a Generative Prior [20.004837564647367]
We introduce three forms of composition based on diffusion priors. We tackle the challenge of long sequence generation. Using parallel composition, we show promising steps toward two-person generation.
arXiv Detail & Related papers (2023-03-02T17:09:27Z)
A Deep Temporal Fusion Framework for Scene Flow Using a Learnable Motion Model and Occlusions [17.66624674542256]
We propose a novel data-driven approach for temporal fusion of scene flow estimates in a multi-frame setup. In a second step, a neural network combines bi-directional scene flow estimates from a common reference frame, yielding a refined estimate. This way, our approach provides a fast multi-frame extension for a variety of scene flow estimators, which outperforms the underlying dual-frame approaches.
arXiv Detail & Related papers (2020-11-03T10:14:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.