Euphonium: Steering Video Flow Matching via Process Reward Gradient Guided Stochastic Dynamics
- URL: http://arxiv.org/abs/2602.04928v2
- Date: Fri, 06 Feb 2026 05:22:04 GMT
- Title: Euphonium: Steering Video Flow Matching via Process Reward Gradient Guided Stochastic Dynamics
- Authors: Ruizhe Zhong, Jiesong Lian, Xiaoyue Mi, Zixiang Zhou, Yuan Zhou, Qinglin Lu, Junchi Yan,
- Abstract summary: We propose Euphonium, a novel framework that steers generation via process reward gradient guided dynamics.<n>Our key insight is to formulate the sampling process as a theoretically principled algorithm that explicitly incorporates the gradient of a Process Reward Model.<n>We derive a distillation objective that internalizes the guidance signal into the flow network, eliminating inference-time dependency on the reward model.
- Score: 49.242224984144904
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While online Reinforcement Learning has emerged as a crucial technique for aligning flow matching models with human preferences, current approaches are hindered by inefficient exploration during training rollouts. Relying on undirected stochasticity and sparse outcome rewards, these methods struggle to discover high-reward samples, resulting in data-inefficient and slow optimization. To address these limitations, we propose Euphonium, a novel framework that steers generation via process reward gradient guided dynamics. Our key insight is to formulate the sampling process as a theoretically principled Stochastic Differential Equation that explicitly incorporates the gradient of a Process Reward Model into the flow drift. This design enables dense, step-by-step steering toward high-reward regions, advancing beyond the unguided exploration in prior works, and theoretically encompasses existing sampling methods (e.g., Flow-GRPO, DanceGRPO) as special cases. We further derive a distillation objective that internalizes the guidance signal into the flow network, eliminating inference-time dependency on the reward model. We instantiate this framework with a Dual-Reward Group Relative Policy Optimization algorithm, combining latent process rewards for efficient credit assignment with pixel-level outcome rewards for final visual fidelity. Experiments on text-to-video generation show that Euphonium achieves better alignment compared to existing methods while accelerating training convergence by 1.66x. Our code is available at https://github.com/zerzerzerz/Euphonium
Related papers
- Formalizing the Sampling Design Space of Diffusion-Based Generative Models via Adaptive Solvers and Wasserstein-Bounded Timesteps [4.397130429878499]
Diffusion-based generative models have achieved remarkable performance across various domains, yet their practical deployment is often limited by high sampling costs.<n>We propose SDM, a principled framework that aligns the numerical solver with the intrinsic properties of the diffusion trajectory.<n>By analyzing the ODE dynamics, we show that efficient low-order solvers suffice in early high-noise stages while higher-order solvers can be progressively deployed to handle the increasing non-linearity of later stages.
arXiv Detail & Related papers (2026-02-13T05:02:07Z) - FlowSteer: Guiding Few-Step Image Synthesis with Authentic Trajectories [82.90132015584359]
ReFlow has theoretical consistency with flow matching but suboptimal performance in practical scenarios.<n>We propose FlowSteer, a method unlocks the potential of ReFlow-based distillation by guiding the student along teacher's authentic generation trajectories.
arXiv Detail & Related papers (2025-11-24T07:13:23Z) - G$^2$RPO: Granular GRPO for Precise Reward in Flow Models [74.21206048155669]
We propose a novel Granular-GRPO (G$2$RPO) framework that achieves precise and comprehensive reward assessments of sampling directions.<n>We introduce a Multi-Granularity Advantage Integration module that aggregates advantages computed at multiple diffusion scales.<n>Our G$2$RPO significantly outperforms existing flow-based GRPO baselines.
arXiv Detail & Related papers (2025-10-02T12:57:12Z) - Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching [6.238027696245818]
Reinforcement Learning (RL) has emerged as a powerful technique for improving image and video generation in Diffusion and Flow Matching models.<n>Our investigation reveals a significant drawback to this approach: SDE-based sampling introduces pronounced noise artifacts in the generated images.<n>Our proposed method, Coefficients-Preserving Sampling (CPS) eliminates these noise artifacts.
arXiv Detail & Related papers (2025-09-07T07:25:00Z) - Flows and Diffusions on the Neural Manifold [0.0]
Diffusion and flow-based generative models have achieved remarkable success in domains such as image synthesis, video generation, and natural language modeling.<n>In this work, we extend these advances to weight space learning by leveraging recent techniques to incorporate structural priors derived from optimization dynamics.<n>We unify several trajectory inference techniques towards matching a gradient flow, providing a theoretical framework for treating optimization paths as inductive bias.
arXiv Detail & Related papers (2025-07-14T02:26:06Z) - Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design [87.58981407469977]
We propose a novel framework for inference-time reward optimization with diffusion models inspired by evolutionary algorithms.<n>Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising.
arXiv Detail & Related papers (2025-02-20T17:48:45Z) - Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation [3.8959351616076745]
Flow matching has emerged as a promising framework for training generative models.<n>We introduce a self-corrected flow distillation method that integrates consistency models and adversarial training.<n>This work is a pioneer in achieving consistent generation quality in both few-step and one-step sampling.
arXiv Detail & Related papers (2024-12-22T07:48:49Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Adaptive Learning Rate and Momentum for Training Deep Neural Networks [0.0]
We develop a fast training method motivated by the nonlinear Conjugate Gradient (CG) framework.
Experiments in image classification datasets show that our method yields faster convergence than other local solvers.
arXiv Detail & Related papers (2021-06-22T05:06:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.