DeFlow: Decoupling Manifold Modeling and Value Maximization for Offline Policy Extraction
- URL: http://arxiv.org/abs/2601.10471v2
- Date: Sat, 17 Jan 2026 06:21:38 GMT
- Title: DeFlow: Decoupling Manifold Modeling and Value Maximization for Offline Policy Extraction
- Authors: Zhancun Mu,
- Abstract summary: We present DeFlow, a decoupled offline RL framework that leverages flow matching to faithfully capture complex behavior manifold.<n>We address this by learning a lightweight refinement module within an explicit, data-derived trust region of the flow manifold.<n>DeFlow achieves superior performance on the challenging OGBench benchmark and demonstrates efficient offline-to-online adaptation.
- Score: 4.558338633638409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present DeFlow, a decoupled offline RL framework that leverages flow matching to faithfully capture complex behavior manifolds. Optimizing generative policies is computationally prohibitive, typically necessitating backpropagation through ODE solvers. We address this by learning a lightweight refinement module within an explicit, data-derived trust region of the flow manifold, rather than sacrificing the iterative generation capability via single-step distillation. This way, we bypass solver differentiation and eliminate the need for balancing loss terms, ensuring stable improvement while fully preserving the flow's iterative expressivity. Empirically, DeFlow achieves superior performance on the challenging OGBench benchmark and demonstrates efficient offline-to-online adaptation.
Related papers
- Iterative Refinement of Flow Policies in Probability Space for Online Reinforcement Learning [56.47948583452555]
We introduce the Stepwise Flow Policy (SWFP) framework, founded on the key insight that discretizing the flow matching inference process via a fixed-step Euler scheme aligns it with the variational Jordan-Kinderlehrer-Otto principle from optimal transport.<n>SWFP decomposes the global flow into a sequence of small, incremental transformations between proximate distributions.<n>This decomposition yields an efficient algorithm that fine-tunes pre-trained flows via a cascade of small flow blocks, offering significant advantages.
arXiv Detail & Related papers (2025-10-17T07:43:51Z) - SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling [9.936731043466699]
Training expressive flow-based policies with off-policy reinforcement learning is notoriously unstable due to gradient pathologies in the multi-step action sampling process.<n>We trace this instability to a fundamental connection: the flow rollout is algebraically equivalent to a residual recurrent computation, making it susceptible to the same vanishing and exploding gradients as RNNs.<n>We develop a practical SAC-based algorithm, enabled by a noise-augmented rollout, that facilitates direct end-to-end training of these policies.
arXiv Detail & Related papers (2025-09-30T04:21:20Z) - Unleashing Flow Policies with Distributional Critics [15.149475517073258]
We introduce the Distributional Flow Critic (DFC), a novel critic architecture that learns the complete state-action return distribution.<n>DFC provides the expressive flow-based policy with a rich, distributional Bellman target, which offers a more stable and informative learning signal.
arXiv Detail & Related papers (2025-09-27T03:51:06Z) - One-Step Flow Policy Mirror Descent [52.31612487608593]
Flow Policy Mirror Descent (FPMD) is an online RL algorithm that enables 1-step sampling during flow policy inference.<n>Our approach exploits a theoretical connection between the distribution variance and the discretization error of single-step sampling in straight flow matching models.
arXiv Detail & Related papers (2025-07-31T15:51:10Z) - Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization [14.320131946691268]
We propose an easy-to-use and theoretically sound fine-tuning method for flow-based generative models.<n>By introducing an online rewardweighting mechanism, our approach guides the model to prioritize high-reward regions in the data manifold.<n>Our method achieves optimal policy convergence while allowing controllable trade-offs between reward and diversity.
arXiv Detail & Related papers (2025-02-09T22:45:15Z) - AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based Policies [21.024480978703288]
We propose AdaFlow, an imitation learning framework based on flow-based generative modeling.
AdaFlow represents the policy with state-conditioned ordinary differential equations (ODEs)
We show that AdaFlow achieves high performance with fast inference speed.
arXiv Detail & Related papers (2024-02-06T10:15:38Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z) - AccFlow: Backward Accumulation for Long-Range Optical Flow [70.4251045372285]
This paper proposes a novel recurrent framework called AccFlow for long-range optical flow estimation.
We demonstrate the superiority of backward accumulation over conventional forward accumulation.
Experiments validate the effectiveness of AccFlow in handling long-range optical flow estimation.
arXiv Detail & Related papers (2023-08-25T01:51:26Z) - Deep Equilibrium Optical Flow Estimation [80.80992684796566]
Recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms.
These RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation.
We propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer.
arXiv Detail & Related papers (2022-04-18T17:53:44Z) - GMFlow: Learning Optical Flow via Global Matching [124.57850500778277]
We propose a GMFlow framework for learning optical flow estimation.
It consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation.
Our new framework outperforms 32-iteration RAFT's performance on the challenging Sintel benchmark.
arXiv Detail & Related papers (2021-11-26T18:59:56Z) - Self Normalizing Flows [65.73510214694987]
We propose a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer.
This reduces the computational complexity of each layer's exact update from $mathcalO(D3)$ to $mathcalO(D2)$.
We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts.
arXiv Detail & Related papers (2020-11-14T09:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.