Controllable Flow Matching for Online Reinforcement Learning
- URL: http://arxiv.org/abs/2511.06816v1
- Date: Mon, 10 Nov 2025 08:01:20 GMT
- Title: Controllable Flow Matching for Online Reinforcement Learning
- Authors: Bin Wang, Boxiang Tao, Haifeng Jing, Hongbo Dou, Zijian Wang,
- Abstract summary: We propose CtrlFlow, a trajectory-level synthetic method using conditional flow matching (CFM)<n>Our method ensures optimal trajectory sampling by minimizing the control energy governed by the non-linear Controllability Gramian Matrix.<n>In online settings, CtrlFlow demonstrates the better performance on common MuJoCo benchmark tasks than dynamics models.
- Score: 5.944099401274571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based reinforcement learning (MBRL) typically relies on modeling environment dynamics for data efficiency. However, due to the accumulation of model errors over long-horizon rollouts, such methods often face challenges in maintaining modeling stability. To address this, we propose CtrlFlow, a trajectory-level synthetic method using conditional flow matching (CFM), which directly modeling the distribution of trajectories from initial states to high-return terminal states without explicitly modeling the environment transition function. Our method ensures optimal trajectory sampling by minimizing the control energy governed by the non-linear Controllability Gramian Matrix, while the generated diverse trajectory data significantly enhances the robustness and cross-task generalization of policy learning. In online settings, CtrlFlow demonstrates the better performance on common MuJoCo benchmark tasks than dynamics models and achieves superior sample efficiency compared to standard MBRL methods.
Related papers
- Solving Inverse Problems with Flow-based Models via Model Predictive Control [41.551726534223704]
MPC-Flow is a model predictive control framework that formulates inverse problem solving with flow-based generative models as a sequence of control sub-problems.<n>We show how different algorithmic choices yield a spectrum of guidance algorithms, including regimes that avoid backpropagation through the generative model trajectory.
arXiv Detail & Related papers (2026-01-30T17:59:09Z) - Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making [48.998030470623384]
offline decision-making requires reliable behaviors from fixed datasets without further interaction.<n>We propose a compositional model-based diffusion framework consisting of: (i) a planner that generates diverse, task-aligned trajectories; (ii) a dynamics model that enforces consistency with the underlying system dynamics; and (iii) a ranker module that selects behaviors aligned with the task objectives.
arXiv Detail & Related papers (2025-12-09T06:26:02Z) - DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions [6.723690093335988]
We propose a diffusion-based world model that generates future state-reward trajectories conditioned on the current state, action, and return-to-go.<n>We show that conservative offline RL algorithms such as TD3BC and IQL benefit significantly from training on these augmented trajectories.
arXiv Detail & Related papers (2025-09-23T20:06:26Z) - Sample-Efficient Reinforcement Learning of Koopman eNMPC [42.72938925647165]
Reinforcement learning can be used to tune data-driven (economic) nonlinear model predictive controllers ((e)NMPCs) for optimal performance in a specific control task.<n>We combine a model-based RL algorithm with our published method that turns Koopman (e)NMPCs into automatically differentiable policies.
arXiv Detail & Related papers (2025-03-24T15:35:16Z) - Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z) - Model-Based Offline Reinforcement Learning with Reliability-Guaranteed Sequence Modeling [0.0]
Model-based offline reinforcement learning (MORL) aims to learn a policy by exploiting a dynamics model derived from an existing dataset.<n>We propose a new MORL algorithm textbfReliability-guaranteed textbfTransformer (RT), which can eliminate unreliable trajectories.
arXiv Detail & Related papers (2025-02-10T14:08:55Z) - Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review [63.31328039424469]
This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions.
We explain the application of various RL algorithms, including PPO, differentiable optimization, reward-weighted MLE, value-weighted sampling, and path consistency learning.
arXiv Detail & Related papers (2024-07-18T17:35:32Z) - Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL)
QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM)
Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z) - Environment Transformer and Policy Optimization for Model-Based Offline
Reinforcement Learning [25.684201757101267]
We propose an uncertainty-aware sequence modeling architecture called Environment Transformer.
Benefiting from the accurate modeling of the transition dynamics and reward function, Environment Transformer can be combined with arbitrary planning, dynamics programming, or policy optimization algorithms for offline RL.
arXiv Detail & Related papers (2023-03-07T11:26:09Z) - Improving and generalizing flow-based generative models with minibatch
optimal transport [90.01613198337833]
We introduce the generalized conditional flow matching (CFM) technique for continuous normalizing flows (CNFs)
CFM features a stable regression objective like that used to train the flow in diffusion models but enjoys the efficient inference of deterministic flow models.
A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference.
arXiv Detail & Related papers (2023-02-01T14:47:17Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Autoregressive Dynamics Models for Offline Policy Evaluation and
Optimization [60.73540999409032]
We show that expressive autoregressive dynamics models generate different dimensions of the next state and reward sequentially conditioned on previous dimensions.
We also show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer.
arXiv Detail & Related papers (2021-04-28T16:48:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.