Related papers: Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making

Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making

URL: http://arxiv.org/abs/2512.08280v1
Date: Tue, 09 Dec 2025 06:26:02 GMT
Title: Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making
Authors: Haldun Balim, Na Li, Yilun Du,
Abstract summary: offline decision-making requires reliable behaviors from fixed datasets without further interaction.<n>We propose a compositional model-based diffusion framework consisting of: (i) a planner that generates diverse, task-aligned trajectories; (ii) a dynamics model that enforces consistency with the underlying system dynamics; and (iii) a ranker module that selects behaviors aligned with the task objectives.
Score: 48.998030470623384
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline decision-making requires synthesizing reliable behaviors from fixed datasets without further interaction, yet existing generative approaches often yield trajectories that are dynamically infeasible. We propose Model Predictive Diffuser (MPDiffuser), a compositional model-based diffusion framework consisting of: (i) a planner that generates diverse, task-aligned trajectories; (ii) a dynamics model that enforces consistency with the underlying system dynamics; and (iii) a ranker module that selects behaviors aligned with the task objectives. MPDiffuser employs an alternating diffusion sampling scheme, where planner and dynamics updates are interleaved to progressively refine trajectories for both task alignment and feasibility during the sampling process. We also provide a theoretical rationale for this procedure, showing how it balances fidelity to data priors with dynamics consistency. Empirically, the compositional design improves sample efficiency, as it leverages even low-quality data for dynamics learning and adapts seamlessly to novel dynamics. We evaluate MPDiffuser on both unconstrained (D4RL) and constrained (DSRL) offline decision-making benchmarks, demonstrating consistent gains over existing approaches. Furthermore, we present a preliminary study extending MPDiffuser to vision-based control tasks, showing its potential to scale to high-dimensional sensory inputs. Finally, we deploy our method on a real quadrupedal robot, showcasing its practicality for real-world control.

Related papers

State-Action Inpainting Diffuser for Continuous Control with Delay [28.10905055038984]
State-Action Inpainting Diffuser (SAID) is a framework that integrates the inductive bias of dynamics learning with the direct decision-making capability of policy optimization.<n>Our study suggests a new methodology to advance the field of continuous control and reinforcement learning with delay.
arXiv Detail & Related papers (2026-03-02T07:28:27Z)
Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z)
Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model.<n>By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data.<n>On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z)
Controllable Motion Generation via Diffusion Modal Coupling [19.534234002173314]
We propose a novel framework that enhances controllability in diffusion models by leveraging multi-modal prior distributions.<n>We evaluate our approach on motion prediction using a dataset and multi-task control in Maze2D environments.
arXiv Detail & Related papers (2025-03-04T07:22:34Z)
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z)
FlowDAS: A Stochastic Interpolant-based Framework for Data Assimilation [15.64941169350615]
Data assimilation (DA) integrates observations with a dynamical model to estimate states of PDE-governed systems.<n>FlowDAS is a generative DA framework that uses interpolants to learn state transition dynamics.<n>We show that FlowDAS surpasses model-driven methods, neural operators, and score-based baselines in accuracy and physical plausibility.
arXiv Detail & Related papers (2025-01-13T05:03:41Z)
Off-dynamics Conditional Diffusion Planners [15.321049697197447]
This work explores the use of more readily available, albeit off-dynamics datasets, to address the challenge of data scarcity in Offline RL. We propose a novel approach using conditional Diffusion Probabilistic Models (DPMs) to learn the joint distribution of the large-scale off-dynamics dataset and the limited target dataset.
arXiv Detail & Related papers (2024-10-16T04:56:43Z)
MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL [25.76141096396645]
We propose a task-oriented conditioned diffusion planner for offline meta-RL(MetaDiffuser) The proposed framework enjoys the robustness to the quality of collected warm-start data from the testing task. Experiment results on MuJoCo benchmarks show that MetaDiffuser outperforms other strong offline meta-RL baselines.
arXiv Detail & Related papers (2023-05-31T15:01:38Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization [60.73540999409032]
We show that expressive autoregressive dynamics models generate different dimensions of the next state and reward sequentially conditioned on previous dimensions. We also show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer.
arXiv Detail & Related papers (2021-04-28T16:48:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.