Manifold-Constrained Energy-Based Transition Models for Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2602.02900v1
- Date: Mon, 02 Feb 2026 23:15:43 GMT
- Title: Manifold-Constrained Energy-Based Transition Models for Offline Reinforcement Learning
- Authors: Zeyu Fang, Zuyuan Zhang, Mahdi Imani, Tian Lan,
- Abstract summary: We train conditional energy-based transition models using a manifold projection--diffusion negative sampler.<n>MC-ETM learns a latent manifold of next states and generates near-manifold hard negatives.<n>We formalize MC-ETM through a hybrid pessimistic MDP formulation and derive a conservative performance bound separating in-support evaluation error from truncation risk.
- Score: 13.92596311376194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model-based offline reinforcement learning is brittle under distribution shift: policy improvement drives rollouts into state--action regions weakly supported by the dataset, where compounding model error yields severe value overestimation. We propose Manifold-Constrained Energy-based Transition Models (MC-ETM), which train conditional energy-based transition models using a manifold projection--diffusion negative sampler. MC-ETM learns a latent manifold of next states and generates near-manifold hard negatives by perturbing latent codes and running Langevin dynamics in latent space with the learned conditional energy, sharpening the energy landscape around the dataset support and improving sensitivity to subtle out-of-distribution deviations. For policy optimization, the learned energy provides a single reliability signal: rollouts are truncated when the minimum energy over sampled next states exceeds a threshold, and Bellman backups are stabilized via pessimistic penalties based on Q-value-level dispersion across energy-guided samples. We formalize MC-ETM through a hybrid pessimistic MDP formulation and derive a conservative performance bound separating in-support evaluation error from truncation risk. Empirically, MC-ETM improves multi-step dynamics fidelity and yields higher normalized returns on standard offline control benchmarks, particularly under irregular dynamics and sparse data coverage.
Related papers
- Out-of-distribution transfer of PDE foundation models to material dynamics under extreme loading [86.6550968435969]
Most PDE foundation models are pretrained and fine-tuned on fluid-centric benchmarks.<n>We benchmark out-of-distribution transfer on two discontinuity-dominated regimes in which shocks, evolving interfaces, and fracture produce highly non-smooth fields.<n>We evaluate two open-source PDE foundation models, POSEIDON and MORPH, and compare fine-tuning from pretrained weights against training from scratch across training-set sizes to quantify sample efficiency under distribution shift.
arXiv Detail & Related papers (2026-03-04T18:19:35Z) - Energy-Guided Flow Matching Enables Few-Step Conformer Generation and Ground-State Identification [45.52894539097255]
We present EnFlow, a unified framework that couples flow matching with an explicitly learned energy model.<n>By incorporating energy-gradient guidance during sampling, our method steers trajectories toward lower-energy regions.<n>The learned energy function further enables efficient energy-based ranking of generated ensembles for accurate ground-state identification.
arXiv Detail & Related papers (2025-12-27T14:00:22Z) - MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics [72.00014675808228]
Instability in Large Language Models evaluation process obscures true learning dynamics.<n>We introduce textbfMaP, a framework that integrates underlineMerging underlineand the underlinePass@k metric.<n>Experiments show that MaP yields significantly smoother performance curves, reduces inter-run variance, and ensures more consistent rankings.
arXiv Detail & Related papers (2025-10-10T11:40:27Z) - Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z) - Metriplectic Conditional Flow Matching for Dissipative Dynamics [5.920407670799846]
conditional flow matching learns dissipative dynamics without violating first principles.<n>MCFM trains via conditional flow matching on short transitions, avoiding long rollout adjoints.<n>We provide continuous and discrete time guarantees linking this parameterization and sampler to conservation, monotonic dissipation, and stable rollouts.
arXiv Detail & Related papers (2025-09-23T19:46:54Z) - Kolmogorov-Arnold Energy Models: Fast and Interpretable Generative Modeling [0.0]
We introduce the Kolmogorov-Arnold Energy Model (KAEM) to take advantage of structural and inductive biases.<n> KAEM balances common generative modeling trade-offs, offering fast inference, interpretability, and stable training, while being naturally suited to Zettascale Computing hardware.
arXiv Detail & Related papers (2025-06-17T04:07:32Z) - Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling [5.787270247104665]
Current state-of-the-art generative models map noise to data distributions by matching flows or scores.<n>Here, we propose Energy Matching, a framework that endows flow-based approaches with the flexibility of EBMs.
arXiv Detail & Related papers (2025-04-14T18:10:58Z) - Learning Energy-Based Prior Model with Diffusion-Amortized MCMC [89.95629196907082]
Common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress.
We introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it.
arXiv Detail & Related papers (2023-10-05T00:23:34Z) - Autoregressive Dynamics Models for Offline Policy Evaluation and
Optimization [60.73540999409032]
We show that expressive autoregressive dynamics models generate different dimensions of the next state and reward sequentially conditioned on previous dimensions.
We also show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer.
arXiv Detail & Related papers (2021-04-28T16:48:44Z) - No MCMC for me: Amortized sampling for fast and stable training of
energy-based models [62.1234885852552]
Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty.
We present a simple method for training EBMs at scale using an entropy-regularized generator to amortize the MCMC sampling.
Next, we apply our estimator to the recently proposed Joint Energy Model (JEM), where we match the original performance with faster and stable training.
arXiv Detail & Related papers (2020-10-08T19:17:20Z) - Training Deep Energy-Based Models with f-Divergence Minimization [113.97274898282343]
Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging.
We propose a general variational framework termed f-EBM to train EBMs using any desired f-divergence.
Experimental results demonstrate the superiority of f-EBM over contrastive divergence, as well as the benefits of training EBMs using f-divergences other than KL.
arXiv Detail & Related papers (2020-03-06T23:11:13Z) - Targeted free energy estimation via learned mappings [66.20146549150475]
Free energy perturbation (FEP) was proposed by Zwanzig more than six decades ago as a method to estimate free energy differences.
FEP suffers from a severe limitation: the requirement of sufficient overlap between distributions.
One strategy to mitigate this problem, called Targeted Free Energy Perturbation, uses a high-dimensional mapping in configuration space to increase overlap.
arXiv Detail & Related papers (2020-02-12T11:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.