Related papers: Improving Diffusion Planners by Self-Supervised Action Gating with Energies

Improving Diffusion Planners by Self-Supervised Action Gating with Energies

URL: http://arxiv.org/abs/2603.02650v1
Date: Tue, 03 Mar 2026 06:36:16 GMT
Title: Improving Diffusion Planners by Self-Supervised Action Gating with Energies
Authors: Yuan Lu, Dongqi Han, Yansen Wang, Dongsheng Li,
Abstract summary: We propose Self-supervised Action Gating with Energies (SAGE) to penalise dynamically inconsistent plans using a latent consistency signal.<n>SAGE trains a Joint-Embedding Predictive Architecture (JEPA) encoder on offline state sequences and an action-conditioned latent predictor for short horizon transitions.<n>At test time, SAGE assigns each sampled candidate an energy given by its latent prediction error and combines this feasibility score with value estimates to select actions.
Score: 31.430422680816907
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion planners are a strong approach for offline reinforcement learning, but they can fail when value-guided selection favours trajectories that score well yet are locally inconsistent with the environment dynamics, resulting in brittle execution. We propose Self-supervised Action Gating with Energies (SAGE), an inference-time re-ranking method that penalises dynamically inconsistent plans using a latent consistency signal. SAGE trains a Joint-Embedding Predictive Architecture (JEPA) encoder on offline state sequences and an action-conditioned latent predictor for short horizon transitions. At test time, SAGE assigns each sampled candidate an energy given by its latent prediction error and combines this feasibility score with value estimates to select actions. SAGE can integrate into existing diffusion planning pipelines that can sample trajectories and select actions via value scoring; it requires no environment rollouts and no policy re-training. Across locomotion, navigation, and manipulation benchmarks, SAGE improves the performance and robustness of diffusion planners.

Related papers

TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training [53.93696896939915]
Training tool-use agents typically rely on Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks.<n>We propose TopoCurate, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology.<n>TopoCurate achieves consistent gains of 4.2% (SFT) and 6.9% (RL) over state-of-the-art baselines.
arXiv Detail & Related papers (2026-03-02T10:38:54Z)
Planning as Descent: Goal-Conditioned Latent Trajectory Synthesis in Learned Energy Landscapes [0.8703455323398351]
Planning as Descent (PaD) is a framework for offline goal-conditioned reinforcement learning.<n>PaD learns a goal-conditioned energy function over entire latent trajectories, assigning low energy to feasible, goal-consistent futures.<n>Our results suggest learning to evaluate and refine trajectories provides a robust alternative to direct policy learning.
arXiv Detail & Related papers (2025-12-19T17:49:13Z)
Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making [48.998030470623384]
offline decision-making requires reliable behaviors from fixed datasets without further interaction.<n>We propose a compositional model-based diffusion framework consisting of: (i) a planner that generates diverse, task-aligned trajectories; (ii) a dynamics model that enforces consistency with the underlying system dynamics; and (iii) a ranker module that selects behaviors aligned with the task objectives.
arXiv Detail & Related papers (2025-12-09T06:26:02Z)
Bayesian Active Inference for Intelligent UAV Anti-Jamming and Adaptive Trajectory Planning [5.620125209890186]
This paper proposes a hierarchical trajectory planning framework for UAVs operating under adversarial jamming conditions.<n>The approach combines expert-generated demonstrations with probabilistic generative modeling to encode high-level symbolic planning, low-level motion policies, and wireless signal feedback.<n>During deployment, the UAV performs online inference to anticipate interference, localize jammers, and adapt its trajectory accordingly, without prior knowledge of jammer locations.
arXiv Detail & Related papers (2025-12-05T13:38:52Z)
The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas [56.62286434195321]
This paper systematically studies the effectiveness of two different action representations.<n>We propose cognitive bandwidth perspective as a conceptual framework to qualitatively understand the differences.<n>We provide an actionable guide for building more capable PwS agents for better scalable autonomy.
arXiv Detail & Related papers (2025-10-08T14:47:40Z)
TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning [63.73629127832652]
We introduce TD-JEPA, which leverages TD-based latent-predictive representations into unsupervised RL.<n> TD-JEPA trains explicit state and task encoders, a policy-conditioned multi-step predictor, and a set of parameterized policies directly in latent space.<n> Empirically, TD-JEPA matches or outperforms state-of-the-art baselines on locomotion, navigation, and manipulation tasks across 13 datasets.
arXiv Detail & Related papers (2025-10-01T10:21:18Z)
Predictive Planner for Autonomous Driving with Consistency Models [5.966385886363771]
Trajectory prediction and planning are essential for autonomous vehicles to navigate safely and efficiently in dynamic environments.<n>Recent diffusion-based generative models have shown promise in multi-agent trajectory generation, but their slow sampling is less suitable for high-frequency planning tasks.<n>We leverage the consistency model to build a predictive planner that samples from a joint distribution of ego and surrounding agents, conditioned on the ego vehicle's navigational goal.
arXiv Detail & Related papers (2025-02-12T00:26:01Z)
Resisting Stochastic Risks in Diffusion Planners with the Trajectory Aggregation Tree [20.855596726996712]
Trajectory Aggregation Tree (TAT) is a dynamic tree-like structure based on historical and current trajectories. TAT can be deployed without modifying the original training and sampling pipelines of diffusion planners, making it a training-free, ready-to-deploy solution. Our results highlight its remarkable ability to resist the risk from unreliable trajectories, guarantee the performance boosting of diffusion planners in $100%$ of tasks, and exhibit an appreciable tolerance margin for sample quality, thereby enabling planning with a more than $3times$ acceleration.
arXiv Detail & Related papers (2024-05-28T06:57:22Z)
Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z)
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)
Active Inference and Behavior Trees for Reactive Action Planning and Execution in Robotics [2.040132783511305]
We propose a hybrid combination of active inference and behavior trees (BTs) for reactive action planning and execution in dynamic environments. The proposed approach allows to handle partially observable initial states and improves the robustness of classical BTs against unexpected contingencies.
arXiv Detail & Related papers (2020-11-19T10:24:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.