Structural Information-based Hierarchical Diffusion for Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2509.21942v1
- Date: Fri, 26 Sep 2025 06:24:06 GMT
- Title: Structural Information-based Hierarchical Diffusion for Offline Reinforcement Learning
- Authors: Xianghua Zeng, Hao Peng, Angsheng Li, Yicheng Pan,
- Abstract summary: We propose Structural Information-based Hierarchical Diffusion framework for effective and stable offline policy learning.<n>We analyze structural information embedded in offline trajectories to construct the diffusion hierarchy adaptively.<n>We show that SIHD significantly outperforms state-of-the-art baselines in decision-making performance.
- Score: 13.839214658191038
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion-based generative methods have shown promising potential for modeling trajectories from offline reinforcement learning (RL) datasets, and hierarchical diffusion has been introduced to mitigate variance accumulation and computational challenges in long-horizon planning tasks. However, existing approaches typically assume a fixed two-layer diffusion hierarchy with a single predefined temporal scale, which limits adaptability to diverse downstream tasks and reduces flexibility in decision making. In this work, we propose SIHD, a novel Structural Information-based Hierarchical Diffusion framework for effective and stable offline policy learning in long-horizon environments with sparse rewards. Specifically, we analyze structural information embedded in offline trajectories to construct the diffusion hierarchy adaptively, enabling flexible trajectory modeling across multiple temporal scales. Rather than relying on reward predictions from localized sub-trajectories, we quantify the structural information gain of each state community and use it as a conditioning signal within the corresponding diffusion layer. To reduce overreliance on offline datasets, we introduce a structural entropy regularizer that encourages exploration of underrepresented states while avoiding extrapolation errors from distributional shifts. Extensive evaluations on challenging offline RL tasks show that SIHD significantly outperforms state-of-the-art baselines in decision-making performance and demonstrates superior generalization across diverse scenarios.
Related papers
- Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration [56.074760766965085]
PRISM achieves a dynamics-aware framework that arbitrates data based on its degree of cognitive conflict with the model's existing knowledge.<n>Our findings suggest that disentangling data based on internal optimization regimes is crucial for scalable and robust agent alignment.
arXiv Detail & Related papers (2026-01-12T05:43:20Z) - Regularizing Subspace Redundancy of Low-Rank Adaptation [54.473090597164834]
We propose ReSoRA, a method that explicitly models redundancy between mapping subspaces and adaptively Regularizes Subspace redundancy of Low-Rank Adaptation.<n>Our proposed method consistently facilitates existing state-of-the-art PETL methods across various backbones and datasets in vision-language retrieval and standard visual classification benchmarks.<n>As a training supervision, ReSoRA can be seamlessly integrated into existing approaches in a plug-and-play manner, with no additional inference costs.
arXiv Detail & Related papers (2025-07-28T11:52:56Z) - RAD: Retrieval High-quality Demonstrations to Enhance Decision-making [23.136426643341462]
offline reinforcement learning (RL) enables agents to learn policies from fixed datasets.<n>RL is often limited by dataset sparsity and the lack of transition overlap between suboptimal and expert trajectories.<n>We propose Retrieval High-quAlity Demonstrations (RAD) for decision-making, which combines non-parametric retrieval with diffusion-based generative modeling.
arXiv Detail & Related papers (2025-07-21T08:08:18Z) - State-Covering Trajectory Stitching for Diffusion Planners [23.945423041112036]
State-Covering Trajectory Stitching (SCoTS) is a reward-free trajectory augmentation method that stitches together short trajectory segments.<n>We demonstrate that SCoTS significantly improves the performance and generalization capabilities of diffusion planners on offline goal-conditioned benchmarks.
arXiv Detail & Related papers (2025-06-01T08:32:22Z) - STRAP: Spatio-Temporal Pattern Retrieval for Out-of-Distribution Generalization [34.53308463024231]
We propose an innovative Spatio-Temporal Retrieval-Augmented Pattern Learning framework, STRAP.<n>During inference, STRAP retrieves relevant patterns from this library based on similarity to the current input and injects them into the model via a plug-and-play prompting mechanism.<n>Experiments across multiple real-world streaming graph datasets show that STRAP consistently outperforms state-of-the-art STGNN baselines on STOOD tasks.
arXiv Detail & Related papers (2025-05-26T06:11:05Z) - Spatial-Temporal-Spectral Unified Modeling for Remote Sensing Dense Prediction [20.1863553357121]
Current deep learning architectures for remote sensing are fundamentally rigid.<n>We introduce the Spatial-Temporal-Spectral Unified Network (STSUN) for unified modeling.<n> STSUN can adapt to input and output data with arbitrary spatial sizes, temporal lengths, and spectral bands.<n>It unifies various dense prediction tasks and diverse semantic class predictions.
arXiv Detail & Related papers (2025-05-18T07:39:17Z) - Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model.<n>By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data.<n>On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z) - Hybrid Recurrent Models Support Emergent Descriptions for Hierarchical Planning and Control [0.8749675983608172]
A class of hybrid state-space model known as recurrent switching linear dynamical systems (rSLDS) discovers meaningful behavioural units.
We propose that the rich representations formed by an rSLDS can provide useful abstractions for planning and control.
We present a novel hierarchical model-based algorithm inspired by Active Inference in which a discrete MDP sits above a low-level linear-quadratic controller.
arXiv Detail & Related papers (2024-08-20T16:02:54Z) - Causal prompting model-based offline reinforcement learning [16.95292725275873]
Model-based offline RL allows agents to fully utilise pre-collected datasets without requiring additional or unethical explorations.
Applying model-based offline RL to online systems presents challenges due to the highly suboptimal (noise-filled) and diverse nature of datasets generated by online systems.
We introduce the Causal Prompting Reinforcement Learning framework, designed for highly suboptimal and resource-constrained online scenarios.
arXiv Detail & Related papers (2024-06-03T07:28:57Z) - Simple Hierarchical Planning with Diffusion [54.48129192534653]
Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets.
We introduce the Hierarchical diffuser, a fast, yet surprisingly effective planning method combining the advantages of hierarchical and diffusion-based planning.
Our model adopts a "jumpy" planning strategy at the higher level, which allows it to have a larger receptive field but at a lower computational cost.
arXiv Detail & Related papers (2024-01-05T05:28:40Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn.
We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z) - Network Diffusions via Neural Mean-Field Dynamics [52.091487866968286]
We propose a novel learning framework for inference and estimation problems of diffusion on networks.
Our framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities.
Our approach is versatile and robust to variations of the underlying diffusion network models.
arXiv Detail & Related papers (2020-06-16T18:45:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.