Related papers: MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning

MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning

URL: http://arxiv.org/abs/2602.23770v1
Date: Fri, 27 Feb 2026 07:56:33 GMT
Title: MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning
Authors: Chenxing Lin, Xinhui Gao, Haipeng Zhang, Xinran Li, Haitao Wang, Songzhu Mei, Chenglu Wen, Weiquan Liu, Siqi Shen, Cheng Wang,
Abstract summary: We propose MAGE, a Multi-scale Autoregressive GEneration-based offline RL method.<n>MAGE incorporates a condition-guided multi-scale autoencoder to learn hierarchical trajectory representations.<n>Experiments show that MAGE successfully integrates multi-scale trajectory modeling with conditional guidance.
Score: 42.779100789823055
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative models have gained significant traction in offline reinforcement learning (RL) due to their ability to model complex trajectory distributions. However, existing generation-based approaches still struggle with long-horizon tasks characterized by sparse rewards. Some hierarchical generation methods have been developed to mitigate this issue by decomposing the original problem into shorter-horizon subproblems using one policy and generating detailed actions with another. While effective, these methods often overlook the multi-scale temporal structure inherent in trajectories, resulting in suboptimal performance. To overcome these limitations, we propose MAGE, a Multi-scale Autoregressive GEneration-based offline RL method. MAGE incorporates a condition-guided multi-scale autoencoder to learn hierarchical trajectory representations, along with a multi-scale transformer that autoregressively generates trajectory representations from coarse to fine temporal scales. MAGE effectively captures temporal dependencies of trajectories at multiple resolutions. Additionally, a condition-guided decoder is employed to exert precise control over short-term behaviors. Extensive experiments on five offline RL benchmarks against fifteen baseline algorithms show that MAGE successfully integrates multi-scale trajectory modeling with conditional guidance, generating coherent and controllable trajectories in long-horizon sparse-reward settings.

Related papers

Parallel Diffusion Solver via Residual Dirichlet Policy Optimization [88.7827307535107]
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature.<n>Existing solver-based acceleration methods often face significant image quality degradation under a low-dimensional budget.<n>We propose the Ensemble Parallel Direction solver (dubbed as EPD-EPr), a novel ODE solver that mitigates these errors by incorporating multiple gradient parallel evaluations in each step.
arXiv Detail & Related papers (2025-12-28T05:48:55Z)
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning [61.380634253724594]
Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL)<n>We show that it is possible to overcome this problem by acting and exploring within the internal representations of an autoregressive model.
arXiv Detail & Related papers (2025-12-23T18:51:50Z)
Efficient Generative Transformer Operators For Million-Point PDEs [12.324265832276538]
ECHO is a transformer-operator framework for generating million-point PDE trajectories.<n>We demonstrate state-of-the-art performance on million-point simulations featuring complex, high-frequency dynamics, and long-term horizons.
arXiv Detail & Related papers (2025-12-04T16:46:48Z)
DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions [6.723690093335988]
We propose a diffusion-based world model that generates future state-reward trajectories conditioned on the current state, action, and return-to-go.<n>We show that conservative offline RL algorithms such as TD3BC and IQL benefit significantly from training on these augmented trajectories.
arXiv Detail & Related papers (2025-09-23T20:06:26Z)
State-Covering Trajectory Stitching for Diffusion Planners [29.89423911968709]
State-Covering Trajectory Stitching (SCoTS) is a reward-free trajectory augmentation method that stitches together short trajectory segments.<n>We demonstrate that SCoTS significantly improves the performance and generalization capabilities of diffusion planners on offline goal-conditioned benchmarks.
arXiv Detail & Related papers (2025-06-01T08:32:22Z)
Extendable Long-Horizon Planning via Hierarchical Multiscale Diffusion [62.91968752955649]
This paper tackles a novel problem, extendable long-horizon planning-enabling agents to plan trajectories longer than those in training data without compounding errors.<n>We propose an augmentation method that iteratively generates longer trajectories by stitching shorter ones.<n>HM-Diffuser trains on these extended trajectories using a hierarchical structure, efficiently handling tasks across multiple temporal scales.
arXiv Detail & Related papers (2025-03-25T22:52:46Z)
Multi-Agent Path Finding in Continuous Spaces with Projected Diffusion Models [57.45019514036948]
Multi-Agent Path Finding (MAPF) is a fundamental problem in robotics.<n>This work proposes a novel approach that integrates constrained optimization with diffusion models for MAPF in continuous spaces.
arXiv Detail & Related papers (2024-12-23T21:27:19Z)
Navigation with QPHIL: Quantizing Planner for Hierarchical Implicit Q-Learning [17.760679318994384]
We present a novel hierarchical transformer-based approach leveraging a learned quantizer of the space. This quantization enables the training of a simpler zone-conditioned low-level policy and simplifies planning. Our proposed approach achieves state-of-the-art results in complex long-distance navigation environments.
arXiv Detail & Related papers (2024-11-12T12:49:41Z)
Haar Wavelet based Block Autoregressive Flows for Trajectories [129.37479472754083]
Prediction of trajectories such as that of pedestrians is crucial to the performance of autonomous agents. We introduce a novel Haar wavelet based block autoregressive model leveraging split couplings. We illustrate the advantages of our approach for generating diverse and accurate trajectories on two real-world datasets.
arXiv Detail & Related papers (2020-09-21T13:57:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.