Related papers: Forecasting in Offline Reinforcement Learning for Non-stationary Environments

Forecasting in Offline Reinforcement Learning for Non-stationary Environments

URL: http://arxiv.org/abs/2512.01987v2
Date: Tue, 02 Dec 2025 17:50:43 GMT
Title: Forecasting in Offline Reinforcement Learning for Non-stationary Environments
Authors: Suzan Ece Ada, Georg Martius, Emre Ugur, Erhan Oztop,
Abstract summary: We introduce Forecasting in Non-stationary Offline RL (FORL), a framework that unifies conditional diffusion-based candidate state generation.<n>FORL targets environments prone to unexpected, potentially non-Markovian offsets, requiring robust agent performance from the onset of each episode.<n> Empirical evaluations on offline RL benchmarks, augmented with real-world time-series data, demonstrate that FORL consistently improves performance compared to competitive baselines.
Score: 23.889016600249295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline Reinforcement Learning (RL) provides a promising avenue for training policies from pre-collected datasets when gathering additional interaction data is infeasible. However, existing offline RL methods often assume stationarity or only consider synthetic perturbations at test time, assumptions that often fail in real-world scenarios characterized by abrupt, time-varying offsets. These offsets can lead to partial observability, causing agents to misperceive their true state and degrade performance. To overcome this challenge, we introduce Forecasting in Non-stationary Offline RL (FORL), a framework that unifies (i) conditional diffusion-based candidate state generation, trained without presupposing any specific pattern of future non-stationarity, and (ii) zero-shot time-series foundation models. FORL targets environments prone to unexpected, potentially non-Markovian offsets, requiring robust agent performance from the onset of each episode. Empirical evaluations on offline RL benchmarks, augmented with real-world time-series data to simulate realistic non-stationarity, demonstrate that FORL consistently improves performance compared to competitive baselines. By integrating zero-shot forecasting with the agent's experience, we aim to bridge the gap between offline RL and the complexities of real-world, non-stationary environments.

Related papers

When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training [58.25341036646294]
We analytically examine why learning recurrent poles does not provide tangible benefits in data and empirically offer real-time learning scenarios.<n>We show that fixed-pole networks achieve superior performance with lower training complexity, making them more suitable for online real-time tasks.
arXiv Detail & Related papers (2026-02-25T00:15:13Z)
Scaling Online Distributionally Robust Reinforcement Learning: Sample-Efficient Guarantees with General Function Approximation [18.596128578766958]
Distributionally robust RL (DR-RL) addresses this issue by optimizing worst-case performance over an uncertainty set of transition dynamics.<n>We propose an online DR-RL algorithm with general function approximation that learns an optimal robust policy purely through interaction with the environment.<n>We provide a theoretical analysis establishing a near-optimal sublinear regret bound under a total variation uncertainty set, demonstrating the sample efficiency and effectiveness of our method.
arXiv Detail & Related papers (2025-12-22T02:12:04Z)
Hybrid Cross-domain Robust Reinforcement Learning [26.850955692805186]
Robust reinforcement learning (RL) aims to learn policies that remain effective despite uncertainties in its environment.<n>In this paper, we introduce HYDRO, the first Hybrid Cross-Domain Robust RL framework designed to address these challenges.<n>By measuring and minimizing performance gaps between the simulator and the worst-case models in the uncertainty set, HYDRO employs novel uncertainty filtering and prioritized sampling to select the most relevant and reliable simulator samples.
arXiv Detail & Related papers (2025-05-29T02:25:13Z)
Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator [50.191655141020505]
Reinforcement Learning (RL) has demonstrated impressive capabilities in robotic control but remains challenging due to high sample complexity, safety concerns, and the sim-to-real gap.<n>We introduce Offline Robotic World Model (RWM-O), a model-based approach that explicitly estimates uncertainty to improve policy learning without reliance on a physics simulator.
arXiv Detail & Related papers (2025-04-23T12:58:15Z)
Streetwise Agents: Empowering Offline RL Policies to Outsmart Exogenous Stochastic Disturbances in RTC [21.20874303316171]
We propose a novel post-deployment shaping of policies conditioned on real-time characterization of out-of-distribution sub-spaces. Our experimental results on BWE and other standard offline RL benchmark environments demonstrate a significant improvement.
arXiv Detail & Related papers (2024-11-11T09:22:09Z)
Offline Reinforcement Learning with Imbalanced Datasets [23.454333727200623]
A real-world offline reinforcement learning (RL) dataset is often imbalanced over the state space due to the challenge of exploration or safety considerations. We show that typically offline RL methods based on distributional constraints, such as conservative Q-learning (CQL), are ineffective in extracting policies under the imbalanced dataset. Inspired by natural intelligence, we propose a novel offline RL method that utilizes the augmentation of CQL with a retrieval process to recall past related experiences.
arXiv Detail & Related papers (2023-07-06T03:22:19Z)
FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [54.34189781923818]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.<n>We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.<n>We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z)
Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity [39.886149789339335]
offline reinforcement learning aims to learn to perform decision making from history data without active exploration. Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset. We consider a distributionally robust formulation of offline RL, focusing on robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings.
arXiv Detail & Related papers (2022-08-11T11:55:31Z)
Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning [62.19209005400561]
offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learning from static datasets. A key challenge of offline RL is the instability of policy training, caused by the mismatch between the distribution of the offline data and the undiscounted stationary state-action distribution of the learned policy. We regularize the undiscounted stationary distribution of the current policy towards the offline data during the policy optimization process.
arXiv Detail & Related papers (2022-06-14T20:56:16Z)
Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes. A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z)
Instabilities of Offline RL with Pre-Trained Neural Representation [127.89397629569808]
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated. Recent theoretical advances have shown that such sample-efficient offline RL is indeed possible provided certain strong representational conditions hold. This work studies these issues from an empirical perspective to gauge how stable offline RL methods are.
arXiv Detail & Related papers (2021-03-08T18:06:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.