Selecting Offline Reinforcement Learning Algorithms for Stochastic Network Control
- URL: http://arxiv.org/abs/2603.03932v1
- Date: Wed, 04 Mar 2026 10:41:10 GMT
- Title: Selecting Offline Reinforcement Learning Algorithms for Stochastic Network Control
- Authors: Nicolas Helson, Pegah Alizadeh, Anastasios Giovanidis,
- Abstract summary: offline RL is a promising approach for next-generation wireless networks.<n>We show that Conservative Q-Learning consistently produces more robust policies across different sources ofity.<n>These findings provide practical guidance for offline RL algorithm selection in AI-driven network control pipelines.
- Score: 2.3220521366735247
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Offline Reinforcement Learning (RL) is a promising approach for next-generation wireless networks, where online exploration is unsafe and large amounts of operational data can be reused across the model lifecycle. However, the behavior of offline RL algorithms under genuinely stochastic dynamics -- inherent to wireless systems due to fading, noise, and traffic mobility -- remains insufficiently understood. We address this gap by evaluating Bellman-based (Conservative Q-Learning), sequence-based (Decision Transformers), and hybrid (Critic-Guided Decision Transformers) offline RL methods in an open-access stochastic telecom environment (mobile-env). Our results show that Conservative Q-Learning consistently produces more robust policies across different sources of stochasticity, making it a reliable default choice in lifecycle-driven AI management frameworks. Sequence-based methods remain competitive and can outperform Bellman-based approaches when sufficient high-return trajectories are available. These findings provide practical guidance for offline RL algorithm selection in AI-driven network control pipelines, such as O-RAN and future 6G functions, where robustness and data availability are key operational constraints.
Related papers
- Offline Reinforcement Learning for Mobility Robustness Optimization [3.7164203452531233]
We revisit the Mobility Robustness optimisation algorithm and study the possibility of learning the optimal Cell Individual Offset tuning using offline Reinforcement Learning.<n>Such methods make use of collected offline datasets to learn the optimal policy, without further exploration.<n>We adapt and apply a sequence-based method called Decision Transformers as well as a value-based method called Conservative Q-Learning to learn the optimal policy for the same target reward as the vanilla rule-based MRO.
arXiv Detail & Related papers (2025-06-28T07:31:01Z) - Offline and Distributional Reinforcement Learning for Wireless Communications [5.771885923067511]
Traditional online reinforcement learning (RL) and deep RL methods face limitations in real-time wireless networks.<n>We focus on offline and distributional RL, two advanced RL techniques that can overcome these challenges.<n>We introduce a novel framework that combines offline and distributional RL for wireless communication applications.
arXiv Detail & Related papers (2025-04-04T09:24:39Z) - Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z) - Resilient UAV Trajectory Planning via Few-Shot Meta-Offline Reinforcement Learning [5.771885923067511]
This work proposes a novel, resilient, few-shot meta-offline RL algorithm combining offline RL and model-agnostic meta-learning.<n>We show that the proposed few-shot meta-offline RL algorithm converges faster than baseline schemes.<n>It is the only algorithm that can achieve optimal joint AoI and transmission power using an offline dataset.
arXiv Detail & Related papers (2025-02-03T11:39:12Z) - Offline Reinforcement Learning and Sequence Modeling for Downlink Link Adaptation [3.687363450234871]
Link adaptation (LA) is an essential function in modern wireless communication systems.<n>LA dynamically adjusts the transmission rate of a communication link to match time- and frequency-varying radio link conditions.<n>Recent research has introduced online reinforcement learning approaches as an alternative to the more commonly used rule-based algorithms.
arXiv Detail & Related papers (2024-10-30T14:01:31Z) - MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Environment Transformer and Policy Optimization for Model-Based Offline
Reinforcement Learning [25.684201757101267]
We propose an uncertainty-aware sequence modeling architecture called Environment Transformer.
Benefiting from the accurate modeling of the transition dynamics and reward function, Environment Transformer can be combined with arbitrary planning, dynamics programming, or policy optimization algorithms for offline RL.
arXiv Detail & Related papers (2023-03-07T11:26:09Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - Semantic-Aware Collaborative Deep Reinforcement Learning Over Wireless
Cellular Networks [82.02891936174221]
Collaborative deep reinforcement learning (CDRL) algorithms in which multiple agents can coordinate over a wireless network is a promising approach.
In this paper, a novel semantic-aware CDRL method is proposed to enable a group of untrained agents with semantically-linked DRL tasks to collaborate efficiently across a resource-constrained wireless cellular network.
arXiv Detail & Related papers (2021-11-23T18:24:47Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.