Offline Reinforcement Learning for Wireless Network Optimization with
Mixture Datasets
- URL: http://arxiv.org/abs/2311.11423v1
- Date: Sun, 19 Nov 2023 21:02:17 GMT
- Title: Offline Reinforcement Learning for Wireless Network Optimization with
Mixture Datasets
- Authors: Kun Yang, Cong Shen, Jing Yang, Shu-ping Yeh, Jerry Sydir
- Abstract summary: Reinforcement learning (RL) has boosted the adoption of online RL for wireless radio resource management (RRM)
Online RL algorithms require direct interactions with the environment.
offline RL can produce a near-optimal RL policy even when all involved behavior policies are highly suboptimal.
- Score: 13.22086908661673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent development of reinforcement learning (RL) has boosted the
adoption of online RL for wireless radio resource management (RRM). However,
online RL algorithms require direct interactions with the environment, which
may be undesirable given the potential performance loss due to the unavoidable
exploration in RL. In this work, we first investigate the use of \emph{offline}
RL algorithms in solving the RRM problem. We evaluate several state-of-the-art
offline RL algorithms, including behavior constrained Q-learning (BCQ),
conservative Q-learning (CQL), and implicit Q-learning (IQL), for a specific
RRM problem that aims at maximizing a linear combination {of sum and}
5-percentile rates via user scheduling. We observe that the performance of
offline RL for the RRM problem depends critically on the behavior policy used
for data collection, and further propose a novel offline RL solution that
leverages heterogeneous datasets collected by different behavior policies. We
show that with a proper mixture of the datasets, offline RL can produce a
near-optimal RL policy even when all involved behavior policies are highly
suboptimal.
Related papers
- Offline and Distributional Reinforcement Learning for Radio Resource Management [5.771885923067511]
Reinforcement learning (RL) has proved to have a promising role in future intelligent wireless networks.
Online RL has been adopted for radio resource management (RRM), taking over traditional schemes.
We propose an offline and distributional RL scheme for the RRM problem, enabling offline training using a static dataset.
arXiv Detail & Related papers (2024-09-25T09:22:23Z) - Advancing RAN Slicing with Offline Reinforcement Learning [15.259182716723496]
This paper introduces offlineReinforcement Learning to solve the RAN slicing problem.
We show how offline RL can effectively learn near-optimal policies from sub-optimal datasets.
We also present empirical evidence of the efficacy of offline RL in adapting to various service-level requirements.
arXiv Detail & Related papers (2023-12-16T22:09:50Z) - Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid
Reinforcement Learning [66.43003402281659]
A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset.
We design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL.
The proposed algorithm does not require any reward information during data collection.
arXiv Detail & Related papers (2023-05-17T15:17:23Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline
and Online RL [48.552287941528]
Off-policy reinforcement learning holds the promise of sample-efficient learning of decision-making policies.
In the offline RL setting, standard off-policy RL methods can significantly underperform.
We introduce Expected-Max Q-Learning (EMaQ), which is more closely related to the resulting practical algorithm.
arXiv Detail & Related papers (2020-07-21T21:13:02Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.