Off-policy Learning for Remote Electrical Tilt Optimization
- URL: http://arxiv.org/abs/2005.10577v1
- Date: Thu, 21 May 2020 11:30:31 GMT
- Title: Off-policy Learning for Remote Electrical Tilt Optimization
- Authors: Filippo Vannella, Jaeseong Jeong, Alexandre Proutiere
- Abstract summary: We address the problem of Remote Electrical Tilt (RET) optimization using off-policy Contextual Multi-Armed-Bandit (CMAB) techniques.
We propose CMAB learning algorithms to extract optimal tilt update policies from the data.
Our policies show consistent improvements over the rule-based logging policy used to collect the data.
- Score: 68.8204255655161
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We address the problem of Remote Electrical Tilt (RET) optimization using
off-policy Contextual Multi-Armed-Bandit (CMAB) techniques. The goal in RET
optimization is to control the orientation of the vertical tilt angle of the
antenna to optimize Key Performance Indicators (KPIs) representing the Quality
of Service (QoS) perceived by the users in cellular networks. Learning an
improved tilt update policy is hard. On the one hand, coming up with a new
policy in an online manner in a real network requires exploring tilt updates
that have never been used before, and is operationally too risky. On the other
hand, devising this policy via simulations suffers from the
simulation-to-reality gap. In this paper, we circumvent these issues by
learning an improved policy in an offline manner using existing data collected
on real networks. We formulate the problem of devising such a policy using the
off-policy CMAB framework. We propose CMAB learning algorithms to extract
optimal tilt update policies from the data. We train and evaluate these
policies on real-world 4G Long Term Evolution (LTE) cellular network data. Our
policies show consistent improvements over the rule-based logging policy used
to collect the data.
Related papers
- Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning [9.341618348621662]
We aim to find the best-performing policy within a limited budget of online interactions.
We first study the major online RL exploration methods based on intrinsic rewards and UCB.
We then introduce an algorithm for planning to go out-of-distribution that avoids these issues.
arXiv Detail & Related papers (2023-10-09T13:47:05Z) - Model-based trajectory stitching for improved behavioural cloning and
its applications [7.462336024223669]
Trajectory Stitching (TS) generates new trajectories by stitching' pairs of states that were disconnected in the original data.
We demonstrate that the iterative process of replacing old trajectories with new ones incrementally improves the underlying behavioural policy.
arXiv Detail & Related papers (2022-12-08T14:18:04Z) - Offline Reinforcement Learning with Closed-Form Policy Improvement
Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning.
In this paper, we propose our closed-form policy improvement operators.
We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv Detail & Related papers (2022-11-29T06:29:26Z) - Mutual Information Regularized Offline Reinforcement Learning [76.05299071490913]
We propose a novel MISA framework to approach offline RL from the perspective of Mutual Information between States and Actions in the dataset.
We show that optimizing this lower bound is equivalent to maximizing the likelihood of a one-step improved policy on the offline dataset.
We introduce 3 different variants of MISA, and empirically demonstrate that tighter mutual information lower bound gives better offline RL performance.
arXiv Detail & Related papers (2022-10-14T03:22:43Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - Learning Optimal Antenna Tilt Control Policies: A Contextual Linear
Bandit Approach [65.27783264330711]
Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity.
We devise algorithms learning optimal tilt control policies from existing data.
We show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.
arXiv Detail & Related papers (2022-01-06T18:24:30Z) - Non-Stationary Off-Policy Optimization [50.41335279896062]
We study the novel problem of off-policy optimization in piecewise-stationary contextual bandits.
In the offline learning phase, we partition logged data into categorical latent states and learn a near-optimal sub-policy for each state.
In the online deployment phase, we adaptively switch between the learned sub-policies based on their performance.
arXiv Detail & Related papers (2020-06-15T09:16:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.