Deep Reinforcement Learning for mmWave Initial Beam Alignment
- URL: http://arxiv.org/abs/2302.08969v1
- Date: Fri, 17 Feb 2023 16:10:42 GMT
- Title: Deep Reinforcement Learning for mmWave Initial Beam Alignment
- Authors: Daniel Tandler, Sebastian D\"orner, Marc Gauger, Stephan ten Brink
- Abstract summary: We investigate the applicability of deep reinforcement learning algorithms to the adaptive initial access beam alignment problem for mmWave communications.
Deep reinforcement learning has the potential to address a new and wider range of applications.
We show that, although the chosen off-the-shelf deep reinforcement learning agent fails to perform well when trained on realistic problem sizes, introducing action space shaping in the form of beamforming modules vastly improves the performance.
- Score: 6.240268911509346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the applicability of deep reinforcement learning algorithms to
the adaptive initial access beam alignment problem for mmWave communications
using the state-of-the-art proximal policy optimization algorithm as an
example. In comparison to recent unsupervised learning based approaches
developed to tackle this problem, deep reinforcement learning has the potential
to address a new and wider range of applications, since, in principle, no
(differentiable) model of the channel and/or the whole system is required for
training, and only agent-environment interactions are necessary to learn an
algorithm (be it online or using a recorded dataset). We show that, although
the chosen off-the-shelf deep reinforcement learning agent fails to perform
well when trained on realistic problem sizes, introducing action space shaping
in the form of beamforming modules vastly improves the performance, without
sacrificing much generalizability. Using this add-on, the agent is able to
deliver competitive performance to various state-of-the-art methods on
simulated environments, even under realistic problem sizes. This demonstrates
that through well-directed modification, deep reinforcement learning may have a
chance to compete with other approaches in this area, opening up many
straightforward extensions to other/similar scenarios.
Related papers
- Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Local Methods with Adaptivity via Scaling [38.99428012275441]
This paper aims to merge the local training technique with the adaptive approach to develop efficient distributed learning methods.
We consider the classical Local SGD method and enhance it with a scaling feature.
In addition to theoretical analysis, we validate the performance of our methods in practice by training a neural network.
arXiv Detail & Related papers (2024-06-02T19:50:05Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Generalization of Deep Reinforcement Learning for Jammer-Resilient
Frequency and Power Allocation [4.436632973105495]
We tackle the problem of joint frequency and power allocation while emphasizing the generalization capability of a deep reinforcement learning model.
We show the improved training and inference performance of the proposed methods when tested on previously unseen simulated wireless networks.
The end-to-end solution was implemented on the embedded software-defined radio and validated using over-the-air evaluation.
arXiv Detail & Related papers (2023-02-04T22:15:32Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Sample-Efficient, Exploration-Based Policy Optimisation for Routing
Problems [2.6782615615913348]
This paper presents a new reinforcement learning approach that is based on entropy.
In addition, we design an off-policy-based reinforcement learning technique that maximises the expected return.
We show that our model can generalise to various route problems.
arXiv Detail & Related papers (2022-05-31T09:51:48Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Unbiased Deep Reinforcement Learning: A General Training Framework for
Existing and Future Algorithms [3.7050607140679026]
We propose a novel training framework that is conceptually comprehensible and potentially easy to be generalized to all feasible algorithms for reinforcement learning.
We employ Monte-carlo sampling to achieve raw data inputs, and train them in batch to achieve Markov decision process sequences.
We propose several algorithms embedded with our new framework to deal with typical discrete and continuous scenarios.
arXiv Detail & Related papers (2020-05-12T01:51:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.