Related papers: Physics-Informed Parametric Bandits for Beam Alignment in mmWave Communications

Physics-Informed Parametric Bandits for Beam Alignment in mmWave Communications

URL: http://arxiv.org/abs/2510.18299v1
Date: Tue, 21 Oct 2025 05:07:07 GMT
Title: Physics-Informed Parametric Bandits for Beam Alignment in mmWave Communications
Authors: Hao Qin, Thang Duong, Ming Li, Chicheng Zhang,
Abstract summary: In millimeter wave (mmWave) communications, beam alignment and tracking are crucial to combat the significant path loss.<n>We propose two physics-informed bandit algorithms textitpretc and textitprgreedy that exploit the sparse multipath property of mmWave channels.<n>Our algorithms treat the parameters of each path as black boxes and maintain optimal estimates of them based on sampled historical rewards.
Score: 19.693175084764
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In millimeter wave (mmWave) communications, beam alignment and tracking are crucial to combat the significant path loss. As scanning the entire directional space is inefficient, designing an efficient and robust method to identify the optimal beam directions is essential. Since traditional bandit algorithms require a long time horizon to converge under large beam spaces, many existing works propose efficient bandit algorithms for beam alignment by relying on unimodality or multimodality assumptions on the reward function's structure. However, such assumptions often do not hold (or cannot be strictly satisfied) in practice, which causes such algorithms to converge to choosing suboptimal beams. In this work, we propose two physics-informed bandit algorithms \textit{pretc} and \textit{prgreedy} that exploit the sparse multipath property of mmWave channels - a generic but realistic assumption - which is connected to the Phase Retrieval Bandit problem. Our algorithms treat the parameters of each path as black boxes and maintain optimal estimates of them based on sampled historical rewards. \textit{pretc} starts with a random exploration phase and then commits to the optimal beam under the estimated reward function. \textit{prgreedy} performs such estimation in an online manner and chooses the best beam under current estimates. Our algorithms can also be easily adapted to beam tracking in the mobile setting. Through experiments using both the synthetic DeepMIMO dataset and the real-world DeepSense6G dataset, we demonstrate that both algorithms outperform existing approaches in a wide range of scenarios across diverse channel environments, showing their generalizability and robustness.

Related papers

Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update [70.38810219913593]
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function.<n>GLBs are widely applicable to real-world scenarios, but their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency.<n>We propose a jointly efficient algorithm that attains a nearly optimal regret bound with $mathcalO(1)$ time and space complexities per round.
arXiv Detail & Related papers (2025-07-16T02:24:21Z)
HoloBeam: Learning Optimal Beamforming in Far-Field Holographic Metasurface Transceivers [5.402030962296633]
Holographic Metasurface Transceivers (HMTs) are emerging as cost-effective substitutes to large antenna arrays for beamforming in Millimeter and TeraHertz wave communication. To achieve desired channel gains through beamforming in HMT, phase-shifts of a large number of elements need to be appropriately set, which is challenging. We develop a learning algorithm using a it fixed-budget multi-armed bandit framework to beamform and maximize received signal strength at the receiver for far-field regions.
arXiv Detail & Related papers (2023-12-30T03:29:32Z)
Equivariant Deep Weight Space Alignment [54.65847470115314]
We propose a novel framework aimed at learning to solve the weight alignment problem. We first prove that weight alignment adheres to two fundamental symmetries and then, propose a deep architecture that respects these symmetries.
arXiv Detail & Related papers (2023-10-20T10:12:06Z)
Implicitly normalized forecaster with clipping for linear and non-linear heavy-tailed multi-armed bandits [85.27420062094086]
Implicitly Normalized Forecaster (INF) is considered an optimal solution for adversarial multi-armed bandit (MAB) problems. We propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INFclip) for MAB problems with heavy-tailed settings. We demonstrate that INFclip is optimal for linear heavy-tailed MAB problems and works well for non-linear ones.
arXiv Detail & Related papers (2023-05-11T12:00:43Z)
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience. We propose the first online continuous hyperparameter tuning framework for contextual bandits. We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z)
UB3: Best Beam Identification in Millimeter Wave Systems via Pure Exploration Unimodal Bandits [7.253481390411171]
We develop an algorithm that exploits the unimodal structure of the received signal strengths of the beams to identify the best beam in a finite time. Our algorithm is named Unimodal Bandit for Best Beam (UB3) and identifies the best beam with a high probability in a few rounds.
arXiv Detail & Related papers (2022-12-26T09:24:22Z)
Fast Beam Alignment via Pure Exploration in Multi-armed Bandits [91.11360914335384]
We develop a bandit-based fast BA algorithm to reduce BA latency for millimeter-wave (mmWave) communications. Our algorithm is named Two-Phase Heteroscedastic Track-and-Stop (2PHT&S)
arXiv Detail & Related papers (2022-10-23T05:57:39Z)
Efficient Beam Search for Initial Access Using Collaborative Filtering [1.496194593196997]
Beamforming-capable antenna arrays overcome the high free-space path loss at higher carrier frequencies. The beams must be properly aligned to ensure that the highest power is radiated towards (and received by) the user equipment (UE)
arXiv Detail & Related papers (2022-09-14T14:25:56Z)
Bayesian Optimization-Based Beam Alignment for MmWave MIMO Communication Systems [1.7467279441152421]
beam alignment (BA) is a critical issue in millimeter wave communication (mmWave) We present a novel beam alignment scheme on the basis of a machine learning strategy, Bayesian optimization (BO) In this work, we consider the beam alignment issue to be a black box function and then use BO to find the possible optimal beam pair.
arXiv Detail & Related papers (2022-07-28T15:37:49Z)
Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting. We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z)
Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms [74.55200180156906]
The contextual bandit problem models the trade-off between exploration and exploitation. We show our Syndicated Bandits framework can achieve the optimal regret upper bounds.
arXiv Detail & Related papers (2021-06-05T22:30:21Z)
Deep Learning-based Compressive Beam Alignment in mmWave Vehicular Systems [75.77033270838926]
vehicular channels exhibit structure that can be exploited for beam alignment with fewer channel measurements. We propose a deep learning-based technique to design a structured compressed sensing (CS) matrix.
arXiv Detail & Related papers (2021-02-27T04:38:12Z)
Beamforming Learning for mmWave Communication: Theory and Experimental Validation [23.17604790640996]
We propose a beam design technique that reduces the search time and does not require CSI while guaranteeing a minimum beamforming gain. We evaluate the efficacy of the proposed scheme in terms of building the codebook and assessing its performance through real-life measurements.
arXiv Detail & Related papers (2019-12-28T05:46:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.