Related papers: Online Learning of Whittle Indices for Restless Bandits with Non-Stationary Transition Kernels

Online Learning of Whittle Indices for Restless Bandits with Non-Stationary Transition Kernels

URL: http://arxiv.org/abs/2506.18186v1
Date: Sun, 22 Jun 2025 22:04:52 GMT
Title: Online Learning of Whittle Indices for Restless Bandits with Non-Stationary Transition Kernels
Authors: Md Kamran Chowdhury Shisher, Vishrant Tripathi, Mung Chiang, Christopher G. Brinton,
Abstract summary: We consider optimal resource allocation for restless multi-armed bandits (RMABs) in unknown, non-stationary settings.<n>We propose an online learning algorithm for Whittle indices in this setting.<n>Our algorithm achieves superior performance in terms of lowest cumulative regret relative to baselines in non-stationary environments.
Score: 15.044145268931624
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider optimal resource allocation for restless multi-armed bandits (RMABs) in unknown, non-stationary settings. RMABs are PSPACE-hard to solve optimally, even when all parameters are known. The Whittle index policy is known to achieve asymptotic optimality for a large class of such problems, while remaining computationally efficient. In many practical settings, however, the transition kernels required to compute the Whittle index are unknown and non-stationary. In this work, we propose an online learning algorithm for Whittle indices in this setting. Our algorithm first predicts current transition kernels by solving a linear optimization problem based on upper confidence bounds and empirical transition probabilities calculated from data over a sliding window. Then, it computes the Whittle index associated with the predicted transition kernels. We design these sliding windows and upper confidence bounds to guarantee sub-linear dynamic regret on the number of episodes $T$, under the condition that transition kernels change slowly over time (rate upper bounded by $\epsilon=1/T^k$ with $k>0$). Furthermore, our proposed algorithm and regret analysis are designed to exploit prior domain knowledge and structural information of the RMABs to accelerate the learning process. Numerical results validate that our algorithm achieves superior performance in terms of lowest cumulative regret relative to baselines in non-stationary environments.

Related papers

Near-Optimal Algorithm for Non-Stationary Kernelized Bandits [6.379833644595456]
We study a non-stationary kernelized bandit (KB) problem, also called time-varying Bayesian optimization. We show the first algorithm-independent regret lower bound for non-stationary KB with squared exponential and Mat'ern kernels. We propose a novel near-optimal algorithm called restarting phased elimination with random permutation.
arXiv Detail & Related papers (2024-10-21T14:28:26Z)
Efficient Convex Algorithms for Universal Kernel Learning [46.573275307034336]
An ideal set of kernels should: admit a linear parameterization (for tractability); dense in the set of all kernels (for accuracy) Previous algorithms for optimization of kernels were limited to classification and relied on computationally complex Semidefinite Programming (SDP) algorithms. We propose a SVD-QCQPQP algorithm which dramatically reduces the computational complexity as compared with previous SDP-based approaches.
arXiv Detail & Related papers (2023-04-15T04:57:37Z)
Accelerated First-Order Optimization under Nonlinear Constraints [61.98523595657983]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.<n>An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities. We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability. The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z)
An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge [23.890686553141798]
We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. Our algorithm enjoys a tighter dynamic regret bound than previous work on the non-stationary kernel bandit setting.
arXiv Detail & Related papers (2022-05-29T21:32:53Z)
Large-scale Optimization of Partial AUC in a Range of False Positive Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning. We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique. Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z)
Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD) We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning. Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms. We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z)
Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithms [0.0]
We offer on an appropriately constructed gradient algorithm based on problematic Lange dynamics (SGLD) We also provide nonasymptotic analysis of the use of new algorithm's convergence properties. The roots of the TUSLA algorithm are based on the taming processes with developed coefficients citettamed-euler.
arXiv Detail & Related papers (2020-06-25T16:06:22Z)
Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting [31.60239268539764]
PFSGD algorithms do not require setting learning rates while achieving optimal theoretical performance. In this paper, we close the empirical gap with a new parameter-free algorithm based on continuous-time Coin-Betting on truncated models. We show empirically that this new parameter-free algorithm outperforms algorithms with the "best default" learning rates and almost matches the performance of finely tuned baselines without anything to tune.
arXiv Detail & Related papers (2020-06-12T23:10:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.