Tabular and Deep Learning for the Whittle Index
- URL: http://arxiv.org/abs/2406.02057v1
- Date: Tue, 4 Jun 2024 07:41:15 GMT
- Title: Tabular and Deep Learning for the Whittle Index
- Authors: Francisco Robledo RelaƱo, Vivek Borkar, Urtzi Ayesta, Konstantin Avrachenkov,
- Abstract summary: We present QWI and QWINN, two reinforcement learning algorithms for learning the Whittle index for the total discounted criterion.
In our main theoretical result we show that QWI converges to the real Whittle indices.
For QWINN, we show that all local minima of the Bellman error are locally stable equilibria.
Numerical computations show that QWI and QWINN converge faster than the standard Q -learning algorithm.
- Score: 0.2749898166276853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Whittle index policy is a heuristic that has shown remarkably good performance (with guaranteed asymptotic optimality) when applied to the class of problems known as Restless Multi-Armed Bandit Problems (RMABPs). In this paper we present QWI and QWINN, two reinforcement learning algorithms, respectively tabular and deep, to learn the Whittle index for the total discounted criterion. The key feature is the use of two time-scales, a faster one to update the state-action Q -values, and a relatively slower one to update the Whittle indices. In our main theoretical result we show that QWI, which is a tabular implementation, converges to the real Whittle indices. We then present QWINN, an adaptation of QWI algorithm using neural networks to compute the Q -values on the faster time-scale, which is able to extrapolate information from one state to another and scales naturally to large state-space environments. For QWINN, we show that all local minima of the Bellman error are locally stable equilibria, which is the first result of its kind for DQN-based schemes. Numerical computations show that QWI and QWINN converge faster than the standard Q -learning algorithm, neural-network based approximate Q-learning and other state of the art algorithms.
Related papers
- Model-Based Learning of Whittle indices [5.830619388189558]
We present BLINQ, a new model-based algorithm that learns the Whittle indices of an indexable, communicating and unichain Markov Decision Process (MDP)<n>BLINQ significantly outperforms existing Q-learning approaches in terms of the number of samples needed to get an accurate approximation.<n>It has a total computational cost even lower than Q-learning for any reasonably high number of samples.
arXiv Detail & Related papers (2025-11-25T15:21:00Z) - HiQ-Lip: The First Quantum-Classical Hierarchical Method for Global Lipschitz Constant Estimation of ReLU Networks [2.44505480142099]
Estimating the global Lipschitz constant of neural networks is crucial for understanding and improving their robustness and generalization capabilities.
We propose textbfHiQ-Lip, a hybrid quantum-classical hierarchical method that leverages Coherent Ising Machines (CIMs) to estimate the global Lipschitz constant.
arXiv Detail & Related papers (2025-03-20T16:58:40Z) - Faster Q-Learning Algorithms for Restless Bandits [0.0]
We study the Whittle index learning algorithm for restless multi-armed bandits (RMAB)
We first present Q-learning algorithm and its variants -- speedy Q-learning (RMAB), generalized speedy Q-learning (G) and phase Q-learning (PhaseQL)
arXiv Detail & Related papers (2024-09-06T20:55:07Z) - Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes [3.3918638314432945]
We study the Whittle index learning algorithm for restless multi-armed bandits.
We first present Q-learning algorithm with exploration policies -- epsilon-greedy, softmax, epsilon-softmax with constant stepsizes.
We extend the study of Q-learning to index learning for single-armed restless bandit.
arXiv Detail & Related papers (2024-09-06T20:24:19Z) - Time Elastic Neural Networks [2.1756081703276]
We introduce and detail an atypical neural network architecture, called time elastic neural network (teNN)
The novelty compared to classical neural network architecture is that it explicitly incorporates time warping ability.
We demonstrate that, during the training process, the teNN succeeds in reducing the number of neurons required within each cell.
arXiv Detail & Related papers (2024-05-27T09:01:30Z) - Finite-Time Analysis of Whittle Index based Q-Learning for Restless
Multi-Armed Bandits with Neural Network Function Approximation [13.30475927566957]
We present Neural-Q-Whittle, a Whittle index based Q-learning algorithm for RMAB with neural network function approximation.
Despite the empirical success of deep Q-learning, the non-asymptotic convergence rate of Neural-Q-Whittle remains unclear.
arXiv Detail & Related papers (2023-10-03T15:34:21Z) - Problem-Dependent Power of Quantum Neural Networks on Multi-Class
Classification [83.20479832949069]
Quantum neural networks (QNNs) have become an important tool for understanding the physical world, but their advantages and limitations are not fully understood.
Here we investigate the problem-dependent power of QCs on multi-class classification tasks.
Our work sheds light on the problem-dependent power of QNNs and offers a practical tool for evaluating their potential merit.
arXiv Detail & Related papers (2022-12-29T10:46:40Z) - A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z) - On the Global Convergence of Fitted Q-Iteration with Two-layer Neural
Network Parametrization [33.12181620473604]
We study a Fitted Q-Iteration with two-layer ReLU neural network parametrization, and find the sample complexity guarantees for the algorithm.
We show that this approach achieves a sample complexity of $tildemathcalO (1/epsilon2)$, which is order-optimal.
arXiv Detail & Related papers (2022-11-14T19:00:24Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee.
We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling.
We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.