Related papers: Model-Based Learning of Whittle indices

Model-Based Learning of Whittle indices

URL: http://arxiv.org/abs/2511.20397v1
Date: Tue, 25 Nov 2025 15:21:00 GMT
Title: Model-Based Learning of Whittle indices
Authors: Joël Charles-Rebuffé, Nicolas Gast, Bruno Gaujal,
Abstract summary: We present BLINQ, a new model-based algorithm that learns the Whittle indices of an indexable, communicating and unichain Markov Decision Process (MDP)<n>BLINQ significantly outperforms existing Q-learning approaches in terms of the number of samples needed to get an accurate approximation.<n>It has a total computational cost even lower than Q-learning for any reasonably high number of samples.
Score: 5.830619388189558
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present BLINQ, a new model-based algorithm that learns the Whittle indices of an indexable, communicating and unichain Markov Decision Process (MDP). Our approach relies on building an empirical estimate of the MDP and then computing its Whittle indices using an extended version of a state-of-the-art existing algorithm. We provide a proof of convergence to the Whittle indices we want to learn as well as a bound on the time needed to learn them with arbitrary precision. Moreover, we investigate its computational complexity. Our numerical experiments suggest that BLINQ significantly outperforms existing Q-learning approaches in terms of the number of samples needed to get an accurate approximation. In addition, it has a total computational cost even lower than Q-learning for any reasonably high number of samples. These observations persist even when the Q-learning algorithms are speeded up using pre-trained neural networks to predict Q-values.

Related papers

Asymptotic Analysis of Sample-averaged Q-learning [2.2374171443798034]
This paper introduces a framework for time-varying batch-averaged Q-learning, termed sample-averaged Q-learning (SA-QL)<n>We leverage the functional central limit of the sample-averaged algorithm under mild conditions and develop a random scaling method for interval estimation.<n>This work establishes a unified theoretical foundation for sample-averaged Q-learning, providing insights into effective batch scheduling and statistical inference for RL algorithms.
arXiv Detail & Related papers (2024-10-14T17:17:19Z)
Faster Q-Learning Algorithms for Restless Bandits [0.0]
We study the Whittle index learning algorithm for restless multi-armed bandits (RMAB) We first present Q-learning algorithm and its variants -- speedy Q-learning (RMAB), generalized speedy Q-learning (G) and phase Q-learning (PhaseQL)
arXiv Detail & Related papers (2024-09-06T20:55:07Z)
Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes [3.3918638314432945]
We study the Whittle index learning algorithm for restless multi-armed bandits. We first present Q-learning algorithm with exploration policies -- epsilon-greedy, softmax, epsilon-softmax with constant stepsizes. We extend the study of Q-learning to index learning for single-armed restless bandit.
arXiv Detail & Related papers (2024-09-06T20:24:19Z)
Tabular and Deep Learning for the Whittle Index [0.2749898166276853]
We present QWI and QWINN, two reinforcement learning algorithms for learning the Whittle index for the total discounted criterion. In our main theoretical result we show that QWI converges to the real Whittle indices. For QWINN, we show that all local minima of the Bellman error are locally stable equilibria. Numerical computations show that QWI and QWINN converge faster than the standard Q -learning algorithm.
arXiv Detail & Related papers (2024-06-04T07:41:15Z)
Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets) Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z)
Quantization-aware Interval Bound Propagation for Training Certifiably Robust Quantized Neural Networks [58.195261590442406]
We study the problem of training and certifying adversarially robust quantized neural networks (QNNs) Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization. We present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs.
arXiv Detail & Related papers (2022-11-29T13:32:38Z)
A didactic approach to quantum machine learning with a single qubit [68.8204255655161]
We focus on the case of learning with a single qubit, using data re-uploading techniques. We implement the different proposed formulations in toy and real-world datasets using the qiskit quantum computing SDK.
arXiv Detail & Related papers (2022-11-23T18:25:32Z)
Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective [67.45111837188685]
Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data. We experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning.
arXiv Detail & Related papers (2022-06-16T11:44:11Z)
Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods. Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)
Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee. We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling. We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z)
Q-Learning with Differential Entropy of Q-Tables [4.221871357181261]
We conjecture that the reduction in performance during prolonged training sessions of Q-learning is caused by a loss of information. We introduce Differential Entropy of Q-tables (DE-QT) as an external information loss detector to the Q-learning algorithm.
arXiv Detail & Related papers (2020-06-26T04:37:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.