Finite-Time Analysis of Whittle Index based Q-Learning for Restless
Multi-Armed Bandits with Neural Network Function Approximation
- URL: http://arxiv.org/abs/2310.02147v1
- Date: Tue, 3 Oct 2023 15:34:21 GMT
- Title: Finite-Time Analysis of Whittle Index based Q-Learning for Restless
Multi-Armed Bandits with Neural Network Function Approximation
- Authors: Guojun Xiong, Jian Li
- Abstract summary: We present Neural-Q-Whittle, a Whittle index based Q-learning algorithm for RMAB with neural network function approximation.
Despite the empirical success of deep Q-learning, the non-asymptotic convergence rate of Neural-Q-Whittle remains unclear.
- Score: 13.30475927566957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Whittle index policy is a heuristic to the intractable restless multi-armed
bandits (RMAB) problem. Although it is provably asymptotically optimal, finding
Whittle indices remains difficult. In this paper, we present Neural-Q-Whittle,
a Whittle index based Q-learning algorithm for RMAB with neural network
function approximation, which is an example of nonlinear two-timescale
stochastic approximation with Q-function values updated on a faster timescale
and Whittle indices on a slower timescale. Despite the empirical success of
deep Q-learning, the non-asymptotic convergence rate of Neural-Q-Whittle, which
couples neural networks with two-timescale Q-learning largely remains unclear.
This paper provides a finite-time analysis of Neural-Q-Whittle, where data are
generated from a Markov chain, and Q-function is approximated by a ReLU neural
network. Our analysis leverages a Lyapunov drift approach to capture the
evolution of two coupled parameters, and the nonlinearity in value function
approximation further requires us to characterize the approximation error.
Combing these provide Neural-Q-Whittle with $\mathcal{O}(1/k^{2/3})$
convergence rate, where $k$ is the number of iterations.
Related papers
- Quantum-Enhanced Neural Contextual Bandit Algorithms [50.880384999888044]
This paper introduces the Quantum Neural Tangent Kernel-Upper Confidence Bound (QNTK-UCB) algorithm.<n>QNTK-UCB is a novel algorithm that leverages the Quantum Neural Tangent Kernel (QNTK) to address these limitations.
arXiv Detail & Related papers (2026-01-06T09:58:14Z) - Non-asymptotic convergence analysis of the stochastic gradient
Hamiltonian Monte Carlo algorithm with discontinuous stochastic gradient with
applications to training of ReLU neural networks [8.058385158111207]
We provide a non-asymptotic analysis of the convergence of the gradient Hamiltonian Monte Carlo to a target measure in Wasserstein-1 and Wasserstein-2 distance.
To illustrate our main results, we consider numerical experiments on quantile estimation and on several problems involving ReLU neural networks relevant in finance and artificial intelligence.
arXiv Detail & Related papers (2024-09-25T17:21:09Z) - Stochastic Gradient Descent for Two-layer Neural Networks [2.0349026069285423]
This paper presents a study on the convergence rates of the descent (SGD) algorithm when applied to overparameterized two-layer neural networks.
Our approach combines the Tangent Kernel (NTK) approximation with convergence analysis in the Reproducing Kernel Space (RKHS) generated by NTK.
Our research framework enables us to explore the intricate interplay between kernel methods and optimization processes, shedding light on the dynamics and convergence properties of neural networks.
arXiv Detail & Related papers (2024-07-10T13:58:57Z) - Tabular and Deep Learning for the Whittle Index [0.2749898166276853]
We present QWI and QWINN, two reinforcement learning algorithms for learning the Whittle index for the total discounted criterion.
In our main theoretical result we show that QWI converges to the real Whittle indices.
For QWINN, we show that all local minima of the Bellman error are locally stable equilibria.
Numerical computations show that QWI and QWINN converge faster than the standard Q -learning algorithm.
arXiv Detail & Related papers (2024-06-04T07:41:15Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Analyzing Convergence in Quantum Neural Networks: Deviations from Neural
Tangent Kernels [20.53302002578558]
A quantum neural network (QNN) is a parameterized mapping efficiently implementable on near-term Noisy Intermediate-Scale Quantum (NISQ) computers.
Despite the existing empirical and theoretical investigations, the convergence of QNN training is not fully understood.
arXiv Detail & Related papers (2023-03-26T22:58:06Z) - Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic [137.04558017227583]
Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years.
We take a mean-field perspective on the evolution and convergence of feature-based neural AC.
We prove that neural AC finds the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2021-12-27T06:09:50Z) - Estimation of the Mean Function of Functional Data via Deep Neural
Networks [6.230751621285321]
We propose a deep neural network method to perform nonparametric regression for functional data.
The proposed method is applied to analyze positron emission tomography images of patients with Alzheimer disease.
arXiv Detail & Related papers (2020-12-08T17:18:16Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee.
We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling.
We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.