Robust Q-Learning for finite ambiguity sets
- URL: http://arxiv.org/abs/2407.04259v1
- Date: Fri, 5 Jul 2024 05:19:36 GMT
- Title: Robust Q-Learning for finite ambiguity sets
- Authors: Cécile Decker, Julian Sester,
- Abstract summary: We propose a novel $Q$-learning algorithm allowing to solve distributionally robust Markov decision problems.
Our approach goes beyond the well-studied cases involving ambiguity sets of balls around some reference measure.
- Score: 2.3020018305241337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we propose a novel $Q$-learning algorithm allowing to solve distributionally robust Markov decision problems for which the ambiguity set of probability measures can be chosen arbitrarily as long as it comprises only a finite amount of measures. Therefore, our approach goes beyond the well-studied cases involving ambiguity sets of balls around some reference measure with the distance to reference measure being measured with respect to the Wasserstein distance or the Kullback--Leibler divergence. Hence, our approach allows the applicant to create ambiguity sets better tailored to her needs and to solve the associated robust Markov decision problem via a $Q$-learning algorithm whose convergence is guaranteed by our main result. Moreover, we showcase in several numerical experiments the tractability of our approach.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - On Bellman's principle of optimality and Reinforcement learning for
safety-constrained Markov decision process [0.0]
We study optimality for the safety-constrained Markov decision process which is the underlying framework for safe reinforcement learning.
We construct a modified $Q$-learning algorithm for learning the Lagrangian from data.
arXiv Detail & Related papers (2023-02-25T20:36:41Z) - Robust $Q$-learning Algorithm for Markov Decision Processes under Wasserstein Uncertainty [5.639904484784127]
We present a novel $Q$-learning algorithm tailored to solve distributionally robust Markov decision problems.
We prove convergence of the presented algorithm as well as the benefits of considering distributional robustness when solving optimal control problems.
arXiv Detail & Related papers (2022-09-30T10:01:04Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Approximating Euclidean by Imprecise Markov Decision Processes [3.0017241250121383]
We investigate what kind of approximation guarantees are obtained when the Euclidean process is approximated by finite state approximations.
We show that for cost functions over finite time horizons the approximations become arbitrarily precise.
arXiv Detail & Related papers (2020-06-26T11:58:04Z) - High-Dimensional Robust Mean Estimation via Gradient Descent [73.61354272612752]
We show that the problem of robust mean estimation in the presence of a constant adversarial fraction can be solved by gradient descent.
Our work establishes an intriguing connection between the near non-lemma estimation and robust statistics.
arXiv Detail & Related papers (2020-05-04T10:48:04Z) - Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence
Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian
Rewards [10.66048003460524]
We study an extension of the classic multi-armed bandit problem which involves multiple plays and Markovian rewards in the rested bandits setting.
In order to tackle this problem we consider an adaptive allocation rule which at each stage combines the information from the sample means of all the arms, with the Kullback-Leibler upper confidence bound of a single arm which is selected in round-robin way.
arXiv Detail & Related papers (2020-01-30T08:09:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.