Robust $Q$-learning Algorithm for Markov Decision Processes under Wasserstein Uncertainty
- URL: http://arxiv.org/abs/2210.00898v3
- Date: Thu, 20 Jun 2024 15:50:17 GMT
- Title: Robust $Q$-learning Algorithm for Markov Decision Processes under Wasserstein Uncertainty
- Authors: Ariel Neufeld, Julian Sester,
- Abstract summary: We present a novel $Q$-learning algorithm tailored to solve distributionally robust Markov decision problems.
We prove convergence of the presented algorithm as well as the benefits of considering distributional robustness when solving optimal control problems.
- Score: 5.639904484784127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel $Q$-learning algorithm tailored to solve distributionally robust Markov decision problems where the corresponding ambiguity set of transition probabilities for the underlying Markov decision process is a Wasserstein ball around a (possibly estimated) reference measure. We prove convergence of the presented algorithm and provide several examples also using real data to illustrate both the tractability of our algorithm as well as the benefits of considering distributional robustness when solving stochastic optimal control problems, in particular when the estimated distributions turn out to be misspecified in practice.
Related papers
- Deep Learning Methods for S Shaped Utility Maximisation with a Random Reference Point [0.0]
We develop several numerical methods for solving the problem using deep learning and duality methods.
We use deep learning methods to solve the associated Hamilton-Jacobi-Bellman equation for both the primal and dual problems.
We compare the solution of this non-concave problem to that of concavified utility, a random function depending on the benchmark, in both complete and incomplete markets.
arXiv Detail & Related papers (2024-10-07T22:07:59Z) - Robust Q-Learning for finite ambiguity sets [2.3020018305241337]
We propose a novel $Q$-learning algorithm allowing to solve distributionally robust Markov decision problems.
Our approach goes beyond the well-studied cases involving ambiguity sets of balls around some reference measure.
arXiv Detail & Related papers (2024-07-05T05:19:36Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Bi-objective Ranking and Selection Using Stochastic Kriging [0.0]
We consider bi-objective ranking and selection problems in which the two objective outcomes have been observed with uncertainty.
We propose a novel Bayesian bi-objective ranking and selection method that sequentially allocates extra samples to competitive solutions.
Experimental results show that the proposed method outperforms the standard allocation method, as well as a well-known state-of-the-art algorithm.
arXiv Detail & Related papers (2022-09-05T23:51:07Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Machine Unlearning via Algorithmic Stability [31.809738694273623]
We study the problem of machine unlearning and identify a notion of Total Variation (TV) stability.
For convex risk minimization problems, we design TV-stable algorithms based on noisy Gradient Descent (SGD)
Our techniques to generalize our algorithms are differentially private as well.
arXiv Detail & Related papers (2021-02-25T21:16:56Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Approximating Euclidean by Imprecise Markov Decision Processes [3.0017241250121383]
We investigate what kind of approximation guarantees are obtained when the Euclidean process is approximated by finite state approximations.
We show that for cost functions over finite time horizons the approximations become arbitrarily precise.
arXiv Detail & Related papers (2020-06-26T11:58:04Z) - Stochastic Saddle-Point Optimization for Wasserstein Barycenters [69.68068088508505]
We consider the populationimation barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data.
We employ the structure of the problem and obtain a convex-concave saddle-point reformulation of this problem.
In the setting when the distribution of random probability measures is discrete, we propose an optimization algorithm and estimate its complexity.
arXiv Detail & Related papers (2020-06-11T19:40:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.