Related papers: Online Robust Reinforcement Learning with Model Uncertainty

Online Robust Reinforcement Learning with Model Uncertainty

URL: http://arxiv.org/abs/2109.14523v1
Date: Wed, 29 Sep 2021 16:17:47 GMT
Title: Online Robust Reinforcement Learning with Model Uncertainty
Authors: Yue Wang, Shaofeng Zou
Abstract summary: We develop a sample-based approach to estimate the unknown uncertainty set and design a robust Q-learning and robust TDC algorithm. For the robust Q-learning algorithm, we prove that it converges to the optimal robust Q function, and for the robust TDC algorithm, we prove that it converges to some stationary points. Our approach can be readily extended to robustify many other algorithms, e.g., TD, SARSA, and other GTD algorithms.
Score: 24.892994430374912
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust RL, where the uncertainty set is defined to be centering at a misspecified MDP that generates a single sample trajectory sequentially and is assumed to be unknown. We develop a sample-based approach to estimate the unknown uncertainty set and design a robust Q-learning algorithm (tabular case) and robust TDC algorithm (function approximation setting), which can be implemented in an online and incremental fashion. For the robust Q-learning algorithm, we prove that it converges to the optimal robust Q function, and for the robust TDC algorithm, we prove that it converges asymptotically to some stationary points. Unlike the results in [Roy et al., 2017], our algorithms do not need any additional conditions on the discount factor to guarantee the convergence. We further characterize the finite-time error bounds of the two algorithms and show that both the robust Q-learning and robust TDC algorithms converge as fast as their vanilla counterparts(within a constant factor). Our numerical experiments further demonstrate the robustness of our algorithms. Our approach can be readily extended to robustify many other algorithms, e.g., TD, SARSA, and other GTD algorithms.

Related papers

A Robust Algorithm for Non-IID Machine Learning Problems with Convergence Analysis [2.4462606119036456]
We propose an improved numerical algorithm for solving minimax problems based on nonsmooth optimization, quadratic programming and iterative process.<n>Such an algorithm can be widely applied in various fields such as robust optimization, imbalanced learning, etc.
arXiv Detail & Related papers (2025-07-01T14:41:59Z)
Learning based convex approximation for constrained parametric optimization [11.379408842026981]
We propose an input neural network (ICNN)-based self-supervised learning framework to solve constrained optimization problems.<n>We provide rigorous convergence analysis, showing that the framework converges to a Karush-Kuhn-Tucker (KKT) approximation point of the original problem.<n>Our approach achieves a superior balance among accuracy, feasibility, and computational efficiency.
arXiv Detail & Related papers (2025-05-07T00:33:14Z)
Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis [30.713243690224207]
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees.
arXiv Detail & Related papers (2024-10-31T16:53:20Z)
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs) This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z)
Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets [10.26382228865201]
We propose policy gradient algorithms for robust infinite-horizon Markov decision processes (MDPs) with non-rectangular uncertainty sets. The corresponding robust MDPs cannot be solved with dynamic programming techniques and are in fact provably intractable. We thus present the first complete solution scheme for robust MDPs with non-rectangular uncertainty sets offering global optimality guarantees.
arXiv Detail & Related papers (2023-05-30T13:02:25Z)
Stochastic Ratios Tracking Algorithm for Large Scale Machine Learning Problems [0.7614628596146599]
We propose a novel algorithm for adaptive step length selection in the classical SGD framework. Under reasonable conditions, the algorithm produces step lengths in line with well-established theoretical requirements. We show that the algorithm can generate step lengths comparable to the best step length obtained from manual tuning.
arXiv Detail & Related papers (2023-05-17T06:22:11Z)
On Practical Robust Reinforcement Learning: Practical Uncertainty Set and Double-Agent Algorithm [11.748284119769039]
Robust reinforcement learning (RRL) aims at seeking a robust policy to optimize the worst case performance over an uncertainty set of Markov decision processes (MDPs)
arXiv Detail & Related papers (2023-05-11T08:52:09Z)
Fully Stochastic Trust-Region Sequential Quadratic Programming for Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints. The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z)
Instance-Dependent Confidence and Early Stopping for Reinforcement Learning [99.57168572237421]
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. This research provides guarantees that explain textitex post the performance differences observed. A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice.
arXiv Detail & Related papers (2022-01-21T04:25:35Z)
Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
A Dynamical Systems Approach for Convergence of the Bayesian EM Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based. The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM) We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z)
Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.