Model-Free Robust Average-Reward Reinforcement Learning
- URL: http://arxiv.org/abs/2305.10504v1
- Date: Wed, 17 May 2023 18:19:23 GMT
- Title: Model-Free Robust Average-Reward Reinforcement Learning
- Authors: Yue Wang, Alvaro Velasquez, George Atia, Ashley Prater-Bennette,
Shaofeng Zou
- Abstract summary: We focus on the robust average-reward MDPs under the model-free iteration setting.
We design two model-free algorithms, robust relative value (RVI) TD and robust RVI Q-learning, and theoretically prove their convergence to the optimal solution.
- Score: 25.125481838479256
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robust Markov decision processes (MDPs) address the challenge of model
uncertainty by optimizing the worst-case performance over an uncertainty set of
MDPs. In this paper, we focus on the robust average-reward MDPs under the
model-free setting. We first theoretically characterize the structure of
solutions to the robust average-reward Bellman equation, which is essential for
our later convergence analysis. We then design two model-free algorithms,
robust relative value iteration (RVI) TD and robust RVI Q-learning, and
theoretically prove their convergence to the optimal solution. We provide
several widely used uncertainty sets as examples, including those defined by
the contamination model, total variation, Chi-squared divergence,
Kullback-Leibler (KL) divergence and Wasserstein distance.
Related papers
- Annealed Stein Variational Gradient Descent for Improved Uncertainty Estimation in Full-Waveform Inversion [25.714206592953545]
Variational Inference (VI) provides an approximate solution to the posterior distribution in the form of a parametric or non-parametric proposal distribution.
This study aims to improve the performance of VI within the context of Full-Waveform Inversion.
arXiv Detail & Related papers (2024-10-17T06:15:26Z) - On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes [11.868402302316131]
This paper analyzes reinforcement learning (RL) algorithms for Markov decision processes (MDPs) under the average-reward criterion.
We focus on Q-learning algorithms based on relative value (RVI), which are model-free sets of the iteration RVI method for weakly communicating MDPs.
arXiv Detail & Related papers (2024-08-29T04:57:44Z) - Total Uncertainty Quantification in Inverse PDE Solutions Obtained with Reduced-Order Deep Learning Surrogate Models [50.90868087591973]
We propose an approximate Bayesian method for quantifying the total uncertainty in inverse PDE solutions obtained with machine learning surrogate models.
We test the proposed framework by comparing it with the iterative ensemble smoother and deep ensembling methods for a non-linear diffusion equation.
arXiv Detail & Related papers (2024-08-20T19:06:02Z) - The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model [61.87673435273466]
This paper investigates model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice.
We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimize the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP.
arXiv Detail & Related papers (2023-05-26T02:32:03Z) - Sample Complexity of Robust Reinforcement Learning with a Generative
Model [0.0]
We propose a model-based reinforcement learning (RL) algorithm for learning an $epsilon$-optimal robust policy.
We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence.
In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies.
arXiv Detail & Related papers (2021-12-02T18:55:51Z) - Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution.
This approach poses a number of implementation and optimization challenges.
We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Wasserstein Distributionally Robust Inverse Multiobjective Optimization [14.366265951396587]
We develop a distributionally robust inverse multiobjective optimization problem (WRO-IMOP)
We show that the excess risk of the WRO-IMOP estimator has a sub-linear convergence rate.
We demonstrate the effectiveness of our method on both a synthetic multiobjective quadratic program and a real world portfolio optimization problem.
arXiv Detail & Related papers (2020-09-30T10:44:07Z) - Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large.
Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.