Distributional Robustness and Regularization in Reinforcement Learning
- URL: http://arxiv.org/abs/2003.02894v2
- Date: Tue, 14 Jul 2020 06:01:03 GMT
- Title: Distributional Robustness and Regularization in Reinforcement Learning
- Authors: Esther Derman and Shie Mannor
- Abstract summary: We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
- Score: 62.23012916708608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributionally Robust Optimization (DRO) has enabled to prove the
equivalence between robustness and regularization in classification and
regression, thus providing an analytical reason why regularization generalizes
well in statistical learning. Although DRO's extension to sequential
decision-making overcomes $\textit{external uncertainty}$ through the robust
Markov Decision Process (MDP) setting, the resulting formulation is hard to
solve, especially on large domains. On the other hand, existing regularization
methods in reinforcement learning only address $\textit{internal uncertainty}$
due to stochasticity. Our study aims to facilitate robust reinforcement
learning by establishing a dual relation between robust MDPs and
regularization. We introduce Wasserstein distributionally robust MDPs and prove
that they hold out-of-sample performance guarantees. Then, we introduce a new
regularizer for empirical value functions and show that it lower bounds the
Wasserstein distributionally robust value function. We extend the result to
linear value function approximation for large state spaces. Our approach
provides an alternative formulation of robustness with guaranteed finite-sample
performance. Moreover, it suggests using regularization as a practical tool for
dealing with $\textit{external uncertainty}$ in reinforcement learning methods.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation [93.38604803625294]
We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG)
We use Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks.
UncertaintyRAG outperforms baselines by 2.03% on LLaMA-2-7B, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-03T17:39:38Z) - Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes [37.15580574143281]
offline reinforcement learning (RL)
This paper considers the sample complexity of distributionally robust linear Markov decision processes (MDPs) with an uncertainty set characterized by the total variation distance using offline data.
We develop a pessimistic model-based algorithm and establish its sample complexity bound under minimal data coverage assumptions.
arXiv Detail & Related papers (2024-03-19T17:48:42Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Federated Distributionally Robust Optimization with Non-Convex
Objectives: Algorithm and Analysis [24.64654924173679]
Asynchronous distributed algorithm named Asynchronous Single-looP alternatIve gRadient projEction is proposed.
New uncertainty set, i.e., constrained D-norm uncertainty set, is developed to leverage the prior distribution and flexibly control the degree of robustness.
empirical studies on real-world datasets demonstrate that the proposed method can not only achieve fast convergence, but also remain robust against data as well as malicious attacks.
arXiv Detail & Related papers (2023-07-25T01:56:57Z) - Robust Reinforcement Learning in Continuous Control Tasks with
Uncertainty Set Regularization [17.322284328945194]
Reinforcement learning (RL) is recognized as lacking generalization and robustness under environmental perturbations.
We propose a new regularizer named $textbfU$ncertainty $textbfS$et $textbfR$egularizer (USR)
arXiv Detail & Related papers (2022-07-05T12:56:08Z) - Fast Distributionally Robust Learning with Variance Reduced Min-Max
Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications.
Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients.
We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z) - From Majorization to Interpolation: Distributionally Robust Learning
using Kernel Smoothing [1.2891210250935146]
We study the function approximation aspect of distributionally robust optimization (DRO) based on probability metrics.
This paper instead proposes robust learning algorithms based on smooth function approximation and convolution.
arXiv Detail & Related papers (2021-02-16T22:25:18Z) - Distributional Robustness with IPMs and links to Regularization and GANs [10.863536797169148]
We study robustness via divergence-based uncertainty sets in machine learning.
We extend our results to shed light on adversarial generative modelling via $f$-GANs.
arXiv Detail & Related papers (2020-06-08T04:41:29Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.