Soft-Robust Algorithms for Batch Reinforcement Learning
- URL: http://arxiv.org/abs/2011.14495v2
- Date: Fri, 26 Feb 2021 17:46:32 GMT
- Title: Soft-Robust Algorithms for Batch Reinforcement Learning
- Authors: Elita A. Lobo, Mohammad Ghavamzadeh, Marek Petrik
- Abstract summary: In reinforcement learning, robust decision-making problems with limited data are usually computed by the percentile criterion.
We show that the percentile criterion is non- theoretical as it is difficult to optimize and ignores the mean performance.
We propose and analyze two algorithms to approximately optimize the percentile criterion.
- Score: 36.78967245470449
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In reinforcement learning, robust policies for high-stakes decision-making
problems with limited data are usually computed by optimizing the percentile
criterion, which minimizes the probability of a catastrophic failure.
Unfortunately, such policies are typically overly conservative as the
percentile criterion is non-convex, difficult to optimize, and ignores the mean
performance. To overcome these shortcomings, we study the soft-robust
criterion, which uses risk measures to balance the mean and percentile
criterion better. In this paper, we establish the soft-robust criterion's
fundamental properties, show that it is NP-hard to optimize, and propose and
analyze two algorithms to approximately optimize it. Our theoretical analyses
and empirical evaluations demonstrate that our algorithms compute much less
conservative solutions than the existing approximate methods for optimizing the
percentile-criterion.
Related papers
- Percentile Criterion Optimization in Offline Reinforcement Learning [22.42041973113997]
We propose a novel Value-at-Risk based dynamic programming algorithm to optimize the percentile criterion without explicitly constructing any ambiguity.
Our theoretical and empirical results show that our results implicitly learn robust conservative policies.
arXiv Detail & Related papers (2024-04-07T19:29:09Z) - Parameter-Free Algorithms for Performative Regret Minimization under
Decision-Dependent Distributions [15.396561118589577]
performative risk minimization is a formulation of optimization under decision-dependent distributions.
Our algorithms significantly improve upon existing Lipschitz constant distribution parameter-based methods.
We provide experimental results that demonstrate the numerical superiority of our algorithms over the existing method and other black-box optimistic optimization methods.
arXiv Detail & Related papers (2024-02-23T08:36:28Z) - Online Resource Allocation with Convex-set Machine-Learned Advice [27.662388663465006]
We introduce a parameterized class of optimal online resource allocation algorithms that strike a balance between consistent and robust ratios.
Specifically, in a C-reto optimal setting, we maximize the consistent ratio while ensuring that the robust ratio is at least C.
arXiv Detail & Related papers (2023-06-21T14:09:33Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints.
We develop novel and simple optimization formulations for these problems.
As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - On the Optimality of Batch Policy Optimization Algorithms [106.89498352537682]
Batch policy optimization considers leveraging existing data for policy construction before interacting with an environment.
We show that any confidence-adjusted index algorithm is minimax optimal, whether it be optimistic, pessimistic or neutral.
We introduce a new weighted-minimax criterion that considers the inherent difficulty of optimal value prediction.
arXiv Detail & Related papers (2021-04-06T05:23:20Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Robust Reinforcement Learning using Least Squares Policy Iteration with
Provable Performance Guarantees [3.8073142980733]
This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces.
We first propose the Robust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation.
We then propose Robust Least Squares Policy Iteration (RLSPI) algorithm for learning the optimal robust policy.
arXiv Detail & Related papers (2020-06-20T16:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.