Related papers: Soft-Robust Algorithms for Batch Reinforcement Learning

Soft-Robust Algorithms for Batch Reinforcement Learning

URL: http://arxiv.org/abs/2011.14495v2
Date: Fri, 26 Feb 2021 17:46:32 GMT
Title: Soft-Robust Algorithms for Batch Reinforcement Learning
Authors: Elita A. Lobo, Mohammad Ghavamzadeh, Marek Petrik
Abstract summary: In reinforcement learning, robust decision-making problems with limited data are usually computed by the percentile criterion. We show that the percentile criterion is non- theoretical as it is difficult to optimize and ignores the mean performance. We propose and analyze two algorithms to approximately optimize the percentile criterion.
Score: 36.78967245470449
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure. Unfortunately, such policies are typically overly conservative as the percentile criterion is non-convex, difficult to optimize, and ignores the mean performance. To overcome these shortcomings, we study the soft-robust criterion, which uses risk measures to balance the mean and percentile criterion better. In this paper, we establish the soft-robust criterion's fundamental properties, show that it is NP-hard to optimize, and propose and analyze two algorithms to approximately optimize it. Our theoretical analyses and empirical evaluations demonstrate that our algorithms compute much less conservative solutions than the existing approximate methods for optimizing the percentile-criterion.

Related papers

Policy Testing in Markov Decision Processes [48.642181362172906]
We study the policy testing problem in discounted decision processes (MDP) under the fixed-confidence setting.<n>The goal is to determine whether the value of a given policy exceeds a numerical threshold.
arXiv Detail & Related papers (2025-05-21T10:13:54Z)
Optimal Rates for Robust Stochastic Convex Optimization [12.620782629498812]
We develop novel algorithms that achieve minimax-optimal excess risk (up to logarithmic factors) under the $epsilon$-contamination model. Our algorithms do not require stringent assumptions, including Lipschitz continuity and smoothness of individual sample functions. We complement our algorithmic developments with a tight information-theoretic lower bound for robust SCO.
arXiv Detail & Related papers (2024-12-15T00:52:08Z)
Percentile Criterion Optimization in Offline Reinforcement Learning [22.42041973113997]
We propose a novel Value-at-Risk based dynamic programming algorithm to optimize the percentile criterion without explicitly constructing any ambiguity. Our theoretical and empirical results show that our results implicitly learn robust conservative policies.
arXiv Detail & Related papers (2024-04-07T19:29:09Z)
Parameter-Free Algorithms for Performative Regret Minimization under Decision-Dependent Distributions [15.396561118589577]
performative risk minimization is a formulation of optimization under decision-dependent distributions. Our algorithms significantly improve upon existing Lipschitz constant distribution parameter-based methods. We provide experimental results that demonstrate the numerical superiority of our algorithms over the existing method and other black-box optimistic optimization methods.
arXiv Detail & Related papers (2024-02-23T08:36:28Z)
Online Resource Allocation with Convex-set Machine-Learned Advice [27.662388663465006]
We introduce a parameterized class of optimal online resource allocation algorithms that strike a balance between consistent and robust ratios. Specifically, in a C-reto optimal setting, we maximize the consistent ratio while ensuring that the robust ratio is at least C.
arXiv Detail & Related papers (2023-06-21T14:09:33Z)
Exploring the Algorithm-Dependent Generalization of AUPRC Optimization with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning. In this work, we present the first trial in the single-dependent generalization of AUPRC optimization. Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z)
Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints. We develop novel and simple optimization formulations for these problems. As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
On the Optimality of Batch Policy Optimization Algorithms [106.89498352537682]
Batch policy optimization considers leveraging existing data for policy construction before interacting with an environment. We show that any confidence-adjusted index algorithm is minimax optimal, whether it be optimistic, pessimistic or neutral. We introduce a new weighted-minimax criterion that considers the inherent difficulty of optimal value prediction.
arXiv Detail & Related papers (2021-04-06T05:23:20Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning. Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z)
Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees [3.8073142980733]
This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces. We first propose the Robust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation. We then propose Robust Least Squares Policy Iteration (RLSPI) algorithm for learning the optimal robust policy.
arXiv Detail & Related papers (2020-06-20T16:26:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.