A policy gradient approach for optimization of smooth risk measures
- URL: http://arxiv.org/abs/2202.11046v4
- Date: Sun, 23 Jun 2024 10:03:38 GMT
- Title: A policy gradient approach for optimization of smooth risk measures
- Authors: Nithia Vijayan, Prashanth L. A,
- Abstract summary: We consider episodic Markov decision processes, and model the risk using the broad class of smooth risk measures of the cumulative discounted reward.
We propose two template policy gradient algorithms that optimize a smooth risk measure in on-policy and off-policy RL settings.
- Score: 8.087699764574788
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose policy gradient algorithms for solving a risk-sensitive reinforcement learning (RL) problem in on-policy as well as off-policy settings. We consider episodic Markov decision processes, and model the risk using the broad class of smooth risk measures of the cumulative discounted reward. We propose two template policy gradient algorithms that optimize a smooth risk measure in on-policy and off-policy RL settings, respectively. We derive non-asymptotic bounds that quantify the rate of convergence of our proposed algorithms to a stationary point of the smooth risk measure. As special cases, we establish that our algorithms apply to optimization of mean-variance and distortion risk measures, respectively.
Related papers
- Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees [13.470544618339506]
We propose a spectral risk measure-constrained RL algorithm, spectral-risk-constrained policy optimization (SRCPO)
In the bilevel optimization structure, the outer problem involves optimizing dual variables derived from the risk measures, while the inner problem involves finding an optimal policy.
The proposed method has been evaluated on continuous control tasks and showed the best performance among other RCRL algorithms satisfying the constraints.
arXiv Detail & Related papers (2024-05-29T02:17:25Z) - Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to
Standard RL [48.1726560631463]
We study Risk-Sensitive Reinforcement Learning with the Optimized Certainty Equivalent (OCE) risk.
We propose two general meta-algorithms via reductions to standard RL.
We show that it learns the optimal risk-sensitive policy while prior algorithms provably fail.
arXiv Detail & Related papers (2024-03-10T21:45:12Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms.
We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z) - Distributional Method for Risk Averse Reinforcement Learning [0.0]
We introduce a distributional method for learning the optimal policy in risk averse Markov decision process.
We assume sequential observations of states, actions, and costs and assess the performance of a policy using dynamic risk measures.
arXiv Detail & Related papers (2023-02-27T19:48:42Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Reinforcement Learning with Dynamic Convex Risk Measures [0.0]
We develop an approach for solving time-consistent risk-sensitive optimization problems using model-free reinforcement learning (RL)
We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules.
arXiv Detail & Related papers (2021-12-26T16:41:05Z) - Policy Gradient Methods for Distortion Risk Measures [9.554545881355377]
We propose policy gradient algorithms which learn risk-sensitive policies in a reinforcement learning framework.
We derive a variant of the policy gradient theorem that caters to the DRM objective, and integrate it with a likelihood ratio-based gradient estimation scheme.
arXiv Detail & Related papers (2021-07-09T13:14:12Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Entropic Risk Constrained Soft-Robust Policy Optimization [12.362670630646805]
It is important in high-stakes domains to quantify and manage risk induced by model uncertainties.
We propose an entropic risk constrained policy gradient and actor-critic algorithms that are risk-averse to the model uncertainty.
arXiv Detail & Related papers (2020-06-20T23:48:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.