Risk-Averse Constrained Reinforcement Learning with Optimized Certainty Equivalents
- URL: http://arxiv.org/abs/2510.20199v1
- Date: Thu, 23 Oct 2025 04:33:32 GMT
- Title: Risk-Averse Constrained Reinforcement Learning with Optimized Certainty Equivalents
- Authors: Jane H. Lee, Baturay Saglam, Spyridon Pougkakiotis, Amin Karbasi, Dionysis Kalogerias,
- Abstract summary: Constrained optimization provides a common framework for dealing with conflicting objectives in reinforcement learning (RL)<n>We propose a framework for risk-aware constrained RL, which exhibits per-stage properties jointly in reward values and time using optimized certainty equivalents (OCEs)<n>Our framework ensures an exact equivalent to the original constrained problem within a parameterized strong Lagrangian duality framework under appropriate constraint qualifications.
- Score: 29.698100324454362
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Constrained optimization provides a common framework for dealing with conflicting objectives in reinforcement learning (RL). In most of these settings, the objectives (and constraints) are expressed though the expected accumulated reward. However, this formulation neglects risky or even possibly catastrophic events at the tails of the reward distribution, and is often insufficient for high-stakes applications in which the risk involved in outliers is critical. In this work, we propose a framework for risk-aware constrained RL, which exhibits per-stage robustness properties jointly in reward values and time using optimized certainty equivalents (OCEs). Our framework ensures an exact equivalent to the original constrained problem within a parameterized strong Lagrangian duality framework under appropriate constraint qualifications, and yields a simple algorithmic recipe which can be wrapped around standard RL solvers, such as PPO. Lastly, we establish the convergence of the proposed algorithm under common assumptions, and verify the risk-aware properties of our approach through several numerical experiments.
Related papers
- BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search [72.87861928940929]
Boundary-Aware Policy Optimization (BAPO) is a novel RL framework designed to cultivate reliable boundary awareness without compromising accuracy.<n>BAPO introduces two key components: (i) a group-based boundary-aware reward that encourages an IDK response only when the reasoning reaches its limit, and (ii) an adaptive reward modulator that strategically suspends this reward during early exploration, preventing the model from exploiting IDK as a shortcut.
arXiv Detail & Related papers (2026-01-16T07:06:58Z) - RiskPO: Risk-based Policy Optimization via Verifiable Reward for LLM Post-Training [13.309653291779233]
Reinforcement learning with verifiable reward has emerged as a central paradigm for post-training large language models (LLMs)<n>We argue that these issues stem from overemphasizing high-probability output sequences while neglecting rare but informative reasoning paths.<n>We propose Risk-based Policy Optimization (RiskPO), which substitutes classical mean-based objectives with principled risk measures.
arXiv Detail & Related papers (2025-10-01T13:53:09Z) - Nonconvex Regularization for Feature Selection in Reinforcement Learning [7.408148824204063]
This work proposes an efficient batch algorithm for feature selection in reinforcement learning (RL) with theoretical convergence guarantees.<n> Numerical experiments demonstrate that the proposed approach substantially outperforms state-selection scenarios.
arXiv Detail & Related papers (2025-09-19T06:21:20Z) - Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality [53.525547349715595]
We propose a novel primal-only algorithm called Rectified Robust Policy Optimization (RRPO)<n>RRPO operates directly on the primal problem without relying on dual formulations.<n>We show convergence to an approximately optimal feasible policy with complexity matching the best-known lower bound.
arXiv Detail & Related papers (2025-08-24T16:59:38Z) - Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence [0.562479170374811]
We introduce a novel problem setting in bandit optimization that addresses maximizing expected reward and minimizing associated uncertainty.<n>We propose a unified meta-budgetalgorithmic framework capable of operating under both fixed-confidence and fixed-optimal regimes.<n>Our approach outperforms existing methods in terms of both accuracy and sample efficiency.
arXiv Detail & Related papers (2025-06-27T14:21:03Z) - Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees [13.470544618339506]
We propose a spectral risk measure-constrained RL algorithm, spectral-risk-constrained policy optimization (SRCPO)
In the bilevel optimization structure, the outer problem involves optimizing dual variables derived from the risk measures, while the inner problem involves finding an optimal policy.
The proposed method has been evaluated on continuous control tasks and showed the best performance among other RCRL algorithms satisfying the constraints.
arXiv Detail & Related papers (2024-05-29T02:17:25Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Domain Generalization without Excess Empirical Risk [83.26052467843725]
A common approach is designing a data-driven surrogate penalty to capture generalization and minimize the empirical risk jointly with the penalty.
We argue that a significant failure mode of this recipe is an excess risk due to an erroneous penalty or hardness in joint optimization.
We present an approach that eliminates this problem. Instead of jointly minimizing empirical risk with the penalty, we minimize the penalty under the constraint of optimality of the empirical risk.
arXiv Detail & Related papers (2023-08-30T08:46:46Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss.
We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.