Risk-Averse Learning with Varying Risk Levels
- URL: http://arxiv.org/abs/2512.22986v1
- Date: Sun, 28 Dec 2025 16:09:29 GMT
- Title: Risk-Averse Learning with Varying Risk Levels
- Authors: Siyi Wang, Zifan Wang, Karl H. Johansson,
- Abstract summary: This work investigates risk-averse online optimization in dynamic environments with varying risk levels.<n>To capture the dynamics of the environment and risk levels, we employ the function variation metric and introduce a novel risk-level variation metric.<n>We develop risk-averse learning algorithms with a limited sampling budget and analyze their dynamic regret bounds in terms of function variation, risk-level variation, and the total number of samples.
- Score: 8.646001948552264
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In safety-critical decision-making, the environment may evolve over time, and the learner adjusts its risk level accordingly. This work investigates risk-averse online optimization in dynamic environments with varying risk levels, employing Conditional Value-at-Risk (CVaR) as the risk measure. To capture the dynamics of the environment and risk levels, we employ the function variation metric and introduce a novel risk-level variation metric. Two information settings are considered: a first-order scenario, where the learner observes both function values and their gradients; and a zeroth-order scenario, where only function evaluations are available. For both cases, we develop risk-averse learning algorithms with a limited sampling budget and analyze their dynamic regret bounds in terms of function variation, risk-level variation, and the total number of samples. The regret analysis demonstrates the adaptability of the algorithms in non-stationary and risk-sensitive settings. Finally, numerical experiments are presented to demonstrate the efficacy of the methods.
Related papers
- Constrained Language Model Policy Optimization via Risk-aware Stepwise Alignment [49.2305683068875]
We propose Risk-aware Stepwise Alignment (RSA), a novel alignment method that incorporates risk awareness into the policy optimization process.<n> RSA mitigates risks induced by excessive model shift away from a reference policy, and it explicitly suppresses low-probability yet high-impact harmful behaviors.<n> Experimental results demonstrate that our method achieves high levels of helpfulness while ensuring strong safety.
arXiv Detail & Related papers (2025-12-30T14:38:02Z) - RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration [81.38705556267917]
Existing safety evaluation methods for large language models (LLMs) suffer from inherent limitations.<n>We introduce a theoretical framework that reconstructs the underlying risk concept space.<n>We propose RADAR, a multi-agent collaborative evaluation framework.
arXiv Detail & Related papers (2025-09-28T09:35:32Z) - Risk-Averse Reinforcement Learning with Itakura-Saito Loss [63.620958078179356]
Risk-averse agents choose policies that minimize risk, occasionally sacrificing expected value.<n>We introduce a numerically stable and mathematically sound loss function based on the Itakura-Saito divergence for learning state-value and action-value functions.<n>In the experimental section, we explore multiple scenarios, some with known analytical solutions, and show that the considered loss function outperforms the alternatives.
arXiv Detail & Related papers (2025-05-22T17:18:07Z) - Risk-averse learning with delayed feedback [17.626195546400247]
Delayed feedback makes it challenging to assess and manage risk effectively.<n>We develop two risk-averse learning algorithms that rely on one-point and two-point zeroth-order optimization approaches.
arXiv Detail & Related papers (2024-09-25T12:32:22Z) - Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty [5.710971447109951]
This paper studies continuous-time risk-sensitive reinforcement learning (RL)
I highlight that the conventional policy gradient representation is inadequate for risk-sensitive problems due to the nonlinear nature of quadratic variation.
I prove the convergence of the proposed algorithm for Merton's investment problem and quantify the impact of temperature parameter on the behavior of the learning procedure.
arXiv Detail & Related papers (2024-04-19T03:05:41Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - DRL-ORA: Distributional Reinforcement Learning with Online Risk Adaption [9.191326295161725]
We propose a new framework, Distributional RL with Online Risk Adaptation (DRL-ORA)<n>The framework unifies the existing variants of risk adaption approaches and offers better explainability and flexibility.<n>We show that DRL-ORA outperforms existing methods that rely on fixed risk levels or manually designed risk level adaptation in multiple classes of tasks.
arXiv Detail & Related papers (2023-10-08T14:32:23Z) - Is Risk-Sensitive Reinforcement Learning Properly Resolved? [54.00107408956307]
We propose a novel algorithm, namely Trajectory Q-Learning (TQL), for RSRL problems with provable policy improvement.<n>Based on our new learning architecture, we are free to introduce a general and practical implementation for different risk measures to learn disparate risk-sensitive policies.
arXiv Detail & Related papers (2023-07-02T11:47:21Z) - Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement
Learning [0.0]
We develop an efficient approach to estimate a class of dynamic spectral risk measures with deep neural networks.
We also develop a risk-sensitive actor-critic algorithm that uses full episodes and does not require any additional nested transitions.
arXiv Detail & Related papers (2022-06-29T14:11:15Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Adaptive Risk Tendency: Nano Drone Navigation in Cluttered Environments
with Distributional Reinforcement Learning [17.940958199767234]
We present a distributional reinforcement learning framework to learn adaptive risk tendency policies.
We show our algorithm can adjust its risk-sensitivity on the fly both in simulation and real-world experiments.
arXiv Detail & Related papers (2022-03-28T13:39:58Z) - Automatic Risk Adaptation in Distributional Reinforcement Learning [26.113528145137497]
The use of Reinforcement Learning (RL) agents in practical applications requires the consideration of suboptimal outcomes.
This is especially important in safety-critical environments, where errors can lead to high costs or damage.
We show reduced failure rates by up to a factor of 7 and improved generalization performance by up to 14% compared to both risk-aware and risk-agnostic agents.
arXiv Detail & Related papers (2021-06-11T11:31:04Z) - Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss.
We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.