Related papers: Decoupling Time and Risk: Risk-Sensitive Reinforcement Learning with General Discounting

Decoupling Time and Risk: Risk-Sensitive Reinforcement Learning with General Discounting

URL: http://arxiv.org/abs/2602.04131v1
Date: Wed, 04 Feb 2026 01:49:12 GMT
Title: Decoupling Time and Risk: Risk-Sensitive Reinforcement Learning with General Discounting
Authors: Mehrdad Moghimi, Anthony Coache, Hyejin Ku,
Abstract summary: We propose a novel framework that supports flexible discounting of future rewards and optimization of risk measures in distributional RL.<n>Our results highlight that discounting is a cornerstone in decision-making problems for capturing more expressive temporal and risk preferences profiles.
Score: 2.179313476241343
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Distributional reinforcement learning (RL) is a powerful framework increasingly adopted in safety-critical domains for its ability to optimize risk-sensitive objectives. However, the role of the discount factor is often overlooked, as it is typically treated as a fixed parameter of the Markov decision process or tunable hyperparameter, with little consideration of its effect on the learned policy. In the literature, it is well-known that the discounting function plays a major role in characterizing time preferences of an agent, which an exponential discount factor cannot fully capture. Building on this insight, we propose a novel framework that supports flexible discounting of future rewards and optimization of risk measures in distributional RL. We provide a technical analysis of the optimality of our algorithms, show that our multi-horizon extension fixes issues raised with existing methodologies, and validate the robustness of our methods through extensive experiments. Our results highlight that discounting is a cornerstone in decision-making problems for capturing more expressive temporal and risk preferences profiles, with potential implications for real-world safety-critical applications.

Related papers

Towards Sample-Efficient and Stable Reinforcement Learning for LLM-based Recommendation [56.92367609590823]
Long Chain-of-Thought (Long CoT) reasoning has shown promise in Large Language Models (LLMs)<n>We argue that Long CoT is inherently ill-suited for the sequential recommendation domain.<n>We propose RISER, a novel Reinforced Item Space Exploration framework for Recommendation.
arXiv Detail & Related papers (2026-01-31T10:02:43Z)
BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search [72.87861928940929]
Boundary-Aware Policy Optimization (BAPO) is a novel RL framework designed to cultivate reliable boundary awareness without compromising accuracy.<n>BAPO introduces two key components: (i) a group-based boundary-aware reward that encourages an IDK response only when the reasoning reaches its limit, and (ii) an adaptive reward modulator that strategically suspends this reward during early exploration, preventing the model from exploiting IDK as a shortcut.
arXiv Detail & Related papers (2026-01-16T07:06:58Z)
Risk-Averse Constrained Reinforcement Learning with Optimized Certainty Equivalents [29.698100324454362]
Constrained optimization provides a common framework for dealing with conflicting objectives in reinforcement learning (RL)<n>We propose a framework for risk-aware constrained RL, which exhibits per-stage properties jointly in reward values and time using optimized certainty equivalents (OCEs)<n>Our framework ensures an exact equivalent to the original constrained problem within a parameterized strong Lagrangian duality framework under appropriate constraint qualifications.
arXiv Detail & Related papers (2025-10-23T04:33:32Z)
Risk-sensitive Actor-Critic with Static Spectral Risk Measures for Online and Offline Reinforcement Learning [4.8342038441006805]
We propose a novel framework for optimizing static Spectral Risk Measures (SRM)<n>Our algorithms consistently outperform existing risk-sensitive methods in both online and offline environments across diverse domains.
arXiv Detail & Related papers (2025-07-05T04:41:54Z)
Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions [8.758206783988404]
We propose a reinforcement learning framework under a broad class of risk objectives, characterized by convex scoring functions.<n>This class covers many common risk measures, such as variance, Expected Shortfall, entropic Value-at-Risk, and mean-risk utility.<n>We validate our approach in simulation experiments with a financial application in statistical arbitrage trading, demonstrating the effectiveness of the algorithm.
arXiv Detail & Related papers (2025-05-07T16:31:42Z)
Efficient Risk-sensitive Planning via Entropic Risk Measures [51.42922439693624]
We show that only Entropic Risk Measures (EntRM) can be efficiently optimized through dynamic programming.<n>We prove that this optimality front can be computed effectively thanks to a novel structural analysis and smoothness properties of entropic risks.
arXiv Detail & Related papers (2025-02-27T09:56:51Z)
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning [19.292214425524303]
We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes. Our work focuses on applying the entropic risk measure to RL problems. We center on the linear Markov Decision Process (MDP) setting, a well-regarded theoretical framework that has yet to be examined from a risk-sensitive standpoint.
arXiv Detail & Related papers (2024-07-10T13:09:52Z)
Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation. We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z)
Risk-sensitive Markov Decision Process and Learning under General Utility Functions [3.069335774032178]
Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations.<n>We consider a scenario where the decision-maker seeks to optimize a general utility function of the cumulative reward in the framework of a decision process (MDP)<n>We propose a modified value iteration algorithm that employs an epsilon-covering over the space of cumulative reward.<n>In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy.
arXiv Detail & Related papers (2023-11-22T18:50:06Z)
Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk. We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations. We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z)
False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z)
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria. We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.