Related papers: Optimal Transport-Assisted Risk-Sensitive Q-Learning

Optimal Transport-Assisted Risk-Sensitive Q-Learning

URL: http://arxiv.org/abs/2406.11774v2
Date: Wed, 11 Sep 2024 22:30:25 GMT
Title: Optimal Transport-Assisted Risk-Sensitive Q-Learning
Authors: Zahra Shahrooei, Ali Baheri,
Abstract summary: This paper presents a risk-sensitive Q-learning algorithm that leverages optimal transport theory to enhance the agent safety. We validate the proposed algorithm in a Gridworld environment.
Score: 4.14360329494344
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The primary goal of reinforcement learning is to develop decision-making policies that prioritize optimal performance without considering risk or safety. In contrast, safe reinforcement learning aims to mitigate or avoid unsafe states. This paper presents a risk-sensitive Q-learning algorithm that leverages optimal transport theory to enhance the agent safety. By integrating optimal transport into the Q-learning framework, our approach seeks to optimize the policy's expected return while minimizing the Wasserstein distance between the policy's stationary distribution and a predefined risk distribution, which encapsulates safety preferences from domain experts. We validate the proposed algorithm in a Gridworld environment. The results indicate that our method significantly reduces the frequency of visits to risky states and achieves faster convergence to a stable policy compared to the traditional Q-learning algorithm.

Related papers

Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model [84.00480999255628]
Reinforcement Learning algorithms for safety alignment of Large Language Models (LLMs) encounter the challenge of distribution shift. Current approaches typically address this issue through online sampling from the target policy. We propose a new framework that leverages the model's intrinsic safety judgment capability to extract reward signals.
arXiv Detail & Related papers (2025-03-13T06:40:34Z)
CAT: Caution Aware Transfer in Reinforcement Learning via Distributional Risk [19.698719925388513]
We introduce a novel Caution-Aware Transfer Learning (CAT) framework. Unlike traditional approaches, we define "caution" as a more generalized and comprehensive notion of risk. Our core innovation lies in optimizing a weighted sum of reward return and caution-based on state-action occupancy measures-during the transfer process.
arXiv Detail & Related papers (2024-08-16T15:47:08Z)
A Reductions Approach to Risk-Sensitive Reinforcement Learning with Optimized Certainty Equivalents [44.09686403685058]
We study risk-sensitive RL where the goal is learn a history-dependent policy that optimize some risk measure of cumulative rewards. We propose two meta-algorithms: one grounded in optimism and another based on policy gradients. We empirically show that our algorithms learn the optimal history-dependent policy in a proof-of-concept MDP.
arXiv Detail & Related papers (2024-03-10T21:45:12Z)
SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization. In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints. Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z)
Risk-Aware Reinforcement Learning through Optimal Transport Theory [4.8951183832371]
This paper pioneers the integration of Optimal Transport theory with reinforcement learning (RL) to create a risk-aware framework. Our approach modifies the objective function, ensuring that the resulting policy not only maximizes expected rewards but also respects risk constraints dictated by OT distances. Our contributions are substantiated with a series of theorems, mapping the relationships between risk distributions, optimal value functions, and policy behaviors.
arXiv Detail & Related papers (2023-09-12T13:55:01Z)
Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk. We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations. We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z)
Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z)
Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs [71.47895794305883]
We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning setting. We present an SPI for this RL setting that takes into account the preferences of the algorithm's user for handling the trade-offs for different reward signals.
arXiv Detail & Related papers (2021-05-31T21:04:21Z)
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria. We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)
Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy. We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z)
Improving Robustness via Risk Averse Distributional Reinforcement Learning [13.467017642143581]
Robustness is critical when the policies are trained in simulations instead of real world environment. We propose a risk-aware algorithm to learn robust policies in order to bridge the gap between simulation training and real-world implementation.
arXiv Detail & Related papers (2020-05-01T20:03:10Z)
Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.