Optimal Transport-Assisted Risk-Sensitive Q-Learning
- URL: http://arxiv.org/abs/2406.11774v2
- Date: Wed, 11 Sep 2024 22:30:25 GMT
- Title: Optimal Transport-Assisted Risk-Sensitive Q-Learning
- Authors: Zahra Shahrooei, Ali Baheri,
- Abstract summary: This paper presents a risk-sensitive Q-learning algorithm that leverages optimal transport theory to enhance the agent safety.
We validate the proposed algorithm in a Gridworld environment.
- Score: 4.14360329494344
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The primary goal of reinforcement learning is to develop decision-making policies that prioritize optimal performance without considering risk or safety. In contrast, safe reinforcement learning aims to mitigate or avoid unsafe states. This paper presents a risk-sensitive Q-learning algorithm that leverages optimal transport theory to enhance the agent safety. By integrating optimal transport into the Q-learning framework, our approach seeks to optimize the policy's expected return while minimizing the Wasserstein distance between the policy's stationary distribution and a predefined risk distribution, which encapsulates safety preferences from domain experts. We validate the proposed algorithm in a Gridworld environment. The results indicate that our method significantly reduces the frequency of visits to risky states and achieves faster convergence to a stable policy compared to the traditional Q-learning algorithm.
Related papers
- CAT: Caution Aware Transfer in Reinforcement Learning via Distributional Risk [19.698719925388513]
We introduce a novel Caution-Aware Transfer Learning (CAT) framework.
Unlike traditional approaches, we define "caution" as a more generalized and comprehensive notion of risk.
Our core innovation lies in optimizing a weighted sum of reward return and caution-based on state-action occupancy measures-during the transfer process.
arXiv Detail & Related papers (2024-08-16T15:47:08Z) - SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization.
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z) - Risk-Aware Reinforcement Learning through Optimal Transport Theory [4.8951183832371]
This paper pioneers the integration of Optimal Transport theory with reinforcement learning (RL) to create a risk-aware framework.
Our approach modifies the objective function, ensuring that the resulting policy not only maximizes expected rewards but also respects risk constraints dictated by OT distances.
Our contributions are substantiated with a series of theorems, mapping the relationships between risk distributions, optimal value functions, and policy behaviors.
arXiv Detail & Related papers (2023-09-12T13:55:01Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety
Constraints in Finite MDPs [71.47895794305883]
We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning setting.
We present an SPI for this RL setting that takes into account the preferences of the algorithm's user for handling the trade-offs for different reward signals.
arXiv Detail & Related papers (2021-05-31T21:04:21Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Improving Robustness via Risk Averse Distributional Reinforcement
Learning [13.467017642143581]
Robustness is critical when the policies are trained in simulations instead of real world environment.
We propose a risk-aware algorithm to learn robust policies in order to bridge the gap between simulation training and real-world implementation.
arXiv Detail & Related papers (2020-05-01T20:03:10Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.