Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning
under Distribution Shifts
- URL: http://arxiv.org/abs/2402.09992v1
- Date: Thu, 15 Feb 2024 14:55:38 GMT
- Title: Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning
under Distribution Shifts
- Authors: Tobias Enders, James Harrison, Maximilian Schiffer
- Abstract summary: We study the robustness of deep reinforcement learning algorithms against distribution shifts within contextual multi-stage optimization problems.
We show that our algorithm is superior to risk-neutral Soft Actor-Critic as well as to two benchmark approaches for robust deep reinforcement learning.
- Score: 11.765000124617186
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the robustness of deep reinforcement learning algorithms against
distribution shifts within contextual multi-stage stochastic combinatorial
optimization problems from the operations research domain. In this context,
risk-sensitive algorithms promise to learn robust policies. While this field is
of general interest to the reinforcement learning community, most studies
up-to-date focus on theoretical results rather than real-world performance.
With this work, we aim to bridge this gap by formally deriving a novel
risk-sensitive deep reinforcement learning algorithm while providing numerical
evidence for its efficacy. Specifically, we introduce discrete Soft
Actor-Critic for the entropic risk measure by deriving a version of the Bellman
equation for the respective Q-values. We establish a corresponding policy
improvement result and infer a practical algorithm. We introduce an environment
that represents typical contextual multi-stage stochastic combinatorial
optimization problems and perform numerical experiments to empirically validate
our algorithm's robustness against realistic distribution shifts, without
compromising performance on the training distribution. We show that our
algorithm is superior to risk-neutral Soft Actor-Critic as well as to two
benchmark approaches for robust deep reinforcement learning. Thereby, we
provide the first structured analysis on the robustness of reinforcement
learning under distribution shifts in the realm of contextual multi-stage
stochastic combinatorial optimization problems.
Related papers
- Regularization for Adversarial Robust Learning [18.46110328123008]
We develop a novel approach to adversarial training that integrates $phi$-divergence regularization into the distributionally robust risk function.
This regularization brings a notable improvement in computation compared with the original formulation.
We validate our proposed method in supervised learning, reinforcement learning, and contextual learning and showcase its state-of-the-art performance against various adversarial attacks.
arXiv Detail & Related papers (2024-08-19T03:15:41Z) - Distributional Bellman Operators over Mean Embeddings [37.5480897544168]
We propose a novel framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions.
We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework.
arXiv Detail & Related papers (2023-12-09T11:36:14Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Multivariate Systemic Risk Measures and Computation by Deep Learning
Algorithms [63.03966552670014]
We discuss the key related theoretical aspects, with a particular focus on the fairness properties of primal optima and associated risk allocations.
The algorithms we provide allow for learning primals, optima for the dual representation and corresponding fair risk allocations.
arXiv Detail & Related papers (2023-02-02T22:16:49Z) - Risk-Sensitive Reinforcement Learning with Exponential Criteria [0.0]
We provide a definition of robust reinforcement learning policies and formulate a risk-sensitive reinforcement learning problem to approximate them.
We introduce a novel online Actor-Critic algorithm based on solving a multiplicative Bellman equation using approximation updates.
The implementation, performance, and robustness properties of the proposed methods are evaluated in simulated experiments.
arXiv Detail & Related papers (2022-12-18T04:44:38Z) - Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement
Learning [0.0]
We develop an efficient approach to estimate a class of dynamic spectral risk measures with deep neural networks.
We also develop a risk-sensitive actor-critic algorithm that uses full episodes and does not require any additional nested transitions.
arXiv Detail & Related papers (2022-06-29T14:11:15Z) - Distributed Statistical Min-Max Learning in the Presence of Byzantine
Agents [34.46660729815201]
We consider a multi-agent min-max learning problem, and focus on the emerging challenge of contending with Byzantine adversarial agents.
Our main contribution is to provide a crisp analysis of the proposed robust extra-gradient algorithm for smooth convex-concave and smooth strongly convex-strongly concave functions.
Our rates are near-optimal, and reveal both the effect of adversarial corruption and the benefit of collaboration among the non-faulty agents.
arXiv Detail & Related papers (2022-04-07T03:36:28Z) - Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains.
We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue.
We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.