Risk-Aware Continuous Control with Neural Contextual Bandits
- URL: http://arxiv.org/abs/2312.09961v1
- Date: Fri, 15 Dec 2023 17:16:04 GMT
- Title: Risk-Aware Continuous Control with Neural Contextual Bandits
- Authors: Jose A. Ayala-Romero, Andres Garcia-Saavedra, Xavier Costa-Perez
- Abstract summary: We propose a risk-aware decision-making framework for contextual bandit problems.
Our framework is designed to cater to various risk levels, effectively balancing constraint satisfaction against performance.
We evaluate our framework in a real-world use case involving a 5G mobile network.
- Score: 8.911816419902427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in learning techniques have garnered attention for their
applicability to a diverse range of real-world sequential decision-making
problems. Yet, many practical applications have critical constraints for
operation in real environments. Most learning solutions often neglect the risk
of failing to meet these constraints, hindering their implementation in
real-world contexts. In this paper, we propose a risk-aware decision-making
framework for contextual bandit problems, accommodating constraints and
continuous action spaces. Our approach employs an actor multi-critic
architecture, with each critic characterizing the distribution of performance
and constraint metrics. Our framework is designed to cater to various risk
levels, effectively balancing constraint satisfaction against performance. To
demonstrate the effectiveness of our approach, we first compare it against
state-of-the-art baseline methods in a synthetic environment, highlighting the
impact of intrinsic environmental noise across different risk configurations.
Finally, we evaluate our framework in a real-world use case involving a 5G
mobile network where only our approach consistently satisfies the system
constraint (a signal processing reliability target) with a small performance
toll (8.5% increase in power consumption).
Related papers
- A CMDP-within-online framework for Meta-Safe Reinforcement Learning [23.57318558833378]
We study the problem of meta-safe reinforcement learning (Meta-SRL) through the CMDP-within-online framework.
We obtain task-averaged regret bounds for unseen (optimality gap) and constraint violations using gradient-based meta-learning.
We propose a meta-algorithm that performs inexact online learning on the upper bounds of within-task optimality gap and constraint violations.
arXiv Detail & Related papers (2024-05-26T15:28:42Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Learning Safety Constraints From Demonstration Using One-Class Decision
Trees [1.81343777902022]
We present a novel approach that leverages one-class decision trees to facilitate learning from expert demonstrations.
The learned constraints are subsequently employed within an oracle constrained reinforcement learning framework.
In contrast to other methods, our approach offers an interpretable representation of the constraints, a vital feature in safety-critical environments.
arXiv Detail & Related papers (2023-12-14T11:48:22Z) - ASSERT: Automated Safety Scenario Red Teaming for Evaluating the
Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection.
We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance.
We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z) - Maximum Causal Entropy Inverse Constrained Reinforcement Learning [3.409089945290584]
We propose a novel method that utilizes the principle of maximum causal entropy to learn constraints and an optimal policy.
We evaluate the effectiveness of the learned policy by assessing the reward received and the number of constraint violations.
Our method has been shown to outperform state-of-the-art approaches across a variety of tasks and environments.
arXiv Detail & Related papers (2023-05-04T14:18:19Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Reinforcement Learning with Stepwise Fairness Constraints [50.538878453547966]
We introduce the study of reinforcement learning with stepwise fairness constraints.
We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
arXiv Detail & Related papers (2022-11-08T04:06:23Z) - Constrained Policy Optimization for Controlled Self-Learning in
Conversational AI Systems [18.546197100318693]
We introduce a scalable framework for supporting fine-grained exploration targets for individual domains via user-defined constraints.
We present a novel meta-gradient learning approach that is scalable and practical to address this problem.
We conduct extensive experiments using data from a real-world conversational AI on a set of realistic constraint benchmarks.
arXiv Detail & Related papers (2022-09-17T23:44:13Z) - Constrained Policy Optimization via Bayesian World Models [79.0077602277004]
LAMBDA is a model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes.
We demonstrate LAMBDA's state of the art performance on the Safety-Gym benchmark suite in terms of sample efficiency and constraint violation.
arXiv Detail & Related papers (2022-01-24T17:02:22Z) - Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process.
A key contribution of our approach is to translate cumulative cost constraints into state-based constraints.
We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.