Related papers: Risk-Aware Continuous Control with Neural Contextual Bandits

Risk-Aware Continuous Control with Neural Contextual Bandits

URL: http://arxiv.org/abs/2312.09961v1
Date: Fri, 15 Dec 2023 17:16:04 GMT
Title: Risk-Aware Continuous Control with Neural Contextual Bandits
Authors: Jose A. Ayala-Romero, Andres Garcia-Saavedra, Xavier Costa-Perez
Abstract summary: We propose a risk-aware decision-making framework for contextual bandit problems. Our framework is designed to cater to various risk levels, effectively balancing constraint satisfaction against performance. We evaluate our framework in a real-world use case involving a 5G mobile network.
Score: 8.911816419902427
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in learning techniques have garnered attention for their applicability to a diverse range of real-world sequential decision-making problems. Yet, many practical applications have critical constraints for operation in real environments. Most learning solutions often neglect the risk of failing to meet these constraints, hindering their implementation in real-world contexts. In this paper, we propose a risk-aware decision-making framework for contextual bandit problems, accommodating constraints and continuous action spaces. Our approach employs an actor multi-critic architecture, with each critic characterizing the distribution of performance and constraint metrics. Our framework is designed to cater to various risk levels, effectively balancing constraint satisfaction against performance. To demonstrate the effectiveness of our approach, we first compare it against state-of-the-art baseline methods in a synthetic environment, highlighting the impact of intrinsic environmental noise across different risk configurations. Finally, we evaluate our framework in a real-world use case involving a 5G mobile network where only our approach consistently satisfies the system constraint (a signal processing reliability target) with a small performance toll (8.5% increase in power consumption).

Related papers

An Optimisation Framework for Unsupervised Environment Design [88.29733214939544]
unsupervised environment design (UED) aims to maximise agent's general robustness.<n>We provide a provably convergent algorithm in the zero-sum setting.<n>We empirically verify the efficacy of our method.
arXiv Detail & Related papers (2025-05-27T03:07:26Z)
Constrained Online Decision-Making: A Unified Framework [14.465944215100746]
We investigate a general formulation of sequential decision-making with stage-wise feasibility constraints.<n>We propose a unified algorithmic framework that captures many existing constrained learning problems.<n>Our result offers a principled foundation for constrained sequential decision-making in both theory and practice.
arXiv Detail & Related papers (2025-05-11T19:22:04Z)
A CMDP-within-online framework for Meta-Safe Reinforcement Learning [23.57318558833378]
We study the problem of meta-safe reinforcement learning (Meta-SRL) through the CMDP-within-online framework. We obtain task-averaged regret bounds for unseen (optimality gap) and constraint violations using gradient-based meta-learning. We propose a meta-algorithm that performs inexact online learning on the upper bounds of within-task optimality gap and constraint violations.
arXiv Detail & Related papers (2024-05-26T15:28:42Z)
Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states. In safety-critical domains, such behaviors could lead to disastrous outcomes. We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z)
Learning Safety Constraints From Demonstration Using One-Class Decision Trees [1.81343777902022]
We present a novel approach that leverages one-class decision trees to facilitate learning from expert demonstrations. The learned constraints are subsequently employed within an oracle constrained reinforcement learning framework. In contrast to other methods, our approach offers an interpretable representation of the constraints, a vital feature in safety-critical environments.
arXiv Detail & Related papers (2023-12-14T11:48:22Z)
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection. We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance. We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z)
Maximum Causal Entropy Inverse Constrained Reinforcement Learning [3.409089945290584]
We propose a novel method that utilizes the principle of maximum causal entropy to learn constraints and an optimal policy. We evaluate the effectiveness of the learned policy by assessing the reward received and the number of constraint violations. Our method has been shown to outperform state-of-the-art approaches across a variety of tasks and environments.
arXiv Detail & Related papers (2023-05-04T14:18:19Z)
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
Reinforcement Learning with Stepwise Fairness Constraints [50.538878453547966]
We introduce the study of reinforcement learning with stepwise fairness constraints. We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
arXiv Detail & Related papers (2022-11-08T04:06:23Z)
Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems [18.546197100318693]
We introduce a scalable framework for supporting fine-grained exploration targets for individual domains via user-defined constraints. We present a novel meta-gradient learning approach that is scalable and practical to address this problem. We conduct extensive experiments using data from a real-world conversational AI on a set of realistic constraint benchmarks.
arXiv Detail & Related papers (2022-09-17T23:44:13Z)
Constrained Policy Optimization via Bayesian World Models [79.0077602277004]
LAMBDA is a model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes. We demonstrate LAMBDA's state of the art performance on the Safety-Gym benchmark suite in terms of sample efficiency and constraint violation.
arXiv Detail & Related papers (2022-01-24T17:02:22Z)
Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process. A key contribution of our approach is to translate cumulative cost constraints into state-based constraints. We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z)
SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.