Related papers: Learning to Be Cautious

Learning to Be Cautious

URL: http://arxiv.org/abs/2110.15907v1
Date: Fri, 29 Oct 2021 16:52:45 GMT
Title: Learning to Be Cautious
Authors: Montaser Mohammedalamen, Dustin Morrill, Alexander Sieusahai, Yash Satsangi, Michael Bowling
Abstract summary: A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations. We present a sequence of tasks where cautious behavior becomes increasingly non-obvious, as well as an algorithm to demonstrate that it is possible for a system to emphlearn to be cautious.
Score: 71.9871661858886
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations. It is generally impossible to anticipate all situations that an autonomous system may face or what behavior would best avoid bad outcomes. An agent that could learn to be cautious would overcome this challenge by discovering for itself when and how to behave cautiously. In contrast, current approaches typically embed task-specific safety information or explicit cautious behaviors into the system, which is error-prone and imposes extra burdens on practitioners. In this paper, we present both a sequence of tasks where cautious behavior becomes increasingly non-obvious, as well as an algorithm to demonstrate that it is possible for a system to \emph{learn} to be cautious. The essential features of our algorithm are that it characterizes reward function uncertainty without task-specific safety information and uses this uncertainty to construct a robust policy. Specifically, we construct robust policies with a $k$-of-$N$ counterfactual regret minimization (CFR) subroutine given a learned reward function uncertainty represented by a neural network ensemble belief. These policies exhibit caution in each of our tasks without any task-specific safety tuning.

Related papers

Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time. We present a new, scalable method, which enjoys strict formal guarantees for Safe RL. We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z)
DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems [13.93024489228903]
predefined constraint functions to ensure safety in complex real-world tasks, such as autonomous driving. Recent research highlights the potential of leveraging pre-acquired task-agnostic knowledge to enhance both safety and sample efficiency in related tasks. We propose a novel method to learn shared constraint distributions across multiple tasks. Our approach identifies the shared constraints through imitation learning and then adapts to new tasks by adjusting risk levels within these learned distributions.
arXiv Detail & Related papers (2025-01-30T01:56:07Z)
No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks. This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics. We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z)
Towards Interpretable Reinforcement Learning with Constrained Normalizing Flow Policies [5.6872893893453105]
Reinforcement learning policies are typically represented by black-box neural networks. We propose constrained normalizing flow policies as interpretable and safe-by-construction policy models.
arXiv Detail & Related papers (2024-05-02T11:40:15Z)
Safety Margins for Reinforcement Learning [53.10194953873209]
We show how to leverage proxy criticality metrics to generate safety margins. We evaluate our approach on learned policies from APE-X and A3C within an Atari environment.
arXiv Detail & Related papers (2023-07-25T16:49:54Z)
Skill-Based Reinforcement Learning with Intrinsic Reward Matching [77.34726150561087]
We present Intrinsic Reward Matching (IRM), which unifies task-agnostic skill pretraining and task-aware finetuning. IRM enables us to utilize pretrained skills far more effectively than previous skill selection methods.
arXiv Detail & Related papers (2022-10-14T00:04:49Z)
Distributional Actor-Critic Ensemble for Uncertainty-Aware Continuous Control [13.767812547998735]
Uncertainty quantification is one of the central challenges for machine learning in real-world applications. Disentangling and evaluating uncertainties simultaneously stands a chance of improving the agent's final performance. We propose an uncertainty-aware reinforcement learning algorithm for continuous control tasks.
arXiv Detail & Related papers (2022-07-27T18:11:04Z)
Safer Reinforcement Learning through Transferable Instinct Networks [6.09170287691728]
We present an approach where an additional policy can override the main policy and offer a safer alternative action. In our instinct-regulated RL (IR2L) approach, an "instinctual" network is trained to recognize undesirable situations. We demonstrate IR2L in the OpenAI Safety gym domain, in which it receives a significantly lower number of safety violations.
arXiv Detail & Related papers (2021-07-14T13:22:04Z)
Learning Uncertainty For Safety-Oriented Semantic Segmentation In Autonomous Driving [77.39239190539871]
We show how uncertainty estimation can be leveraged to enable safety critical image segmentation in autonomous driving. We introduce a new uncertainty measure based on disagreeing predictions as measured by a dissimilarity function. We show experimentally that our proposed approach is much less computationally intensive at inference time than competing methods.
arXiv Detail & Related papers (2021-05-28T09:23:05Z)
Coverage as a Principle for Discovering Transferable Behavior in Reinforcement Learning [16.12658895065585]
We argue that representation alone is not enough for efficient transfer in challenging domains and explore how to transfer knowledge through behavior. The behavior of pre-trained policies may be used for solving the task at hand (exploitation) or for collecting useful data to solve the problem (exploration)
arXiv Detail & Related papers (2021-02-24T16:51:02Z)
Safe Reinforcement Learning via Curriculum Induction [94.67835258431202]
In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly. Existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations. This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor.
arXiv Detail & Related papers (2020-06-22T10:48:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.