Safe Reinforcement Learning with Natural Language Constraints
- URL: http://arxiv.org/abs/2010.05150v2
- Date: Wed, 4 Aug 2021 02:46:48 GMT
- Title: Safe Reinforcement Learning with Natural Language Constraints
- Authors: Tsung-Yen Yang and Michael Hu and Yinlam Chow and Peter J. Ramadge and
Karthik Narasimhan
- Abstract summary: We propose learning to interpret natural language constraints for safe RL.
HazardWorld is a new multi-task benchmark that requires an agent to optimize reward while not violating constraints specified in free-form text.
We show that our method achieves higher rewards (up to 11x) and fewer constraint violations (by 1.8x) compared to existing approaches.
- Score: 39.70152978025088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While safe reinforcement learning (RL) holds great promise for many practical
applications like robotics or autonomous cars, current approaches require
specifying constraints in mathematical form. Such specifications demand domain
expertise, limiting the adoption of safe RL. In this paper, we propose learning
to interpret natural language constraints for safe RL. To this end, we first
introduce HazardWorld, a new multi-task benchmark that requires an agent to
optimize reward while not violating constraints specified in free-form text. We
then develop an agent with a modular architecture that can interpret and adhere
to such textual constraints while learning new tasks. Our model consists of (1)
a constraint interpreter that encodes textual constraints into spatial and
temporal representations of forbidden states, and (2) a policy network that
uses these representations to produce a policy achieving minimal constraint
violations during training. Across different domains in HazardWorld, we show
that our method achieves higher rewards (up to11x) and fewer constraint
violations (by 1.8x) compared to existing approaches. However, in terms of
absolute performance, HazardWorld still poses significant challenges for agents
to learn efficiently, motivating the need for future work.
Related papers
- DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications [59.01527054553122]
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL)
Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments, are restricted to suboptimal solutions, and do not adequately handle safety constraints.
In this work, we propose a novel learning approach to address these concerns.
Our method leverages the structure of B"uchia, which explicitly represent the semantics of automat- specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae.
arXiv Detail & Related papers (2024-10-06T21:30:38Z) - Safe Multi-agent Reinforcement Learning with Natural Language Constraints [49.01100552946231]
The role of natural language constraints in Safe Multi-agent Reinforcement Learning (MARL) is crucial, yet often overlooked.
We propose a novel approach named Safe Multi-agent Reinforcement Learning with Natural Language constraints (SMALL)
Our method leverages fine-tuned language models to interpret and process free-form textual constraints, converting them into semantic embeddings.
These embeddings are then integrated into the multi-agent policy learning process, enabling agents to learn policies that minimize constraint violations while optimizing rewards.
arXiv Detail & Related papers (2024-05-30T12:57:35Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models [36.44404825103045]
Safe reinforcement learning (RL) agents accomplish given tasks while adhering to specific constraints.
We propose to use pre-trained language models (LM) to facilitate RL agents' comprehension of natural language constraints.
Our method enhances safe policy learning under a diverse set of human-derived free-form natural language constraints.
arXiv Detail & Related papers (2024-01-15T09:37:03Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - State-wise Safe Reinforcement Learning: A Survey [5.826308050755618]
State-wise constraints are one of the most common constraints in real-world applications.
This paper provides a review of existing approaches that address state-wise constraints in RL.
arXiv Detail & Related papers (2023-02-06T21:11:29Z) - Learning Barrier Certificates: Towards Safe Reinforcement Learning with
Zero Training-time Violations [64.39401322671803]
This paper explores the possibility of safe RL algorithms with zero training-time safety violations.
We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies.
arXiv Detail & Related papers (2021-08-04T04:59:05Z) - Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process.
A key contribution of our approach is to translate cumulative cost constraints into state-based constraints.
We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z) - Deep Constrained Q-learning [15.582910645906145]
In many real world applications, reinforcement learning agents have to optimize multiple objectives while following certain rules or satisfying a set of constraints.
We propose Constrained Q-learning, a novel off-policy reinforcement learning framework restricting the action space directly in the Q-update to learn the optimal Q-function for the induced constrained MDP and the corresponding safe policy.
arXiv Detail & Related papers (2020-03-20T17:26:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.