Your Learned Constraint is Secretly a Backward Reachable Tube
- URL: http://arxiv.org/abs/2501.15618v1
- Date: Sun, 26 Jan 2025 17:54:43 GMT
- Title: Your Learned Constraint is Secretly a Backward Reachable Tube
- Authors: Mohamad Qadri, Gokul Swamy, Jonathan Francis, Michael Kaess, Andrea Bajcsy,
- Abstract summary: We show that ICL recovers the set of states where failure is inevitable, rather than the set of states where failure has already happened.
In contrast to the failure set, the BRT depends on the dynamics of the data collection system.
We discuss the implications of the dynamics-conditionedness of the recovered constraint on both the sample-efficiency of policy search and the transferability of learned constraints.
- Score: 27.63547210632307
- License:
- Abstract: Inverse Constraint Learning (ICL) is the problem of inferring constraints from safe (i.e., constraint-satisfying) demonstrations. The hope is that these inferred constraints can then be used downstream to search for safe policies for new tasks and, potentially, under different dynamics. Our paper explores the question of what mathematical entity ICL recovers. Somewhat surprisingly, we show that both in theory and in practice, ICL recovers the set of states where failure is inevitable, rather than the set of states where failure has already happened. In the language of safe control, this means we recover a backwards reachable tube (BRT) rather than a failure set. In contrast to the failure set, the BRT depends on the dynamics of the data collection system. We discuss the implications of the dynamics-conditionedness of the recovered constraint on both the sample-efficiency of policy search and the transferability of learned constraints.
Related papers
- Bayesian scaling laws for in-context learning [72.17734205418502]
In-context learning (ICL) is a powerful technique for getting language models to perform complex tasks with no training updates.
We show that ICL approximates a Bayesian learner and develop a family of novel Bayesian scaling laws for ICL.
arXiv Detail & Related papers (2024-10-21T21:45:22Z) - Do No Harm: A Counterfactual Approach to Safe Reinforcement Learning [5.862025534776996]
Reinforcement Learning for control has become increasingly popular due to its ability to learn rich feedback policies that take into account uncertainty and complex representations of the environment.
In such methods, if agents are in, or must visit, states where constraint violation might be inevitable, it is unclear how much they should be penalized.
We address this challenge by formulating a constraint on the counterfactual harm of the learned policy compared to a default, safe policy.
In a philosophical sense this formulation only penalizes the learner for constraint violations that it caused; in a practical sense it maintains feasibility of the optimal control problem.
arXiv Detail & Related papers (2024-05-19T20:33:21Z) - CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning [23.76366118253271]
Current solvers fail to produce efficient policies respecting hard constraints.
We present Constraints as terminations (CaT), a novel constrained RL algorithm.
Videos and code are available at https://constraints-as-terminations.io.
arXiv Detail & Related papers (2024-03-27T17:03:31Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - SaFormer: A Conditional Sequence Modeling Approach to Offline Safe
Reinforcement Learning [64.33956692265419]
offline safe RL is of great practical relevance for deploying agents in real-world applications.
We present a novel offline safe RL approach referred to as SaFormer.
arXiv Detail & Related papers (2023-01-28T13:57:01Z) - Dichotomy of Control: Separating What You Can Control from What You
Cannot [129.62135987416164]
We propose a future-conditioned supervised learning framework that separates mechanisms within a policy's control (actions) from those beyond a policy's control (environmentity)
We show that DoC yields policies that are consistent with their conditioning inputs, ensuring that conditioning a learned policy on a desired high-return future outcome will correctly induce high-return behavior.
arXiv Detail & Related papers (2022-10-24T17:49:56Z) - Interactively Learning Preference Constraints in Linear Bandits [100.78514640066565]
We study sequential decision-making with known rewards and unknown constraints.
As an application, we consider learning constraints to represent human preferences in a driving simulation.
arXiv Detail & Related papers (2022-06-10T17:52:58Z) - Learn Zero-Constraint-Violation Policy in Model-Free Constrained
Reinforcement Learning [7.138691584246846]
We propose the safe set actor-critic (SSAC) algorithm, which confines the policy update using safety-oriented energy functions.
The safety index is designed to increase rapidly for potentially dangerous actions.
We claim that we can learn the energy function in a model-free manner similar to learning a value function.
arXiv Detail & Related papers (2021-11-25T07:24:30Z) - Continuous Doubly Constrained Batch Reinforcement Learning [93.23842221189658]
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment.
The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data.
We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates.
arXiv Detail & Related papers (2021-02-18T08:54:14Z) - Inverse Constrained Reinforcement Learning [12.669649178762718]
In this work, we consider the problem of learning constraints from demonstrations of a constraint-abiding agent's behavior.
We show that our framework can successfully learn the most likely constraints that the agent respects.
These learned constraints are textittransferable to new agents that may have different morphologies and/or reward functions.
arXiv Detail & Related papers (2020-11-19T17:56:33Z) - Robot Learning with Crash Constraints [37.685515446816105]
In robot applications where failing is undesired but not catastrophic, many algorithms struggle with leveraging data obtained from failures.
This is usually caused by (i) the failed experiment ending prematurely, or (ii) the acquired data being scarce or corrupted.
We consider failing behaviors as those that violate a constraint and address the problem of learning with crash constraints.
arXiv Detail & Related papers (2020-10-16T23:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.