Learning Behavioral Soft Constraints from Demonstrations
- URL: http://arxiv.org/abs/2202.10407v1
- Date: Mon, 21 Feb 2022 18:09:56 GMT
- Title: Learning Behavioral Soft Constraints from Demonstrations
- Authors: Arie Glazier, Andrea Loreggia, Nicholas Mattei, Taher Rahgooy,
Francesca Rossi, Brent Venable
- Abstract summary: We present a novel inverse reinforcement learning (IRL) method for learning implicit hard and soft constraints over states, actions, and state features.
Our method enables agents implicitly learn human constraints and desires without the need for explicit modeling by the agent designer.
- Score: 31.34800444313487
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many real-life scenarios require humans to make difficult trade-offs: do we
always follow all the traffic rules or do we violate the speed limit in an
emergency? These scenarios force us to evaluate the trade-off between
collective rules and norms with our own personal objectives and desires. To
create effective AI-human teams, we must equip AI agents with a model of how
humans make these trade-offs in complex environments when there are implicit
and explicit rules and constraints. Agent equipped with these models will be
able to mirror human behavior and/or to draw human attention to situations
where decision making could be improved. To this end, we propose a novel
inverse reinforcement learning (IRL) method: Max Entropy Inverse Soft
Constraint IRL (MESC-IRL), for learning implicit hard and soft constraints over
states, actions, and state features from demonstrations in deterministic and
non-deterministic environments modeled as Markov Decision Processes (MDPs). Our
method enables agents implicitly learn human constraints and desires without
the need for explicit modeling by the agent designer and to transfer these
constraints between environments. Our novel method generalizes prior work which
only considered deterministic hard constraints and achieves state of the art
performance.
Related papers
- Teaching AI to Handle Exceptions: Supervised Fine-Tuning with Human-Aligned Judgment [0.0]
Large language models (LLMs) are evolving into agentic AI systems, but their decision-making processes remain poorly understood.
We show that even LLMs that excel at reasoning deviate significantly from human judgments because they adhere strictly to policies.
We then evaluate three approaches to tuning AI agents to handle exceptions: ethical framework prompting, chain-of-thought reasoning, and supervised fine-tuning.
arXiv Detail & Related papers (2025-03-04T20:00:37Z) - A Moral Imperative: The Need for Continual Superalignment of Large Language Models [1.0499611180329806]
Superalignment is a theoretical framework that aspires to ensure that superintelligent AI systems act in accordance with human values and goals.
This paper examines the challenges associated with achieving life-long superalignment in AI systems, particularly large language models (LLMs)
arXiv Detail & Related papers (2024-03-13T05:44:50Z) - Tuning-Free Accountable Intervention for LLM Deployment -- A
Metacognitive Approach [55.613461060997004]
Large Language Models (LLMs) have catalyzed transformative advances across a spectrum of natural language processing tasks.
We propose an innovative textitmetacognitive approach, dubbed textbfCLEAR, to equip LLMs with capabilities for self-aware error identification and correction.
arXiv Detail & Related papers (2024-03-08T19:18:53Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework.
Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations.
We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - Modeling Boundedly Rational Agents with Latent Inference Budgets [56.24971011281947]
We introduce a latent inference budget model (L-IBM) that models agents' computational constraints explicitly.
L-IBMs make it possible to learn agent models using data from diverse populations of suboptimal actors.
We show that L-IBMs match or outperform Boltzmann models of decision-making under uncertainty.
arXiv Detail & Related papers (2023-12-07T03:55:51Z) - Learning Vision-based Pursuit-Evasion Robot Policies [54.52536214251999]
We develop a fully-observable robot policy that generates supervision for a partially-observable one.
We deploy our policy on a physical quadruped robot with an RGB-D camera on pursuit-evasion interactions in the wild.
arXiv Detail & Related papers (2023-08-30T17:59:05Z) - Maximum Causal Entropy Inverse Constrained Reinforcement Learning [3.409089945290584]
We propose a novel method that utilizes the principle of maximum causal entropy to learn constraints and an optimal policy.
We evaluate the effectiveness of the learned policy by assessing the reward received and the number of constraint violations.
Our method has been shown to outperform state-of-the-art approaches across a variety of tasks and environments.
arXiv Detail & Related papers (2023-05-04T14:18:19Z) - Making Human-Like Trade-offs in Constrained Environments by Learning
from Demonstrations [30.738257457765755]
We present a novel inverse reinforcement learning (IRL) method for learning implicit hard and soft constraints from demonstrations.
We then use the constraint learning method to implement a novel system architecture that orchestrates competing objectives.
We evaluate the resulting agent on trajectory length, number of violated constraints, and total reward, demonstrating that our agent architecture is both general and achieves strong performance.
arXiv Detail & Related papers (2021-09-22T20:12:01Z) - Generalizing Decision Making for Automated Driving with an Invariant
Environment Representation using Deep Reinforcement Learning [55.41644538483948]
Current approaches either do not generalize well beyond the training data or are not capable to consider a variable number of traffic participants.
We propose an invariant environment representation from the perspective of the ego vehicle.
We show that the agents are capable to generalize successfully to unseen scenarios, due to the abstraction.
arXiv Detail & Related papers (2021-02-12T20:37:29Z) - Safe Reinforcement Learning with Natural Language Constraints [39.70152978025088]
We propose learning to interpret natural language constraints for safe RL.
HazardWorld is a new multi-task benchmark that requires an agent to optimize reward while not violating constraints specified in free-form text.
We show that our method achieves higher rewards (up to 11x) and fewer constraint violations (by 1.8x) compared to existing approaches.
arXiv Detail & Related papers (2020-10-11T03:41:56Z) - Learning a Directional Soft Lane Affordance Model for Road Scenes Using
Self-Supervision [0.0]
Humans navigate complex environments in an organized yet flexible manner, adapting to the context and implicit social rules.
This work proposes a novel self-supervised method for training a probabilistic network model to estimate the regions humans are most likely to drive in.
The model is shown to successfully generalize to new road scenes, demonstrating potential for real-world application.
arXiv Detail & Related papers (2020-02-17T00:57:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.