Related papers: Making Human-Like Trade-offs in Constrained Environments by Learning from Demonstrations

Making Human-Like Trade-offs in Constrained Environments by Learning from Demonstrations

URL: http://arxiv.org/abs/2109.11018v1
Date: Wed, 22 Sep 2021 20:12:01 GMT
Title: Making Human-Like Trade-offs in Constrained Environments by Learning from Demonstrations
Authors: Arie Glazier, Andrea Loreggia, Nicholas Mattei, Taher Rahgooy, Francesca Rossi, K. Brent Venable
Abstract summary: We present a novel inverse reinforcement learning (IRL) method for learning implicit hard and soft constraints from demonstrations. We then use the constraint learning method to implement a novel system architecture that orchestrates competing objectives. We evaluate the resulting agent on trajectory length, number of violated constraints, and total reward, demonstrating that our agent architecture is both general and achieves strong performance.
Score: 30.738257457765755
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many real-life scenarios require humans to make difficult trade-offs: do we always follow all the traffic rules or do we violate the speed limit in an emergency? These scenarios force us to evaluate the trade-off between collective norms and our own personal objectives. To create effective AI-human teams, we must equip AI agents with a model of how humans make trade-offs in complex, constrained environments. These agents will be able to mirror human behavior or to draw human attention to situations where decision making could be improved. To this end, we propose a novel inverse reinforcement learning (IRL) method for learning implicit hard and soft constraints from demonstrations, enabling agents to quickly adapt to new settings. In addition, learning soft constraints over states, actions, and state features allows agents to transfer this knowledge to new domains that share similar aspects. We then use the constraint learning method to implement a novel system architecture that leverages a cognitive model of human decision making, multi-alternative decision field theory (MDFT), to orchestrate competing objectives. We evaluate the resulting agent on trajectory length, number of violated constraints, and total reward, demonstrating that our agent architecture is both general and achieves strong performance. Thus we are able to capture and replicate human-like trade-offs from demonstrations in environments when constraints are not explicit.

Related papers

FAIRTOPIA: Envisioning Multi-Agent Guardianship for Disrupting Unfair AI Pipelines [1.556153237434314]
AI models have become active decision makers, often acting without human supervision.<n>We envision agents as fairness guardians, since agents learn from their environment.<n>We introduce a fairness-by-design approach which embeds multi-role agents in an end-to-end (human to AI) synergetic scheme.
arXiv Detail & Related papers (2025-06-10T17:02:43Z)
Teaching AI to Handle Exceptions: Supervised Fine-Tuning with Human-Aligned Judgment [0.0]
Large language models (LLMs) are evolving into agentic AI systems, but their decision-making processes remain poorly understood. We show that even LLMs that excel at reasoning deviate significantly from human judgments because they adhere strictly to policies. We then evaluate three approaches to tuning AI agents to handle exceptions: ethical framework prompting, chain-of-thought reasoning, and supervised fine-tuning.
arXiv Detail & Related papers (2025-03-04T20:00:37Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Policy Learning with a Language Bottleneck [65.99843627646018]
We introduce Policy Learning with a Language Bottleneck (PLLB), a framework enabling AI agents to generate linguistic rules. PLLBB alternates between a *rule generation* step guided by language models, and an *update* step where agents learn new policies guided by rules. We show thatPLLB agents are able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users.
arXiv Detail & Related papers (2024-05-07T08:40:21Z)
A Moral Imperative: The Need for Continual Superalignment of Large Language Models [1.0499611180329806]
Superalignment is a theoretical framework that aspires to ensure that superintelligent AI systems act in accordance with human values and goals. This paper examines the challenges associated with achieving life-long superalignment in AI systems, particularly large language models (LLMs)
arXiv Detail & Related papers (2024-03-13T05:44:50Z)
Reinforcement Learning Interventions on Boundedly Rational Human Agents in Frictionful Tasks [25.507656595628376]
We introduce a framework in which an AI agent intervenes on the parameters of a Markov Decision Process (MDP) belonging to a boundedly rational human agent. We show that AI planning with our human models can lead to helpful policies on a wide range of more complex, ground-truth humans.
arXiv Detail & Related papers (2024-01-26T14:59:48Z)
HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind. This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z)
Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components. We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z)
AI planning in the imagination: High-level planning on learned abstract search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training. We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z)
DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation [107.5934592892763]
We propose DREAMWALKER -- a world model based VLN-CE agent. The world model is built to summarize the visual, topological, and dynamic properties of the complicated continuous environment. It can simulate and evaluate possible plans entirely in such internal abstract world, before executing costly actions.
arXiv Detail & Related papers (2023-08-14T23:45:01Z)
Learning Behavioral Soft Constraints from Demonstrations [31.34800444313487]
We present a novel inverse reinforcement learning (IRL) method for learning implicit hard and soft constraints over states, actions, and state features. Our method enables agents implicitly learn human constraints and desires without the need for explicit modeling by the agent designer.
arXiv Detail & Related papers (2022-02-21T18:09:56Z)
Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time. We propose a novel approach to address the difficulties of scalability and data scarcity. Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z)
Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world. Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts. This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z)
Learning Human Rewards by Inferring Their Latent Intelligence Levels in Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data [18.750834997334664]
We argue that humans are bounded rational and have different intelligence levels when reasoning about others' decision-making process. We propose a new multi-agent Inverse Reinforcement Learning framework that reasons about humans' latent intelligence levels during learning.
arXiv Detail & Related papers (2021-03-07T07:48:31Z)
Safe Reinforcement Learning with Natural Language Constraints [39.70152978025088]
We propose learning to interpret natural language constraints for safe RL. HazardWorld is a new multi-task benchmark that requires an agent to optimize reward while not violating constraints specified in free-form text. We show that our method achieves higher rewards (up to 11x) and fewer constraint violations (by 1.8x) compared to existing approaches.
arXiv Detail & Related papers (2020-10-11T03:41:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.