Related papers: Cooperative Inverse Reinforcement Learning

Cooperative Inverse Reinforcement Learning

URL: http://arxiv.org/abs/1606.03137v4
Date: Sat, 17 Feb 2024 16:13:12 GMT
Title: Cooperative Inverse Reinforcement Learning
Authors: Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell
Abstract summary: We propose a formal definition of the value alignment problem as cooperative reinforcement learning (CIRL) A CIRL problem is a cooperative, partial-information game with two agents human and robot; both are rewarded according to the human's reward function, but the robot does not initially know what this is. In contrast to classical IRL, where the human is assumed to act optimally in isolation, optimal CIRL solutions produce behaviors such as active teaching, active learning, and communicative actions.
Score: 64.60722062217417
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans. We propose a formal definition of the value alignment problem as cooperative inverse reinforcement learning (CIRL). A CIRL problem is a cooperative, partial-information game with two agents, human and robot; both are rewarded according to the human's reward function, but the robot does not initially know what this is. In contrast to classical IRL, where the human is assumed to act optimally in isolation, optimal CIRL solutions produce behaviors such as active teaching, active learning, and communicative actions that are more effective in achieving value alignment. We show that computing optimal joint policies in CIRL games can be reduced to solving a POMDP, prove that optimality in isolation is suboptimal in CIRL, and derive an approximate CIRL algorithm.

Related papers

Confidence-Guided Human-AI Collaboration: Reinforcement Learning with Distributional Proxy Value Propagation for Autonomous Driving [1.4063588986150455]
This paper develops a confidence-guided human-AI collaboration (C-HAC) strategy to overcome these limitations.<n>C-HAC achieves rapid and stable learning of human-guided policies with minimal human interaction.<n> Experiments across diverse driving scenarios reveal that C-HAC significantly outperforms conventional methods in terms of safety, efficiency, and overall performance.
arXiv Detail & Related papers (2025-06-04T04:31:10Z)
Improving Human-AI Coordination through Adversarial Training and Generative Models [36.54154192505703]
Generalizing to novel humans requires training on data that captures the diversity of human behaviors. Adversarial training is one avenue for searching for such data and ensuring that agents are robust. We propose a novel strategy for overcoming self-sabotage that combines a pre-trained generative model to simulate valid cooperative agent policies.
arXiv Detail & Related papers (2025-04-21T21:53:00Z)
MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention [81.56607128684723]
We introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed for sample-efficient alignment from human intervention. MereQ infers a residual reward function that captures the discrepancy between the human expert's and the prior policy's underlying reward functions. It then employs Residual Q-Learning (RQL) to align the policy with human preferences using this residual reward function.
arXiv Detail & Related papers (2024-06-24T01:51:09Z)
Advantage Alignment Algorithms [20.125992203908744]
We introduce Advantage Alignment, a family of algorithms that perform opponent shaping efficiently and intuitively. We achieve this by aligning the advantages of interacting agents, increasing the probability of mutually beneficial actions when their interaction has been positive. We demonstrate the effectiveness of our algorithms across a range of social dilemmas, achieving state-of-the-art cooperation and robustness against exploitation.
arXiv Detail & Related papers (2024-06-20T18:30:09Z)
REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world. Current methods to mitigate this misalignment work by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
Contrastive Preference Learning: Learning from Human Feedback without RL [71.77024922527642]
We introduce Contrastive Preference Learning (CPL), an algorithm for learning optimal policies from preferences without learning reward functions. CPL is fully off-policy, uses only a simple contrastive objective, and can be applied to arbitrary MDPs.
arXiv Detail & Related papers (2023-10-20T16:37:56Z)
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism [91.52263068880484]
We study offline Reinforcement Learning with Human Feedback (RLHF) We aim to learn the human's underlying reward and the MDP's optimal policy from a set of trajectories induced by human choices. RLHF is challenging for multiple reasons: large state space but limited human feedback, the bounded rationality of human decisions, and the off-policy distribution shift.
arXiv Detail & Related papers (2023-05-29T01:18:39Z)
Coherent Soft Imitation Learning [17.345411907902932]
Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward. This work derives an imitation method that captures the strengths of both BC and IRL.
arXiv Detail & Related papers (2023-05-25T21:54:22Z)
Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information [110.42866062614912]
We study human-guided human-machine interaction involving private information. We focus on offline reinforcement learning (RL) in this game. We develop a novel identification result and use it to propose a new off-policy evaluation method.
arXiv Detail & Related papers (2022-12-23T06:26:44Z)
Privacy-Preserving Reinforcement Learning Beyond Expectation [6.495883501989546]
Cyber and cyber-physical systems equipped with machine learning algorithms such as autonomous cars share environments with humans. It is important to align system (or agent) behaviors with the preferences of one or more human users. We consider the case when an agent has to learn behaviors in an unknown environment.
arXiv Detail & Related papers (2022-03-18T21:28:29Z)
Bayesian Robust Optimization for Imitation Learning [34.40385583372232]
Inverse reinforcement learning can enable generalization to new states by learning a parameterized reward function. Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework. BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors.
arXiv Detail & Related papers (2020-07-24T01:52:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.