Multi-Principal Assistance Games
- URL: http://arxiv.org/abs/2007.09540v1
- Date: Sun, 19 Jul 2020 00:23:25 GMT
- Title: Multi-Principal Assistance Games
- Authors: Arnaud Fickinger, Simon Zhuang, Dylan Hadfield-Menell, Stuart Russell
- Abstract summary: Impossibility theorems in social choice theory and voting theory can be applied to such games.
We analyze in particular a bandit apprentice game in which the humans act first to demonstrate their individual preferences for the arms.
We propose a social choice method that uses shared control of a system to combine preference inference with social welfare optimization.
- Score: 11.85513759444069
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Assistance games (also known as cooperative inverse reinforcement learning
games) have been proposed as a model for beneficial AI, wherein a robotic agent
must act on behalf of a human principal but is initially uncertain about the
humans payoff function. This paper studies multi-principal assistance games,
which cover the more general case in which the robot acts on behalf of N humans
who may have widely differing payoffs. Impossibility theorems in social choice
theory and voting theory can be applied to such games, suggesting that
strategic behavior by the human principals may complicate the robots task in
learning their payoffs. We analyze in particular a bandit apprentice game in
which the humans act first to demonstrate their individual preferences for the
arms and then the robot acts to maximize the sum of human payoffs. We explore
the extent to which the cost of choosing suboptimal arms reduces the incentive
to mislead, a form of natural mechanism design. In this context we propose a
social choice method that uses shared control of a system to combine preference
inference with social welfare optimization.
Related papers
- Learning to Assist Humans without Inferring Rewards [65.28156318196397]
We build upon prior work that studies assistance through the lens of empowerment.
An assistive agent aims to maximize the influence of the human's actions.
We prove that these representations estimate a similar notion of empowerment to that studied by prior work.
arXiv Detail & Related papers (2024-11-04T21:31:04Z) - HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation [50.616995671367704]
We present a high-dimensional, simulated robot learning benchmark, HumanoidBench, featuring a humanoid robot equipped with dexterous hands.
Our findings reveal that state-of-the-art reinforcement learning algorithms struggle with most tasks, whereas a hierarchical learning approach achieves superior performance when supported by robust low-level policies.
arXiv Detail & Related papers (2024-03-15T17:45:44Z) - RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning [61.10744686260994]
We introduce RoboPianist, a system that enables simulated anthropomorphic hands to learn an extensive repertoire of 150 piano pieces.
We additionally introduce an open-sourced environment, benchmark of tasks, interpretable evaluation metrics, and open challenges for future study.
arXiv Detail & Related papers (2023-04-09T03:53:05Z) - Learning Preferences for Interactive Autonomy [1.90365714903665]
This thesis is an attempt towards learning reward functions from human users by using other, more reliable data modalities.
We first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function.
arXiv Detail & Related papers (2022-10-19T21:34:51Z) - Two ways to make your robot proactive: reasoning about human intentions,
or reasoning about possible futures [69.03494351066846]
We investigate two ways to make robots proactive.
One way is to recognize humans' intentions and to act to fulfill them, like opening the door that you are about to cross.
The other way is to reason about possible future threats or opportunities and to act to prevent or to foster them.
arXiv Detail & Related papers (2022-05-11T13:33:14Z) - REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy
Transfer [57.045140028275036]
We consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology.
Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots.
We propose a novel method named $REvolveR$ of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator.
arXiv Detail & Related papers (2022-02-10T18:50:25Z) - Doing Right by Not Doing Wrong in Human-Robot Collaboration [8.078753289996417]
We propose a novel approach to learning fair and sociable behavior, not by reproducing positive behavior, but rather by avoiding negative behavior.
In this study, we highlight the importance of incorporating sociability in robot manipulation, as well as the need to consider fairness in human-robot interactions.
arXiv Detail & Related papers (2022-02-05T23:05:10Z) - Human-centered mechanism design with Democratic AI [9.832311262933285]
We develop a human-in-the-loop research pipeline called Democratic AI.
reinforcement learning is used to design a social mechanism that humans prefer by majority.
By optimizing for human preferences, Democratic AI may be a promising method for value-aligned policy innovation.
arXiv Detail & Related papers (2022-01-27T10:56:33Z) - Multi-Principal Assistance Games: Definition and Collegial Mechanisms [16.491889275389457]
We introduce the concept of a multi-principal assistance game (MPAG)
In an MPAG, a single agent assists N human principals who may have widely different preferences.
We analyze in particular a generalization of apprenticeship learning in which the humans first perform some work to obtain utility and demonstrate their preferences.
arXiv Detail & Related papers (2020-12-29T00:06:47Z) - Human Grasp Classification for Reactive Human-to-Robot Handovers [50.91803283297065]
We propose an approach for human-to-robot handovers in which the robot meets the human halfway.
We collect a human grasp dataset which covers typical ways of holding objects with various hand shapes and poses.
We present a planning and execution approach that takes the object from the human hand according to the detected grasp and hand position.
arXiv Detail & Related papers (2020-03-12T19:58:03Z) - When Humans Aren't Optimal: Robots that Collaborate with Risk-Aware
Humans [16.21572727245082]
In order to collaborate safely and efficiently, robots need to anticipate how their human partners will behave.
In this paper, we adopt a well-known Risk-Aware human model from behavioral economics called Cumulative Prospect Theory.
We find that this increased modeling accuracy results in safer and more efficient human-robot collaboration.
arXiv Detail & Related papers (2020-01-13T16:27:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.