Reward-rational (implicit) choice: A unifying formalism for reward
learning
- URL: http://arxiv.org/abs/2002.04833v4
- Date: Fri, 11 Dec 2020 17:56:03 GMT
- Title: Reward-rational (implicit) choice: A unifying formalism for reward
learning
- Authors: Hong Jun Jeon, Smitha Milli, Anca D. Dragan
- Abstract summary: Researchers have aimed to learn reward functions from human behavior or feedback.
The types of behavior interpreted as evidence of the reward function have expanded greatly in recent years.
How will a robot make sense of all these diverse types of behavior?
- Score: 35.57436895497646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is often difficult to hand-specify what the correct reward function is for
a task, so researchers have instead aimed to learn reward functions from human
behavior or feedback. The types of behavior interpreted as evidence of the
reward function have expanded greatly in recent years. We've gone from
demonstrations, to comparisons, to reading into the information leaked when the
human is pushing the robot away or turning it off. And surely, there is more to
come. How will a robot make sense of all these diverse types of behavior? Our
key insight is that different types of behavior can be interpreted in a single
unifying formalism - as a reward-rational choice that the human is making,
often implicitly. The formalism offers both a unifying lens with which to view
past work, as well as a recipe for interpreting new sources of information that
are yet to be uncovered. We provide two examples to showcase this: interpreting
a new feedback type, and reading into how the choice of feedback itself leaks
information about the reward.
Related papers
- Adaptive Language-Guided Abstraction from Contrastive Explanations [53.48583372522492]
It is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward.
End-to-end methods for joint feature and reward learning often yield brittle reward functions that are sensitive to spurious state features.
This paper describes a method named ALGAE which alternates between using language models to iteratively identify human-meaningful features.
arXiv Detail & Related papers (2024-09-12T16:51:58Z) - Fine-Grained Human Feedback Gives Better Rewards for Language Model
Training [108.25635150124539]
Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs.
We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects.
arXiv Detail & Related papers (2023-06-02T17:11:37Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Learning Preferences for Interactive Autonomy [1.90365714903665]
This thesis is an attempt towards learning reward functions from human users by using other, more reliable data modalities.
We first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function.
arXiv Detail & Related papers (2022-10-19T21:34:51Z) - The Effect of Modeling Human Rationality Level on Learning Rewards from
Multiple Feedback Types [38.37216644899506]
We argue that grounding the rationality coefficient in real data for each feedback type has a significant positive effect on reward learning.
We find that when learning from a single feedback type, overestimating human rationality can have dire effects on reward accuracy and regret.
arXiv Detail & Related papers (2022-08-23T02:19:10Z) - Choice Set Misspecification in Reward Inference [14.861109950708999]
A promising alternative to manually specifying reward functions is to enable robots to infer them from human feedback.
In this work, we introduce the idea that the choice set itself might be difficult to specify, and analyze choice set misspecification.
We propose a classification of different kinds of choice set misspecification, and show that these different classes lead to meaningful differences in the inferred reward.
arXiv Detail & Related papers (2021-01-19T15:35:30Z) - Semi-supervised reward learning for offline reinforcement learning [71.6909757718301]
Training agents usually requires reward functions, but rewards are seldom available in practice and their engineering is challenging and laborious.
We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data.
In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards.
arXiv Detail & Related papers (2020-12-12T20:06:15Z) - Understanding Learned Reward Functions [6.714172005695389]
We investigate techniques for interpreting learned reward functions.
In particular, we apply saliency methods to identify failure modes and predict the robustness of reward functions.
We find that learned reward functions often implement surprising algorithms that rely on contingent aspects of the environment.
arXiv Detail & Related papers (2020-12-10T18:19:48Z) - Feature Expansive Reward Learning: Rethinking Human Input [31.413656752926208]
We introduce a new type of human input in which the person guides the robot from states where the feature being taught is highly expressed to states where it is not.
We propose an algorithm for learning the feature from the raw state space and integrating it into the reward function.
arXiv Detail & Related papers (2020-06-23T17:59:34Z) - Intrinsic Motivation for Encouraging Synergistic Behavior [55.10275467562764]
We study the role of intrinsic motivation as an exploration bias for reinforcement learning in sparse-reward synergistic tasks.
Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own.
arXiv Detail & Related papers (2020-02-12T19:34:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.