The Effect of Modeling Human Rationality Level on Learning Rewards from
Multiple Feedback Types
- URL: http://arxiv.org/abs/2208.10687v1
- Date: Tue, 23 Aug 2022 02:19:10 GMT
- Title: The Effect of Modeling Human Rationality Level on Learning Rewards from
Multiple Feedback Types
- Authors: Gaurav R. Ghosal, Matthew Zurek, Daniel S. Brown, Anca D. Dragan
- Abstract summary: We argue that grounding the rationality coefficient in real data for each feedback type has a significant positive effect on reward learning.
We find that when learning from a single feedback type, overestimating human rationality can have dire effects on reward accuracy and regret.
- Score: 38.37216644899506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When inferring reward functions from human behavior (be it demonstrations,
comparisons, physical corrections, or e-stops), it has proven useful to model
the human as making noisy-rational choices, with a "rationality coefficient"
capturing how much noise or entropy we expect to see in the human behavior.
Many existing works have opted to fix this coefficient regardless of the type,
or quality, of human feedback. However, in some settings, giving a
demonstration may be much more difficult than answering a comparison query. In
this case, we should expect to see more noise or suboptimality in
demonstrations than in comparisons, and should interpret the feedback
accordingly. In this work, we advocate that grounding the rationality
coefficient in real data for each feedback type, rather than assuming a default
value, has a significant positive effect on reward learning. We test this in
experiments with both simulated feedback, as well a user study. We find that
when learning from a single feedback type, overestimating human rationality can
have dire effects on reward accuracy and regret. Further, we find that the
rationality level affects the informativeness of each feedback type:
surprisingly, demonstrations are not always the most informative -- when the
human acts very suboptimally, comparisons actually become more informative,
even when the rationality level is the same for both. Moreover, when the robot
gets to decide which feedback type to ask for, it gets a large advantage from
accurately modeling the rationality level of each type. Ultimately, our results
emphasize the importance of paying attention to the assumed rationality level,
not only when learning from a single feedback type, but especially when agents
actively learn from multiple feedback types.
Related papers
- Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation [67.88747330066049]
Fine-grained feedback captures nuanced distinctions in image quality and prompt-alignment.
We show that demonstrating its superiority to coarse-grained feedback is not automatic.
We identify key challenges in eliciting and utilizing fine-grained feedback.
arXiv Detail & Related papers (2024-06-24T17:19:34Z) - What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User Perception [53.4840989321394]
We analyze the effect of rationales generated by QA models to support their answers.
We present users with incorrect answers and corresponding rationales in various formats.
We measure the effectiveness of this feedback in patching these rationales through in-context learning.
arXiv Detail & Related papers (2023-11-16T04:26:32Z) - Towards Understanding Sycophancy in Language Models [49.99654432561934]
We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback.
We show that five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks.
Our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses.
arXiv Detail & Related papers (2023-10-20T14:46:48Z) - Human Feedback is not Gold Standard [28.63384327791185]
We critically analyse the use of human feedback for both training and evaluation.
We find that while preference scores have fairly good coverage, they under-represent important aspects like factuality.
arXiv Detail & Related papers (2023-09-28T11:18:20Z) - Fine-Grained Human Feedback Gives Better Rewards for Language Model
Training [108.25635150124539]
Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs.
We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects.
arXiv Detail & Related papers (2023-06-02T17:11:37Z) - Human irrationality: both bad and good for reward inference [3.706222947143855]
This work aims to better understand the effect irrationalities can have on reward inference.
We operationalize irrationality in the language of MDPs, by altering the Bellman optimality equation.
We show that an irrational human, when correctly modelled, can communicate more information about the reward than a perfectly rational human can.
arXiv Detail & Related papers (2021-11-12T21:44:15Z) - Utilizing Self-supervised Representations for MOS Prediction [51.09985767946843]
Existing evaluations usually require clean references or parallel ground truth data.
Subjective tests, on the other hand, do not need any additional clean or parallel data and correlates better to human perception.
We develop an automatic evaluation approach that correlates well with human perception while not requiring ground truth data.
arXiv Detail & Related papers (2021-04-07T09:44:36Z) - Reward-rational (implicit) choice: A unifying formalism for reward
learning [35.57436895497646]
Researchers have aimed to learn reward functions from human behavior or feedback.
The types of behavior interpreted as evidence of the reward function have expanded greatly in recent years.
How will a robot make sense of all these diverse types of behavior?
arXiv Detail & Related papers (2020-02-12T08:07:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.