Human irrationality: both bad and good for reward inference
- URL: http://arxiv.org/abs/2111.06956v1
- Date: Fri, 12 Nov 2021 21:44:15 GMT
- Title: Human irrationality: both bad and good for reward inference
- Authors: Lawrence Chan, Andrew Critch, Anca Dragan
- Abstract summary: This work aims to better understand the effect irrationalities can have on reward inference.
We operationalize irrationality in the language of MDPs, by altering the Bellman optimality equation.
We show that an irrational human, when correctly modelled, can communicate more information about the reward than a perfectly rational human can.
- Score: 3.706222947143855
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Assuming humans are (approximately) rational enables robots to infer reward
functions by observing human behavior. But people exhibit a wide array of
irrationalities, and our goal with this work is to better understand the effect
they can have on reward inference. The challenge with studying this effect is
that there are many types of irrationality, with varying degrees of
mathematical formalization. We thus operationalize irrationality in the
language of MDPs, by altering the Bellman optimality equation, and use this
framework to study how these alterations would affect inference.
We find that wrongly modeling a systematically irrational human as
noisy-rational performs a lot worse than correctly capturing these biases -- so
much so that it can be better to skip inference altogether and stick to the
prior! More importantly, we show that an irrational human, when correctly
modelled, can communicate more information about the reward than a perfectly
rational human can. That is, if a robot has the correct model of a human's
irrationality, it can make an even stronger inference than it ever could if the
human were rational. Irrationality fundamentally helps rather than hinder
reward inference, but it needs to be correctly accounted for.
Related papers
- Can Language Models Learn to Skip Steps? [59.84848399905409]
We study the ability to skip steps in reasoning.
Unlike humans, who may skip steps to enhance efficiency or to reduce cognitive load, models do not possess such motivations.
Our work presents the first exploration into human-like step-skipping ability.
arXiv Detail & Related papers (2024-11-04T07:10:24Z) - Infinite Ends from Finite Samples: Open-Ended Goal Inference as Top-Down Bayesian Filtering of Bottom-Up Proposals [48.437581268398866]
We introduce a sequential Monte Carlo model of open-ended goal inference.
We validate this model in a goal inference task called Block Words.
Our experiments highlight the importance of uniting top-down and bottom-up models for explaining the speed, accuracy, and generality of human theory-of-mind.
arXiv Detail & Related papers (2024-07-23T18:04:40Z) - How Ambiguous are the Rationales for Natural Language Reasoning? A Simple Approach to Handling Rationale Uncertainty [0.0]
Rationales behind answers not only explain model decisions but boost language models to reason well on complex reasoning tasks.
It is non-trivial to estimate the degree to which the rationales are faithful enough to encourage model performance.
We propose how to deal with imperfect rationales causing aleatoric uncertainty.
arXiv Detail & Related papers (2024-02-22T07:12:34Z) - Are Machine Rationales (Not) Useful to Humans? Measuring and Improving
Human Utility of Free-Text Rationales [62.02328001381361]
We show that human utility of existing rationales is far from satisfactory, and expensive to estimate with human studies.
We translate this finding into an automated score, GEN-U, that can help improve LMs' ability to generate rationales with better human utility.
arXiv Detail & Related papers (2023-05-11T19:01:13Z) - On the Sensitivity of Reward Inference to Misspecified Human Models [27.94055657571769]
Inferring reward functions from human behavior is at the center of value alignment - aligning AI objectives with what we, humans, actually want.
This begs the question: how accurate do these models need to be in order for the reward inference to be accurate?
We show that it is unfortunately possible to construct small adversarial biases in behavior that lead to arbitrarily large errors in the inferred reward.
arXiv Detail & Related papers (2022-12-09T08:16:20Z) - Misspecification in Inverse Reinforcement Learning [80.91536434292328]
The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $pi$.
One of the primary motivations behind IRL is to infer human preferences from human behaviour.
This means that they are misspecified, which raises the worry that they might lead to unsound inferences if applied to real-world data.
arXiv Detail & Related papers (2022-12-06T18:21:47Z) - The Effect of Modeling Human Rationality Level on Learning Rewards from
Multiple Feedback Types [38.37216644899506]
We argue that grounding the rationality coefficient in real data for each feedback type has a significant positive effect on reward learning.
We find that when learning from a single feedback type, overestimating human rationality can have dire effects on reward accuracy and regret.
arXiv Detail & Related papers (2022-08-23T02:19:10Z) - Inductive Biases for Deep Learning of Higher-Level Cognition [108.89281493851358]
A fascinating hypothesis is that human and animal intelligence could be explained by a few principles.
This work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing.
The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans' abilities.
arXiv Detail & Related papers (2020-11-30T18:29:25Z) - Machine Common Sense [77.34726150561087]
Machine common sense remains a broad, potentially unbounded problem in artificial intelligence (AI)
This article deals with the aspects of modeling commonsense reasoning focusing on such domain as interpersonal interactions.
arXiv Detail & Related papers (2020-06-15T13:59:47Z) - Implications of Human Irrationality for Reinforcement Learning [26.76732313120685]
We argue that human decision making may be a better source of ideas for constraining how machine learning problems are defined than would otherwise be the case.
One promising idea concerns human decision making that is dependent on apparently irrelevant aspects of the choice context.
We propose a novel POMDP model for contextual choice tasks and show that, despite the apparent irrationalities, a reinforcement learner can take advantage of the way that humans make decisions.
arXiv Detail & Related papers (2020-06-07T07:44:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.