LESS is More: Rethinking Probabilistic Models of Human Behavior
- URL: http://arxiv.org/abs/2001.04465v1
- Date: Mon, 13 Jan 2020 18:59:01 GMT
- Title: LESS is More: Rethinking Probabilistic Models of Human Behavior
- Authors: Andreea Bobu, Dexter R.R. Scobee, Jaime F. Fisac, S. Shankar Sastry,
Anca D. Dragan
- Abstract summary: Boltzmann noisily-rational decision model assumes people approximately optimize a reward function.
Human trajectories lie in a continuous space, with continuous-valued features that influence the reward function.
We introduce a model that explicitly accounts for distances between trajectories, rather than only their rewards.
- Score: 36.020541093946925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robots need models of human behavior for both inferring human goals and
preferences, and predicting what people will do. A common model is the
Boltzmann noisily-rational decision model, which assumes people approximately
optimize a reward function and choose trajectories in proportion to their
exponentiated reward. While this model has been successful in a variety of
robotics domains, its roots lie in econometrics, and in modeling decisions
among different discrete options, each with its own utility or reward. In
contrast, human trajectories lie in a continuous space, with continuous-valued
features that influence the reward function. We propose that it is time to
rethink the Boltzmann model, and design it from the ground up to operate over
such trajectory spaces. We introduce a model that explicitly accounts for
distances between trajectories, rather than only their rewards. Rather than
each trajectory affecting the decision independently, similar trajectories now
affect the decision together. We start by showing that our model better
explains human behavior in a user study. We then analyze the implications this
has for robot inference, first in toy environments where we have ground truth
and find more accurate inference, and finally for a 7DOF robot arm learning
from user demonstrations.
Related papers
- Humanoid Locomotion as Next Token Prediction [84.21335675130021]
Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories.
We show that our model enables a full-sized humanoid to walk in San Francisco zero-shot.
Our model can transfer to the real world even when trained on only 27 hours of walking data, and can generalize commands not seen during training like walking backward.
arXiv Detail & Related papers (2024-02-29T18:57:37Z) - Learning Latent Representations to Co-Adapt to Humans [12.71953776723672]
Non-stationary humans are challenging for robot learners.
In this paper we introduce an algorithmic formalism that enables robots to co-adapt alongside dynamic humans.
arXiv Detail & Related papers (2022-12-19T16:19:24Z) - On the Sensitivity of Reward Inference to Misspecified Human Models [27.94055657571769]
Inferring reward functions from human behavior is at the center of value alignment - aligning AI objectives with what we, humans, actually want.
This begs the question: how accurate do these models need to be in order for the reward inference to be accurate?
We show that it is unfortunately possible to construct small adversarial biases in behavior that lead to arbitrarily large errors in the inferred reward.
arXiv Detail & Related papers (2022-12-09T08:16:20Z) - Misspecification in Inverse Reinforcement Learning [80.91536434292328]
The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $pi$.
One of the primary motivations behind IRL is to infer human preferences from human behaviour.
This means that they are misspecified, which raises the worry that they might lead to unsound inferences if applied to real-world data.
arXiv Detail & Related papers (2022-12-06T18:21:47Z) - Learning Preferences for Interactive Autonomy [1.90365714903665]
This thesis is an attempt towards learning reward functions from human users by using other, more reliable data modalities.
We first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function.
arXiv Detail & Related papers (2022-10-19T21:34:51Z) - Humans are not Boltzmann Distributions: Challenges and Opportunities for
Modelling Human Feedback and Interaction in Reinforcement Learning [13.64577704565643]
We argue that these models are too simplistic and that RL researchers need to develop more realistic human models to design and evaluate their algorithms.
This paper calls for research from different disciplines to address key questions about how humans provide feedback to AIs and how we can build more robust human-in-the-loop RL systems.
arXiv Detail & Related papers (2022-06-27T13:58:51Z) - Probabilistic Human Motion Prediction via A Bayesian Neural Network [71.16277790708529]
We propose a probabilistic model for human motion prediction in this paper.
Our model could generate several future motions when given an observed motion sequence.
We extensively validate our approach on a large scale benchmark dataset Human3.6m.
arXiv Detail & Related papers (2021-07-14T09:05:33Z) - Dynamically Switching Human Prediction Models for Efficient Planning [32.180808286226075]
We give the robot access to a suite of human models and enable it to assess the performance-computation trade-off online.
Our experiments in a driving simulator showcase how the robot can achieve performance comparable to always using the best human model.
arXiv Detail & Related papers (2021-03-13T23:48:09Z) - Model-Based Visual Planning with Self-Supervised Functional Distances [104.83979811803466]
We present a self-supervised method for model-based visual goal reaching.
Our approach learns entirely using offline, unlabeled data.
We find that this approach substantially outperforms both model-free and model-based prior methods.
arXiv Detail & Related papers (2020-12-30T23:59:09Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.