Observation Interference in Partially Observable Assistance Games
- URL: http://arxiv.org/abs/2412.17797v1
- Date: Mon, 23 Dec 2024 18:53:33 GMT
- Title: Observation Interference in Partially Observable Assistance Games
- Authors: Scott Emmons, Caspar Oesterheld, Vincent Conitzer, Stuart Russell,
- Abstract summary: We study a model of the human-AI value alignment problem which allows the human and the AI assistant to have partial observations.
We show that sometimes an optimal assistant must take observation-interfering actions, even when the human is playing optimally.
We show that if the human acts according to the Boltzmann model of irrationality, this can create an incentive for the assistant to interfere with observations.
- Score: 34.53170543153206
- License:
- Abstract: We study partially observable assistance games (POAGs), a model of the human-AI value alignment problem which allows the human and the AI assistant to have partial observations. Motivated by concerns of AI deception, we study a qualitatively new phenomenon made possible by partial observability: would an AI assistant ever have an incentive to interfere with the human's observations? First, we prove that sometimes an optimal assistant must take observation-interfering actions, even when the human is playing optimally, and even when there are otherwise-equivalent actions available that do not interfere with observations. Though this result seems to contradict the classic theorem from single-agent decision making that the value of perfect information is nonnegative, we resolve this seeming contradiction by developing a notion of interference defined on entire policies. This can be viewed as an extension of the classic result that the value of perfect information is nonnegative into the cooperative multiagent setting. Second, we prove that if the human is simply making decisions based on their immediate outcomes, the assistant might need to interfere with observations as a way to query the human's preferences. We show that this incentive for interference goes away if the human is playing optimally, or if we introduce a communication channel for the human to communicate their preferences to the assistant. Third, we show that if the human acts according to the Boltzmann model of irrationality, this can create an incentive for the assistant to interfere with observations. Finally, we use an experimental model to analyze tradeoffs faced by the AI assistant in practice when considering whether or not to take observation-interfering actions.
Related papers
- Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback [16.540715313676994]
We show that when human feedback is based only on partial observations, it can result in deceptive inflation and overjustification.
We show that sometimes, the human's feedback determines the return function uniquely up to an additive constant, but in other realistic cases, there is irreducible ambiguity.
arXiv Detail & Related papers (2024-02-27T18:32:11Z) - The Duet of Representations and How Explanations Exacerbate It [0.0]
An algorithm effects a causal representation of relations between features and labels in the human's perception.
Explanations can direct the human's attention to the conflicting feature and away from other relevant features.
This leads to causal overattribution and may adversely affect the human's information processing.
arXiv Detail & Related papers (2024-02-13T11:18:27Z) - Towards Understanding Sycophancy in Language Models [49.99654432561934]
We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback.
We show that five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks.
Our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses.
arXiv Detail & Related papers (2023-10-20T14:46:48Z) - HODN: Disentangling Human-Object Feature for HOI Detection [51.48164941412871]
We propose a Human and Object Disentangling Network (HODN) to model the Human-Object Interaction (HOI) relationships explicitly.
Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions.
Our proposed method achieves competitive performance on both the V-COCO and the HICO-Det Linking datasets.
arXiv Detail & Related papers (2023-08-20T04:12:50Z) - When to Ask for Help: Proactive Interventions in Autonomous
Reinforcement Learning [57.53138994155612]
A long-term goal of reinforcement learning is to design agents that can autonomously interact and learn in the world.
A critical challenge is the presence of irreversible states which require external assistance to recover from, such as when a robot arm has pushed an object off of a table.
We propose an algorithm that efficiently learns to detect and avoid states that are irreversible, and proactively asks for help in case the agent does enter them.
arXiv Detail & Related papers (2022-10-19T17:57:24Z) - Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs
for Centaurs [22.52332536886295]
We present a novel formulation of the interaction between the human and the AI as a sequential game.
We show that in this case the AI's problem of helping bounded-rational humans make better decisions reduces to a Bayes-adaptive POMDP.
We discuss ways in which the machine can learn to improve upon its own limitations as well with the help of the human.
arXiv Detail & Related papers (2022-04-03T21:00:51Z) - Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards
Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent.
Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally.
We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z) - Individual vs. Joint Perception: a Pragmatic Model of Pointing as
Communicative Smithian Helping [16.671443846399836]
The simple gesture of pointing can greatly augment ones ability to comprehend states of the world based on observations.
We model an agents update to its belief of the world based on individual observations using a partially observable Markov decision process (POMDP)
On top of that, we model pointing as a communicative act between agents who have a mutual understanding that the pointed observation must be relevant and interpretable.
arXiv Detail & Related papers (2021-06-03T17:21:23Z) - Understanding the Effect of Out-of-distribution Examples and Interactive
Explanations on Human-AI Decision Making [19.157591744997355]
We argue that the typical experimental setup limits the potential of human-AI teams.
We develop novel interfaces to support interactive explanations so that humans can actively engage with AI assistance.
arXiv Detail & Related papers (2021-01-13T19:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.