The AI off-switch problem as a signalling game: bounded rationality and incomparability
- URL: http://arxiv.org/abs/2502.06403v3
- Date: Mon, 31 Mar 2025 08:18:33 GMT
- Title: The AI off-switch problem as a signalling game: bounded rationality and incomparability
- Authors: Alessio Benavoli, Alessandro Facchini, Marco Zaffalon,
- Abstract summary: We model the off-switch problem as a signalling game, where a human decision-maker communicates its preferences to an AI agent.<n>We show that a necessary condition for an AI system to refrain from disabling its off-switch is its uncertainty about the human's utility.<n>We also analyse how message costs influence optimal strategies and extend the analysis to scenarios involving incomparability.
- Score: 45.76759085727843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The off-switch problem is a critical challenge in AI control: if an AI system resists being switched off, it poses a significant risk. In this paper, we model the off-switch problem as a signalling game, where a human decision-maker communicates its preferences about some underlying decision problem to an AI agent, which then selects actions to maximise the human's utility. We assume that the human is a bounded rational agent and explore various bounded rationality mechanisms. Using real machine learning models, we reprove prior results and demonstrate that a necessary condition for an AI system to refrain from disabling its off-switch is its uncertainty about the human's utility. We also analyse how message costs influence optimal strategies and extend the analysis to scenarios involving incomparability.
Related papers
- Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem [1.3905735045377272]
The AI alignment problem, which focusses on ensuring that artificial intelligence (AI) systems act according to human values, presents profound challenges.<n>With the progression from narrow AI to Artificial General Intelligence (AGI) and Superintelligence, fears about control and existential risk have escalated.<n>Here, we investigate whether embracing inevitable AI misalignment can be a contingent strategy to foster a dynamic ecosystem of competing agents.
arXiv Detail & Related papers (2025-05-05T11:33:18Z) - Human Decision-making is Susceptible to AI-driven Manipulation [87.24007555151452]
AI systems may exploit users' cognitive biases and emotional vulnerabilities to steer them toward harmful outcomes.
This study examined human susceptibility to such manipulation in financial and emotional decision-making contexts.
arXiv Detail & Related papers (2025-02-11T15:56:22Z) - Raising the Stakes: Performance Pressure Improves AI-Assisted Decision Making [57.53469908423318]
We show the effects of performance pressure on AI advice reliance when laypeople complete a common AI-assisted task.
We find that when the stakes are high, people use AI advice more appropriately than when stakes are lower, regardless of the presence of an AI explanation.
arXiv Detail & Related papers (2024-10-21T22:39:52Z) - Combining AI Control Systems and Human Decision Support via Robustness and Criticality [53.10194953873209]
We extend a methodology for adversarial explanations (AE) to state-of-the-art reinforcement learning frameworks.
We show that the learned AI control system demonstrates robustness against adversarial tampering.
In a training / learning framework, this technology can improve both the AI's decisions and explanations through human interaction.
arXiv Detail & Related papers (2024-07-03T15:38:57Z) - Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs
for Centaurs [22.52332536886295]
We present a novel formulation of the interaction between the human and the AI as a sequential game.
We show that in this case the AI's problem of helping bounded-rational humans make better decisions reduces to a Bayes-adaptive POMDP.
We discuss ways in which the machine can learn to improve upon its own limitations as well with the help of the human.
arXiv Detail & Related papers (2022-04-03T21:00:51Z) - A User-Centred Framework for Explainable Artificial Intelligence in
Human-Robot Interaction [70.11080854486953]
We propose a user-centred framework for XAI that focuses on its social-interactive aspect.
The framework aims to provide a structure for interactive XAI solutions thought for non-expert users.
arXiv Detail & Related papers (2021-09-27T09:56:23Z) - The human-AI relationship in decision-making: AI explanation to support
people on justifying their decisions [4.169915659794568]
People need more awareness of how AI works and its outcomes to build a relationship with that system.
In decision-making scenarios, people need more awareness of how AI works and its outcomes to build a relationship with that system.
arXiv Detail & Related papers (2021-02-10T14:28:34Z) - Towards AI Forensics: Did the Artificial Intelligence System Do It? [2.5991265608180396]
We focus on AI that is potentially malicious by design'' and grey box analysis.
Our evaluation using convolutional neural networks illustrates challenges and ideas for identifying malicious AI.
arXiv Detail & Related papers (2020-05-27T20:28:19Z) - Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork [54.309495231017344]
We argue that AI systems should be trained in a human-centered manner, directly optimized for team performance.
We study this proposal for a specific type of human-AI teaming, where the human overseer chooses to either accept the AI recommendation or solve the task themselves.
Our experiments with linear and non-linear models on real-world, high-stakes datasets show that the most accuracy AI may not lead to highest team performance.
arXiv Detail & Related papers (2020-04-27T19:06:28Z) - A Model-Based, Decision-Theoretic Perspective on Automated Cyber
Response [0.0]
This paper describes an approach to automated cyber response that is designed along these lines.
We combine a simulation of the system to be defended with an anytime online planner to solve cyber defense problems characterized as partially observable Markov decision problems (POMDPs)
arXiv Detail & Related papers (2020-02-20T15:30:59Z) - A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous
Algorithmic Scores [85.12096045419686]
We study the adoption of an algorithmic tool used to assist child maltreatment hotline screening decisions.
We first show that humans do alter their behavior when the tool is deployed.
We show that humans are less likely to adhere to the machine's recommendation when the score displayed is an incorrect estimate of risk.
arXiv Detail & Related papers (2020-02-19T07:27:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.