Human Control: Definitions and Algorithms
- URL: http://arxiv.org/abs/2305.19861v1
- Date: Wed, 31 May 2023 13:53:02 GMT
- Title: Human Control: Definitions and Algorithms
- Authors: Ryan Carey and Tom Everitt
- Abstract summary: We show that shutdown instructability implies appropriate shutdown behavior, retention of human autonomy, and avoidance of user harm.
We also analyse the related concepts of non-obstruction and shutdown alignment, three previously proposed algorithms for human control, and one new algorithm.
- Score: 11.536162323162099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How can humans stay in control of advanced artificial intelligence systems?
One proposal is corrigibility, which requires the agent to follow the
instructions of a human overseer, without inappropriately influencing them. In
this paper, we formally define a variant of corrigibility called shutdown
instructability, and show that it implies appropriate shutdown behavior,
retention of human autonomy, and avoidance of user harm. We also analyse the
related concepts of non-obstruction and shutdown alignment, three previously
proposed algorithms for human control, and one new algorithm.
Related papers
- Combining AI Control Systems and Human Decision Support via Robustness and Criticality [53.10194953873209]
We extend a methodology for adversarial explanations (AE) to state-of-the-art reinforcement learning frameworks.
We show that the learned AI control system demonstrates robustness against adversarial tampering.
In a training / learning framework, this technology can improve both the AI's decisions and explanations through human interaction.
arXiv Detail & Related papers (2024-07-03T15:38:57Z) - Persuasion, Delegation, and Private Information in Algorithm-Assisted
Decisions [0.0]
A principal designs an algorithm that generates a publicly observable prediction of a binary state.
She must decide whether to act directly based on the prediction or to delegate the decision to an agent with private information but potential misalignment.
We study the optimal design of the prediction algorithm and the delegation rule in such environments.
arXiv Detail & Related papers (2024-02-14T18:32:30Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - Conformal Decision Theory: Safe Autonomous Decisions from Imperfect Predictions [80.34972679938483]
We introduce Conformal Decision Theory, a framework for producing safe autonomous decisions despite imperfect machine learning predictions.
Decisions produced by our algorithms are safe in the sense that they come with provable statistical guarantees of having low risk.
Experiments demonstrate the utility of our approach in robot motion planning around humans, automated stock trading, and robot manufacturing.
arXiv Detail & Related papers (2023-10-09T17:59:30Z) - Characterizing Manipulation from AI Systems [7.344068411174193]
We build upon prior literature on manipulation from other fields and characterize the space of possible notions of manipulation.
We propose a definition of manipulation based on our characterization.
Third, we discuss the connections between manipulation and related concepts, such as deception and coercion.
arXiv Detail & Related papers (2023-03-16T15:19:21Z) - "No, to the Right" -- Online Language Corrections for Robotic
Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution.
Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot.
We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z) - When to Ask for Help: Proactive Interventions in Autonomous
Reinforcement Learning [57.53138994155612]
A long-term goal of reinforcement learning is to design agents that can autonomously interact and learn in the world.
A critical challenge is the presence of irreversible states which require external assistance to recover from, such as when a robot arm has pushed an object off of a table.
We propose an algorithm that efficiently learns to detect and avoid states that are irreversible, and proactively asks for help in case the agent does enter them.
arXiv Detail & Related papers (2022-10-19T17:57:24Z) - Meaningful human control over AI systems: beyond talking the talk [8.351027101823705]
We identify four properties which AI-based systems must have to be under meaningful human control.
First, a system in which humans and AI algorithms interact should have an explicitly defined domain of morally loaded situations.
Second, humans and AI agents within the system should have appropriate and mutually compatible representations.
Third, responsibility attributed to a human should be commensurate with that human's ability and authority to control the system.
arXiv Detail & Related papers (2021-11-25T11:05:37Z) - The Flaws of Policies Requiring Human Oversight of Government Algorithms [2.741266294612776]
I propose a shift from human oversight to institutional oversight as the central mechanism for regulating government algorithms.
First, agencies must justify that it is appropriate to incorporate an algorithm into decision-making.
Second, these justifications must receive democratic public review and approval before the agency can adopt the algorithm.
arXiv Detail & Related papers (2021-09-10T18:58:45Z) - A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous
Algorithmic Scores [85.12096045419686]
We study the adoption of an algorithmic tool used to assist child maltreatment hotline screening decisions.
We first show that humans do alter their behavior when the tool is deployed.
We show that humans are less likely to adhere to the machine's recommendation when the score displayed is an incorrect estimate of risk.
arXiv Detail & Related papers (2020-02-19T07:27:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.