On Avoiding Power-Seeking by Artificial Intelligence
- URL: http://arxiv.org/abs/2206.11831v1
- Date: Thu, 23 Jun 2022 16:56:21 GMT
- Title: On Avoiding Power-Seeking by Artificial Intelligence
- Authors: Alexander Matt Turner
- Abstract summary: We do not know how to align a very intelligent AI agent's behavior with human interests.
I investigate whether we can build smart AI agents which have limited impact on the world, and which do not autonomously seek power.
- Score: 93.9264437334683
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We do not know how to align a very intelligent AI agent's behavior with human
interests. I investigate whether -- absent a full solution to this AI alignment
problem -- we can build smart AI agents which have limited impact on the world,
and which do not autonomously seek power. In this thesis, I introduce the
attainable utility preservation (AUP) method. I demonstrate that AUP produces
conservative, option-preserving behavior within toy gridworlds and within
complex environments based off of Conway's Game of Life. I formalize the
problem of side effect avoidance, which provides a way to quantify the side
effects an agent had on the world. I also give a formal definition of
power-seeking in the context of AI agents and show that optimal policies tend
to seek power. In particular, most reward functions have optimal policies which
avoid deactivation. This is a problem if we want to deactivate or correct an
intelligent agent after we have deployed it. My theorems suggest that since
most agent goals conflict with ours, the agent would very probably resist
correction. I extend these theorems to show that power-seeking incentives occur
not just for optimal decision-makers, but under a wide range of decision-making
procedures.
Related papers
- Raising the Stakes: Performance Pressure Improves AI-Assisted Decision Making [57.53469908423318]
We show the effects of performance pressure on AI advice reliance when laypeople complete a common AI-assisted task.
We find that when the stakes are high, people use AI advice more appropriately than when stakes are lower, regardless of the presence of an AI explanation.
arXiv Detail & Related papers (2024-10-21T22:39:52Z) - Aligning AI Agents via Information-Directed Sampling [20.617552198581024]
bandit alignment problem involves maximizing long-run expected reward by interacting with an environment and a human.
We study these trade-offs theoretically and empirically in a toy bandit alignment problem which resembles the beta-Bernoulli bandit.
We demonstrate while naive exploration algorithms which reflect current practices and even touted algorithms such as Thompson sampling both fail to provide acceptable solutions to this problem.
arXiv Detail & Related papers (2024-10-18T18:23:41Z) - FlyAI -- The Next Level of Artificial Intelligence is Unpredictable! Injecting Responses of a Living Fly into Decision Making [6.694375709641935]
We introduce a new type of bionic AI that enhances decision-making unpredictability by incorporating responses from a living fly.
Our approach uses a fly's varied reactions, to tune an AI agent in the game of Gobang.
arXiv Detail & Related papers (2024-09-30T17:19:59Z) - Who Wrote this? How Smart Replies Impact Language and Agency in the
Workplace [0.0]
This study uses smart replies (SRs) to show how AI influences humans without any intent on the part of the developer.
I propose a loss of agency theory as a viable approach for studying the impact of AI on human agency.
arXiv Detail & Related papers (2022-10-07T20:06:25Z) - Parametrically Retargetable Decision-Makers Tend To Seek Power [91.93765604105025]
In fully observable environments, most reward functions have an optimal policy which seeks power by keeping options open and staying alive.
We consider a range of models of AI decision-making, from optimal, to random, to choices informed by learning and interacting with an environment.
We show that a range of qualitatively dissimilar decision-making procedures incentivize agents to seek power.
arXiv Detail & Related papers (2022-06-27T17:39:23Z) - Formalizing the Problem of Side Effect Regularization [81.97441214404247]
We propose a formal criterion for side effect regularization via the assistance game framework.
In these games, the agent solves a partially observable Markov decision process.
We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks.
arXiv Detail & Related papers (2022-06-23T16:36:13Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Intelligence and Unambitiousness Using Algorithmic Information Theory [22.710015392064083]
We show that an agent learns to accrue reward at least as well as a human mentor, while relying on that mentor with diminishing probability.
We show that eventually, the agent's world-model incorporates the following true fact: intervening in the "outside world" will have no effect on reward acquisition.
arXiv Detail & Related papers (2021-05-13T13:10:28Z) - Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork [54.309495231017344]
We argue that AI systems should be trained in a human-centered manner, directly optimized for team performance.
We study this proposal for a specific type of human-AI teaming, where the human overseer chooses to either accept the AI recommendation or solve the task themselves.
Our experiments with linear and non-linear models on real-world, high-stakes datasets show that the most accuracy AI may not lead to highest team performance.
arXiv Detail & Related papers (2020-04-27T19:06:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.