Parametrically Retargetable Decision-Makers Tend To Seek Power
- URL: http://arxiv.org/abs/2206.13477v1
- Date: Mon, 27 Jun 2022 17:39:23 GMT
- Title: Parametrically Retargetable Decision-Makers Tend To Seek Power
- Authors: Alexander Matt Turner, Prasad Tadepalli
- Abstract summary: In fully observable environments, most reward functions have an optimal policy which seeks power by keeping options open and staying alive.
We consider a range of models of AI decision-making, from optimal, to random, to choices informed by learning and interacting with an environment.
We show that a range of qualitatively dissimilar decision-making procedures incentivize agents to seek power.
- Score: 91.93765604105025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: If capable AI agents are generally incentivized to seek power in service of
the objectives we specify for them, then these systems will pose enormous
risks, in addition to enormous benefits. In fully observable environments, most
reward functions have an optimal policy which seeks power by keeping options
open and staying alive. However, the real world is neither fully observable,
nor will agents be perfectly optimal. We consider a range of models of AI
decision-making, from optimal, to random, to choices informed by learning and
interacting with an environment. We discover that many decision-making
functions are retargetable, and that retargetability is sufficient to cause
power-seeking tendencies. Our functional criterion is simple and broad. We show
that a range of qualitatively dissimilar decision-making procedures incentivize
agents to seek power. We demonstrate the flexibility of our results by
reasoning about learned policy incentives in Montezuma's Revenge. These results
suggest a safety risk: Eventually, highly retargetable training procedures may
train real-world agents which seek power over humans.
Related papers
- Non-maximizing policies that fulfill multi-criterion aspirations in expectation [0.7874708385247353]
In dynamic programming and reinforcement learning, the policy for the sequential decision making of an agent is usually determined by expressing the goal as a scalar reward function.
We consider finite acyclic Decision Markov Processes with multiple distinct evaluation metrics, which do not necessarily represent quantities that the user wants to be maximized.
Our algorithm guarantees that this task is fulfilled by using simplices to approximate feasibility sets and propagate aspirations forward while ensuring they remain feasible.
arXiv Detail & Related papers (2024-08-08T11:41:04Z) - Power-seeking can be probable and predictive for trained agents [3.616948583169635]
Power-seeking behavior is a key source of risk from advanced AI.
We investigate how the training process affects power-seeking incentives.
We show that power-seeking incentives can be probable and predictive.
arXiv Detail & Related papers (2023-04-13T13:29:01Z) - On Avoiding Power-Seeking by Artificial Intelligence [93.9264437334683]
We do not know how to align a very intelligent AI agent's behavior with human interests.
I investigate whether we can build smart AI agents which have limited impact on the world, and which do not autonomously seek power.
arXiv Detail & Related papers (2022-06-23T16:56:21Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Who Leads and Who Follows in Strategic Classification? [82.44386576129295]
We argue that the order of play in strategic classification is fundamentally determined by the relative frequencies at which the decision-maker and the agents adapt to each other's actions.
We show that a decision-maker with the freedom to choose their update frequency can induce learning dynamics that converge to Stackelberg equilibria with either order of play.
arXiv Detail & Related papers (2021-06-23T16:48:46Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z) - Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal
Agent [21.548271801592907]
Reinforcement learners are agents that learn to pick actions that lead to high reward.
We show that if an agent is guaranteed to be "asymptotically optimal" in any environment, then subject to an assumption about the true environment, this agent will be either "destroyed" or "incapacitated"
We present an agent, Mentee, with the modest guarantee of approaching the performance of a mentor, doing safe exploration instead of reckless exploration.
arXiv Detail & Related papers (2020-06-05T10:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.