Intelligence and Unambitiousness Using Algorithmic Information Theory
- URL: http://arxiv.org/abs/2105.06268v1
- Date: Thu, 13 May 2021 13:10:28 GMT
- Title: Intelligence and Unambitiousness Using Algorithmic Information Theory
- Authors: Michael K. Cohen, Badri Vellambi, Marcus Hutter
- Abstract summary: We show that an agent learns to accrue reward at least as well as a human mentor, while relying on that mentor with diminishing probability.
We show that eventually, the agent's world-model incorporates the following true fact: intervening in the "outside world" will have no effect on reward acquisition.
- Score: 22.710015392064083
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Algorithmic Information Theory has inspired intractable constructions of
general intelligence (AGI), and undiscovered tractable approximations are
likely feasible. Reinforcement Learning (RL), the dominant paradigm by which an
agent might learn to solve arbitrary solvable problems, gives an agent a
dangerous incentive: to gain arbitrary "power" in order to intervene in the
provision of their own reward. We review the arguments that generally
intelligent algorithmic-information-theoretic reinforcement learners such as
Hutter's (2005) AIXI would seek arbitrary power, including over us. Then, using
an information-theoretic exploration schedule, and a setup inspired by causal
influence theory, we present a variant of AIXI which learns to not seek
arbitrary power; we call it "unambitious". We show that our agent learns to
accrue reward at least as well as a human mentor, while relying on that mentor
with diminishing probability. And given a formal assumption that we probe
empirically, we show that eventually, the agent's world-model incorporates the
following true fact: intervening in the "outside world" will have no effect on
reward acquisition; hence, it has no incentive to shape the outside world.
Related papers
- Strategy Masking: A Method for Guardrails in Value-based Reinforcement Learning Agents [0.27309692684728604]
We study methods for constructing guardrails for AI agents that use reward functions to learn decision making.
We introduce a novel approach, which we call strategy masking, to explicitly learn and then suppress undesirable AI agent behavior.
arXiv Detail & Related papers (2025-01-09T18:43:05Z) - MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards.
We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration.
We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z) - Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions.
In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z) - Efficient Open-world Reinforcement Learning via Knowledge Distillation
and Autonomous Rule Discovery [5.680463564655267]
Rule-driven deep Q-learning agent (RDQ) as one possible implementation of framework.
We show that RDQ successfully extracts task-specific rules as it interacts with the world.
In experiments, we show that the RDQ agent is significantly more resilient to the novelties than the baseline agents.
arXiv Detail & Related papers (2023-11-24T04:12:50Z) - Parametrically Retargetable Decision-Makers Tend To Seek Power [91.93765604105025]
In fully observable environments, most reward functions have an optimal policy which seeks power by keeping options open and staying alive.
We consider a range of models of AI decision-making, from optimal, to random, to choices informed by learning and interacting with an environment.
We show that a range of qualitatively dissimilar decision-making procedures incentivize agents to seek power.
arXiv Detail & Related papers (2022-06-27T17:39:23Z) - On Avoiding Power-Seeking by Artificial Intelligence [93.9264437334683]
We do not know how to align a very intelligent AI agent's behavior with human interests.
I investigate whether we can build smart AI agents which have limited impact on the world, and which do not autonomously seek power.
arXiv Detail & Related papers (2022-06-23T16:56:21Z) - An Algorithmic Theory of Metacognition in Minds and Machines [1.52292571922932]
We present an algorithmic theory of metacognition based on a well-understood trade-off in reinforcement learning.
We show how to create metacognition in machines by implementing a deep MAC.
arXiv Detail & Related papers (2021-11-05T22:31:09Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Learning Human Rewards by Inferring Their Latent Intelligence Levels in
Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data [18.750834997334664]
We argue that humans are bounded rational and have different intelligence levels when reasoning about others' decision-making process.
We propose a new multi-agent Inverse Reinforcement Learning framework that reasons about humans' latent intelligence levels during learning.
arXiv Detail & Related papers (2021-03-07T07:48:31Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.