Related papers: Intelligence and Unambitiousness Using Algorithmic Information Theory

Intelligence and Unambitiousness Using Algorithmic Information Theory

URL: http://arxiv.org/abs/2105.06268v1
Date: Thu, 13 May 2021 13:10:28 GMT
Title: Intelligence and Unambitiousness Using Algorithmic Information Theory
Authors: Michael K. Cohen, Badri Vellambi, Marcus Hutter
Abstract summary: We show that an agent learns to accrue reward at least as well as a human mentor, while relying on that mentor with diminishing probability. We show that eventually, the agent's world-model incorporates the following true fact: intervening in the "outside world" will have no effect on reward acquisition.
Score: 22.710015392064083
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Algorithmic Information Theory has inspired intractable constructions of general intelligence (AGI), and undiscovered tractable approximations are likely feasible. Reinforcement Learning (RL), the dominant paradigm by which an agent might learn to solve arbitrary solvable problems, gives an agent a dangerous incentive: to gain arbitrary "power" in order to intervene in the provision of their own reward. We review the arguments that generally intelligent algorithmic-information-theoretic reinforcement learners such as Hutter's (2005) AIXI would seek arbitrary power, including over us. Then, using an information-theoretic exploration schedule, and a setup inspired by causal influence theory, we present a variant of AIXI which learns to not seek arbitrary power; we call it "unambitious". We show that our agent learns to accrue reward at least as well as a human mentor, while relying on that mentor with diminishing probability. And given a formal assumption that we probe empirically, we show that eventually, the agent's world-model incorporates the following true fact: intervening in the "outside world" will have no effect on reward acquisition; hence, it has no incentive to shape the outside world.

Related papers

Universal AI maximizes Variational Empowerment [0.0]
We build on the existing framework of Self-AIXI -- a universal learning agent that predicts its own actions. We argue that power-seeking tendencies of universal AI agents can be explained as an instrumental strategy to secure future reward. Our main contribution is to show how these motivations systematically lead universal AI agents to seek and sustain high-optionality states.
arXiv Detail & Related papers (2025-02-20T02:58:44Z)
Strategy Masking: A Method for Guardrails in Value-based Reinforcement Learning Agents [0.27309692684728604]
We study methods for constructing guardrails for AI agents that use reward functions to learn decision making. We introduce a novel approach, which we call strategy masking, to explicitly learn and then suppress undesirable AI agent behavior.
arXiv Detail & Related papers (2025-01-09T18:43:05Z)
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions. In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z)
Efficient Open-world Reinforcement Learning via Knowledge Distillation and Autonomous Rule Discovery [5.680463564655267]
Rule-driven deep Q-learning agent (RDQ) as one possible implementation of framework. We show that RDQ successfully extracts task-specific rules as it interacts with the world. In experiments, we show that the RDQ agent is significantly more resilient to the novelties than the baseline agents.
arXiv Detail & Related papers (2023-11-24T04:12:50Z)
Flexible Attention-Based Multi-Policy Fusion for Efficient Deep Reinforcement Learning [78.31888150539258]
Reinforcement learning (RL) agents have long sought to approach the efficiency of human learning. Prior studies in RL have incorporated external knowledge policies to help agents improve sample efficiency. We present Knowledge-Grounded RL (KGRL), an RL paradigm fusing multiple knowledge policies and aiming for human-like efficiency and flexibility.
arXiv Detail & Related papers (2022-10-07T17:56:57Z)
Parametrically Retargetable Decision-Makers Tend To Seek Power [91.93765604105025]
In fully observable environments, most reward functions have an optimal policy which seeks power by keeping options open and staying alive. We consider a range of models of AI decision-making, from optimal, to random, to choices informed by learning and interacting with an environment. We show that a range of qualitatively dissimilar decision-making procedures incentivize agents to seek power.
arXiv Detail & Related papers (2022-06-27T17:39:23Z)
On Avoiding Power-Seeking by Artificial Intelligence [93.9264437334683]
We do not know how to align a very intelligent AI agent's behavior with human interests. I investigate whether we can build smart AI agents which have limited impact on the world, and which do not autonomously seek power.
arXiv Detail & Related papers (2022-06-23T16:56:21Z)
An Algorithmic Theory of Metacognition in Minds and Machines [1.52292571922932]
We present an algorithmic theory of metacognition based on a well-understood trade-off in reinforcement learning. We show how to create metacognition in machines by implementing a deep MAC.
arXiv Detail & Related papers (2021-11-05T22:31:09Z)
Knowledge is reward: Learning optimal exploration by predictive reward cashing [5.279475826661643]
We exploit the inherent mathematical structure of Bayes-adaptive problems in order to dramatically simplify the problem. The key to this simplification comes from the novel concept of cross-value. This results in a new denser reward structure that "cashes in" all future rewards that can be predicted from the current information state.
arXiv Detail & Related papers (2021-09-17T12:52:24Z)
Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
Learning Human Rewards by Inferring Their Latent Intelligence Levels in Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data [18.750834997334664]
We argue that humans are bounded rational and have different intelligence levels when reasoning about others' decision-making process. We propose a new multi-agent Inverse Reinforcement Learning framework that reasons about humans' latent intelligence levels during learning.
arXiv Detail & Related papers (2021-03-07T07:48:31Z)
Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL) In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.