AGI Agent Safety by Iteratively Improving the Utility Function
- URL: http://arxiv.org/abs/2007.05411v1
- Date: Fri, 10 Jul 2020 14:30:56 GMT
- Title: AGI Agent Safety by Iteratively Improving the Utility Function
- Authors: Koen Holtman
- Abstract summary: We present an AGI safety layer that creates a special dedicated input terminal to support the iterative improvement of an AGI agent's utility function.
We show ongoing work in mapping it to a Causal Influence Diagram (CID)
We then present the design of a learning agent, a design that wraps the safety layer around either a known machine learning system, or a potential future AGI-level learning system.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While it is still unclear if agents with Artificial General Intelligence
(AGI) could ever be built, we can already use mathematical models to
investigate potential safety systems for these agents. We present an AGI safety
layer that creates a special dedicated input terminal to support the iterative
improvement of an AGI agent's utility function. The humans who switched on the
agent can use this terminal to close any loopholes that are discovered in the
utility function's encoding of agent goals and constraints, to direct the agent
towards new goals, or to force the agent to switch itself off. An AGI agent may
develop the emergent incentive to manipulate the above utility function
improvement process, for example by deceiving, restraining, or even attacking
the humans involved. The safety layer will partially, and sometimes fully,
suppress this dangerous incentive. The first part of this paper generalizes
earlier work on AGI emergency stop buttons. We aim to make the mathematical
methods used to construct the layer more accessible, by applying them to an MDP
model. We discuss two provable properties of the safety layer, and show ongoing
work in mapping it to a Causal Influence Diagram (CID). In the second part, we
develop full mathematical proofs, and show that the safety layer creates a type
of bureaucratic blindness. We then present the design of a learning agent, a
design that wraps the safety layer around either a known machine learning
system, or a potential future AGI-level learning system. The resulting agent
will satisfy the provable safety properties from the moment it is first
switched on. Finally, we show how this agent can be mapped from its model to a
real-life implementation. We review the methodological issues involved in this
step, and discuss how these are typically resolved.
Related papers
- Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems.
This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process.
We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z) - Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement [117.94654815220404]
G"odel Agent is a self-evolving framework inspired by the G"odel machine.
G"odel Agent can achieve continuous self-improvement, surpassing manually crafted agents in performance, efficiency, and generalizability.
arXiv Detail & Related papers (2024-10-06T10:49:40Z) - On the limits of agency in agent-based models [13.130587222524305]
Agent-based modeling (ABM) seeks to understand the behavior of complex systems by simulating a collection of agents that act and interact within an environment.
Recent advancements in large language models (LLMs) present an opportunity to enhance ABMs.
We introduce AgentTorch -- a framework that scales ABMs to millions of agents while capturing high-resolution agent behavior using LLMs.
arXiv Detail & Related papers (2024-09-14T04:17:24Z) - Automated Design of Agentic Systems [5.404186221463082]
We formulate a new research area, Automated Design of Agentic Systems, which aims to automatically create powerful agentic system designs.
We show that our algorithm can progressively invent agents with novel designs that greatly outperform state-of-the-art hand-designed agents.
arXiv Detail & Related papers (2024-08-15T21:59:23Z) - AgentGym: Evolving Large Language Model-based Agents across Diverse Environments [116.97648507802926]
Large language models (LLMs) are considered a promising foundation to build such agents.
We take the first step towards building generally-capable LLM-based agents with self-evolution ability.
We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration.
arXiv Detail & Related papers (2024-06-06T15:15:41Z) - Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL)
This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z) - The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI)
We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents.
We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z) - On the Use and Misuse of Absorbing States in Multi-agent Reinforcement
Learning [55.95253619768565]
Current MARL algorithms assume that the number of agents within a group remains fixed throughout an experiment.
In many practical problems, an agent may terminate before their teammates.
We present a novel architecture for an existing state-of-the-art MARL algorithm which uses attention instead of a fully connected layer with absorbing states.
arXiv Detail & Related papers (2021-11-10T23:45:08Z) - Counterfactual Planning in AGI Systems [0.0]
Key step in counterfactual planning is to use an AGI machine learning system to construct a counterfactual world model.
A counterfactual planning agent determines the action that best maximizes expected utility in this counterfactual planning world.
We use counterfactual planning to construct an AGI agent emergency stop button, and a safety interlock that will automatically stop the agent before it undergoes an intelligence explosion.
arXiv Detail & Related papers (2021-01-29T13:44:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.